An LLM Eval project focus on planning.

Papers

PlanBench: An Extensible Benchmark for Evaluating Large Language...