An LLM Eval project focus on planning.
PlanBench: An Extensible Benchmark for Evaluating Large Language...