Summary
They note that LLMs have trouble planning, but
ChatGPT Strawberry o1
does a good job
They fail to provide much of a theory about why it would behave this way
Talks about
PlanBench
Notes
Did it Work?
Questions
[ ]