Summary

They note that LLMs have trouble planning, but ChatGPT Strawberry o1 does a good job
They fail to provide much of a theory about why it would behave this way
Talks about PlanBench

Notes

Did it Work?

Questions

[ ]