Discussion about this post

User's avatar
Bayesian's avatar

I think the second argument is neat and convincing. -1 points for giving the reader an exercise 😠. But yeah basically for any harder verifiable task an implicit subcapability that will be verifiably trained is long term planning, bc you need to make plans and break down the problem into parts when the problem is hard. I would further guess that part of what makes current models not great at this is that the math environments with verifiable rewards are usually pretty short problems, and there’s possibly much more long term planning progress to be made. Then, the reasoning traces naturally learn to make long term plans, which in a way is what ‘asking the right questions’ means in math? Not entirely but it’s a big part of it. Idrk about the step after that, and clearly you don’t either…

1 more comment...

No posts

Ready for more?