Yeah deepmind had good results with IMO problems, but only geometry problems. They scored almost at the level of gold medalist. That's only a fraction of IMO problems, though. They did it by combining a formal verification system with a LLM to propose solution paths, and then doing some tree search I think.
This is one way to improve large AI systems and will probably be incorporated in some way in the future, for example by integrating with a language like lean (for math proofs).
They will also be improved by combining with tool use like calculators, code interpreters, web search, calendars, etc. This is already starting to happen to some extent.
LLMs by themselves, at least with current architectures using transformers, are not great at reasoning (counting, arithmetic, symbolic reasoning)
Yeah deepmind had good results with IMO problems, but only geometry problems. They scored almost at the level of gold medalist. That's only a fraction of IMO problems, though. They did it by combining a formal verification system with a LLM to propose solution paths, and then doing some tree search I think.
This is one way to improve large AI systems and will probably be incorporated in some way in the future, for example by integrating with a language like lean (for math proofs).
They will also be improved by combining with tool use like calculators, code interpreters, web search, calendars, etc. This is already starting to happen to some extent.
LLMs by themselves, at least with current architectures using transformers, are not great at reasoning (counting, arithmetic, symbolic reasoning)