“The model is definitely better at solving the AP math test than I am, and I was a math minor in college,” OpenAI’s chief research officer, Bob McGrew, tells me. He says OpenAI also tested o1 against a qualifying exam for the International Mathematics Olympiad, and while GPT-4o only correctly solved only 13 percent of problems, o1 scored 83 percent.
That's still unreliable enough that I wouldn't trust it to actually do anything. If it scoured its database for a trigonometry textbook and cited a solution for a problem which was as correct as any web calculator, cool. That'd be as useful as google was in 2010. 83% is the kind of score I get on advanced mathematics tests when I have no idea what I'm doing but half-remember the basic steps to get an answer.
That's still unreliable enough that I wouldn't trust it to actually do anything. If it scoured its database for a trigonometry textbook and cited a solution for a problem which was as correct as any web calculator, cool. That'd be as useful as google was in 2010. 83% is the kind of score I get on advanced mathematics tests when I have no idea what I'm doing but half-remember the basic steps to get an answer.