GPT is getting better at math

Nevertheless, at least one university academic would anticipate a higher level of thoroughness from their doctoral candidates.

Man considers a vast blackboard covered in mathematical equations, symbolizing complexity.

A recent study by Epoch AI, a non-profit research institute, indicates that OpenAI’s GPT-5.2 Pro demonstrates enhanced capability in tackling complex mathematical challenges compared to prior iterations of the company’s leading large language model.

According to Epoch’s findings, GPT-5.2 Pro successfully resolved four problems that no other AI model had managed to crack, and out of 13 problems previously solved by other models, it conquered 11, Epoch reported.

Consequently, GPT-5.2 Pro successfully addressed 31% of Epoch AI’s challenges, marking an improvement from the prior top score of 19%.

For a considerable time, artificial intelligence has struggled with mathematical problems. Some researchers theorize this difficulty stems from AI systems’ inability to acknowledge their own boundaries, whereas others suggest the core problem lies in AI’s primary focus on linguistic processing over numerical understanding, resulting in occasional errors.

The Epoch AI experiment provides evidence that AI is progressively improving its proficiency in handling more intricate mathematical challenges. During the assessment, GPT-5.2 Pro encountered problems spanning diverse mathematical fields.

One of the problems successfully tackled by GPT-5.2 Pro was provided by Joel Hass, a mathematics professor at the University of California, Davis. He expressed to Epoch AI his admiration for how the model deciphered his topological puzzle, stating, “GPT-5.2 Pro resolved the problem with accurate logic. Remarkably, it could identify the precise geometry of a surface as specified by a polynomial in the problem description.”

Another challenge came from Ken Ono, a number theorist at the University of Virginia. He remarked that the AI model had “grasped the fundamental theoretical approach and performed the required calculations” to arrive at the solution. However, he qualified his praise by noting, “Were this a PhD student, I would assign a score of merely 6/10 for thoroughness because of absent specifics.”

Generative AIArtificial Intelligence

Trending →

Gemini CLI: See Your Changes First

Orchestrating AI Agents with Amazon Bedrock

JetBrains uses AI to make Kotlin and Java debugging easier.

Postgres: The Go-To Database, Ready for AI’s Future

Apple’s 50th Birthday: They’re celebrating you.

GPT is getting better at math

Nevertheless, at least one university academic would anticipate a higher level of thoroughness from their doctoral candidates.

Leave a Reply Cancel reply

You Might Also Like ↷

AI Reality Check

Descope makes managing AI identities simple.

We Slashed Development Time From Months to Weeks with Generative UI.

Anthropic forms group to study AI’s real-world impact.