Close Menu
Core Bulletin

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Former Googlers’ AI startup OpenArt now creates ‘brain rot’ videos in just one click

    August 9, 2025

    How to Watch Outside Lands 2025 Live Stream Online

    August 9, 2025

    Hailey Bieber Amps up Date Night Style for a Celebrity Favorite Spaghetti Spot

    August 9, 2025
    Facebook X (Twitter) Instagram
    Core BulletinCore Bulletin
    Trending
    • Former Googlers’ AI startup OpenArt now creates ‘brain rot’ videos in just one click
    • How to Watch Outside Lands 2025 Live Stream Online
    • Hailey Bieber Amps up Date Night Style for a Celebrity Favorite Spaghetti Spot
    • 2025 fantasy football draft guide – Rankings, mock drafts and analysis
    • Police officer dies after shooting near US’s CDC headquarters
    • Lammy and Vance to hold meeting to discuss US-brokered Ukraine peace plan | Ukraine
    • ‘It’s missing something’: AGI, superintelligence and a race for the future | Artificial intelligence (AI)
    • 3 Best Steam Mops, Tested for Months (2025)
    Saturday, August 9
    • Home
    • Business
    • Health
    • Lifestyle
    • Politics
    • Science
    • Sports
    • Travel
    • World
    • Technology
    • Entertainment
    Core Bulletin
    Home»Science»Mathematicians Question AI Performance at International Math Olympiad
    Science

    Mathematicians Question AI Performance at International Math Olympiad

    By Liam PorterAugust 7, 2025No Comments7 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Telegram Email
    Mathematicians Question AI Performance at International Math Olympiad
    Share
    Facebook Twitter LinkedIn Pinterest Email

    A defining memory from my senior year of high school was a nine-hour math exam with just six questions. Six of the top scorers won slots on the U.S. team for the International Math Olympiad (IMO), the world’s longest running math competition for high school students. I didn’t make the cut, but became a tenured mathematics professor anyway.

    This year’s olympiad, held last month on Australia’s Sunshine Coast, had an unusual sideshow. While 110 students from around the world went to work on complex math problems using pen and paper, several AI companies quietly tested new models in development on a computerized approximation of the exam. Right after the closing ceremonies, OpenAI and later Google DeepMind announced that their models earned (unofficial) gold medals for solving five of the six problems. Researchers like Sébastien Bubeck of OpenAI celebrated these models’ successes as a “moon landing moment” by industry.

    But are they? Is AI going to replace professional mathematicians? I’m still waiting for the proof.


    On supporting science journalism

    If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


    The hype around this year’s AI results is easy to understand because the olympiad is hard. To wit, in my senior year of high school, I set aside calculus and linear algebra to focus on olympiad-style problems, which were more of a challenge. Plus the cutting-edge models still in development did so much better at the exam than the commercial models already out there. In a parallel contest administered by MathArena.ai, Gemini 2.5 pro, Grok 4, o3 high, o4-mini high and DeepSeek R1 all failed to produce a single completely correct solution. It shows that AI models are getting smarter, their reasoning capabilities improving rather dramatically.

    Yet I’m still not worried.

    The latest models just got a good grade on a single test—as did many of the students—and a head-to-head comparison isn’t entirely fair. The models often employ a “best-of-n” strategy, generating multiple solutions and then grading themselves to select the strongest. This is akin to having several students work independently, then get together to pick the best solution and submit only that one. If the human contestants were allowed this option, their scores would likely improve too.

    Other mathematicians are similarly cautioning against the hype. IMO gold medalist Terence Tao (currently a mathematician at the University of California, Los Angeles) noted on Mastodon that what AI can do depends on what the testing methodology is. IMO president Gregor Dolinar said that the organization “cannot validate the methods [used by the AI models], including the amount of compute used or whether there was any human involvement, or whether the results can be reproduced.”

    Besides, IMO exam questions don’t compare to the kinds of questions professional mathematicians try to answer, where it can take nine years, rather than nine hours, to solve a problem at the frontier of mathematical research. As Kevin Buzzard, a mathematics professor at Imperial College London, said in an online forum, “When I arrived in Cambridge UK as an undergraduate clutching my IMO gold medal I was in no position to help any of the research mathematicians there.”

    These days, mathematical research can take more than one lifespan to acquire the right expertise. Like many of my colleagues, I’ve been tempted to try “vibe proving”—having a math chat with an LLM as one would with a colleague, asking “Is it true that…” followed by a technical mathematical conjecture. The chatbot often then supplies a clearly articulated argument that, in my experience, tends to be correct when it comes to standard topics but subtly wrong at the cutting edge. For example, every model I’ve asked has made the same subtle mistake in assuming that the theory of idempotents behaves the same for weak infinite-dimensional categories as it does for ordinary ones, something that human experts (trust me on this) in my field know to be false.

    I’ll never trust an LLM—which at its core is just predicting what text will come next in a string of words, based on what’s in its dataset—to provide a mathematical proof that I can’t verify myself.

    The good news is, we do have an automated mechanism for determining whether proofs can be trusted. Relatively recent tools called “proof assistants” are software programs (they don’t use AI) designed to check whether a logical argument proves the stated claim. They are increasingly attracting attention from mathematicians like Tao, Buzzard and myself who want more assurance that our own proofs are correct. And they offer the potential to help democratize mathematics and even improve AI safety.

    Suppose I received a letter, in unfamiliar handwriting, from Erode, a city in Tamil Nadu, India, purporting to contain a mathematical proof. Maybe its ideas are brilliant, or maybe they’re nonsensical. I’d have to spend hours carefully studying every line, making sure the argument flowed step-by-step, before I’d be able to determine whether the conclusions are true or false.

    But if the mathematical text were written in an appropriate computer syntax instead of natural language, a proof assistant could check the logic for me. A human mathematician, such as I, would then only need to understand the meaning of the technical terms in the theorem statement. In the case of Srinivasa Ramanujan, a generational mathematical genius who did hail from Erode, an expert did take the time to carefully decipher his letter. In 1913 Ramanujan wrote to the British mathematician G. H. Hardy with his ideas. Luckily, Hardy recognized Ramanujan’s brilliance and invited him to Cambridge to collaborate, launching the career of one of the all-time mathematical “greats.”

    What’s interesting is that some of the AI IMO contestants submitted their answers in the language of the Lean computer proof assistant so that the computer program could automatically check for errors in their reasoning. A start-up called Harmonic posted formal proofs generated by their model for five of the six problems, and ByteDance achieved a silver-medal level performance by solving four of the six problems. But the questions had to be written to accommodate the models’ language limitations, and they still needed days to figure it out.

    Still, formal proofs are uniquely trustworthy. While so-called “reasoning” models are prompted to break problems down into pieces and explain their “thinking” step by step, the output is as likely to produce an argument that sounds logical but isn’t, as to constitute a genuine proof. By contrast, a proof assistant will not accept a proof unless it is fully precise and fully rigorous, justifying every step in its chain-of-thought. In some circumstances, a hand-waving or approximate solution is good enough, but when mathematical accuracy matters, we should demand that AI-generated proofs are formally verifiable.

    Not every application of generative AI is so black and white, where humans with the right expertise can determine whether the results are correct or incorrect. In life, there is a lot of uncertainty and it’s easy to make mistakes. As I learned in high school, one of the best things about math is the fact that you can prove definitively that some ideas are wrong. So I’m happy to have an AI try to solve my personal math problems, but only if the results are formally verifiable. And we aren’t quite there, yet.

    This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of Scientific American.

    [source_link

    International Math Mathematicians Olympiad performance Question
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Liam Porter
    • Website

    Liam Porter is a seasoned news writer at Core Bulletin, specializing in breaking news, technology, and business insights. With a background in investigative journalism, Liam brings clarity and depth to every piece he writes.

    Related Posts

    ‘Bizarre and wrong’: Danish zoo sparks debate with plea for pets to use as food | Animals

    August 9, 2025

    Pfizer Covid vaccine for young children may not be renewed by FDA | US news

    August 9, 2025

    Jim Lovell, Apollo 13 commander, dies aged 97 | Nasa

    August 8, 2025

    AI-Designed Hydrogel Inspired by Nature Creates Ultra-Strong Underwater Adhesive

    August 8, 2025

    Eight bat researchers mostly from Asia and Africa refused entry into Australia to attend global scientific event | Science

    August 8, 2025

    What Are Light Echoes, and Why Do They Matter?

    August 8, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Medium Rectangle Ad
    Don't Miss
    Technology

    Former Googlers’ AI startup OpenArt now creates ‘brain rot’ videos in just one click

    August 9, 2025

    AI-generated “brain rot” videos are popping up all over the internet and getting a lot…

    How to Watch Outside Lands 2025 Live Stream Online

    August 9, 2025

    Hailey Bieber Amps up Date Night Style for a Celebrity Favorite Spaghetti Spot

    August 9, 2025

    2025 fantasy football draft guide – Rankings, mock drafts and analysis

    August 9, 2025
    Our Picks

    Reform council confirms ‘patriotic’ flag policy

    July 4, 2025

    Trump references bankers with antisemitic slur in Iowa speech to mark megabill’s passage – as it happened | Donald Trump

    July 4, 2025

    West Indies v Australia: Tourists bowled out for 286 in Grenada Test

    July 4, 2025

    Beards may be dirtier than toilets – but all men should grow one | Polly Hudson

    July 4, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Medium Rectangle Ad
    About Us

    Welcome to Core Bulletin — your go-to source for reliable news, breaking stories, and thoughtful analysis covering a wide range of topics from around the world. Our mission is to inform, engage, and inspire our readers with accurate reporting and fresh perspectives.

    Our Picks

    Former Googlers’ AI startup OpenArt now creates ‘brain rot’ videos in just one click

    August 9, 2025

    How to Watch Outside Lands 2025 Live Stream Online

    August 9, 2025
    Recent Posts
    • Former Googlers’ AI startup OpenArt now creates ‘brain rot’ videos in just one click
    • How to Watch Outside Lands 2025 Live Stream Online
    • Hailey Bieber Amps up Date Night Style for a Celebrity Favorite Spaghetti Spot
    • 2025 fantasy football draft guide – Rankings, mock drafts and analysis
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions
    © 2025 Core Bulletin. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.