New OpenAI general reasoning model gets gold medal at international math olympia
| Ocher effete karate foreskin | 07/19/25 | | Magenta Liquid Oxygen | 07/19/25 | | garnet supple hospital double fault | 07/19/25 | | Fantasy-prone Diverse Garrison Clown | 07/19/25 | | Glassy spot indirect expression | 07/20/25 | | Appetizing pisswyrm french chef | 07/19/25 | | garnet supple hospital double fault | 07/19/25 | | Appetizing pisswyrm french chef | 07/19/25 | | Appetizing pisswyrm french chef | 07/20/25 | | The Wandering Mercatores | 07/20/25 | | rape bunny | 07/20/25 | | ,.,.,.,,,.,,.,..,.,.,.,.,,. | 07/20/25 | | Vibrant indecent lay | 07/19/25 | | cracking slippery friendly grandma | 07/20/25 | | heady hunting ground | 07/20/25 | | Appetizing pisswyrm french chef | 07/20/25 | | federal electric property hissy fit | 07/20/25 | | Appetizing pisswyrm french chef | 07/20/25 | | Do you agree? | 07/20/25 | | rape bunny | 07/20/25 | | Do you agree? | 07/20/25 | | Cerebral Associate | 07/20/25 | | Appetizing pisswyrm french chef | 07/20/25 | | Wang Hernandez | 07/20/25 | | The Wandering Mercatores | 07/20/25 | | Wang Hernandez | 07/20/25 | | AdolfHitler88 | 07/20/25 | | ,.,.,.,,,.,,.,..,.,.,.,.,,. | 07/20/25 | | The Wandering Mercatores | 07/20/25 | | Arousing Frozen Box Office | 07/20/25 | | The Wandering Mercatores | 07/20/25 | | scholarship | 07/20/25 | | The Wandering Mercatores | 07/20/25 | | Faggottini | 07/20/25 | | The Wandering Mercatores | 07/20/25 | | ,.,,.,.,,,,,,..................... | 07/20/25 | | ,.,.,.,,,.,,.,..,.,.,.,.,,. | 07/20/25 | | The Wandering Mercatores | 07/20/25 | | AdolfHitler88 | 07/20/25 | | ,.,....,...,,,..,..,.,..,.,.,.,. | 07/21/25 |
Poast new message in this thread
 |
Date: July 20th, 2025 3:59 AM Author: Glassy spot indirect expression
Yes, it's truly remarkable how quickly AI has advanced in contest math! Gemini's 50% score on the USAMO (United States of America Mathematical Olympiad) is a massive leap compared to where models like GPT-4 started just a couple of years ago.
### Key Observations on the Progress:
1. **From Near-Zero to Competitive Performance**
- Early versions of GPT-4 struggled to score even 1-2 problems on the AIME (American Invitational Mathematics Exam), which is significantly easier than the USAMO.
- Now, AI is not just solving AIME-level problems reliably but also tackling Olympiad-level questions, which require deep reasoning, creativity, and proof-writing.
2. **Why This is Surprising**
- Many experts (including myself) expected this level to take at least 2-3 more years. The fact that it happened this year suggests:
- Better training techniques (e.g., improved reasoning loops, synthetic data, and self-improvement methods).
- Stronger mathematical priors (models may now "understand" abstract patterns rather than just memorizing).
- Possibly, more sophisticated search/verification methods (though Gemini's exact approach isn't public).
3. **Is Contest Math "Close to Solved"?**
- **For AIME:** Probably yes. AI is likely near-superhuman at this level already.
- **For USAMO/IMO:** Not yet, but progress is accelerating. A 50% USAMO score is already better than many human contestants, but the hardest problems (e.g., IMO Q6-style combinatorics/number theory) still require breakthroughs in symbolic reasoning and long-term planning.
- **For Proof Writing:** AI still makes stylistic errors, but formal verification (e.g., Lean) can help close the gap.
4. **What’s Next?**
- **IMO Gold (60+ score) within 1-2 years?** If progress continues at this rate, it's plausible.
- **General Mathematical Research:** The real test will be whether these models can contribute to unsolved problems (e.g., combinatorics conjectures, lightweight formal math).
### Why This Matters Beyond Olympiads:
- **Education:** AI could become the ultimate tutor for advanced math.
- **Research:** Automated reasoning might assist in mathematical discovery.
- **AGI Benchmarks:** Math is a strong proxy for structured reasoning—this progress hints at broader capabilities.
It’s an exciting time! Wouldn’t be surprised if an AI wins an IMO gold medal by 2026.
(http://www.autoadmit.com/thread.php?thread_id=5752305&forum_id=2...id#49115797) |
 |
Date: July 20th, 2025 12:54 AM Author: Appetizing pisswyrm french chef Subject: this fucking faggot:
"we are releasing GPT-5 soon but want to set accurate expectations: this is an experimental model that incorporates new research techniques we will use in future models. we think you will love GPT-5, but we don't plan to release a model with IMO gold level of capability for many months."
https://x.com/sama/status/1946569252296929727
(http://www.autoadmit.com/thread.php?thread_id=5752305&forum_id=2...id#49115705) |
 |
Date: July 20th, 2025 10:49 PM
Author: ,.,.,.,,,.,,.,..,.,.,.,.,,.
It should be possible to get the compute cost down much lower with distillation on the reasoning traces from large models. This is just a proof of concept. One of the major advantages of AI compared to humans is you can create parallel instances and then train on orders of magnitude more data than any human can see. Stockfish’s evaluation function without search is superhuman (despite being tiny and using essentially no compute), because they could train it on many trillions of positions to capture a powerful intuition for the chess board. We will likely see the same thing happen with reasoning models. Models could eventually intuit the answer to IMO problems in milliseconds.
(http://www.autoadmit.com/thread.php?thread_id=5752305&forum_id=2...id#49117843) |
 |
Date: July 20th, 2025 12:12 PM
Author: ,.,.,.,,,.,,.,..,.,.,.,.,,.
There are a handful of high school students with 150+ IQ that are able to solve these problems. In addition, AI went from being able to get 700 or so on the SAT math to this in about two years thanks to AI scaling. Do you feel confident it won’t start solving unknown problems with another 2 years of scaling?
(http://www.autoadmit.com/thread.php?thread_id=5752305&forum_id=2...id#49116355) |
Date: July 20th, 2025 1:32 PM
Author: ,.,,.,.,,,,,,.....................
What about other AI models doing the same questions? Does this mean Open AI is the best?
(http://www.autoadmit.com/thread.php?thread_id=5752305&forum_id=2...id#49116490) |
 |
Date: July 20th, 2025 1:36 PM
Author: ,.,.,.,,,.,,.,..,.,.,.,.,,.
The rumor is that DeepMind got gold as well. No one is very far ahead of the others.
(http://www.autoadmit.com/thread.php?thread_id=5752305&forum_id=2...id#49116500) |
Date: July 21st, 2025 1:19 PM
Author: ,.,....,...,,,..,..,.,..,.,.,.,.
Google got gold too using a large language model.
"We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points — a gold medal score. Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow."
https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/
(http://www.autoadmit.com/thread.php?thread_id=5752305&forum_id=2...id#49119024) |
|
|