Mythos is actually a step change in AI capabilities
| .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:. | 06/09/26 | | Genius Bear on the loose in Japan | 06/09/26 | | .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:. | 06/09/26 | | Genius Bear on the loose in Japan | 06/09/26 | | Mailer Daemon | 06/09/26 | | Frutiger Aero | 06/09/26 | | Frutiger Aero | 06/09/26 | | Personally Disordered | 06/09/26 | | Genius Bear on the loose in Japan | 06/09/26 | | German pumo | 06/09/26 | | The Penis | 06/09/26 | | .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:. | 06/09/26 | | The Penis | 06/10/26 |
Poast new message in this thread
Date: June 9th, 2026 9:52 PM
Author: .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.
https://www.oneusefulthing.org/p/what-it-feels-like-to-work-with-mythos
https://x.com/victortaelin/status/2064448425936994742?s=46
https://metr.org/time-horizons/
https://www.anthropic.com/news/claude-fable-5-mythos-5
Meanwhile it’s showing large improvements on the sorts of software engineering tasks necessary for recursive self-improvement. EPAH crying, losing hope.
(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2...id#49926681) |
Date: June 9th, 2026 9:55 PM Author: Genius Bear on the loose in Japan
that step change in question?
it's doing the exact same stuff it did before (language tasks), but slightly better
Wow. A Step Change Was Performed. This. Changes. Everything.
(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2...id#49926686) |
 |
Date: June 9th, 2026 9:59 PM
Author: .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.
Language tasks includes writing code for self-improving AGI. This is not comforting
(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2...id#49926695) |
 |
Date: June 9th, 2026 11:55 PM
Author: .,.,...,..,.,.,:,,:,...,:::,...,:,.,.:..:.
8.10 USAMO 2026
The USA Mathematical Olympiad (USAMO) is a six-problem, two-day proof-based
competition for high school students. It is the next step of the math olympiad track in the
US after the AIME, which was a popular AI benchmark last year but is now saturated. The 2026 USAMO took place on March 21–22, 2026, after almost all of Mythos’s training data
was collected, and we are confident that there was no contamination.
Because USAMO solutions are proofs rather than short answers, grading can be challenging
and subjective. We follow the MathArena
41 grading methodology, where each proof is
rewritten by a neutral model (Gemini 3.1 Pro) and judged by a panel of 3 frontier models (we
used Gemini 3.1 Pro, Claude Opus 4.6, and Claude Mythos Preview) according to defined
rubrics. The final score is the minimum given by any judge.
Mythos 5 scored 99.8% at medium, high, and xhigh reasoning effort, and 98.3% at low
effort, averaging over 10 attempts per problem. Across all 240 attempts, the only proof that
more than one judge scored below full marks was a low-effort attempt on Problem 6,
where the model itself declined to claim a complete solution and proved a restricted
subcase instead. Average token usage per attempt ranged from roughly 42K at low effort to
100K at xhigh. Under similar settings, Opus 4.8 scored 96.7% and Opus 4.7 scored 69.3%
-----------
an LLM is definitely going to 100% the IMO this year.
(http://www.autoadmit.com/thread.php?thread_id=5872645&forum_id=2...id#49926915) |
|
|