7/10/25 AI thread
| black box of digital vectors raping will stancil | 07/10/25 | | ,.,.,.,....,.,..,.,.,. | 07/10/25 | | black box of digital vectors raping will stancil | 07/10/25 | | black box of digital vectors raping will stancil | 07/10/25 | | ,.,.,.,....,.,..,.,.,. | 07/10/25 | | black box of digital vectors raping will stancil | 07/10/25 | | black box of digital vectors raping will stancil | 07/10/25 | | Business school fucking ROCKS!!! | 07/10/25 | | black box of digital vectors raping will stancil | 07/10/25 | | Business school fucking ROCKS!!! | 07/10/25 | | black box of digital vectors raping will stancil | 07/10/25 | | ,.,....,...,,,..,..,.,..,.,.,.,. | 07/10/25 | | black box of digital vectors raping will stancil | 07/10/25 | | ,.,....,...,,,..,..,.,..,.,.,.,. | 07/10/25 | | ,.,....,...,,,..,..,.,..,.,.,.,. | 07/10/25 | | black box of digital vectors raping will stancil | 07/10/25 | | Business school fucking ROCKS!!! | 07/10/25 | | black box of digital vectors raping will stancil | 07/10/25 | | ,.,.,.,....,.,..,.,.,. | 07/10/25 |
Poast new message in this thread
Date: July 10th, 2025 11:32 AM Author: ,.,.,.,....,.,..,.,.,.
i would like to see Grok 4 evaluated on a broader set of benchmarks, but the preliminary numbers seem to strongly imply that LLM progress is not stalling. the Frontiermath score will be interesting.
(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id#49089895) |
 |
Date: July 10th, 2025 12:30 PM Author: ,.,.,.,....,.,..,.,.,.
I am reminded of the debates about the meaningfulness of IQ and the existence of the g factor. It sounds intuitively reasonable that they are just fitting to benchmarks just like it’s reasonable to think IQ is only what IQ tests measure. But then people would create alternative measures of intelligence and the first principal component vector would be identical to the one in other intelligence measures, and the g factor would explain a large percentage of the subtest variance. Similarly when people create new LLM benchmarks the rank orderings for LLMs is highly similar to other benchmarks. There are some caveats like Claude models being especially good at coding relative to other measures, but it’s generally true
(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id#49090044) |
Date: July 10th, 2025 6:01 PM Author: black box of digital vectors raping will stancil
https://x.com/vincentweisser/status/1943427747717722490
apparently grok 4 leaned heavily into RL for their reported performance gains
(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id#49090910) |
 |
Date: July 10th, 2025 6:07 PM Author: Business school fucking ROCKS!!! (🧐)
you do this round of coping every time a new model comes out
https://x.com/TimSweeneyEpic/status/1943398745762116029
https://x.com/arcprize/status/1943168950763950555
etc
by all accounts it seems to be a leading model. maybe not as practically good for all-around stuff as o3 (which has top-tier tool calling) or as good as sonnet/opus 4 for coding (which anthropic is increasingly specializing in), but nonetheless pretty much all feedback involving actual use seems positive
(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id#49090931) |
 |
Date: July 10th, 2025 6:23 PM
Author: ,.,....,...,,,..,..,.,..,.,.,.,.
Can you buy like $5 in API credits?
(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id#49090998) |
 |
Date: July 10th, 2025 6:31 PM
Author: ,.,....,...,,,..,..,.,..,.,.,.,.
lame. weird how quickly xAI became a leader in this space
(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id#49091035) |
 |
Date: July 10th, 2025 6:21 PM
Author: ,.,....,...,,,..,..,.,..,.,.,.,.
Wait a month and someone else will release a better model
(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id#49090990) |
Date: July 10th, 2025 6:43 PM Author: black box of digital vectors raping will stancil
https://x.com/ramez/status/1943431212766294413
lmao the new grok 4 just straight up looks up what elon musk's personal beliefs are and then incorporates them into its answer and shows it in its chain of thought
that's hilarious and honestly kind of based
(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id#49091100) |
 |
Date: July 10th, 2025 7:46 PM Author: ,.,.,.,....,.,..,.,.,.
(http://www.autoadmit.com/thread.php?thread_id=5748664&forum_id=2...id#49091377) |
|
|