The crazy thing about LLMs is nobody even theorized they were possible
| ''''"'''""'"" | 11/02/25 | | AZNgirl asking Othani why he didn't hit 4 homers | 11/02/25 | | ,.,...,.,.,...,.,,. | 11/02/25 | | ''''"'''""'"" | 11/02/25 | | culture hasn't existed for 20 years | 11/02/25 | | culture hasn't existed for 20 years | 11/02/25 | | .,.,,..,..,.,.,:,,:,...,:::,...,:,.,.:..:. | 11/02/25 | | Jew in Wolfs clothing | 11/02/25 | | culture hasn't existed for 20 years | 11/02/25 | | ,.,...,.,.,...,.,,. | 11/02/25 | | ''''"'''""'"" | 11/02/25 | | culture hasn't existed for 20 years | 11/02/25 | | beautiful kike | 11/02/25 | | ''''"'''""'"" | 11/02/25 | | ,.,...,.,.,...,.,,. | 11/02/25 |
Poast new message in this thread
Date: November 2nd, 2025 1:04 PM Author: ,.,...,.,.,...,.,,.
people had been training LSTMs for language modeling for years before GPT-1 was released, so that's not true. the idea that training for predictive loss on text gradually produces language fluency and understanding makes deep intuitive sense if you think about it for two seconds. many people in ML were surprised and disappointed that the architectures for doing this could look stupidly simple and all you needed to do was train at scale (and everything not based on scale and instead focused on clever model design failed). certainly GPT-3 was not surprising at all given GPT-2
(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2most#49395372) |
 |
Date: November 2nd, 2025 1:38 PM
Author: .,.,,..,..,.,.,:,,:,...,:::,...,:,.,.:..:.
cr, lots of people who thought we needed hand engineered syntactic processing. Some of them are still deluded enough to believe their precious cognitive theories will be worth something in the long run.
(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2most#49395459) |
 |
Date: November 2nd, 2025 2:36 PM Author: ,.,...,.,.,...,.,,.
seems premature to conclude that given the progress we have seen in the last year. LLMs within the last year got gold at the imo, gold at the international informatics olympiad, gold at international collegiate programming contest, saw rapid growth in math and software engineering benchmark performance, etc. GPT-5 hallucination rates are also down compared to prior models. people now have weird expectations for this technology given the rate of improvement - 3 or 4 months without sizeable improvements means it has "plateaued". there's now a long history of people being wrong about this.
compute based models of AI progress predicted recent history well. if you project those models out to the sort of datacenters and chips that are currently being built, it's very hard to buy into notions of AI progress meaningfully stalling. compute buys algorithmic improvements too, since model designers can rapidly test out and iterate over architectures, so model quality per unit of compute cost is likely to continue to increase.
(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2most#49395596) |
 |
Date: November 2nd, 2025 2:58 PM Author: culture hasn't existed for 20 years
It is absolutely far fetched to imagine that LLMs can make novel discoveries in physics. LLMs cannot do this. They cannot make out of distribution inferences
The reason why AI was able to discover novel chess strategy and understanding is because that form of AI is reinforcement learning and not LLMs. Alphazero and similar chess engines are training against themselves, creating novel data, and testing against it for feedback, resulting in the synthesis of genuinely new "understanding" that is entirely outside of the distribution of known human chess understanding and experience
LLMs don't work like that and aren't capable of that. They will get better, yes, and people will figure out clever ways to set up agentic superstructures to enable them to do practically useful tasks that are very economically efficient and productive, but LLMs are absolutely not *ever* going to be capable of novel inference. At best they can provide assistance and facilitation of the training of other RL based AIs that can
(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2most#49395642) |
 |
Date: November 2nd, 2025 2:59 PM Author: ,.,...,.,.,...,.,,.
classical statistical inference models are highly focused on preventing something called overfitting.
imagine you have a data sample that you are trying to fit a function to that will predict new datapoints. if you have a function approximator (such as a neural network) that has many tunable parameters, then you can often fit many, many functions to your data. most of them will have very little ability to predict data points that aren't in your sample, so you end up with an overfitted function that models your data well but is essentially useless. most people predicted this as the standard failure model of neural networks that are either 1) very large, with many tunable parameters (more opportunity to fit a complicated function with no generalization power 2) trained very hard on the data (so the model will fit the data too closely and not extract broadly useful patterns). they thought you would then need to keep the number of parameters suitably low and stop model training after a certain point to prevent overfitting.
it turns that if you just continue making neural networks larger or training for very long time horizons, you'll often see model generalization error go to a new, lower level. this is deeply counter intuitive for standard statistical models. why this happens is not completely understood. weight decay, which penalizes model complexity, is likely a big part of this. the function approximator will wander around the loss landscape for a while and then eventually find a simple encoding function for the data. simple models have lower generalization error.
(http://www.autoadmit.com/thread.php?thread_id=5792672&forum_id=2most#49395644) |
|
|