i'm Yann LeCun pilled on AI now
| dave portnoy being baked in a pizza oven | 06/05/25 | | blow off some steam | 06/05/25 | | dave portnoy being baked in a pizza oven | 06/06/25 | | Peter Andreas Thiel | 06/06/25 | | dave portnoy being baked in a pizza oven | 06/06/25 | | dave portnoy being baked in a pizza oven | 06/06/25 | | Peter Andreas Thiel | 06/06/25 | | dave portnoy being baked in a pizza oven | 06/06/25 | | Edmonton Oilers | 06/06/25 | | dave portnoy being baked in a pizza oven | 06/06/25 | | .,,.,,.,,,.... | 06/06/25 | | Ass Sunstein | 06/06/25 | | Peter Andreas Thiel | 06/06/25 | | Ass Sunstein | 06/06/25 | | martin heidegger | 06/06/25 | | Ass Sunstein | 06/06/25 | | martin heidegger | 06/06/25 | | .,,.,,.,,,.... | 06/06/25 | | Ass Sunstein | 06/06/25 | | dave portnoy being baked in a pizza oven | 06/06/25 | | Ass Sunstein | 06/06/25 | | .,,.,,.,,,.... | 06/06/25 | | dave portnoy being baked in a pizza oven | 06/06/25 | | .,,.,,.,,,.... | 06/06/25 | | dave portnoy being baked in a pizza oven | 06/06/25 | | .,,.,,.,,,.... | 06/06/25 | | dave portnoy being baked in a pizza oven | 06/06/25 |
Poast new message in this thread
Date: June 5th, 2025 11:05 PM Author: dave portnoy being baked in a pizza oven
LLMs are pretty much maxed out and we need to train AI on real-world physical experience now
current AI companies are mostly going to pivot to really dystopian commercial use cases to try to get some of their money back. romantic chat bots, targeted ads, surveillance/data selling, etc
(http://www.autoadmit.com/thread.php?thread_id=5734039&forum_id=2:#48991123) |
Date: June 6th, 2025 12:00 PM Author: Peter Andreas Thiel (🧐)
Anthropic tripled their revenue just over the last couple of months and for now they almost exclusively focus on being good at coding workflows
even if the limit of current AI is automating much of white collar knowledge work (programming, corporate law, finance, etc.) that's still a very large and fundamental shift. there's also stuff like having an interactive personal tutor which can adapt to your exact skill level/knowledge available on demand. multimodal models being the "missing piece" to make general purpose robots feasible, even if their use is very limited at first. etc.
meta has also been an absolute fucking disaster in the AI race given the amount of resources they've put into it
(http://www.autoadmit.com/thread.php?thread_id=5734039&forum_id=2:#48992167) |
Date: June 6th, 2025 12:26 PM Author: dave portnoy being baked in a pizza oven
https://www.youtube.com/watch?v=qvNCVYkHKfg
this whole interview is quite good tbh
(http://www.autoadmit.com/thread.php?thread_id=5734039&forum_id=2:#48992216) |
 |
Date: June 6th, 2025 12:42 PM Author: .,,.,,.,,,....
that isn't an interesting argument. some people were wrong in the past, but that doesn't imply different people are wrong now about a completely different paradigm. the level of generality with the current wave of progress is clearly much, much greater than in the 60s and there was nothing backing up their predictions other than overconfidence in the idea that the mind is built out of symbolic logic. transformers successfully being used for game playing, language generation, audio generation, image generation and understanding, video generative models, etc is the sort of thing you would only expect to happen in worlds where connectionism is true and notions of AI being built out of complicated, hand engineered modules are wrong.
(http://www.autoadmit.com/thread.php?thread_id=5734039&forum_id=2:#48992253) |
 |
Date: June 6th, 2025 12:53 PM Author: .,,.,,.,,,....
i think what gave them credibility is that they were right for the past several years. if you looked at GPT-2 and GPT-3 and then extrapolated from there based on likely increases in training compute, what has happened in 2022-2025 was predictable. even after GPT-3, many people were surprised by the advances in benchmark performance when they shouldn't have been. if you do the same exercise with the likely increases in training compute in 2026 and 2027 and inference scaling, it's not too hard to imagine highly competent AI agents that could substantially automate AI research.
it's interesting to note that that while certain things with easily verifiable rewards are advancing rapidly still (such as programming and math benchmark performance), other things seem to be stalling. the Claude 4 benchmarks don't see too promising for the LLM maximalist point of view but it's hard to know and we only have limited data points. the other issue is that we have very little idea what the labs are working on internally. transformers with chain of thought might soon hit a wall, but how can we be confident that someone doesn't have something else cooking that could address the remaining problems? LLMs are not necessarily synonymous with transformers trained with stochastic gradient descent.
(http://www.autoadmit.com/thread.php?thread_id=5734039&forum_id=2:#48992303) |
 |
Date: June 6th, 2025 2:03 PM Author: .,,.,,.,,,....
They have slowed down in most ways, but SWE bench/Aider type benchmarks are still rapidly improving. These are the benchmarks most closely approximating autonomous software engineer work. The models are capable of handling larger context windows (now up to 1 million tokens with 2.5 pro), can more effectively use tools and interact with large code bases, and are significantly less error prone and capable of correcting their own errors when they do occur. I think it’s very hard to predict how much 2027 (or even 2026) models can speed AI research. Part of the issue is that labs are still compute constrained, so even if the models can do useful AI research, they might not be able to leverage that effectively. I think what makes me more open to the idea this might be possible is that deep learning has been fundamentally an empirical field and not theory driven. If you totally embrace that idea, autonomous, not super smart SWE AIs trying a lot of crap and training on the results is a plausible way to significantly boost AI performance.
(http://www.autoadmit.com/thread.php?thread_id=5734039&forum_id=2:#48992456) |
Date: June 6th, 2025 3:41 PM Author: dave portnoy being baked in a pizza oven
JEPA, or Joint Embedding Predictive Architecture, is a self-supervised learning framework designed to encourage models to form internal “world models” by predicting abstract representations of future (or missing) data rather than reconstructing raw inputs. Below is an overview of how JEPA works and why it is particularly well-suited for letting AI systems learn their own latent understanding of the world.
1. Core Idea: Predicting in Latent Space
Traditional self-supervised approaches—like autoencoders or generative masked‐modeling—often try to reconstruct pixels or raw tokens, which forces the model to spend capacity on both relevant and irrelevant details (e.g., exact pixel colors). JEPA sidesteps this by having two networks:
A context encoder that processes observed parts of the input (e.g., an image with masked regions or a video clip missing certain frames) and produces a “context embedding.”
A target encoder that separately encodes the actual data (or future frames) into “target embeddings.”
The training objective is to align the context embedding with the correct target embedding (or to distinguish it from incorrect ones) in latent space, rather than to reconstruct raw pixels or tokens. By comparing embeddings directly, the model can discard unpredictable noise (e.g., lighting variations, background clutter) and focus on stable, high-level features that are useful for prediction and planning
turingpost.com
arxiv.org
.
2. Architecture Variants (I-JEPA, V-JEPA, etc.)
I-JEPA (Image JEPA): Given a single image, a large “context crop” (covering a broad spatial area) is encoded, and the model predicts embeddings of several “target crops” from that image. Target crops are often chosen at scales large enough to require understanding semantics (e.g., object identities), not trivial low-level details
ai.meta.com
.
V-JEPA (Video JEPA): Extends I-JEPA to video by having the context encoder ingest previous frames (and possibly actions), then predicting the embedding of future frames. Because it only needs to predict abstract representations, the model can choose which features of the future are predictable (e.g., object positions) and ignore the unpredictable (e.g., exact pixel noise)
linkedin.com
ai.meta.com
.
By operating in this “embedding space” rather than pixel space, JEPA-based models learn world‐model representations: latent features that capture how a scene or environment evolves over time (e.g., object motion, physical interactions) without being burdened by pixel-level reconstruction
arxiv.org
medium.com
.
3. Loss Function and Training Dynamics
JEPA typically uses a contrastive or predictive loss at the embedding level. A common choice is InfoNCE: the context embedding must be close (in representation space) to the true target embedding and far from negative samples (embeddings of unrelated patches or frames). In some variants, an exponential moving average is used to stabilize the target encoder, ensuring that the targets change more slowly than the context encoder (similar to BYOL or MoCo strategies)
arxiv.org
.
Because the model is encouraged to predict only abstracted features, it effectively learns which aspects of the environment are predictable and worth modeling. For instance, in V-JEPA, predicting where a car will be next frame is feasible, whereas predicting the precise noise pattern on its surface is not. By focusing capacity on the predictable latent variables, JEPA induces a more robust internal “world model” that can be reused for downstream tasks (classification, reinforcement learning, planning) with far fewer labeled samples
linkedin.com
arxiv.org
.
4. Why JEPA Enables Self-Formed World-Models
Abstract Prediction vs. Generative Modeling: Generative models (e.g., diffusion, autoregressive transformers) must allocate capacity to model every detail, including inherently unpredictable factors. JEPA’s abstraction means that if some aspect of the future cannot be predicted from the context (e.g., random background flicker), the model can “discard” it and focus on the stable dynamics (e.g., object trajectories)
ai.meta.com
arxiv.org
.
Efficiency & Generalization: Empirically, JEPA variants (I-JEPA, V-JEPA) show 1.5×–6× gains in sample efficiency compared to pixel-based generative pre-training, because they aren’t forced to learn noise patterns or outliers. This leads to embeddings that capture “common sense” world dynamics—e.g., gravity, object permanence—encouraging the model to form its own latent simulation or predictive engine that generalizes to new tasks with minimal adaptation
linkedin.com
medium.com
.
Scalability & Modularity: The separation between context encoder and target encoder (or predictor) means that JEPA can be stacked hierarchically. A higher-level JEPA might predict scene-level embeddings (e.g., “a red car turns right”), while a lower-level JEPA predicts pixel embeddings or optical flow. This hierarchy mirrors how humans build world models: first conceptualizing objects and actions, then filling in details
rohitbandaru.github.io
medium.com
.
5. Practical Outcomes & Extensions
Recent work has shown that JEPA-trained backbones (e.g., ViT with I-JEPA) outperform standard self-supervised baselines on tasks like object detection, depth estimation, and policy learning when used as initializations
arxiv.org
. Furthermore, extensions like seq-JEPA incorporate sequences of views plus “action embeddings,” allowing the model to learn representations that are both invariant (for classification) and equivariant (for tasks requiring precise spatial dynamics), effectively learning a richer world model in a single architecture
arxiv.org
.
In summary, JEPA’s strength lies in its ability to force the model to abstract away unpredictable noise and extract only the predictable, semantically meaningful features of its inputs. By learning to align context embeddings with the embeddings of masked or future data, the model inherently constructs an internal world model—a latent simulation of its environment—that can be leveraged for downstream reasoning, planning, and decision-making with high efficiency.
(http://www.autoadmit.com/thread.php?thread_id=5734039&forum_id=2:#48992758) |
|
|