My Articles

2025
Why Do LLMs Get Smarter as They Get Bigger? The Hidden Geometry of “Superposition”
Read Article
2025
Beyond Diffusion: How Visual Autoregressive Modeling (VAR) Rewrites the Rules of Image Generation
Read Article
2025
Breaking the Transformer Bottleneck: Gating for Scaled Non-Linearity and “Attention-Sink-Free” LLMs
Read Article
2025
Beyond RAG: Why CLaRa’s “Latent Reasoning” Changes the Game for Retrieval-Augmented Generation
Read Article
2025
KAN: The Shift from Fixed Nodes to Learnable Edges
Read Article
2025
The Magic Number is 3.6: Measuring the True Capacity of Large Language Models
Read Article
2025
Beyond the Click: Inside the Architecture of SAM 3 and Promptable Concept Segmentation
Read Article
2025
We’ve Been Overcomplicating Diffusion: Why “Just Image Transformers” Are All You Need
Read Article
2025
Nested Learning: A Technical Reflection on a New Way to Think About Deep Learning
Read Article
2025
The Transformer Paradigm: Rethinking Sequence Modeling Through Attention
Read Article
2025
Inside PaliGemma: Building an Open, Transferable 3B Vision-Language Model
Read Article
2025
Less Is More: My Deep Dive into Recursive Reasoning with Tiny Networks
Read Article

Why Do LLMs Get Smarter as They Get Bigger? The Hidden Geometry of “Superposition”