The Brain-AI Gap

Thought Experiments with Kush • January 03, 2026 • Solo Episode

View Original Episode

Guests

No guests identified for this episode.

Description

This article illustrates how artificial intelligence’s path to general powerful intelligence will require architectural changes rather than continued scaling. We’re not facing a temporary bottleneck but a fundamental mismatch between transformer architectures and biological intelligence. Recent research reveals two critical gaps: 1) Biological neurons are vastly more complex than artificial counterparts- each human neuron contains 3-5 computational subunits capable of sophisticated nonlinear processing. 2) Brains use fundamentally different learning mechanisms than transformers, leveraging localized, timing-based learning without requiring backward passes. This isn’t a technical limitation. It’s a design gap that scaling alone can’t bridge. The path forward requires architectures that mimic how brains process information. Let’s examine the evidence in concrete terms. Why the Scaling Hypothesis is Fundamentally Flawed Industry leaders now acknowledge this reality. Microsoft CEO Satya Nadella admitted at Microsoft Ignite in late 2024: “there is a lot of debate on whether we’ve hit the wall with scaling laws... these are not physical laws. They’re just empirical observations.” Similarly, OpenAI co-founder Ilya Sutskever told Reuters that “everyone is looking for the next thing” to scale AI models, while industry reports confirm OpenAI’s Orion model showed diminishing returns compared to previous generation leaps. What’s happening here is not a temporary slowdown. It’s a fundamental limit that reveals how our current approach misunderstands intelligence itself. Consider the ARC benchmark developed by François Chollet - this tests genuine abstraction , not just memorization. The best AI systems achieve only 15% on this task, while humans score 80%. This isn’t about “slower computers” - it’s about architecture that can’t replicate human reasoning. The deeper truth: Bringing the brain into AI isn’t about “scaling” but about recognizing that intelligence emerges from biological mechanisms that transformers ignore entirely. When you consider how the brain processes information, it becomes clear: we’ve been building systems that process text - not intelligence. How does this gap manifest in practical terms? How Brains Outperform AI: Concrete Evidence Biological neurons aren’t simple switches - they’re sophisticated computational engines. Artificial neurons use weighted sums followed by nonlinear activation - simplifying the McCulloch-Pitts model from 1943. But human neurons use dendritic trees as independent processors . Each neuron contains 3-5 computational subunits that detect patterns like XOR - tasks once thought impossible for single neurons. Consider a real-world example: When you flip a coin, it seems random. But if you slow it down, you see the physics: air resistance, gravity, and even the coin’s microscopic imperfections affect outcomes. Similarly, biological neurons detect patterns through subcellular mechanisms - no “black box” needed. Why this matters: Human brains operate on 12-20 watts - about the same as a light bulb - while training GPT-4 required energy equivalent to powering 1,000 homes for five to six years. This 200-million-fold efficiency gap stems from biology’s “local processing” approach: no global error signals, only millisecond-scale learning. Think about city navigation: You don’t process every light and street sign at once - you focus on what’s relevant to your current path. Similarly, the brain uses sparse coding where only 5-10% of neurons activate at any moment. This creates an energy-efficient system that processes information without overload. Another concrete illustration: Imagine identifying a cat. You don’t process every hair individually - you recognize the shape, size, and movement patterns. Your brain’s visual system filters out irrelevant details through hierarchical processing. This isn’t “faster” processing - it’s selective information handling that brains do through local computation. The Core Limitations of Transformer Architecture The scaling hypothesis is crumbling. Here’s why: * Transformers use global error signals (backward passes) to update weights. * Brains use local learning rules (e.g., spike-timing-dependent plasticity) that require no global gradients. The real problem isn’t size - it’s architecture. Even if you build a 100-billion-neuron transformer, it won’t match the brain’s computational density. Why? Because brains use: * Dendritic computation (100+ effective units per neuron) * Glial cells that actively process information (not just support neurons) * Neuromodulators like dopamine to control learning rates This is more than theoretical. Consider the 2024 Nature study showing that dopamine and serotonin work in opposition during reward learning: dopamine increases with reward while serotonin decreases, and blocking serotonin alone actually enhanced learning. This three-factor learning rule (pre × post × neuromodulator) allows the same spike timing to produce different outcomes based on behavioral relevance - enabling what neuroscientists call “gated plasticity.” The computational gap: While a transformer model processes information sequentially across billions of parameters, biological systems achieve similar results through localized learning. When you see a car approaching, your brain doesn’t process each pixel individually - instead, it quickly identifies the vehicle through hierarchical processing that prioritizes relevant features. Consider another example: Imagine solving a puzzle. A transformer might look at every piece individually - but brains focus on patterns and relationships. The brain uses “gated plasticity” to strengthen connections only when relevant - no global gradient calculations needed. Let’s examine a specific case: When learning a new language, humans don’t memorize every word - instead, they detect patterns through contextual learning. Similarly, the brain uses neuromodulators to adjust learning rates based on attention and relevance. This isn’t “better memory” - it’s adaptive learning that transformers cannot replicate. Why Scaling Isn’t the Answer The industry is recognizing this shift. Reports show that OpenAI’s Orion model showed diminishing returns compared to previous generation leaps. Microsoft has pivoted toward “test-time compute” methods, allowing models more time to reason at inference. This acknowledges implicitly that raw pattern matching cannot substitute for deliberate reasoning. The evidence is clear: * The ARC benchmark tests genuine abstraction: tasks require inferring novel transformation rules from just a few examples, as humans easily do. Human performance reaches approximately 80%; the best AI systems achieve only 31% using non-LLM approaches, with LLM approaches scoring around 15%. * Compositional reasoning reveals especially severe limitations. A 2024 study of transformers trained from scratch found 62.88% of novel compounds failed consistent translation, even when models had learned all component parts. * Hallucination appears to be an inescapable feature rather than a fixable bug. Xu et al. (2024) proved formally that hallucination cannot be eliminated in LLMs used as general problem solvers - a consequence of the computability-theoretic fact that LLMs cannot learn all computable functions. The industry response is shifting. By late 2024, leaders who built their careers on scaling began hedging. Marc Andreessen reported that current models are “sort of hitting the same ceiling on capabilities.” OpenAI’s o1 models represent this pivot, performing explicit chain-of-thought reasoning that can be extended at test time. This acknowledges implicitly that raw pattern matching cannot substitute for deliberate reasoning. Academic analysis questions whether the scaling hypothesis is even falsifiable.A 2024 paper from Pittsburgh’s philosophy of science community argues it “yields an impoverished framework” due to reliance on unpredictable “emergent abilities,” sensitivity to metric choice, and lack of construct validity when applying human intelligence tests to language models. The strong claim that intelligence emerges automatically from scale remains unproven and increasingly challenged. A deeper exploration of the scaling paradox: If intelligence truly emerged from scaling, we’d see consistent improvements with more parameters. But we don’t. Even with 1.3 trillion parameters in GPT-4, performance plateaus at around 80% on composition tasks. This isn’t an engineering problem - it’s a fundamental mismatch between how we model intelligence and how intelligence actually works. The real question: What if intelligence isn’t about pattern recognition but about biological computation ? That’s the insight we’re missing in our scaling approach. How to Fix AI Without Scaling The path forward isn’t bigger models - it’s smarter designs. Build event-driven systems Instead of processing all data simultaneously (like transformers), mimic the brain’s “sparse coding” where only 5-10% of neurons activate at any moment. Intel’s Loihi 2 chip already does this, using 1 million neurons at 1 watt. Use neuromorphic hardware: IBM’s NorthPole chip achieves 22x faster inference than GPUs while using 25x less energy. It’s not just better hardware - it’s biologically inspired architecture . Prioritize local learning: Backpropagation requires global error signals. Brains use local plasticity - no backward passes needed. This avoids the weight transport problem and non-local credit assignment that plagues transformers. Real-world impact: * World models like V-JEPA 2 enable robots to grasp objects without training (Meta, 2025). * AlphaGeometry combines neural + symbolic reasoning to solve math problems - proving hybrid approaches work better than pure scaling. Let’s examine a practical application: Consider surgical decision support on Loihi 2. It achieves 94% energy reduction versus GPUs while maintaining sub-50ms response times - critical for life-saving interventions. This isn’t just “better efficiency” - it’s biologically inspired architecture that replicates what brains do naturally. Another concrete example: IBM’s NorthPole chip achieves 22x faster inference than GPUs on vision tasks while using 25x less energy. For a surgical robot, this translates to faster decision times - potentially saving lives in emergency situations. The key is architectural change, not scale. Consider how the brain handles visual processing: it doesn’t process every pixel in detail - it extracts essential features through hierarchical processing. Similarly, transformers process inputs as tokens without considering spatial relationships. Let’s explore a specific implementation: The Hala Point system - announced April 2024 - deploys 1,152 Loihi 2 processors containing 1.15 billion neurons and 128 billion synapses while consuming maximum 2,600 watts. This isn’t “scaling” - it’s biologically inspired architecture that replicates what brains do naturally. The path forward requires multiple innovations working together: * Event-driven computation for efficiency * Compositional rigor of symbolic reasoning * Predictive power of world models * The flexibility of neural pattern recognition * Developmental self-organization The next breakthrough in AI may come not from training a larger transformer, but from architectures that learn more like brains actually do. Counterarguments: Why Scaling Might Still Work A reasonable objection is that scaling might eventually work.After all, models like GPT-4 show remarkable capabilities. But this overlooks the fundamental difference between what these systems do and how brains process information. The strongest version of this view holds that: * Transformers can eventually overcome current limitations. * The brain’s mechanisms aren’t yet understood well enough to replicate. Here’s the response: These objections often stem from an overestimation of transformer capabilities and underestimate of biological complexity. The brain’s mechanisms - like spike-timing-dependent plasticity - don’t require global error signals but instead use millisecond-precise timing to detect causal relationships. This is fundamentally different from transformer architectures that process static inputs. The evidence is clear. Neuromorphic hardware approaches brain-like efficiency while scaling to billion-neuron systems. These systems achieve 47x more efficient spectrogram encoding from audio and 90x computation reduction in optical flow compared to conventional deep learning. Surgical decision support on Loihi 2 showed 94% energy reduction versus GPUs with sub-50ms response times. Why scaling won’t solve it: The ARC benchmark proves that composition tasks require understanding relationships - not just memorization. Humans solve these because we understand how things work together. Transformers lack this because they can’t replicate the brain’s “gated plasticity” mechanisms. Let’s examine the practical implications: Consider a robot trying to grasp a cup. A transformer might recognize the cup’s shape from thousands of training examples - but it won’t understand how to manipulate it in real-time. The brain, however, learns through sensorimotor interaction and context - exactly what the V-JEPA 2 system demonstrates. This is more than theoretical. The 2024 study showing dopamine and serotonin work in opposition during reward learning - where blocking serotonin alone enhanced learning - demonstrates that biological systems operate through mechanisms that transformers simply can’t replicate. Why This Isn’t About “Smarter” AI Bringing the brain into AI isn’t about replacing transformers. It’s about: * Energy efficiency: Brains use 12-20 watts vs. 50,000+ watts for AI training (GPT-4). * Developmental plasticity: Humans learn through critical periods - AI lacks this. * Embodied understanding: Robots learn by doing (V-JEPA 2) rather than processing static text. The biggest mistake? Assuming intelligence emerges from “scaling.” It doesn’t. The brain’s architecture - dendritic computation, glial cells, neuromodulators - creates intelligence at the systems level . Scaling transformers won’t replicate this. Consider another concrete example. Imagine a child learning to ride a bike. They don’t just memorize instructions - they develop skills through hands-on experience. Similarly, biological intelligence emerges from sensorimotor interaction with the environment, not static datasets. This isn’t about “AI being too small.” It’s about biological intelligence operating through mechanisms we’ve ignored . Scaling transformers won’t fix this. The path forward requires architectures that mimic how brains process information. Let’s examine the developmental aspect: Critical periods in human learning - such as language acquisition - require specific environmental input during windows of opportunity. AI lacks this because it can’t develop through interaction. The human brain’s capacity for embodied learning is a fundamental difference that transformers simply can’t replicate. Another example: The visual system’s critical period for ocular dominance is well-studied. Deprivation during this window produces permanent deficits. Language acquisition shows similar constraints, with second language learning after puberty becoming “conscious and labored.” These aren’t just human traits - they’re biological mechanisms that transformers ignore. The implications for AI: If we build AI based on transformers, we’ll never achieve the embodied intelligence that humans naturally develop through experience. This isn’t a technical limitation - it’s a design gap that scaling alone can’t bridge. The Brain’s Architecture Biological neurons aren’t simple switches - they’re sophisticated computational engines. Each neuron contains 3-5 independent computational subunits within its dendritic tree, with different branches exhibiting distinct integration rules. Proximal inputs sum linearly while distal inputs are amplified with high gain. This creates a system where a single neuron can detect complex patterns like XOR - something artificial neurons can’t do. Let’s examine dendritic computation in detail: When you flip a coin, it seems random. But if you slow it down, you see the physics: air resistance, gravity, and even the coin’s microscopic imperfections affect outcomes. Similarly, biological neurons detect patterns through subcellular mechanisms - no “black box” needed. The brain’s 86 billion neurons thus contain hundreds of billions of effective computational units. This isn’t just “more processing power” - it’s parallel computation that works in ways transformers simply can’t replicate. Consider another example: The human brain uses spiking neurons to detect patterns through timing. When you see a car approaching, your brain doesn’t process every pixel individually - it quickly identifies the vehicle through hierarchical processing that prioritizes relevant features. This isn’t “faster” processing - it’s selective information handling that brains do through local computation. The role of glial cells: For decades, the brain’s non-neuronal cells were dismissed as mere support infrastructure. This view is now obsolete. Astrocytes, which comprise roughly 20% of brain cells, contact up to one million synapses each in the hippocampus. They exhibit calcium-based excitability operating on seconds-to-minutes timescales - a “slow computation” channel complementing neurons’ millisecond-scale processing. The “tripartite synapse” concept: Introduced by Araque et al. in 1998, this recognizes that synaptic transmission involves not two parties but three: presynaptic neuron, postsynaptic neuron, and astrocytic process. Astrocytes release neuroactive substances including glutamate, D-serine, and ATP that modulate synaptic transmission. IBM researchers demonstrated neuron-astrocyte networks achieve the best-known scaling for memory capacity in any biological dense associative memory implementation. This isn’t just “better memory” - it’s biologically inspired architecture that replicates what brains do naturally. Microglia and neural pruning: Traditionally viewed as immune cells, microglia sculpt neural circuits through complement-dependent synaptic pruning. Wang et al. (2020) found that microglial depletion after learning extended memory retention, implicating these cells in adaptive forgetting. The efficiency gap: The human brain operates on approximately 12-20 watts - roughly the power of a dim light bulb - while processing information across 100 billion neurons. Training GPT-4 consumed an estimated 51,773-62,319 megawatt-hours, equivalent to powering 1,000 US homes for five to six years. A single GPT-4o query requires 0.3-0.42 watt-hours; with ChatGPT serving roughly one billion queries daily, inference alone demands continuous power equivalent to a small power plant. The 200-million-fold efficiency gap stems from fundamental architectural differences.Biological brains achieve efficiency through sparse coding (only 5-10% of neurons fire at any moment), event-driven computation (no processing when nothing changes), co-located memory and computation (eliminating the von Neumann bottleneck), and local learning rules (no global gradient computation). Neuromorphic hardware: Intel’s Loihi 2 chip supports 1 million neurons and 120 million synapses at approximately one watt, while the Hala Point system scales to 1.15 billion neurons. In April 2025, researchers demonstrated the first large language model running on neuromorphic hardware at ICLR, suggesting these architectures may eventually support sophisticated language processing. Benchmark results: Neuromorphic systems achieve 47x more efficient spectrogram encoding from audio and 90x computation reduction in optical flow compared to conventional deep learning. Surgical decision support on Loihi 2 showed 94% energy reduction versus GPUs with sub-50ms response times. The neuromorphic ecosystem is expanding: SynSense’s Speck chip operates at 0.7 milliwatts for real-time visual processing. BrainScaleS-2 at Heidelberg University provides analog neuromorphic computing at 1,000-10,000x biological time acceleration for research applications. SpiNNcloud partnered with Sandia National Labs in May 2024 for national defense applications, signaling growing military interest. Conclusion: Architecture Matters as Much as Scale The evidence assembled here challenges the assumption that general intelligence will emerge from scaling current architectures. Biological brains achieve their capabilities through mechanisms fundamentally different from transformers: dendritic computation multiplies effective neuron count, glial cells participate actively in information processing, local learning rules eliminate the need for global gradient computation, and neuromodulators provide context-dependent control over plasticity. The 200-million-fold energy efficiency gap between brains and AI suggests these differences are not cosmetic but fundamental. Alternative architectures are maturing rapidly. State space models offer linear-time sequence processing competitive with transformers. World models enable sample-efficient learning and planning from imagined experience. Neuromorphic hardware approaches brain-like efficiency while scaling to billion-neuron systems. Neurosymbolic integration achieves breakthroughs on mathematical reasoning that pure neural approaches cannot match. Each addresses limitations inherent to transformer architecture rather than simply scaling it further. The path forward likely requires multiple innovations working together: the efficiency of event-driven computation, the compositional rigor of symbolic reasoning, the predictive power of world models, the flexibility of neural pattern recognition, and the developmental self-organization that shapes biological intelligence. The next breakthrough in AI may come not from training a larger transformer, but from architectures that learn more like brains actually do. The ultimate truth:We’ve been building systems that process text - not intelligence. The brain’s architecture creates intelligence at the systems level . Scaling transformers won’t replicate this. The path forward requires architectures that mimic how brains process information. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit thekush.substack.com

Audio