Mark Lubin

← Back

Toward Emergent Memory Architecture for AI Systems

Current LLM + memory augmentation approaches face a fundamental architectural limitation: they attempt to add memory to a frozen system that cannot truly learn. This is analogous to giving someone with amnesia an excellent notebook — helpful, but not equivalent to actual memory.


Biological Analogy

Two distinct biological learning systems are worth distinguishing:

System Substrate Timescale Function
Phylogenetic DNA/Genome Generations Constructs the brain itself
Ontogenetic Synapses Lifetime Learns from experience

Critically, one constructs the other. DNA doesn't store memories — it encodes the machinery capable of forming memories.


Mapping to Current AI

Biological AI Analog Modifiable?
DNA Architecture definition + training code No
Development Training run One-time
Body Trained model (weights) No (frozen)
Brain Missing N/A

The “brain” — a system with genuine plasticity that continues learning after deployment — has no analog in current LLM systems. Memory augmentation is prosthetic, not neural.


The Selective Pressure Gap

In biology, selective pressure operates on individuals, but results propagate back to architecture through reproduction. In LLMs:

Existing feedback mechanisms (RLHF, fine-tuning) are offline, aggregated, and human-mediated — not continuous selective pressure.


Proposed Architecture: Dual-Component System

┌─────────────────────────────────────┐
│  Online Learning Controller         │  ← Has plasticity, actually learns
│  (planning, memory, tool use)       │
└──────────────┬──────────────────────┘
               │ orchestrates
               ▼
┌─────────────────────────────────────┐
│  Frozen LLM Scaffold                │  ← Reasoning/linguistic capacity
└─────────────────────────────────────┘

The controller learns through RLHF-style feedback while the LLM provides stable reasoning infrastructure.


The Encoding Problem

Even with a learned controller over memory primitives, we still prespecify the primitives themselves (put, get, search, compress) and the data encoding (how information is represented).

Three approaches to learning memory systems

1. Program Synthesis (Search-Based)

2. Policy Over Primitives (Gradient-Based)

3. Selective Pressure on Encoding (Evolutionary)


The Entanglement Problem

Neural networks entangle operations and content in the same weights. Unlike classical computers (code separate from data), you cannot extract “the algorithm the network learned” independent of training distribution. This connects to interpretability research — finding circuits, features, and causal structure.


Proposed Hybrid: Two-Timescale Learning

Fast timescale (gradient-based):
  Controller policy adapts to current encoding
  Individual learning, within-lifetime

Slow timescale (evolutionary):
  Encoding schemes compete in population
  Winners reproduce, losers die
  Structure emerges from selection

This mirrors biology:

Requirements for true selective pressure on encoding

  1. Population — many encoding variants
  2. Variation — mutations in encoding space
  3. Selection — task performance determines fitness
  4. Reproduction — good encodings proliferate

This is evolutionary computation applied to the encoding scheme itself — NAS but for memory encoding, running continuously with real task feedback as fitness.


Key Insight

Memory, learning, compression, and attention are biologically unified but architecturally separated in current AI. Memory augmentation approaches may be ceiling-limited — useful for current practical applications, but insufficient for true adaptive intelligence.

The deeper problem isn't “how do we give LLMs better memory” but “how do we give LLMs a brain at all” — a component with genuine plasticity where individual experience feeds back into architecture through selective pressure.


Open Questions

  1. Can encoding search be made gradient-friendly (DARTS-style) without bounding the search space?
  2. What's the minimal viable “brain” that provides plasticity without catastrophic forgetting?
  3. How do you create continuous selective pressure on encoding without evolutionary timescales?
  4. Is there a way to separate “algorithm learned” from “content learned” in neural networks?

Related Research