research·May 2026·9 min read

Three architectures of agent memory — and why Trail picked Compile

Karpathy, Tan, and Liu all start from the same diagnosis — your agent is a retriever, not a thinker. They reach three different architectures: retrieve (RAG), compile (LLM Wiki), and act (Fat Skills / GBrain). Trail picked Compile, deliberately. This is what we accept and what we believe is converging.

The shared diagnosis everyone keeps re-deriving

When Andrej Karpathy posted his "LLM Wiki" gist in April 2026, it hit five thousand stars in days. When Garry Tan open-sourced GBrain a few weeks later, the same crowd was reading both. The two systems take opposite engineering directions, but they start from an identical complaint: an agent that holds a million tokens in context still re-reads its source material from scratch every session, never compounds, never learns, never connects yesterday's insight to today's question. Karpathy put it sharply — RAG rereads the same books for every exam, never actually learning the material. The Luxembourg-based finance practitioner Yanli Liu, surveying the field for AI Advances in late April, summarised the diagnosis the most cleanly: it's a retriever, not a thinker.

This is the consensus, and it has been the consensus for at least two years. What is interesting is what people are doing about it.

Liu, observing the field after both Karpathy and Tan had shipped their respective patterns, identified three distinct architectural responses. RAG is the retrieve-pattern, mature and ubiquitous. The LLM Wiki is the compile-pattern, a public artefact in April but in fact much older — Niklas Luhmann was running a paper-based instance of it from 1952 to 1998. GBrain is the act-pattern, an autonomous-skill harness built around twenty-four cron jobs and seventeen thousand interlinked pages. The three responses are not in competition. They are solving different versions of the same problem, and the question of which one to use turns out to depend on a question most teams skip: what is your agent's job?

Trail picked one of these three. We picked it deliberately. This essay is the explanation of which, and why, and what we know we are giving up.

Three patterns, three trade-offs

The retrieve-pattern is the path of least resistance and the path most production teams take. You embed your documents into vectors, store them in Pinecone or Chroma or pgvector, and when a query arrives you find the nearest chunks, inject them into the prompt, and let the model generate. The architecture is mature. The frameworks have standardised around it. You can ship a working internal-knowledge assistant in a long weekend.

What you cannot do with retrieve is build expertise that compounds. Liu walks through three failure modes — chunking, re-derivation, passivity — and the first two are structural rather than fixable with better tooling. When a thirty-page technical specification gets split into five-hundred-token fragments, the chunk that mentions a compliance requirement and the chunk that explains why the requirement exists land in different vectors. The retriever finds one and misses the other. Your agent gives a technically correct but dangerously incomplete answer. This is not a chunk-size tuning problem. It is what retrieval is.

The compile-pattern attacks this directly by moving the synthesis work from query-time to write-time. You read a new source through a language model that has already loaded the existing wiki. The model writes summary pages, updates affected entity profiles, files cross-references, flags contradictions, and lands the result back as durable markdown. Future queries do not retrieve raw chunks; they read against the compiled product. Liu's vivid framing for what happens at ingest is that a single new document touches ten to fifteen wiki pages — every one of them slightly richer than it was yesterday. This is the architecture Karpathy described and the one we built Trail around.

The act-pattern goes further still. Garry Tan's GBrain treats knowledge not as a thing to be queried but as a thing to be acted upon. Twenty-one cron jobs run in the background — scrape Hacker News every six hours, enrich newly mentioned entities daily, draft a digest every Monday morning. Each job is a "fat skill," a markdown file declaring its triggers, its tools, its write-targets, and whether it is allowed to mutate the brain. The intelligence lives in the skills, not in the harness. The agent is awake while you sleep.

What separates the three is not which one is correct — they are all correct for the problems they were designed to solve — but which problem you have. Retrieve handles two hundred thousand documents that change daily; it does not handle compounding expertise. Compile handles a thousand sources of deep domain material; it does not handle a million Confluence pages, and it does not act on what it knows. Act handles autonomous workflows for a single power user; it requires months of engineering investment and a level of system understanding that most knowledge workers do not have.

Why Trail picked Compile

The argument for compile, for our specific tenants and our specific positioning, runs as follows.

The problems Trail addresses are not search problems. They are accumulation problems. An acupuncturist with twenty-five years of clinical material wants the system to be smarter about her practice on her hundredth question than it was on her first. A solo founder building a knowledge base of customer interviews wants the trail of Neurons to compound into something denser than the sum of the calls. A research consultancy ingesting two hundred regulatory filings wants the third filing to update what was learned from the first two, not replace it.

In all three cases the ceiling is set by the depth of cross-reference, not by the breadth of the corpus. Most of these tenants will never have ten thousand source documents, let alone two hundred thousand. They will have a few hundred to a few thousand, each one carefully chosen, and they will return to the same KB for years. This is the regime where the compile-pattern wins decisively. It is also the regime where retrieve-only RAG quietly under-performs: it scales effortlessly, but the agent never gets smarter. The hundredth query is no better than the first, because nothing was ever consolidated.

Our trade-off, accepted explicitly, is that we are not trying to be the right tool for the half-million-page Confluence migration. That is a retrieve problem. If you have it, run RAG. We are trying to be the right tool for the two-hundred-source domain corpus where every source matters and every connection between them is valuable. There is a real frontier somewhere around five thousand Neurons per knowledge base where pure markdown navigation begins to creak; we believe we can push that ceiling considerably with FTS5 today and a vector retrieval layer when the data demands it. Liu makes the same point — at one hundred thousand sources, the LLM Wiki pattern is unusable without adding a retrieval layer on top, which then starts to look like RAG again. We agree. The convergence is real, and Trail's design assumes it.

Our other trade-off, also accepted explicitly, is that the compile-pattern is passive. The system knows things. It does not, today, do things. It will not silently flag that a new clinical paper contradicts a standing protocol. It will not draft a Monday digest of last week's ingested material on its own. We have a planned feature for this — internally we call it Trail Routines, user-authored cron-fired workflows that read against the KB and emit candidates back into the curation queue — but it is a planned feature, not a shipped one. We are deferring it because we believe the foundation has to be rock-solid for one paying tenant before the action layer becomes useful rather than dangerous.

What stays the same when scale changes

The version of this argument that goes wrong is the tribal one. Compile is the right architecture, retrieve is dead, act is the future. That is not what we believe. Liu's closing thesis, which we agree with, is that the three patterns are converging. A production-grade knowledge layer in 2026 will eventually combine all three. Retrieval at the bottom for handling scale that the corpus eventually reaches. Compile in the middle for the synthesis that retrieval cannot produce. Action at the top for the workflows that operate on the compiled material. Trail's design assumes this convergence and is deliberately built as the middle layer.

What this means in practice is that Trail does not try to be the agent platform. We try to be the persistent compiled memory the agent platform reads from. Our KBs are exposed to MCP clients as a first-class read interface — point any agent at the Neuron trail, and the trail becomes that agent's institutional memory. A future GBrain-style harness running on top of Trail would invoke our compiled Neurons rather than re-deriving the same understanding on every query. The compile cost has already been paid, once, at ingest. The agent reads against that work for its remaining lifetime.

The same is true at the retrieval boundary. As individual KBs grow large enough to start straining markdown-and-FTS5 navigation, we will add a vector layer that backs the same Neurons. Retrieval becomes a performance optimisation under the existing compile model rather than a replacement for it. The agent still reads compiled Neurons; the retrieval layer just helps narrow the candidate set first.

This is the bet. The three architectures look like rivals in 2026 because they were proposed in isolation — Karpathy's gist, Tan's repo, the established RAG ecosystem — and the discourse around them has framed the choice as a zero-sum one. We think the dichotomy is temporary. The same way databases evolved from "pick SQL or NoSQL" into hybrid stores that handle both, agent memory will evolve from "pick retrieve, compile, or act" into a layered stack where each layer does one thing well. Trail's choice is to be the durable, governed, schema-bound layer in the middle that the others read against and write back to.

The decision framework, restated

The question Liu asks at the end of her piece is the right one for any team thinking about this. What is your agent's job? If the job is finding answers in a large corpus that changes daily, run RAG and pair it with a reranker; you will cover most of what enterprise knowledge assistants are asked to do. If the job is operating autonomously on workflows that recur, budget months for engineering and build the skill layer carefully; the payoff is an agent that does, not just responds. If the job is in between — building expertise that should compound over time, where the value is in the connections between sources rather than in any single document — that is the regime where the compile-pattern earns its keep, and that is where Trail lives.

The teams that get this wrong tend to default to RAG because it is easiest to ship and then notice, six months in, that their agent never got smarter. That is not a tooling problem. It is a category mismatch. Retrieve-shaped problems are fine; compile-shaped problems are different. Naming the difference up front is the cheapest piece of architecture work you can do.

We picked Compile. We picked it for a class of users we know how to serve. We accept the ceilings, and we have a plan for them. We do not believe Compile wins everywhere, and we do not believe RAG is dead. We believe each architecture is the right answer to a different question, and we believe knowing which question your agent is actually trying to answer matters more than which architecture is currently fashionable.

Start with the job. The architecture follows.