Technical Notes
Technical Notes
Memory is not a search problem
Memory is not a search problem

Thrindex
Ask most teams how their agent's memory works and the answer is some version of the same sentence: we embed the text, store the vectors, and run a similarity search at query time. That sentence describes a search engine. It does not describe a memory system, and the difference is the entire problem.
Search and memory look alike from the outside. Both take a query and return stored items. But they answer different questions. A search engine answers what did I store that resembles this? A memory system has to answer what is true right now that I should act on? The first is a geometry problem — distance between vectors. The second is not a geometry problem at all.
Where the work actually is
It helps to separate a memory system into two paths, because they have opposite shapes.
The read path is what runs while the agent waits. It must be fast — tens of milliseconds, not hundreds — because every millisecond here is latency the user feels. The temptation is to make it smart: call a model, reason about the query, traverse a graph of related facts. Every one of those choices is a mistake. Anything expensive on the read path is latency you cannot get back.
The write path is what runs when a new memory arrives. The agent is not waiting on it. This is where every expensive, intelligent operation belongs — and there are more of them than people expect:
Operation | Question it answers | Why it cannot live on the read path |
|---|---|---|
Deduplication | Have we already stored this? | Comparing against the corpus is too slow to do per query |
Importance scoring | How much should this memory matter? | Needs signals that take time to compute |
Compression | What is the short version of this? | Summarizing is expensive; do it once, not per read |
Conflict resolution | Does this contradict something we know? | Requires looking up and reasoning over related memories |
The pattern is the same across all four rows. Each is a piece of cognition — judgment about the memory, not just storage of it. None of it can happen while an agent is waiting for an answer. So it happens before: continuously, in the background, every time a memory comes in. By the time a query arrives, the thinking is already done. The read path only has to look up the result.
This is the load-bearing idea, and it is worth stating plainly: a fast memory system is not fast because retrieval is optimized. It is fast because all the slow work was moved off the read path entirely.
WRITE PATH (agent is not waiting) READ PATH (agent is waiting) ──────────────────────────────── ──────────────────────────── new memory query │ │ ▼ ▼ dedup → score → compress → resolve look up pre-computed result │ (cognition, runs in background) │ (no model calls, no graph walk) ▼ ▼ durable store ───────────────────────▶ ranked answer (the slow, smart work lands here) target: tens of ms
WRITE PATH (agent is not waiting) READ PATH (agent is waiting) ──────────────────────────────── ──────────────────────────── new memory query │ │ ▼ ▼ dedup → score → compress → resolve look up pre-computed result │ (cognition, runs in background) │ (no model calls, no graph walk) ▼ ▼ durable store ───────────────────────▶ ranked answer (the slow, smart work lands here) target: tens of ms
What this changes
Once memory is framed this way, a lot of design questions answer themselves.
You stop asking which vector database is fastest and start asking what cognition runs on the write path, because that is what determines whether the answers are any good. A faster vector search returns the wrong memory faster. It does not return a better one.
You stop treating the store as a passive bucket and start treating it as something that is always working — comparing, scoring, compressing, resolving — even when no agent is talking to it. The memory is being maintained between queries, not just during them.
And you stop measuring quality by similarity alone. Similarity tells you a returned memory is related to the query. It does not tell you the memory is current, or important, or relevant to the task the agent is actually doing. Those are separate signals, and a memory system worth the name has to account for all of them.
Retrieval is the easy half. It is well understood, and the tools for it are mature and commoditized. The hard half — the half that decides whether an agent can be trusted — is everything that happens to a memory before anyone asks for it. That is not a search problem. That is the problem.
Ask most teams how their agent's memory works and the answer is some version of the same sentence: we embed the text, store the vectors, and run a similarity search at query time. That sentence describes a search engine. It does not describe a memory system, and the difference is the entire problem.
Search and memory look alike from the outside. Both take a query and return stored items. But they answer different questions. A search engine answers what did I store that resembles this? A memory system has to answer what is true right now that I should act on? The first is a geometry problem — distance between vectors. The second is not a geometry problem at all.
Where the work actually is
It helps to separate a memory system into two paths, because they have opposite shapes.
The read path is what runs while the agent waits. It must be fast — tens of milliseconds, not hundreds — because every millisecond here is latency the user feels. The temptation is to make it smart: call a model, reason about the query, traverse a graph of related facts. Every one of those choices is a mistake. Anything expensive on the read path is latency you cannot get back.
The write path is what runs when a new memory arrives. The agent is not waiting on it. This is where every expensive, intelligent operation belongs — and there are more of them than people expect:
Operation | Question it answers | Why it cannot live on the read path |
|---|---|---|
Deduplication | Have we already stored this? | Comparing against the corpus is too slow to do per query |
Importance scoring | How much should this memory matter? | Needs signals that take time to compute |
Compression | What is the short version of this? | Summarizing is expensive; do it once, not per read |
Conflict resolution | Does this contradict something we know? | Requires looking up and reasoning over related memories |
The pattern is the same across all four rows. Each is a piece of cognition — judgment about the memory, not just storage of it. None of it can happen while an agent is waiting for an answer. So it happens before: continuously, in the background, every time a memory comes in. By the time a query arrives, the thinking is already done. The read path only has to look up the result.
This is the load-bearing idea, and it is worth stating plainly: a fast memory system is not fast because retrieval is optimized. It is fast because all the slow work was moved off the read path entirely.
WRITE PATH (agent is not waiting) READ PATH (agent is waiting) ──────────────────────────────── ──────────────────────────── new memory query │ │ ▼ ▼ dedup → score → compress → resolve look up pre-computed result │ (cognition, runs in background) │ (no model calls, no graph walk) ▼ ▼ durable store ───────────────────────▶ ranked answer (the slow, smart work lands here) target: tens of ms
What this changes
Once memory is framed this way, a lot of design questions answer themselves.
You stop asking which vector database is fastest and start asking what cognition runs on the write path, because that is what determines whether the answers are any good. A faster vector search returns the wrong memory faster. It does not return a better one.
You stop treating the store as a passive bucket and start treating it as something that is always working — comparing, scoring, compressing, resolving — even when no agent is talking to it. The memory is being maintained between queries, not just during them.
And you stop measuring quality by similarity alone. Similarity tells you a returned memory is related to the query. It does not tell you the memory is current, or important, or relevant to the task the agent is actually doing. Those are separate signals, and a memory system worth the name has to account for all of them.
Retrieval is the easy half. It is well understood, and the tools for it are mature and commoditized. The hard half — the half that decides whether an agent can be trusted — is everything that happens to a memory before anyone asks for it. That is not a search problem. That is the problem.


Read more articles
Read more articles

Business insurance myths that could put your company at risk
Don't let misconceptions leave your business vulnerable. We debunk the most dangerous about commercial property insurance.
Business insurance

How weather patterns are changing property insurance (what you need to know)
Climate change is affecting coverage and costs. Discover how changing weather impacts your property insurance and stay protected.
Industry insights

What to do in the first 24 hours after property damage
Quick action after damage can save you money and stress. Follow this step-by-step checklist to protect your property.
Claims advice

Business insurance myths that could put your company at risk
Don't let misconceptions leave your business vulnerable. We debunk the most dangerous about commercial property insurance.
Business insurance

How weather patterns are changing property insurance (what you need to know)
Climate change is affecting coverage and costs. Discover how changing weather impacts your property insurance and stay protected.
Industry insights
GET STARTED
Let's find your perfect coverage
Tell us about your property and we'll create a custom insurance plan just for you in less than 5 minutes.
GET STARTED
Let's find your perfect coverage
Tell us about your property and we'll create a custom insurance plan just for you in less than 5 minutes.
GET STARTED
Let's find your perfect coverage
Tell us about your property and we'll create a custom insurance plan just for you in less than 5 minutes.