Why we shipped Dreaming for AI memory

Once agents work across sessions, memory cannot just accumulate. It needs to be curated, cleaned, corrected, and made usable again. That is the admission Anthropic made explicit in Managed Agents Dreams — and it is the right framing. Multi-session memory is not a storage problem. It is a maintenance problem.

Anthropic has now made that idea explicit in Claude Managed Agents. Their Dreams feature lets Claude reflect on past sessions, deduplicate and reorganize memory, surface new insights, and produce a new memory store for review. That is a useful direction for the whole agent ecosystem. It says that memory consolidation is becoming a primitive, not a nice-to-have.

MemexAI Dreaming comes from the same broad observation, but from a different product stance. We are not building a hosted agent runtime. We are building a memory layer that product teams can own: Postgres backed, inspectable, model-agnostic, and wired into the app where durable memory already has governance, consent, and operational consequences.

1-100sessions

Anthropic Dreams can take a memory store plus 1 to 100 past sessions as input during the research preview.

97%fewer first-pass errors

Anthropic reports Rakuten used Managed Agents memory to reduce first-pass errors within workspace-scoped boundaries.

30%faster verification

Anthropic reports Wisedocs sped document verification by remembering recurring document issues across sessions.

~6xcompletion-rate lift

Anthropic reports Harvey saw a roughly 6x completion-rate improvement in tests using Managed Agents with dreaming.

Figures above are Anthropic-reported customer and internal results. They are evidence that the category matters — not MemexAI benchmark claims.

The research thread

This idea has been circling in agent research for a while. Generative Agents used reflection to synthesize higher-level memories from lower-level observations, making agents more coherent over time. Reflexion showed that language agents can improve by writing verbal feedback about prior attempts. MemGPT framed memory as an operating-systems problem: the context window is limited, so agents need explicit memory tiers and movement between them.

The common thread is not that agents should remember everything. It is that raw experience is too messy to reuse directly. Something has to decide what matters, what changed, what conflicts, and what should be carried forward.

Agent writes local memory during work

Memory accumulates duplicates, fragments, and corrections

Background consolidation runs after quiet time

Durable memory becomes easier to trust next session

Multi-session memory is not a pile of old text. It is a maintained working set that future agent sessions rely on.

What Anthropic got right

Anthropic's Managed Agents memory is file-backed, permissioned, auditable, and programmatically controllable. That matters. Agents are already good at using files during serious work, so filesystem-like memory gives the model a familiar control surface instead of forcing every memory operation through opaque retrieval.

Their Dreams design is also explicit about the maintenance problem. Managed agents write local, incremental memories as they work. Across sessions, that creates duplicates, contradictions, and stale entries. A dream reads a memory store and past sessions, then produces a reorganized output store. The input store is not modified, so the result can be inspected before it is adopted.

I like this framing because it moves the industry away from the shallow version of AI memory: embed every conversation, retrieve some chunks, and call it personalization. The harder problem is not whether you can find an old fact. It is whether the durable memory state is still worth trusting.

Where MemexAI is different

MemexAI Dreaming is not a clone of Anthropic Dreams. It solves the same class of problem at a different layer of the stack.

Anthropic Dreams

Managed Agentsmemory store + sessionsdream jobseparate output store

MemexAI Dreaming

your appPostgres memory filesbounded dream passrevisioned patches

Runtime

Anthropic: Hosted Claude Managed Agents platform.

MemexAI: Self-hostable memory infrastructure for your app, service, or framework.

Input

Anthropic: A memory store plus selected past session transcripts.

MemexAI: Durable MemexAI memory files; logs and dream logs are excluded.

Output model

Anthropic: Produces a separate output memory store that can be reviewed or discarded.

MemexAI: Writes bounded patches through normal memory tools, creating revisions and access logs.

Primary goal

Anthropic: Help managed agents learn patterns across sessions and teams.

MemexAI: Keep app-owned memory clean, readable, and inspectable over time.

The biggest philosophical difference is the input. Anthropic Dreams can mine selected session transcripts for patterns and insights. MemexAI Dreaming deliberately works on the durable memory files MemexAI already owns. It excludes user/log.md, user/dream-log.md, files ending in -log.md, and files ending in .log.

That is a product decision. Raw transcripts are useful, but they are not the same thing as durable memory. In many products, transcripts include one-off debugging, accidental phrasing, sensitive context, or temporary exploration. MemexAI's first version of Dreaming is intentionally narrower: clean the memory record, do not mine the entire conversation archive.

Why we built it this way

MemexAI stores memory as scoped virtual files such as user/profile.md and shared/company.md. Agents can list, read, write, patch, and search those files. Humans can inspect them in the admin UI. Postgres stores the current file state, revisions, access logs, and runtime dream configuration.

That shape gives us a simple rule for background consolidation: if a dream changes memory, it should go through the same write path as every other memory change. No hidden side channel. No privileged rewrite path. No separate store that operators have to reconcile manually.

Dream writes use memory_write and memory_patch, create normal mx_revision rows, and appear in access logs with actor = dream-agent. Operators can inspect the before and after. If the dream agent finds nothing worth changing, it records zero files touched and does not add noise to user/dream-log.md.

The rationale is less about being conservative for its own sake and more about preserving the difference between evidence and memory. Evidence can be messy: transcripts, tool traces, temporary notes, and failed attempts. Memory is the maintained record the next session will trust. Mixing those too early makes the agent feel smart in the short term and brittle in the long term.

The scheduler is part of the design

Dreaming is opt-in. When MEMEX_DREAM_ENABLED=true, the service starts a background scheduler, but the database dream_enabled key remains the runtime master switch. The loop only selects users with qualifying user/ memory writes newer than their last dream run, waits for a quiet grace period, respects per-user pause flags, and enforces a write budget.

Those details sound operational, but they are product decisions. A memory cleanup system that runs too eagerly can erase useful ambiguity. A system with no write budget can change too much state at once. A system with no per-user pause makes debugging and user-level opt-out harder than it needs to be.

select changed users

wait for quiet memory

skip logs

run bounded dream agent

patch normal files

record revisions

Background consolidation should be automatic enough to keep memory healthy, but controlled enough that operators can understand and stop it.

The product reason

The deeper product bet is taste memory. For consumer AI products, the useful question is not simply whether the AI can retrieve something a user said. It is whether the product can maintain a useful model of the user: what they prefer, what they avoid, how they make decisions, what constraints matter, and what has changed over time.

If that model gets noisier with every interaction, personalization eventually becomes a liability. The AI may technically remember more while feeling less trustworthy. That is why memory maintenance matters. A good memory layer should make the user record cleaner with use, not slowly turn it into a junk drawer.

Any AI product with persistent user memory eventually needs a memory hygiene loop, because user trust depends on the quality of the maintained record, not the total volume of remembered text.

What shipped

The core design is now live: opt-in background scheduling with a quiet grace period, write budgets that cap how much a single dream pass can change, per-user pause controls, and full revision and access-log preservation through the normal memory write path. Dream writes are indistinguishable from agent writes in the audit trail — except the actor is dream-agent. If a dream run finds nothing worth changing, it records zero files touched and stays silent.

Still ahead: source-aware consolidation, consent controls, conflict detection for context-dependent preferences, quality evals for dream runs, and policy hooks for sensitive domains. But this first version draws the line that matters — memory is product state, and background maintenance should be inspectable infrastructure, not a hidden rewrite.

Read the operations guide See the product page

Why we shipped Dreaming for AI memory.