All use cases Use Case

It Remembers Everything — Stays on Your Laptop

PostgreSQLpgvectorMemoryCompactionMulti-ChannelSelf-Hosted
"I had spent eight months building a therapeutic relationship with ChatGPT, processing trauma. When the memory was wiped, it was like losing a trusted counselor who suddenly couldn't remember our sessions." — ChatGPT user, OpenAI Community Forum, February 2025

The Story

February 5, 2025. An OpenAI backend update rolls out with no warning. Millions of ChatGPT users open their conversations to find their AI has forgotten everything. Not just the last conversation — everything. Months of accumulated context, preferences, project history, personal details, creative collaborations. Gone.

The reports poured in. Creative writers who had spent months building fictional universes with ChatGPT — character backstories, plot arcs, world-building details — all erased. Therapy users who had been processing trauma through carefully built conversational relationships found themselves talking to a stranger. Students who had built complex study systems over an entire semester lost all of it overnight.

"My AI feels like it's far gone in dementia." That was one of the milder reactions.

The February wipe wasn't an isolated incident. It was the most visible symptom of a fundamental architectural problem: cloud AI memory is borrowed, not owned. A 2025 MIT study found an 83% failure rate in ChatGPT's memory system — memories silently dropped, incorrectly recalled, or contaminated with details from other users' conversations. The system has a hard cap of approximately 24,000 words of stored memories. Beyond that, older memories are silently discarded to make room for new ones. You're never told which memories were dropped or why.

Then there's context rot. Over long conversations, AI models gradually lose coherence. Early context fades. Contradictions accumulate. The AI starts confidently asserting things that contradict what it said two hundred messages ago. And cross-contamination — details from one conversation bleeding into another, names from one user appearing in another user's session. Your personal context mixed with a stranger's, processed on shared infrastructure you can't inspect.

This is the reality of AI memory in 2025: unreliable, uncontrollable, and deletable without notice by a company whose priorities are not yours.

The Idea

Three tiers of memory, all local, all in PostgreSQL on your hardware.

Session transcripts — Every conversation is stored as a tree-structured message history. Not a flat log — a tree, with branching. When you explore a tangent and then come back, the branch is preserved. When you retry a question with different framing, both paths exist. You can navigate, revisit, and branch from any point in any conversation you've ever had.

Working memory — The agent's current understanding of active tasks, ongoing projects, recent entities, and short-term goals. This is what lets the agent pick up where you left off without you re-explaining context. Working memory is updated continuously during conversations and persists between sessions.

Long-term memory — Everything else. Preferences, decisions, facts, relationships, project history, technical opinions — stored as embeddings in pgvector and retrieved via hybrid search (BM25 for keyword matching, cosine similarity for semantic matching, cross-encoder re-ranking for precision). When you mention something relevant to a conversation from three months ago, the memory system surfaces it automatically.

You (March)

I've been thinking about the module system. I strongly prefer composition over inheritance — interfaces should be small and composable, not deep hierarchies.

Agent

Noted. I'll use composition by default for new module designs. Small interfaces, embed for reuse, avoid deep type hierarchies.

Six months later:

You (September)

Write a new notification module — needs to support email, Slack, and webhooks.

Agent

Here's the design. I'm using a Notifier interface with a single Send(ctx, message) method, and separate implementations for each channel. The dispatcher composes them — no base class, no inheritance. Consistent with how you prefer to structure these things.

I also referenced the event bus pattern from the scheduler module (which uses the same approach) to keep the style uniform across the codebase.

The agent didn't ask. It didn't need a reminder. It retrieved a six-month-old architectural preference because the current task was semantically relevant. This is what memory is supposed to do — not store facts in a list, but surface the right context at the right time.

Memory works across channels. What you told the agent on Telegram yesterday is available when you talk to it on your desktop today.

You (Telegram, Tuesday)

Remind me about the Marcus call — what did we discuss yesterday?

Agent

Yesterday on Slack you discussed three things with me about Marcus: the partnership pricing model (you were leaning toward revenue share over flat fee), the April timeline concern (he wants to launch Q2 but you think the API isn't ready), and the data residency question (his European clients need EU hosting, which affects your infrastructure plan). You asked me to draft a counter-proposal — it's in your working memory. Want me to pull it up?

And memory persists indefinitely. There's no word cap, no silent pruning, no "oldest memories get dropped" policy.

You (August)

What was my investment thesis on PLTR back in February?

Agent

In February you were bullish on PLTR for three reasons: government contract momentum (specifically the NATO deal), the AIP platform's enterprise adoption curve, and what you called "the only company that actually makes money from AI deployments." Your concern was valuation — you said the P/E was "insane" but you thought the TAM justified it if AIP penetration hit 15% of Fortune 500 by 2027. You bought at $78 and set a mental target of $120 by year-end.

What It Looks Like

You control it. Memory isn't a black box. You can search it, browse it, edit it, and delete it. Found something the agent remembered incorrectly? Correct it. Want to remove a sensitive conversation from memory entirely? Delete it. Want to export your entire memory for backup or migration? It's a PostgreSQL database — pg_dump and you're done.

It never degrades. There's no word limit. There's no point at which the system starts silently dropping old memories. A conversation from your first week is just as retrievable as one from yesterday. The retrieval quality doesn't degrade over time — hybrid search with re-ranking means relevance scoring is consistent regardless of database size.

Cross-channel continuity. Desktop, Telegram, Slack, CLI — every channel reads from and writes to the same memory. The agent you talk to on your phone at lunch is the same agent that helped you debug code on your desktop this morning. There's no "Telegram context" and "desktop context" — there's just your context.

Private by architecture. Memory lives in PostgreSQL on your machine. Not "encrypted in the cloud." Not "de-identified and stored securely." On your machine. There's no API call that transmits your memories to a third party. There's no model training pipeline that ingests your conversations. The privacy isn't a policy — it's a physical constraint.

Compaction, not deletion. As conversations grow long, the system uses iterative summarisation to compress older context without losing it. A thousand-message conversation becomes a detailed summary that preserves key decisions, facts, and context — while the full transcript remains available in the session tree if you need the exact words. This is how memory scales: not by forgetting, but by organising.

How It Works

  • Long-term memory — Stored as vector embeddings in PostgreSQL with pgvector. Retrieval uses a three-stage pipeline: BM25 keyword search for exact matches, cosine similarity for semantic matches, and cross-encoder re-ranking to score the combined results. This hybrid approach means the system finds relevant memories whether you use the exact same words or describe the same concept differently.
  • Working memory — A structured representation of the agent's current context: active tasks, recent entities, ongoing projects, and short-term goals. Updated continuously during conversations. Persists between sessions so the agent can resume context without re-explanation.
  • Session trees — Conversations are stored as tree-structured message histories with id and parent_id fields. Branching, navigation, and compaction all operate on this tree. You can revisit any point in any conversation, branch from it, and explore alternative paths — all preserved in the session history.
  • Compaction — Three triggers initiate compaction: token count thresholds, message count thresholds, and time-based triggers. The compaction process uses iterative summarisation — each pass produces a more concise summary while preserving key facts, decisions, and file operations. The full transcript is always retained; compaction creates navigable summaries, not lossy compression.
  • Self-hosted PostgreSQL + pgvector — The entire memory system runs on a local PostgreSQL instance with the pgvector extension. No external database service. No cloud storage. Backup is pg_dump. Migration is pg_restore. You can inspect every byte of your agent's memory with standard SQL queries.
  • Multi-channel — All channels (desktop, Telegram, Slack, CLI, HTTP) read from and write to the same memory store. A memory created during a Telegram conversation is available in every subsequent interaction, regardless of channel. The agent's identity and knowledge are channel-independent.
  • OpenAI embeddings — Text is converted to vector embeddings using OpenAI's embedding model for storage in pgvector. This is the one external API call in the memory system — the embedding itself. The resulting vectors are stored locally. For users who want zero external calls, local embedding models (via Ollama) are supported as an alternative.

What Breaks Without This

ChatGPT has an 83% memory failure rate according to MIT research. Memories are silently dropped, incorrectly recalled, or contaminated with cross-user data. The system caps at approximately 24,000 words of stored memories — beyond that, older memories are discarded without notification. And as February 2025 demonstrated, the entire memory system can be wiped by a backend update you have no control over and receive no warning about.

Claude has no persistent memory system. Every conversation starts from scratch. Claude Projects offer limited shared context via project documents, but these are static files you maintain manually — not dynamic memory that builds over time. There's no mechanism for the agent to remember what you discussed last week, let alone last month.

Perplexity has no memory at all. Each query is independent. Ask the same question twice and it may give you different answers with no awareness that you asked before. It's a search tool, not an agent — and search tools don't remember their users.

Cursor resets context between windows. Open a new editor window and the AI has no idea what you were working on in the previous one. Close the editor and context is gone. There's no persistent memory across sessions, let alone across days or weeks.

Claude Code uses a static CLAUDE.md file for project context. This is a manually maintained document — you write it, you update it, you decide what goes in it. It's useful, but it's not memory. It's a readme file. The agent doesn't add to it from conversations, doesn't retrieve from it selectively, and doesn't build understanding over time.

Build This

This is not a concept — it's buildable today.

Salmex I/O's three-tier memory — session, working, and long-term — lives in PostgreSQL on your hardware. Hybrid retrieval with pgvector means your agent's recall compounds over months, never degrades, and never gets wiped by a backend update you didn't ask for.

PostgreSQLpgvectorMemoryCompactionMulti-ChannelSelf-Hosted