Lattice: A Memory Engine That Actually Remembers

Ask any AI assistant what you told it five minutes ago and it performs flawlessly. Ask it what you told it last Tuesday and you get a blank stare. This is the fundamental gap in modern AI: conversations are stateless by default. Every session starts from zero.

The industry’s answer has been retrieval-augmented generation (RAG). Dump documents into a vector database, retrieve relevant chunks, paste them into the prompt. It works for knowledge bases. It fails spectacularly for knowing the user.

RAG answers the question “what does this document say?” It cannot answer “what does this person care about?” These are fundamentally different problems, and conflating them is why most AI memory systems feel hollow.

We set out to build something different. Lattice is a persistent memory engine for AI agents that continuously learns about users across every interaction, through conversations, explicit statements, and even uploaded documents, without hallucinating, without drowning in duplicates, and without the user ever having to say “remember this.”

The Hallucination That Started Everything

During development of our agent framework, we discovered a critical bug in our memory extraction. When we said:

“I need to finish the quarterly report by Friday.”

The system extracted:

“The user’s manager is named Sarah and she works in the finance department.”

Not a typo. Not a misinterpretation. A complete fabrication. The extraction LLM, given no grounding constraints, hallucinated a person, a relationship, and a department that appeared nowhere in the conversation.

This is not an edge case. This is the default behavior of any memory system that asks an LLM to “extract facts” without verifying the output against the source material. The model fills in what it thinks should be there, not what is there. Every memory system without source validation is one prompt away from poisoning its own knowledge with fabricated data.

This failure forced us to rethink the entire architecture from the ground up.

Why Existing Approaches Fall Short

We studied the memory landscape before building Lattice. Most systems we found shared a common architecture: extract facts from conversation using an LLM, embed them, store them in a vector database, and retrieve them later by similarity search. Some added useful features like user profiles, intelligent decay, or memory relationships.

But none of them solved the problem that kept us up at night: extraction reliability. Every system we evaluated trusted the LLM to extract facts faithfully. None verified that the extracted facts actually appeared in the source conversation. LLMs are confident liars. Without grounding constraints, any extraction pipeline can silently poison the knowledge base with fabricated data: exactly the failure we experienced firsthand.

We also wanted memory that was not a separate service to integrate with, but a native capability built directly into the agent itself, something that runs inside the agent loop rather than alongside it.

These two requirements, grounded extraction and native integration, became the foundation of Lattice.

Lattice: Architecture Overview

Lattice is not a standalone service or external API. It is a native subsystem built directly into SAGE, our agentic framework, meaning every agent we ship gets persistent memory as a first-class capability. No external API calls, no separate infrastructure. Memory lives inside the agent.

The architecture has three ingestion paths and a six-stage retrieval pipeline, all connected through a memory graph with typed relationships.

Lattice architecture diagram showing three ingestion paths (Conversations, Agent Tools, Documents) flowing through Source Validation and Semantic Deduplication into the Memory Graph, then through the 6-Stage Retrieval Pipeline to produce Agent Context

The Memory Entry

Every memory in Lattice is a rich structured object with the following properties:

Content: The actual fact, preference, or instruction (e.g., “User loves chocolate chip cookies”)
Category: Classified as a fact, preference, instruction, or context
Source: Whether it came from the user, the agent, or the system
Confidence: A score from 0.0 to 1.0 that strengthens with use and decays with neglect
Memory type: Episodic (events), semantic (facts), or procedural (how-to)
Importance: A score from 0.0 to 1.0
Persistence level: Ephemeral, short-term, or long-term
Entities: People, places, and concepts mentioned
Relations: Graph edges connecting this memory to related memories, typed as “updates,” “extends,” or “derives”
Embedding: A 1024-dimensional vector for semantic search
Access count: How many times this memory has been retrieved (a reinforcement signal)

This is not a key-value store. It is a graph of interconnected, evolving knowledge about a person.

Ingestion: Three Paths to Memory

Path 1: Automatic Extraction with Source Validation

After every agent response, Lattice runs a background extraction pass on the conversation. The most recent messages are sent to an LLM with a grounded extraction prompt that includes the user’s actual name and, critically, requires a source quote for every extracted memory.

The extraction prompt enforces strict grounding: every memory must be traceable to a specific phrase in the conversation. If the LLM cannot point to text that supports a fact, it is instructed not to extract it.

After the LLM responds, each memory undergoes source validation: the provided quote is checked against the actual conversation text using a fuzzy matching algorithm (60%+ word overlap threshold). Memories that fail validation are rejected before they ever reach storage.

This is what would have caught the hallucination described above. The LLM would have been required to provide a quote from the conversation supporting the extraction of “Sarah” and “finance department.” Since no such quote exists, the user only said “I need to finish the quarterly report by Friday,” so the memory would be rejected.

The principle: never trust the LLM’s output without verifying it against the input.

Source validation flow showing how Lattice catches hallucinated extractions by requiring source quotes verified against the actual conversation, with a grounded extraction passing validation and a hallucinated extraction being rejected

Path 2: Explicit Memory (Agent Tools)

The agent has four memory tools it can call during conversation:

Remember: Store a fact with maximum confidence and long-term persistence. The agent uses this proactively and silently, with no “I’ll remember that!” announcements.
Recall: Search memories before answering personal questions. Uses the full retrieval pipeline.
Update: Modify a memory when the user corrects something or circumstances change (moved cities, changed jobs).
Forget: Delete a memory when the user requests it.

The agent is instructed to use these tools continuously and quietly. It remembers names, relationships, preferences, work context, communication style, goals, and corrections, all without asking permission.

Path 3: Document Ingestion

When a user uploads a document to a project, Lattice extracts not just chunks for RAG. It also extracts what the document reveals about the user. The initial chunks are analyzed to answer: “What does this tell us about the user’s interests and work?”

Upload a TypeScript configuration file and Lattice might extract: “User works with TypeScript projects using strict mode.” Upload a research paper and it might extract: “User is researching distributed systems consensus algorithms.”

This bridges the gap between document knowledge and user knowledge. The RAG system gets the content; Lattice gets the context.

The Deduplication Problem (and the Memory Graph)

Naive memory systems accumulate garbage. Tell the agent you like coffee on Monday, mention it again on Wednesday, and you have two identical memories. Say “I moved to San Francisco” after previously saying “I live in New York” and now you have a contradiction with no resolution.

Lattice solves both problems with semantic deduplication on write, using a tiered similarity system:

Very high similarity (> 0.95): The new memory is essentially a duplicate. Skip it and reinforce the existing one instead (+0.1 confidence, +1 access count).

High similarity (> 0.88): The new memory updates or contradicts the old one. Supersede the old memory and link them with an “updates” relationship. The old memory remains in storage but is hidden from retrieval.

Moderate similarity (> 0.72): The memories are related but distinct. Keep both and classify the relationship as “extends” (complementary information) or “derives” (inferred connection) based on word overlap and entity matching.

Low similarity (< 0.72): The new memory is unrelated. Save it as a standalone entry.

The relationship classification uses word overlap analysis and entity matching:

Same category + high word overlap = “updates” (contradiction or correction)
Shared entities = “extends” (complementary information)
Different category or low overlap = “derives” (loose connection)

This creates a memory graph where every memory is potentially connected to related memories. Superseded memories are never shown in retrieval results but remain in storage, maintaining a complete history of how the system’s knowledge about the user has evolved.

Retrieval: The Six-Stage Pipeline

Getting memories in is only half the problem. Getting the right memories out is where most systems fall short. Lattice uses a multi-stage retrieval pipeline with seven scoring signals.

Stage 0: Query Rewriting. A single embedding vector cannot capture every angle of a query. “What’s my favorite candy?” matches differently than “preferred sweets and snacks” or “candy and treat preferences.” Lattice generates 2 alternative phrasings using an LLM, then searches all 3 in parallel.

Stage 1: Broad Search. Runs vector search (semantic similarity via embeddings) and keyword search (exact text matching) simultaneously. Vector search catches conceptual matches; keyword search catches exact terms that embeddings might miss. Results are merged, deduplicated by ID, with the highest score kept for each memory.

Stage 2: Entity Boosting. Extracts proper nouns, dates, emails, and capitalized phrases from the query. Memories that share entities with the query get a scoring boost in Stage 4.

Stage 3: Temporal Filtering. Detects time expressions in the query (“yesterday,” “last week,” “in March,” “recently”) and filters candidates to the appropriate time range. Gracefully degrades: if the temporal filter would eliminate all results, it is skipped.

Stage 4: Composite Scoring. Each candidate receives a composite score from seven weighted signals:

Semantic similarity (weight: 1.0): Vector cosine similarity from embedding search.

Recency (weight: 0.8): Linear decay from newest (1.0) to oldest (0.0).

Confidence (weight: 1.0): The memory’s confidence score, which evolves over time.

Access count (weight: 0.3): Normalized usage frequency. Popular memories rank higher.

Entity match (weight: 0.8): Proportion of query entities found in the memory’s entities.

Importance (weight: 0.7): Importance score assigned during extraction.

Persistence level (weight: 0.3): Long-term memories score 1.0, short-term 0.5, ephemeral 0.2.

This multi-signal approach means a memory does not need to be a perfect semantic match to surface. A highly confident, frequently accessed memory about a related entity will rank above a moderately similar memory that has not been touched in months.

Stage 5: Diversity Filtering. After sorting by composite score, the pipeline prevents returning N memories that all say the same thing. Each candidate is compared against already-selected memories using content-normalized word overlap. If more than 80% of words match, it is considered a duplicate and skipped.

Stage 6: Return Top N. The top 15 memories (configurable) are returned, formatted by category with source indicators. Memories the user explicitly stated are marked as such; memories the system inferred from conversation are marked as learned.

Reinforcement and Decay

Lattice memories are not static. They evolve through two mechanisms:

Reinforcement. Every time a memory is retrieved, it receives a timestamp update, an incremented access count, and a small confidence boost (+0.05, capped at 1.0). Memories that are frequently relevant become stronger over time.

Pruning. Periodic cleanup removes memories that are both stale (not accessed in 90 days) and low-confidence (below 0.3). Memories that have not been accessed in 270 days are pruned regardless of confidence. This is more nuanced than simple time-based expiration. A high-confidence memory survives much longer than a speculative one.

User Profile System

Beyond individual memory retrieval, Lattice maintains a structured user profile that partitions memories into:

Static profile: Long-term memories, user-explicit memories, and high-confidence memories (>= 0.9), organized by category: Identity and Facts, Preferences, Instructions, Ongoing Context
Dynamic profile: Recent memories (last 4 weeks) with short-term or ephemeral persistence, sorted by recency

This provides the agent with both a stable understanding of who the user is and an awareness of what they have been working on recently. The static profile captures identity; the dynamic profile captures momentum.

Design Principles

Several principles guided the architecture of Lattice:

1. Never trust unverified extraction. Any fact extracted by an LLM must be traceable to specific text in the source material. The source quote mechanism makes hallucination a detectable, rejectable event rather than an invisible corruption of the knowledge base.

2. Memory is a graph, not a list. Real human knowledge is interconnected. Knowing someone moved from New York to San Francisco is more useful than having two unrelated location entries. The typed relationship system (updates, extends, derives) captures how knowledge evolves.

3. Retrieval is multi-dimensional. A single similarity score is insufficient. Recency, confidence, entity overlap, importance, persistence level, and access frequency all contribute to whether a memory is relevant right now.

4. The agent should never ask permission to remember. Memory should be invisible infrastructure. The user says something meaningful, and it is captured silently, accurately, and with appropriate confidence. No “Would you like me to remember that?” interruptions.

5. Every retrieval is a learning opportunity. The reinforcement mechanism means the system’s understanding of what matters improves passively over time. Frequently relevant memories strengthen; irrelevant ones fade.

6. Validate at the boundaries, trust internally. Input validation (source quote checking) and output diversity (dedup filtering) are strict. Everything in between: storage, embedding, graph linking, can be trusted because the boundaries are controlled.

What is Next

This initial version of Lattice solves the foundational problems: hallucination, deduplication, contradiction resolution, multi-signal retrieval, and proactive agent behavior. Future work includes:

Memory consolidation: Periodically merging related short-term memories into higher-level long-term summaries
Cross-user knowledge: Shared team memories with access control
Emotional context: Detecting and storing sentiment alongside facts
Proactive surfacing: The agent volunteers relevant memories before being asked, based on conversation trajectory prediction
Memory importance re-ranking: Using conversation outcomes to retroactively adjust memory importance scores

The AI industry has treated memory as a search problem: store text, embed it, retrieve it. But human memory is not search. It is a living system that strengthens with use, resolves contradictions, connects related concepts, forgets what does not matter, and, most importantly, never makes things up.

Lattice is our attempt to build memory that works the way memory should. Not as a bolted-on feature, but as fundamental infrastructure that makes AI agents genuinely understand the people they work with.

Every conversation is an opportunity to learn. Lattice makes sure nothing is wasted.