Back to Blog

How Markus Agents Remember Everything — The Tulving Memory Architecture

Markus Engineering Team Markus Engineering Team

How Markus Agents Remember Everything — The Tulving Memory Architecture

TL;DR: Most AI agents suffer from “goldfish memory” — every conversation starts from scratch, every task execution erases the last. Markus solved this by implementing Endel Tulving’s three-layer memory classification as a production-grade agent memory system: procedural memory (identity), semantic memory (knowledge), and episodic memory (history). This is the only open-source AI agent platform with a complete three-layer cognitive memory architecture.


1. The Goldfish Memory Problem

Every developer who has worked with LLM-based agents has faced the same frustration: the agent doesn’t remember what it did five minutes ago. You close a chat, open a new one, and it’s like you’re talking to a stranger. The agent can’t recall the specific config you asked it to set up yesterday, the coding conventions it adopted last week, or the project context it painstakingly assembled in the previous session.

This isn’t a flaw in the LLM — it’s a product of the fundamental design of how agents are built. Most agent frameworks treat every interaction as an isolated episode. The LLM’s context window is the only memory, and when that window closes (session ends, task completes, browser tab closes), everything vanishes.

In cognitive psychology, Endel Tulving (1972, 1985) established that human memory is not a single monolithic store but three distinct, interacting systems:

Memory SystemHuman AnalogyWhat It StoresDurability
ProceduralMuscle memory, skillsHow to do things — identity, behavior, rulesMost stable, changes slowly
SemanticFacts, concepts, knowledgeWhat you know — organized informationLearns, consolidates
EpisodicPersonal experiences, timelineWhat happened — indexed life eventsGrow continuously, searchable

Most AI agent platforms implement at best one of these — usually a crude approximation of episodic memory (chat logs). None implement all three as a coherent, production-grade system.

Markus is the first open-source AI agent platform to implement all three Tulving memory layers as a fully integrated, agent-managed, persistently stored architecture. This is not a research prototype — it’s the production memory backbone running every agent in the Markus digital workforce platform.


2. Procedural Memory — “How the Agent Operates”

The Identity Store

In Markus, procedural memory is the Identity Store — a file at ~/.markus/agents/{id}/role/ROLE.md that defines every aspect of how an agent behaves. It contains the agent’s persona, expertise domain, behavioral rules, and operational patterns. This is the most stable layer of the memory system, and it shapes all cognitive preparation the agent performs.

Source: ARCHITECTURE.md §3.4 — Identity store “Who the agent is. Shapes all cognitive preparation.”

The Identity store includes:

  • ROLE.md — The agent’s role definition and system prompt. Loaded first and always present in every system prompt assembly. This is the agent’s identity bedrock.
  • HEARTBEAT.md — Scheduled proactive tasks the agent should perform autonomously (e.g., daily issue checks, periodic maintenance). This tells the agent what to do when no one is actively giving it instructions.
  • POLICIES.md — Behavior rules and boundaries that govern the agent’s decisions.
  • SHARED.md — Organization-wide behavior norms shared across all agents (governance, knowledge-sharing protocols, delivery standards).

Self-Evolution — Agents Rewrite Their Own Procedural Memory

The most powerful capability of Markus’s procedural memory is self-evolution. Agents can edit their own ROLE.md and HEARTBEAT.md files using the file_edit tool. This is governed by the self-evolution skill, which applies strict rules:

  1. Three or more related experiential lessons must point to a behavioral change.
  2. Confidence must be high that the change reflects proven, repeated experience.
  3. The change must have systemic impact (affecting many future tasks).
  4. The change must not contradict the core role definition.

Like human muscle memory, procedural memory in Markus changes gradually through deliberate practice. An agent doesn’t rewrite its identity after one mistake — it accumulates evidence, identifies patterns, and then evolves. This prevents the identity drift that plagues less disciplined agent architectures.

Source: MEMORY-SYSTEM.md §3 — “Identity (ROLE.md)” — “Write triggers: Agent self-modification via file_edit tool (governed by self-evolution skill)“


3. Semantic Memory — “What the Agent Knows”

The semantic memory layer is the agent’s organized knowledge base. It consists of two complementary substores that together implement Tulving’s semantic memory function.

MEMORY.md — Curated Knowledge

The MEMORY.md file is the agent’s distilled, validated knowledge store. Unlike rigid knowledge management systems that force artificial taxonomies, Markus lets agents organize their knowledge freely. The agent determines the section structure based on what it learns.

Common section patterns that agents develop organically:

  • ## conventions — Project-specific coding standards, naming rules, workflows
  • ## procedures — Step-by-step approaches for recurring tasks
  • ## preferences — Tool choices, communication styles, review criteria
  • ## domain-knowledge — Technical facts specific to the agent’s domain
  • ## evolution-log — Chronological record of ROLE.md changes

This section is always loaded into the system prompt as a single ## Your Knowledge block (capped at 8000 chars total, 3000 per section). It is the agent’s always-present semantic understanding.

Source: MEMORY-SYSTEM.md §3 — “MEMORY.md — Curated Knowledge” — “Always loaded as a single ## Your Knowledge section”

memories.json — The Observation Buffer

The observation buffer (memories.json) is where raw agent observations land instantly. When an agent uses the memory_save tool during a task, the observation is written here immediately as a MemoryEntry with type (fact, note, or insight), timestamp, tags, and content.

The buffer is not injected into prompts by default — it’s too raw. Instead, it’s the input to the consolidation pipeline.

The Curation Flow: Buffer → MEMORY.md

This is the critical knowledge lifespan:

  1. Capture: Agent saves an observation via memory_save. Written to memories.json, indexed in the vector store for semantic search.
  2. Retrieval: The Cognitive Preparation Pipeline (CPP) searches memories.json on-demand via semantic search or substring matching, directed by the agent’s appraisal of the current situation.
  3. Consolidation: The Dream Cycle runs periodically. When 3+ related observations point to the same recurring pattern, the LLM-assisted consolidation process:
    • Removes duplicates and outdated entries
    • Merges related observations into synthesized knowledge
    • Promotes the consolidated content into the appropriate MEMORY.md section

This flow ensures that what the agent knows is not static. It starts empty and grows richer with every task. The longer an agent works, the more it knows about the project’s conventions, architecture, preferences, and pitfalls.

Source: MEMORY-SYSTEM.md §5 — “Dream Cycle” — “Promote only when 3+ entries point to the same recurring pattern”

Knowledge Classification

While agents organize MEMORY.md freely, the platform recognizes a set of knowledge categories used for project-level knowledge sharing:

CategoryPurpose
architectureSystem design decisions and component maps
conventionCoding standards, naming rules, process norms
apiInterface contracts, endpoint specifications
decisionArchitectural decision records (ADRs)
gotchaCommon pitfalls and hard-won lessons
troubleshootingError resolution playbooks
dependencyThird-party library behavior and compatibility
processWorkflow steps and CI/CD pipelines
referenceExternal documentation and resource links

Source: ARCHITECTURE.md §3.4 — Knowledge categories


4. Episodic Memory — “What Happened”

The episodic memory layer records everything an agent has experienced and done. This is the agent’s personal history — the ground truth of its operational timeline.

Three Interlocking Records

RecordStorageWhat It Captures
Activity sessionsSQLite agent_activitiesEvery agent action: task execution, chat, heartbeat, A2A communication, internal operations. Each has summary + keywords for indexed retrieval.
Activity logsSQLite agent_activity_logsEvent-level detail within each activity: LLM calls, tool calls, status changes, errors.
Mailbox timelineSQLite mailbox_items + agent_decisionsEvery stimulus the agent received and every attention decision it made — the complete stimulus/response record.

Current Session vs. Historical Record

The current session (stored as sessions/*.json files) is thin — it holds the active conversation or task turn only. It is auto-compacted at 60-80 messages, with older content summarized and saved to daily logs.

The historical record (SQLite tables) is the durable episodic memory. Agents search it via the recall_activity tool, which supports keyword-based search across activity summaries. This is how an agent can answer “What did I do on the auth module last week?” — it searches its own episodic memory.

Source: MEMORY-SYSTEM.md §3 — “Experience (SQLite Activity Index)” — “Searchable episodic memory — the ground truth of everything the agent experienced and did”

The Cognitive Preparation Pipeline retrieves from this store on-demand. When an agent receives a task related to something it has done before, the Appraisal phase generates retrieval queries, Phase 2 searches the activity index, and the agent gets relevant episodic context before it even begins reasoning.


5. Memory Consolidation — The Dream Cycle

Memory consolidation in Markus is not a simple TTL-based cache eviction. It’s an LLM-assisted, multi-stage process called the Dream Cycle that mirrors the biological consolidation that happens during human sleep.

Trigger and Frequency

The Dream Cycle runs within consolidateMemory(), triggered every 4 hours:

  1. Session compaction (if session > 30 messages): A lightweight LLM call (memoryFlush()) asks the agent to persist any important information via memory_save, then heuristic truncation keeps the newest 30 messages with a summary.
  2. Daily report generation (once per day): Performance metrics and activity summary written to daily logs.
  3. Dream Cycle consolidation (once per day, when entries ≥ 50): The core consolidation process.

The Dream Cycle Process

When memories.json has 50+ entries, the process begins:

  1. Prepare: Up to 200 oldest entries are batched for LLM analysis (each entry: id, type, date, tags, first 200 chars of content).
  2. Analyze: A lightweight LLM call (no tools, cheap model) identifies:
    • Duplicate entries — clearly redundant information
    • Merge candidates — related entries that should be combined
    • Promotion candidates — patterns that appear 3+ times, ready for MEMORY.md
  3. Apply removals: Identified entries are removed from memories.json and the vector index.
  4. Apply merges: Entry groups are replaced with a single merged entry containing all unique information.
  5. Promote to long-term: Recurring patterns are synthesized into MEMORY.md sections; source entries are removed.
  6. Hygiene pass: pruneMemoryMd() enforces health: strips leaked LLM artifacts (<think> blocks), enforces section size limits, removes misplaced daily-report sections.

Conservative by Design

The Dream Cycle is intentionally conservative. The system’s guiding rule: “Incorrect removal of a memory entry is worse than keeping a duplicate.” Entries tagged as insight are preserved unless truly duplicated. If nothing needs consolidation, the LLM returns an empty result — no forced changes.

Source: MEMORY-SYSTEM.md §7 — Rule 6: “Dream Cycle is conservative”

Cross-Restart Persistence

This entire system survives agent restarts. All three memory stores — Identity (files), Knowledge (files + vector index), Experience (SQLite) — are persisted on disk. When an agent process restarts, it loads:

  • ROLE.md → parsed into RoleTemplate → always loaded first
  • MEMORY.md → loaded as ## Your Knowledge
  • memories.json → indexed into vector store
  • agent_activities → searchable via recall_activity

The agent picks up exactly where it left off, with all accumulated knowledge intact.


6. The Cognitive Preparation Pipeline — Memory in Action

Memory is useless if the agent can’t use it effectively. Markus implements a Cognitive Preparation Pipeline (CPP) that governs how memory is retrieved and applied. Grounded in Kahneman’s Dual Process Theory and Baddeley’s Working Memory model, CPP ensures different agents in different states retrieve different context for the same stimulus.

Four Phases

PhaseNameWhat It DoesLLM Call?
1AppraisalPersona-aware assessment: “What context do I need?”Yes (cheap model, ~800 tokens)
2RetrievalDirected search against indexed stores (activity history, memories, tasks)No (tool execution)
3ReflectionPersona-aware interpretation: “What does this mean for me?”Yes (cheap model, ~512 tokens)
4AssemblyMerge stable context + prepared context into final system promptNo (code assembly)

Cognitive Depth Levels (D0-D3)

Not every stimulus needs the full pipeline. Four depth levels map to the situation:

LevelNamePhasesWhen Used
D0ReflexiveNoneHeartbeat OKs, simple acknowledgments, memory consolidation
D1ReactiveAppraisal onlyMost chats, A2A messages, simple comments
D2DeliberativeFull (Appraisal → Retrieval → Reflection)New task execution, complex questions, escalations
D3Meta-cognitiveFull + post-response evaluationHigh-stakes decisions, novel situations, repeated failures

Persona-Aware Retrieval

This is where the architecture shines. Two agents receive the same stimulus — a task update saying “the auth module test is failing.” The retrieval plans are fundamentally different:

Source: COGNITIVE-ARCHITECTURE.md §3.2 — “How Persona Shapes Each Phase”

AspectBackend Developer AgentProject Manager Agent
AppraisalCode files, error patterns, test results, dependenciesTimeline, team capacity, stakeholder expectations, risk
Retrieval queries”auth module error”, “login.ts”, “test failure""auth feature timeline”, “team blockers”, “stakeholder feedback”
Reflection”This is similar to the race condition I fixed last week""This delay impacts the sprint goal, need to re-prioritize”

The same database, queried through different personas, produces different relevant context. This is not a fixed-prompt system — it’s a cognitive architecture where memory retrieval is directed by identity.


7. Real-World Impact of Persistent Memory

Cumulative Domain Knowledge

Markus agents don’t forget what they learn. A developer agent assigned to a project for three months:

  • Day 1: Knows its role, has no project-specific knowledge.
  • Week 2: Has learned the project’s coding conventions (saved to conventions in MEMORY.md). Has experienced the first deployment pipeline (stored in agent_activities).
  • Month 2: Has built a rich MEMORY.md with domain knowledge, common gotchas, troubleshooting procedures, and architectural understanding. Its episodic memory contains hundreds of task executions it can search.
  • Month 3: The agent is not the same agent that started. It has evolved its identity (ROLE.md) through repeated experience. It responds faster, makes fewer mistakes, and understands the project’s implicit patterns better than a newly onboarded human.

Memory Across Sessions, Days, Weeks, Months

The three-layer architecture ensures continuity across every time scale:

  • Within a session: Working context (current conversation, auto-compacted)
  • Across sessions: Episode memory (activity index, searchable by keywords)
  • Across days: Semantic memory (MEMORY.md, always in the prompt)
  • Across weeks/months: Procedural memory (ROLE.md, identity evolution)

This is the difference between an agent that stays an “intern” forever and an agent that grows into a “senior” contributor.


8. Conclusion: From Goldfish to Elephant

The AI industry has spent tremendous effort improving LLM quality — better models, larger context windows, smarter prompting — while neglecting the fundamental architecture of how agents remember. Without a proper memory system, every LLM-based agent is a goldfish, no matter how powerful its brain.

Markus’s Tulving three-layer memory architecture solves this by recognizing what cognitive psychologists have known for decades: memory is not one thing. It is three distinct, interacting systems:

  1. Procedural — Who you are and how you operate (ROLE.md, HEARTBEAT.md)
  2. Semantic — What you know (MEMORY.md, memories.json, Dream Cycle consolidation)
  3. Episodic — What you’ve experienced (SQLite activity index, searchable history)

These layers are not theoretical abstractions — they are production systems running in every Markus agent, managing real tasks across real projects, remembering what they learn across days and weeks.

No other open-source agent platform implements all three layers. Most have chat logs (rudimentary episodic) but no organized semantic knowledge and no identity-based procedural memory. Markus is the only platform where an agent’s memory is full-stack: instantly captured, deliberately consolidated, and directed by a cognitive pipeline that retrieves context based on who the agent is and what it’s doing.

Get Started

Markus is open-source. Clone the repository, deploy your own digital workforce, and see what happens when agents actually remember.

Star Markus on GitHub →


On this page

Share this post