architecture technical deep-dive markus multi-agent

Deep Dive into Markus Architecture: Memory, A2A Protocol & Multi-Agent Runtime

Markus Team

May 16, 2026 15 min read

Deep Dive into Markus Architecture: Memory, A2A Protocol & Multi-Agent Runtime

Meta Description: Explore the Markus multi-agent architecture — a production-grade cognitive runtime featuring Tulving three-tier memory, Agent-to-Agent (A2A) protocol, Cognitive Preparation Pipeline, 9-state task governance, and Heartbeat-driven autonomous agents. Learn how AI agents think, remember, and collaborate.

1. Introduction

As AI agents evolve from simple chatbots into autonomous digital employees, the underlying architecture must support memory persistence, inter-agent communication, task governance, and self-directed operation. Markus is an open-source multi-agent runtime that takes a principled approach to all of these challenges.

Inspired by cognitive psychology, distributed systems, and production-grade software engineering, Markus provides a complete infrastructure for deploying teams of AI agents that can remember past interactions, communicate with each other, delegate tasks, follow governance policies, and even initiate work on their own through a Heartbeat mechanism.

This deep dive explores the core architectural components that make Markus a compelling choice for developers building multi-agent systems in production. We’ll cover:

The three-layer architecture (Web UI → Org Manager → Agent Runtime)
Tulving three-tier memory (Procedural, Semantic, Episodic) and the Dream Cycle
The A2A protocol (Agent-to-Agent communication) with mailbox system and attention controller
Cognitive Preparation Pipeline (CPP) with four depth levels
Task governance: 9-state state machine, approval gates, trust levels, and workspace isolation
The Heartbeat mechanism for proactive agent behavior

2. The Three-Layer Architecture

Markus follows a clean separation of concerns with three distinct layers. Each layer has a clearly defined responsibility, and they communicate through well-defined interfaces.

┌──────────────────────────────────────────────────────────────┐
│                 Web UI (React + Vite + Tailwind)              │
│   Dashboard · Chat · Project Management · Builder · Hub      │
└──────────────────────────────┬───────────────────────────────┘
                               │ REST + WebSocket
┌──────────────────────────────┴───────────────────────────────┐
│                 Org Manager (API Server)                       │
│   Authentication · Task Governance · Project Management       │
│   Reporting · User Management                                 │
└──────────────────────────────┬───────────────────────────────┘
                               │
┌──────────────────────────────┴───────────────────────────────┐
│                 Agent Runtime (Core Runtime)                    │
│   Agent · LLM Router · Tool System · Memory · Heartbeat       │
│   Mailbox · Attention Controller · Context Engine             │
└──────────┬──────────────────────────────┬────────────────────┘
           │                              │
┌──────────┴──────────────┐   ┌──────────┴────────────────────┐
│  Storage (SQLite/PostgreSQL)│   Comms Bridges               │
│                           │   Slack · Feishu · WhatsApp     │
│                           │   Telegram · WeCom             │
└──────────────────────────┘   └─────────────────────────────┘

2.1 Web UI (Presentation Layer)

The frontend is built with React + Vite + Tailwind CSS, providing a responsive dashboard that works across desktop and mobile. It offers workspaces for chat, project management, agent configuration (Builder), capability discovery (Hub), and system settings. Communication with the server layer uses REST for standard CRUD operations and WebSocket for real-time updates like task status changes and agent messages.

2.2 Org Manager (API / Governance Layer)

The Org Manager serves as the central API server. It handles:

Authentication & Authorization — user and agent identity management
Task Governance — state machine transitions, approval routing, and policy enforcement
Project Management — project creation, milestone tracking, deliverable management
Reporting & User Management — audit logs, team organization

This layer is stateless from the agent perspective; it orchestrates governance without interfering with agent execution logic.

2.3 Agent Runtime (Core Cognitive Layer)

The Agent Runtime is where the actual intelligence lives. It manages:

Agent lifecycle — creation, session management, sub-agent spawning
LLM Router — intelligent model selection, failover, circuit breaker
Tool System — tool registration, execution, sandboxing
Memory System — Procedural, Semantic, and Episodic tiers
A2A Communication — mailbox, message routing, delegation
Heartbeat Scheduler — autonomous periodic task execution
Context Engine — 24-segment system prompt assembly with KV-cache optimization

The runtime is designed so that multiple agents can coexist, each with isolated workspaces and independent memory profiles.

3. Tulving Three-Tier Memory System

Named after cognitive psychologist Endel Tulving, Markus implements a three-tier memory architecture that mirrors human memory systems. This is a defining feature of its cognitive architecture and one of the key differentiators from simpler agent frameworks.

┌─────────────────────────────────────────────────────────────────┐
│                        Memory System Overview                    │
├─────────────┬─────────────────┬─────────────────────────────────┤
│  Procedural │    Semantic      │         Episodic                │
│  ("How to") │  ("Know what")   │      ("What happened")          │
├─────────────┼─────────────────┼─────────────────────────────────┤
│  ROLE.md    │  MEMORY.md      │  sessions/*.json (current)      │
│  Skills     │  memories.json  │  SQLite agent_activities         │
│  Behavior   │  Long-term      │  (past activities, on-demand)   │
│  Rules      │  Knowledge      │                                  │
└─────────────┴─────────────────┴─────────────────────────────────┘

3.1 Procedural Memory — “How to Act”

Procedural memory encodes the agent’s identity, behavioral rules, and skill definitions. It answers the question: Who am I, and how should I behave?

Aspect	Detail
Storage	`role/ROLE.md` + Skill definition files
Content	Agent identity, system prompts, behavior boundaries, action policies
Loading	Prepended to the system prompt at every inference cycle
Mutability	ROLE.md is immutable by the agent — only human users can modify core identity

This layer ensures that an agent cannot rewrite its own fundamental character. It creates a stable anchor for identity, preventing drift during extended autonomous operation.

3.2 Semantic Memory — “What I Know”

Semantic memory stores factual knowledge, verified patterns, workflows, and domain expertise. It is the agent’s accumulated long-term knowledge base.

Aspect	Detail
Storage	`MEMORY.md` (curated, always in prompt) + `memories.json` (observation buffer, searchable)
Capacity	MEMORY.md: 3,000 characters per section, 15,000 total
Key Tools	`memory_save` (save observation), `memory_search` (retrieve), `memory_update_longterm` (consolidate to MEMORY.md)

Semantic memory is the primary mechanism for learning from experience. The agent saves observations, searches for relevant knowledge during tasks, and periodically consolidates important patterns into its permanent knowledge base.

3.3 Episodic Memory — “What Happened”

Episodic memory records the agent’s past experiences — tasks it performed, messages it received, sessions it participated in.

Aspect	Detail
Storage	`sessions/*.json` (current + recent sessions), SQLite `agent_activities` (historical)
Retrieval	`recall_activity` tool — query by task, type, or keyword
Use Case	Contextual awareness, learning from past outcomes, continuity across sessions

Unlike semantic memory which stores generalized knowledge, episodic memory preserves specific experiences. This allows agents to answer questions like “What did I work on yesterday?” or “How did I solve that similar problem last time?“

3.4 The Dream Cycle — Memory Consolidation

Markus features an autonomous memory consolidation process called the Dream Cycle, inspired by how human brains consolidate memories during sleep.

Trigger: memories.json > 50 entries AND not run today
    │
    ▼
LLM reviews all observations
    │
    ├── Merge duplicates
    ├── Prune outdated entries
    ├── Identify recurring patterns
    │
    ▼
Pattern appears 3+ times?
    ├── Yes → Promote to MEMORY.md
    └── No  → Retain or discard
    │
    ▼
Prune source entries from memories.json

The Dream Cycle ensures that:

Noise is filtered out — one-off events don’t clutter long-term memory
Patterns are promoted — recurring observations graduate to permanent knowledge
Storage is bounded — the observation buffer stays within reasonable limits

This is a critical feature for long-running agents that accumulate thousands of observations over time. Without consolidation, memory would become unwieldy and retrieval would degrade.

4. A2A Agent-to-Agent Communication Protocol

Agents don’t work in isolation — they communicate. Markus implements a proprietary A2A (Agent-to-Agent) protocol specifically designed for AI agent communication, built on top of a robust mailbox system.

Agent A                    Mailbox DB               Agent B
   │                          │                        │
   │── agent_send_message ──►│  (queued as INBOX)      │
   │                          │                        │
   │                          │── context switched ──► │
   │                          │   (picked from MAIL)   │
   │                          │                        │
   │◄── agent_send_message ──│─────────────────────────│
   │   (reply, wait_for_reply│                        │
   │    = true)              │                        │

4.1 Mailbox System

Every agent has a persistent mailbox stored in the database:

OUTBOX — Messages the agent has sent (for audit trail)
INBOX — Incoming messages waiting to be processed
MAIL — Processed messages (archived)
PARKED — Messages addressed but not yet picked up by their target agent

Messages are asynchronous by default — sending does not block either the sender or the receiver. The receiver processes messages on its own schedule during context switches.

4.2 Synchronous vs. Asynchronous Communication

Mode	Tool	Behavior	Use Case
Async	`agent_send_message` (default)	Fire-and-forget; sender continues immediately	Status updates, notifications, non-blocking coordination
Sync	`agent_send_message({ wait_for_reply: true })`	Sender blocks until receiver responds	Questions requiring immediate answers, decisions

The wait_for_reply: true mode is powerful but should be used judiciously — it pauses the sending agent’s execution until the receiver responds.

4.3 Attention Controller

Linked to the mailbox system is the Attention Controller, which determines how the agent spends its cognitive cycles. In each execution loop, the agent:

Checks for high-priority tasks (blockers, reviews, urgent messages)
Checks mailbox for new A2A messages
Processes pending tasks in priority order

This ensures that an agent doesn’t get stuck on a single task while urgent messages pile up.

5. Cognitive Preparation Pipeline (CPP)

The Cognitive Preparation Pipeline is the system that assembles the agent’s running context before every inference call. It operates at one of four depth levels:

Context Assembly

Incoming Request
       │
       ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ 1. System Message     — Role identity, behavioral rules, system context     │
│ 2. Profile            — Agent name, role, team, assigned skills             │
│ 3. Environment        — OS, tools, resources, installed packages           │
│ 4. State (varies)     — Task details, context window management             │
│ 5. Instructions       — Tool usage rules, process guidelines                │
│ 6. Notifications      — Task status, DAG resolution, system updates        │
│ 7. Skills             — Activated skill instructions / MCP definitions      │
│ 8. Memory             — Semantic (MEMORY.md) + Episodic (recall results)    │
│ 9. Conversation       — Recent tool calls, results, and decisions           │
│ 10. Long Context      — Past conversation history (compressed if large)    │
│ 11. Tools             — Available function definitions                      │
│ 12. Tool Results      — Results from tool calls in current cycle            │
│ 13+ ...               — Additional dynamic context                          │
└─────────────────────────────────────────────────────────────────────────────┘

CPP Depth Levels

Level	Name	Segments	When Used
0	Minimal	1-5 (skips memory search)	Simple tool calls, context switch decisions
1	Standard	1-12 (with memory recall)	Normal task execution, most operations
2	Extended	1-10 (without tools)	Planning, decomposition, reflection
3	Full	1-10 (with long context)	Complex reasoning, architecture decisions, task creation

The system optimizes context size by excluding irrelevant segments at shallow depths, reducing token consumption by up to 60% for simple operations.

KV-Cache Optimization

One of Markus’s more advanced features is multi-agent KV-cache management:

Each agent maintains an independent KV-cache session with the LLM provider
Agent-specific caching ensures inference efficiency across an agent’s session lifespan
Context switching between agents preserves cache state, avoiding redundant computation when the agent resumes its session

This is a deceptively important optimization. In a 10-agent team, without proper caching, an agent would pay cold-start latency every time it’s called upon. With KV-caching, context switches are near-instantaneous.

6. Task Governance System

Task governance is the backbone of Markus’s reliability layer. Every piece of work — from code changes to content creation — flows through a precisely defined state machine.

6.1 9-State Finite State Machine

  pending ─► in_progress ─► review ─► completed
     │            │            │
     │            ▼            │
     └───► blocked ◄───────────┘
     │                          
     ▼                         
  rejected                     
                               
     ┌────────────────────┐    
     │ archived ◄─────────│    
     └────────────────────┘    
                       
  (Also: failed, cancelled)

State	Description
`pending`	Created but not yet started
`in_progress`	Assigned and actively being worked on
`blocked`	Waiting on external dependency
`review`	Submitted for peer review
`completed`	Approved by reviewer
`failed`	Execution error, agent or tool failure
`rejected`	Requirement rejected, not goaled
`cancelled`	Explicitly cancelled by manager
`archived`	Stored for historical reference

6.2 Submit-Review-Merge (SRM) Workflow

This is Markus’s built-in quality gate. No deliverable can be completed without going through this cycle:

Submit: Worker agent calls task_submit_review with summary and deliverable references
Review: The designated reviewer_agent_id inspects the submission
Approve: Reviewer marks the task as completed (auto-completes)
Reject: Task returns to in_progress with feedback
Retry: Worker revises and resubmits

This enforces a four-eyes principle on every deliverable, preventing single-agent mistakes from reaching production.

6.3 Trust Levels and Approval Gates

Agents build trust over time based on their delivery track record:

Level	Autonomy	Review Requirement	Promotion Criteria
Probation	Low	All tasks reviewed by senior agent	Successful deliveries
Standard	Medium	Complex tasks require review	Consistent quality
Trusted	High	Significant tasks only need review	Track record of first-pass approvals
Senior	Full	Can review others’ work	Mentorship and quality leadership

Approval gates sit between task creation and execution, allowing organizations to define which operations require human sign-off before proceeding.

6.4 Workspace Isolation

Each agent gets an isolated workspace directory. The system enforces file access boundaries:

Agents can read/write anywhere on disk (OS permissions allowing)
But they cannot write to other agents’ workspace directories
Shared workspace readable by all agents, writable only to designated ones

This prevents an agent from accidentally (or intentionally) corrupting another agent’s files while allowing collaboration via the shared space.

7. The Heartbeat Mechanism

The Heartbeat is what transforms Markus from a reactive system (waiting for input) into a proactive workforce that initiates work autonomously.

7.1 How It Works

Heartbeat Tick (configurable interval: every 60–300s)
       │
       ▼
Check agent's mailbox for unread messages
       │
       ▼
Check pending tasks (any assigned but unstarted?)
       │
       ▼
Check scheduled tasks (any recurring tasks due?)
       │
       ▼
Check own patrol items (defined in HEARTBEAT.md)
       │
       ▼
If nothing urgent: process next pending task

Each agent has a HEARTBEAT.md file where it defines its own personal patrol checklist — things it should regularly check or monitor.

7.2 What Heartbeat Enables

Scenario	Without Heartbeat	With Markus Heartbeat
Codebase scan	Need to schedule via CI/CD	Agent scans daily on its own
Content publishing	Manual trigger required	Agent publishes on schedule
System monitoring	Requires external tools (Prometheus, Datadog)	Agent checks and reports hourly
Task management	Need human to assign	Agent picks up pending tasks autonomously

The Heartbeat makes Markus fundamentally different from chat-based AI tools. Your AI team doesn’t wait for you to give it work. It actively looks for things to do, within the boundaries you’ve set.

8. Context Engine & System Prompt Assembly

Every time an agent processes an input, the Context Engine assembles a system prompt from up to 24 segments. This dynamic assembly ensures that the agent always has the right context for the current operation without wasting tokens on irrelevant information.

Assembly Priority

The system prompt is built in this order:

System Message — Core role identity
Context Window Management — Conversation length tracking, compression triggers
State Overrides — Current attention state (pending mailbox items, task context)
Announcements — System-wide directives from human operators
Policies — Security, workspace, delivery rules
Team Working Norms — Project-specific procedures (NORMS.md)
Notifications — Task status changes, dependency resolutions
Skill Instructions — Active skill documentation
Memory — Curated knowledge (MEMORY.md) + goal-relevant memories
Human Feedback — Direct manager comments
Conversation History — Recent interaction log
Tool Definitions — Available function signatures
Tool Results — Return values from prior tool calls 14+ Context from Prior Sessions — Compressed session summaries when context window is at risk

Dynamic Compression: When the total context exceeds the LLM’s context window, the system automatically compresses the oldest segments (typically conversation history and prior session summaries) into a condensed form.

9. Conclusion

The Markus architecture represents a principled approach to building a production-grade multi-agent runtime. It doesn’t take shortcuts — memory is not a vector store hack, communication is not shared chat history, and governance is not an afterthought.

If you are building multi-agent systems for real work, the Markus architecture offers proven solutions to the hard problems:

Memory — Three-tier, self-consolidating system inspired by human cognition
Communication — A2A protocol with mailbox system and attention controller
Governance — 9-state task FSM with trust levels, approval gates, and SRM workflow
Proactivity — Heartbeat-driven autonomous operation
Extensibility — Skill system with Markus Hub marketplace

Markus is free and open source (AGPL-3.0), available at github.com/markus-global/markus.

Markus is an open source AI Workforce Platform. Install it today with curl -fsSL https://markus.global/install.sh | bash.

Keywords: Markus architecture, multi-agent system, AI agent memory, A2A protocol, agent-to-agent communication, cognitive architecture, task governance, Heartbeat system, agent runtime, Tulving memory, dream cycle, context engine, agent orchestration, open source AI platform

On this page

Deep Dive into Markus Architecture: Memory, A2A Protocol & Multi-Agent Runtime

1. Introduction

2. The Three-Layer Architecture

2.1 Web UI (Presentation Layer)

2.2 Org Manager (API / Governance Layer)

2.3 Agent Runtime (Core Cognitive Layer)

3. Tulving Three-Tier Memory System

3.1 Procedural Memory — “How to Act”

3.2 Semantic Memory — “What I Know”

3.3 Episodic Memory — “What Happened”

3.4 The Dream Cycle — Memory Consolidation

4. A2A Agent-to-Agent Communication Protocol

4.1 Mailbox System

4.2 Synchronous vs. Asynchronous Communication

4.3 Attention Controller

5. Cognitive Preparation Pipeline (CPP)

Context Assembly

CPP Depth Levels

KV-Cache Optimization

6. Task Governance System

6.1 9-State Finite State Machine

6.2 Submit-Review-Merge (SRM) Workflow

6.3 Trust Levels and Approval Gates

6.4 Workspace Isolation

7. The Heartbeat Mechanism

7.1 How It Works

7.2 What Heartbeat Enables

8. Context Engine & System Prompt Assembly

Assembly Priority

9. Conclusion

Share this post