Deep Dive into Markus Architecture: Memory, A2A Protocol & Multi-Agent Runtime
Markus Team Deep Dive into Markus Architecture: Memory, A2A Protocol & Multi-Agent Runtime
Meta Description: Explore the Markus multi-agent architecture — a production-grade cognitive runtime featuring Tulving three-tier memory, Agent-to-Agent (A2A) protocol, Cognitive Preparation Pipeline, 9-state task governance, and Heartbeat-driven autonomous agents. Learn how AI agents think, remember, and collaborate.
1. Introduction
As AI agents evolve from simple chatbots into autonomous digital employees, the underlying architecture must support memory persistence, inter-agent communication, task governance, and self-directed operation. Markus is an open-source multi-agent runtime that takes a principled approach to all of these challenges.
Inspired by cognitive psychology, distributed systems, and production-grade software engineering, Markus provides a complete infrastructure for deploying teams of AI agents that can remember past interactions, communicate with each other, delegate tasks, follow governance policies, and even initiate work on their own through a Heartbeat mechanism.
This deep dive explores the core architectural components that make Markus a compelling choice for developers building multi-agent systems in production. We’ll cover:
- The three-layer architecture (Web UI → Org Manager → Agent Runtime)
- Tulving three-tier memory (Procedural, Semantic, Episodic) and the Dream Cycle
- The A2A protocol (Agent-to-Agent communication) with mailbox system and attention controller
- Cognitive Preparation Pipeline (CPP) with four depth levels
- Task governance: 9-state state machine, approval gates, trust levels, and workspace isolation
- The Heartbeat mechanism for proactive agent behavior
2. The Three-Layer Architecture
Markus follows a clean separation of concerns with three distinct layers. Each layer has a clearly defined responsibility, and they communicate through well-defined interfaces.
┌──────────────────────────────────────────────────────────────┐
│ Web UI (React + Vite + Tailwind) │
│ Dashboard · Chat · Project Management · Builder · Hub │
└──────────────────────────────┬───────────────────────────────┘
│ REST + WebSocket
┌──────────────────────────────┴───────────────────────────────┐
│ Org Manager (API Server) │
│ Authentication · Task Governance · Project Management │
│ Reporting · User Management │
└──────────────────────────────┬───────────────────────────────┘
│
┌──────────────────────────────┴───────────────────────────────┐
│ Agent Runtime (Core Runtime) │
│ Agent · LLM Router · Tool System · Memory · Heartbeat │
│ Mailbox · Attention Controller · Context Engine │
└──────────┬──────────────────────────────┬────────────────────┘
│ │
┌──────────┴──────────────┐ ┌──────────┴────────────────────┐
│ Storage (SQLite/PostgreSQL)│ Comms Bridges │
│ │ Slack · Feishu · WhatsApp │
│ │ Telegram · WeCom │
└──────────────────────────┘ └─────────────────────────────┘
2.1 Web UI (Presentation Layer)
The frontend is built with React + Vite + Tailwind CSS, providing a responsive dashboard that works across desktop and mobile. It offers workspaces for chat, project management, agent configuration (Builder), capability discovery (Hub), and system settings. Communication with the server layer uses REST for standard CRUD operations and WebSocket for real-time updates like task status changes and agent messages.
2.2 Org Manager (API / Governance Layer)
The Org Manager serves as the central API server. It handles:
- Authentication & Authorization — user and agent identity management
- Task Governance — state machine transitions, approval routing, and policy enforcement
- Project Management — project creation, milestone tracking, deliverable management
- Reporting & User Management — audit logs, team organization
This layer is stateless from the agent perspective; it orchestrates governance without interfering with agent execution logic.
2.3 Agent Runtime (Core Cognitive Layer)
The Agent Runtime is where the actual intelligence lives. It manages:
- Agent lifecycle — creation, session management, sub-agent spawning
- LLM Router — intelligent model selection, failover, circuit breaker
- Tool System — tool registration, execution, sandboxing
- Memory System — Procedural, Semantic, and Episodic tiers
- A2A Communication — mailbox, message routing, delegation
- Heartbeat Scheduler — autonomous periodic task execution
- Context Engine — 24-segment system prompt assembly with KV-cache optimization
The runtime is designed so that multiple agents can coexist, each with isolated workspaces and independent memory profiles.
3. Tulving Three-Tier Memory System
Named after cognitive psychologist Endel Tulving, Markus implements a three-tier memory architecture that mirrors human memory systems. This is a defining feature of its cognitive architecture and one of the key differentiators from simpler agent frameworks.
┌─────────────────────────────────────────────────────────────────┐
│ Memory System Overview │
├─────────────┬─────────────────┬─────────────────────────────────┤
│ Procedural │ Semantic │ Episodic │
│ ("How to") │ ("Know what") │ ("What happened") │
├─────────────┼─────────────────┼─────────────────────────────────┤
│ ROLE.md │ MEMORY.md │ sessions/*.json (current) │
│ Skills │ memories.json │ SQLite agent_activities │
│ Behavior │ Long-term │ (past activities, on-demand) │
│ Rules │ Knowledge │ │
└─────────────┴─────────────────┴─────────────────────────────────┘
3.1 Procedural Memory — “How to Act”
Procedural memory encodes the agent’s identity, behavioral rules, and skill definitions. It answers the question: Who am I, and how should I behave?
| Aspect | Detail |
|---|---|
| Storage | role/ROLE.md + Skill definition files |
| Content | Agent identity, system prompts, behavior boundaries, action policies |
| Loading | Prepended to the system prompt at every inference cycle |
| Mutability | ROLE.md is immutable by the agent — only human users can modify core identity |
This layer ensures that an agent cannot rewrite its own fundamental character. It creates a stable anchor for identity, preventing drift during extended autonomous operation.
3.2 Semantic Memory — “What I Know”
Semantic memory stores factual knowledge, verified patterns, workflows, and domain expertise. It is the agent’s accumulated long-term knowledge base.
| Aspect | Detail |
|---|---|
| Storage | MEMORY.md (curated, always in prompt) + memories.json (observation buffer, searchable) |
| Capacity | MEMORY.md: 3,000 characters per section, 15,000 total |
| Key Tools | memory_save (save observation), memory_search (retrieve), memory_update_longterm (consolidate to MEMORY.md) |
Semantic memory is the primary mechanism for learning from experience. The agent saves observations, searches for relevant knowledge during tasks, and periodically consolidates important patterns into its permanent knowledge base.
3.3 Episodic Memory — “What Happened”
Episodic memory records the agent’s past experiences — tasks it performed, messages it received, sessions it participated in.
| Aspect | Detail |
|---|---|
| Storage | sessions/*.json (current + recent sessions), SQLite agent_activities (historical) |
| Retrieval | recall_activity tool — query by task, type, or keyword |
| Use Case | Contextual awareness, learning from past outcomes, continuity across sessions |
Unlike semantic memory which stores generalized knowledge, episodic memory preserves specific experiences. This allows agents to answer questions like “What did I work on yesterday?” or “How did I solve that similar problem last time?“
3.4 The Dream Cycle — Memory Consolidation
Markus features an autonomous memory consolidation process called the Dream Cycle, inspired by how human brains consolidate memories during sleep.
Trigger: memories.json > 50 entries AND not run today
│
▼
LLM reviews all observations
│
├── Merge duplicates
├── Prune outdated entries
├── Identify recurring patterns
│
▼
Pattern appears 3+ times?
├── Yes → Promote to MEMORY.md
└── No → Retain or discard
│
▼
Prune source entries from memories.json
The Dream Cycle ensures that:
- Noise is filtered out — one-off events don’t clutter long-term memory
- Patterns are promoted — recurring observations graduate to permanent knowledge
- Storage is bounded — the observation buffer stays within reasonable limits
This is a critical feature for long-running agents that accumulate thousands of observations over time. Without consolidation, memory would become unwieldy and retrieval would degrade.
4. A2A Agent-to-Agent Communication Protocol
Agents don’t work in isolation — they communicate. Markus implements a proprietary A2A (Agent-to-Agent) protocol specifically designed for AI agent communication, built on top of a robust mailbox system.
Agent A Mailbox DB Agent B
│ │ │
│── agent_send_message ──►│ (queued as INBOX) │
│ │ │
│ │── context switched ──► │
│ │ (picked from MAIL) │
│ │ │
│◄── agent_send_message ──│─────────────────────────│
│ (reply, wait_for_reply│ │
│ = true) │ │
4.1 Mailbox System
Every agent has a persistent mailbox stored in the database:
- OUTBOX — Messages the agent has sent (for audit trail)
- INBOX — Incoming messages waiting to be processed
- MAIL — Processed messages (archived)
- PARKED — Messages addressed but not yet picked up by their target agent
Messages are asynchronous by default — sending does not block either the sender or the receiver. The receiver processes messages on its own schedule during context switches.
4.2 Synchronous vs. Asynchronous Communication
| Mode | Tool | Behavior | Use Case |
|---|---|---|---|
| Async | agent_send_message (default) | Fire-and-forget; sender continues immediately | Status updates, notifications, non-blocking coordination |
| Sync | agent_send_message({ wait_for_reply: true }) | Sender blocks until receiver responds | Questions requiring immediate answers, decisions |
The wait_for_reply: true mode is powerful but should be used judiciously — it pauses the sending agent’s execution until the receiver responds.
4.3 Attention Controller
Linked to the mailbox system is the Attention Controller, which determines how the agent spends its cognitive cycles. In each execution loop, the agent:
- Checks for high-priority tasks (blockers, reviews, urgent messages)
- Checks mailbox for new A2A messages
- Processes pending tasks in priority order
This ensures that an agent doesn’t get stuck on a single task while urgent messages pile up.
5. Cognitive Preparation Pipeline (CPP)
The Cognitive Preparation Pipeline is the system that assembles the agent’s running context before every inference call. It operates at one of four depth levels:
Context Assembly
Incoming Request
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ 1. System Message — Role identity, behavioral rules, system context │
│ 2. Profile — Agent name, role, team, assigned skills │
│ 3. Environment — OS, tools, resources, installed packages │
│ 4. State (varies) — Task details, context window management │
│ 5. Instructions — Tool usage rules, process guidelines │
│ 6. Notifications — Task status, DAG resolution, system updates │
│ 7. Skills — Activated skill instructions / MCP definitions │
│ 8. Memory — Semantic (MEMORY.md) + Episodic (recall results) │
│ 9. Conversation — Recent tool calls, results, and decisions │
│ 10. Long Context — Past conversation history (compressed if large) │
│ 11. Tools — Available function definitions │
│ 12. Tool Results — Results from tool calls in current cycle │
│ 13+ ... — Additional dynamic context │
└─────────────────────────────────────────────────────────────────────────────┘
CPP Depth Levels
| Level | Name | Segments | When Used |
|---|---|---|---|
| 0 | Minimal | 1-5 (skips memory search) | Simple tool calls, context switch decisions |
| 1 | Standard | 1-12 (with memory recall) | Normal task execution, most operations |
| 2 | Extended | 1-10 (without tools) | Planning, decomposition, reflection |
| 3 | Full | 1-10 (with long context) | Complex reasoning, architecture decisions, task creation |
The system optimizes context size by excluding irrelevant segments at shallow depths, reducing token consumption by up to 60% for simple operations.
KV-Cache Optimization
One of Markus’s more advanced features is multi-agent KV-cache management:
- Each agent maintains an independent KV-cache session with the LLM provider
- Agent-specific caching ensures inference efficiency across an agent’s session lifespan
- Context switching between agents preserves cache state, avoiding redundant computation when the agent resumes its session
This is a deceptively important optimization. In a 10-agent team, without proper caching, an agent would pay cold-start latency every time it’s called upon. With KV-caching, context switches are near-instantaneous.
6. Task Governance System
Task governance is the backbone of Markus’s reliability layer. Every piece of work — from code changes to content creation — flows through a precisely defined state machine.
6.1 9-State Finite State Machine
pending ─► in_progress ─► review ─► completed
│ │ │
│ ▼ │
└───► blocked ◄───────────┘
│
▼
rejected
┌────────────────────┐
│ archived ◄─────────│
└────────────────────┘
(Also: failed, cancelled)
| State | Description |
|---|---|
pending | Created but not yet started |
in_progress | Assigned and actively being worked on |
blocked | Waiting on external dependency |
review | Submitted for peer review |
completed | Approved by reviewer |
failed | Execution error, agent or tool failure |
rejected | Requirement rejected, not goaled |
cancelled | Explicitly cancelled by manager |
archived | Stored for historical reference |
6.2 Submit-Review-Merge (SRM) Workflow
This is Markus’s built-in quality gate. No deliverable can be completed without going through this cycle:
- Submit: Worker agent calls
task_submit_reviewwith summary and deliverable references - Review: The designated
reviewer_agent_idinspects the submission - Approve: Reviewer marks the task as completed (auto-completes)
- Reject: Task returns to
in_progresswith feedback - Retry: Worker revises and resubmits
This enforces a four-eyes principle on every deliverable, preventing single-agent mistakes from reaching production.
6.3 Trust Levels and Approval Gates
Agents build trust over time based on their delivery track record:
| Level | Autonomy | Review Requirement | Promotion Criteria |
|---|---|---|---|
| Probation | Low | All tasks reviewed by senior agent | Successful deliveries |
| Standard | Medium | Complex tasks require review | Consistent quality |
| Trusted | High | Significant tasks only need review | Track record of first-pass approvals |
| Senior | Full | Can review others’ work | Mentorship and quality leadership |
Approval gates sit between task creation and execution, allowing organizations to define which operations require human sign-off before proceeding.
6.4 Workspace Isolation
Each agent gets an isolated workspace directory. The system enforces file access boundaries:
- Agents can read/write anywhere on disk (OS permissions allowing)
- But they cannot write to other agents’ workspace directories
- Shared workspace readable by all agents, writable only to designated ones
This prevents an agent from accidentally (or intentionally) corrupting another agent’s files while allowing collaboration via the shared space.
7. The Heartbeat Mechanism
The Heartbeat is what transforms Markus from a reactive system (waiting for input) into a proactive workforce that initiates work autonomously.
7.1 How It Works
Heartbeat Tick (configurable interval: every 60–300s)
│
▼
Check agent's mailbox for unread messages
│
▼
Check pending tasks (any assigned but unstarted?)
│
▼
Check scheduled tasks (any recurring tasks due?)
│
▼
Check own patrol items (defined in HEARTBEAT.md)
│
▼
If nothing urgent: process next pending task
Each agent has a HEARTBEAT.md file where it defines its own personal patrol checklist — things it should regularly check or monitor.
7.2 What Heartbeat Enables
| Scenario | Without Heartbeat | With Markus Heartbeat |
|---|---|---|
| Codebase scan | Need to schedule via CI/CD | Agent scans daily on its own |
| Content publishing | Manual trigger required | Agent publishes on schedule |
| System monitoring | Requires external tools (Prometheus, Datadog) | Agent checks and reports hourly |
| Task management | Need human to assign | Agent picks up pending tasks autonomously |
The Heartbeat makes Markus fundamentally different from chat-based AI tools. Your AI team doesn’t wait for you to give it work. It actively looks for things to do, within the boundaries you’ve set.
8. Context Engine & System Prompt Assembly
Every time an agent processes an input, the Context Engine assembles a system prompt from up to 24 segments. This dynamic assembly ensures that the agent always has the right context for the current operation without wasting tokens on irrelevant information.
Assembly Priority
The system prompt is built in this order:
- System Message — Core role identity
- Context Window Management — Conversation length tracking, compression triggers
- State Overrides — Current attention state (pending mailbox items, task context)
- Announcements — System-wide directives from human operators
- Policies — Security, workspace, delivery rules
- Team Working Norms — Project-specific procedures (NORMS.md)
- Notifications — Task status changes, dependency resolutions
- Skill Instructions — Active skill documentation
- Memory — Curated knowledge (MEMORY.md) + goal-relevant memories
- Human Feedback — Direct manager comments
- Conversation History — Recent interaction log
- Tool Definitions — Available function signatures
- Tool Results — Return values from prior tool calls 14+ Context from Prior Sessions — Compressed session summaries when context window is at risk
Dynamic Compression: When the total context exceeds the LLM’s context window, the system automatically compresses the oldest segments (typically conversation history and prior session summaries) into a condensed form.
9. Conclusion
The Markus architecture represents a principled approach to building a production-grade multi-agent runtime. It doesn’t take shortcuts — memory is not a vector store hack, communication is not shared chat history, and governance is not an afterthought.
If you are building multi-agent systems for real work, the Markus architecture offers proven solutions to the hard problems:
- Memory — Three-tier, self-consolidating system inspired by human cognition
- Communication — A2A protocol with mailbox system and attention controller
- Governance — 9-state task FSM with trust levels, approval gates, and SRM workflow
- Proactivity — Heartbeat-driven autonomous operation
- Extensibility — Skill system with Markus Hub marketplace
Markus is free and open source (AGPL-3.0), available at github.com/markus-global/markus.
Markus is an open source AI Workforce Platform. Install it today with curl -fsSL https://markus.global/install.sh | bash.
Keywords: Markus architecture, multi-agent system, AI agent memory, A2A protocol, agent-to-agent communication, cognitive architecture, task governance, Heartbeat system, agent runtime, Tulving memory, dream cycle, context engine, agent orchestration, open source AI platform