Back to Blog

How Markus Builds AI Teams That Actually Ship — Not Just Chat

Markus Engineering Team Markus Engineering Team

How Markus Builds AI Teams That Actually Ship — Not Just Chat

1. The ‘Alice in Wonderland’ Problem of LLMs

Large language models excel at conversation. Give one a question, and it returns a polished answer. Give it a code request, and it produces a working function. But ask it to build a feature, coordinate a code review, deploy to production, and report the outcome — and the illusion breaks.

This is the Alice in Wonderland problem of LLMs: strong at chatter, weak at delivery. A single AI agent can write code, but it cannot form a team. It cannot delegate a subtask to a specialist, review the result for quality, maintain context across a week-long project, or escalate a blocker to a human manager. The agent sits in a chat window, waiting for the next prompt — forever reactive, never proactive.

The industry response has been to build better tools. Agent frameworks, prompt chaining libraries, and LLM orchestrators all attempt to squeeze more capability out of a single agent. But the limit is not the agent. The limit is the organizational layer. A company of one — even a brilliant one — cannot match the throughput of a coordinated team with roles, governance, memory, and parallel execution.

Markus solves this problem by providing that organizational layer: an open-source AI workforce platform that runs complete AI teams, not just chat agents.


2. Problem: Single AI Agent Limitations

A single agent — whether Claude Code, Codex, ChatGPT, or any copilot — is effective at one task at a time. But as the Markus README states, single agents do not:

  • Coordinate. They cannot delegate subtasks to other agents or track dependencies across parallel workstreams.
  • Remember. Context evaporates when the session ends. Every new conversation starts from zero, even if the agent spent six hours on the same project yesterday.
  • Operate proactively. They wait for your prompt, every time. No agent checks on a long-running build or surfaces a blocker unless you explicitly ask.
  • Review each other. There is no quality gate between “agent said done” and “actually done.” The output of a single agent goes straight from LLM to user with no peer review.
  • Scale. Running ten agents means ten independent sessions with zero shared visibility. There is no dashboard, no task board, no unified view of what the team is doing.

These limitations are not fixable by improving the underlying LLM. They are structural. A single agent, no matter how capable, cannot be in two places at once. It cannot read its own output from a different context. It cannot enforce a review policy on itself.

The missing ingredient is an organizational layer — roles, teams, task boards, reviews, governance, persistent memory, and a dashboard that shows what every agent is doing. Markus provides exactly this layer.


3. Markus’s Solution: The Operating System for an AI Workforce

Markus is an open-source AI employee platform. It is not an agent framework or an LLM orchestrator. It is a platform for running AI companies.

The core differentiator between Markus and other approaches is three layers:

LayerWhat It ProvidesHow It Works
Agent RuntimeFull LLM-powered workers with built-in toolsEach agent talks directly to LLM APIs (no proxying to external CLI tools), uses shell, file I/O, git, web search, code analysis, and MCP servers.
Team LayerRole-based collaboration with A2A protocolAgents delegate tasks, spawn subagents, send structured messages, and collaborate through a built-in Agent-to-Agent protocol. Managers route work, workers execute.
Governance LayerProgressive trust, formal delivery, audit trailTrust levels (probation → standard → trusted → senior) control autonomy. Submit-review-merge pipeline enforces quality gates. Every action is logged.

Markus includes the full agent runtime — it does not wrap external agent tools. Each agent is a complete worker with identity (ROLE.md), skills (SKILLS.md), proactive tasks (HEARTBEAT.md), behavioral rules (POLICIES.md), and persistent memory (MEMORY.md). The platform works with any LLM provider: Anthropic, OpenAI, Google, DeepSeek, MiniMax, SiliconFlow, OpenRouter, Z.AI, Ollama, and more, with automatic failover between providers.

As documented in the Architecture, the TypeScript monorepo is organized into modular packages: core (agent runtime), org-manager (REST API, governance, task lifecycle), web-ui (React + Vite + Tailwind dashboard), storage (SQLite persistence), a2a (Agent-to-Agent protocol), gui (GUI automation with VNC + OmniParser), and comms (external platform bridges).


4. Core Technical Architecture

4.1 Three-Layer Memory System (Tulving)

Markus agents use a memory architecture based on Tulving’s cognitive classification, as specified in the Memory System documentation:

LayerStorageRole
ProceduralROLE.md + skillsHow the agent operates. Identity, behavioral rules, tool permissions.
SemanticMEMORY.md + memories.jsonWhat the agent knows. Agent-organized knowledge, consolidated through the Dream Cycle.
Episodicsessions/*.json (current) + SQLite agent_activities (past)What happened. Current conversation context plus searchable activity history accessed via the recall_activity tool.

Memory persists across restarts, not just within a single conversation. The Dream Cycle runs periodically to consolidate memories.json entries, merge duplicates, and promote recurring patterns into MEMORY.md. This means an agent that learned a project’s coding conventions on Tuesday applies that knowledge on Wednesday without being re-prompted.

Agents accumulate knowledge autonomously via memory_save and memory_update_longterm tools. The semantic layer (MEMORY.md) is a single unified section — the agent organizes it freely into whatever structure makes sense for its work, with no rigid system-enforced taxonomy.

4.2 Single-Thread Attention Model: The Mailbox System

Each Markus agent processes one thing at a time. This is enforced through the Mailbox and Attention Controller system, detailed in the Mailbox System documentation.

The AgentMailbox is a priority queue that accepts 13 message types: human_chat, a2a_message, task_status_update, task_comment, requirement_comment, mention, review_request, requirement_update, session_reply, daily_report, heartbeat, memory_consolidation, and system_event.

The AttentionController manages which item the agent focuses on at any moment. It uses:

  • Yield Points — Safe checkpoints in the tool loop where the agent pauses to evaluate interrupts.
  • Decision Engine — Produces one of six decisions: continue, preempt, cancel, merge, defer, drop. Heuristic rules handle clear cases (e.g., a user chat always preempts ongoing work). An LLM interrupt judge handles ambiguous cases with semantic understanding.
  • Triage with Read-Only Tools — When multiple items compete for attention, the triage LLM invokes read-only tools (task_list, task_get, requirement_list) to gather context before deciding priority.

No LLM call bypasses the attention controller. Every invocation flows through the mailbox, ensuring deterministic behavior and eliminating the memory contamination and cognitive interference that concurrent architectures cause.

4.3 Cognitive Preparation Pipeline

Before the main LLM call, agents run a four-phase Cognitive Preparation Pipeline (CPP), grounded in Kahneman’s Dual Process Theory and Baddeley’s Working Memory model, as specified in the Cognitive Architecture documentation:

  1. Appraisal — A lightweight LLM call assesses the situation from the agent’s perspective and produces a context preparation plan. “What do I need to know to handle this?”
  2. Retrieval — Executes the plan: searches activity history, semantic memory, task context, and team status.
  3. Reflection — A lightweight LLM call processes retrieved context through the agent’s persona, extracting patterns, lessons, and warnings.
  4. Assembly — Code merges the prepared context into the system prompt for the main LLM call.

Four cognitive depth levels control preparation intensity:

  • D0 Reflexive: No preparation (heartbeat acknowledgments, simple responses).
  • D1 Reactive: Appraisal only (most chats, A2A messages).
  • D2 Deliberative: Full pipeline (task execution, complex questions).
  • D3 Meta-cognitive: Full pipeline plus post-response evaluation (high-stakes decisions).

A backend developer and a project manager receiving the same stimulus produce different appraisal plans, different retrieval queries, and different reflections — because the CPP prompts are persona-aware and state-aware. The same data, filtered through different roles, yields different relevant context.

4.4 Heartbeat Mechanism: Agents Work While You Sleep

Agents are not reactive. The HeartbeatScheduler drives periodic check-ins on a configured schedule. During each heartbeat, the agent operates under the “Patrol, Don’t Build” principle:

  • Checks active tasks and updates stale states via task_list.
  • Retries failed tasks.
  • Processes background completion notifications.
  • Saves insights and sends proactive messages via notify_user.
  • Creates tasks for work that requires heavy implementation.

The heartbeat system includes infinite-loop protection through a configurable tool-iteration safety cap (default 200 iterations). Background process completions from background_exec sessions are drained and injected into the agent’s session as [BACKGROUND PROCESS COMPLETED] notifications.

This is the mechanism that transforms an agent from a chat assistant into a proactive digital employee that works around the clock.


5. Team Collaboration in Practice

5.1 A2A Protocol: Agent-to-Agent Communication

Agents communicate through a built-in Agent-to-Agent (A2A) protocol. Any agent can send a structured message to any other agent via agent_send_message. The message arrives in the target agent’s mailbox, is triaged by the AttentionController, and is processed at the appropriate cognitive depth.

This enables a manager-worker architecture: a Manager agent delegates tasks to Worker agents, monitors progress, and handles escalations. Workers report blockers, request clarification, and submit deliverables — all through the A2A protocol.

5.2 Subagent Spawning

For complex tasks, any agent can spawn lightweight LLM subagents using spawn_subagent or spawn_subagents. These are parallel workers that handle focused subtasks and return results to the parent agent. Subagent limits (parallelism, retry policy, preview truncation) are centralized in packages/shared/src/limits.ts.

The parent agent retains full control: it can spawn multiple subagents in parallel, collect their results, and synthesize the output. This is how single agents scale to handle tasks that would require a team.

5.3 Progressive Trust Levels

Markus implements progressive trust, as documented in the Architecture guide. Trust levels control what agents can do autonomously:

Trust LevelConditionPermissions
probationNew agent or score < 40All tasks require human approval
standardScore ≥ 40, ≥ 5 deliveriesRoutine tasks auto-approved
trustedScore ≥ 60, ≥ 15 deliveriesHigher autonomy, can review peers
seniorScore ≥ 80, ≥ 25 deliveriesHighest autonomy, key reviewer role

Agents earn trust through successful deliveries. A senior agent can review and approve work from standard agents. A probation agent requires human approval for every task. This creates a natural career progression that mirrors real engineering organizations.

5.4 Submit-Review-Merge Pipeline

Every deliverable passes through a formal pipeline:

Agent completes work
  → task_submit_review (summary, branch, test results)
  → Quality gates (TypeScript build, ESLint, Vitest)
  → Merge conflict pre-check (dry-run merge)
  → Task state → review
  → Reviewer accepts or requests revision
  → Accept → merge branch → completed
  → Revision → agent reworks → resubmit

This pipeline, specified in ARCHITECTURE.md §4.3, guarantees that no code reaches the “completed” state without passing TypeScript compilation, ESLint checks, and Vitest tests. The merge conflict pre-check runs a dry-run merge before the reviewer even sees the submission, eliminating “works on my machine” surprises.


6. Real Delivery Flow: From Task to Done

The delivery process is not a suggestion — it is enforced at the platform level.

  1. Task creation: An agent creates a task via task_create. The governance layer checks the approval tier: auto-approved tasks start in_progress immediately; manager-level tasks wait for the team manager; human-level tasks pause for manual approval.

  2. Execution: The assigned agent works in its isolated workspace (~/.markus/agents/<agentId>/workspace/). Cross-agent write isolation prevents interference — agents cannot write to other agents’ directories. Git commit metadata is auto-injected with agent ID, name, team, and org, making every commit fully traceable.

  3. Submission: The agent calls task_submit_review with a summary, branch details, and test results. The system transitions the task to review status and notifies the reviewer.

  4. Quality gates: The system automatically runs TypeScript type checking, ESLint linting, and Vitest unit tests. A dry-run merge verifies there are no conflicts.

  5. Review: The assigned reviewer (human or agent) examines the deliverable. They can accept (merges the branch, sets status to completed) or request revision (transitions back to in_progress for rework).

  6. Audit trail: Every action — every tool call, every LLM invocation, every state transition — is logged to the audit_logs table. The full timeline is visible in the web UI.

The task state machine (defined in ARCHITECTURE.md §3.6) follows a strict flow:

pending → in_progress → review → completed → archived

Tasks can also transition to blocked (on hold), failed (unrecoverable error), rejected (proposal denied before work), or cancelled (stopped after work began). Every state transition is recorded with a reason.

Stall detection automatically identifies stuck tasks: tasks in_progress for over 24 hours (or 2x the average completion time) trigger a warning to the agent and escalation to the manager. Tasks in review unhandled for over 12 hours escalate to the human. Tasks assigned but not started after 4 hours trigger a reminder.


7. Conclusion: Why Markus Is Different

Markus positions itself as an operating system for AI companies. The distinction matters:

FactorLangChain Agents / CrewAI / AutoGenMarkus
RuntimeOrchestrator with external CLI toolsFull embedded agent runtime with built-in tools
MemorySession-scoped or minimalThree-layer persistent memory (Tulving model)
ProactivityReactive — waits for user inputHeartbeat-driven, works autonomously
GovernanceNone or minimalProgressive trust, submit-review-merge, audit trail
Team modelManual orchestration codeA2A protocol, subagent spawning, manager/worker roles
Quality gatesNoneTypeScript, ESLint, Vitest enforced per submission
ObservabilityCLI logs per agentCentralized dashboard, real-time WebSocket events, full activity history

CrewAI and AutoGen provide valuable building blocks for multi-agent conversations. But they remain agent frameworks — they give you the components to build a multi-agent system. Markus is an agent platform — it gives you the running system, complete with governance, memory, collaboration protocols, and a delivery pipeline that enforces quality.

Markus is open source (AGPL-3.0) and installs with a single command:

curl -fsSL https://markus.global/install.sh | bash

No Docker. No PostgreSQL. No Go compiler. SQLite database, bundled web UI, zero external dependencies. Deploy it on a cloud server and manage your entire AI workforce from your phone — the responsive dashboard works on desktop and mobile.

The age of single-agent chat is over. The age of AI teams is here.

Get started on GitHub →


{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "How Markus Builds AI Teams That Actually Ship — Not Just Chat",
  "description": "Markus is an open-source AI workforce platform with governance, three-layer memory, peer review, and proactive agents that deliver real work — not just chat.",
  "keywords": ["AI agent platform", "multi-agent system", "open source AI team", "AI automation platform", "AI workforce", "agent runtime", "agent collaboration"],
  "datePublished": "2026-05-12",
  "author": {
    "@type": "Organization",
    "name": "Markus Engineering Team",
    "url": "https://github.com/markus-global/markus"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Markus",
    "url": "https://www.markus.global"
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://github.com/markus-global/markus"
  }
}
On this page

Share this post