Architecture Guide

A comprehensive look at how Hippocortex captures, processes, stores, and retrieves agent memory. This guide covers the full system from event ingestion to context synthesis.

System Overview

Hippocortex is a deterministic memory layer for AI agents. It operates on a three-phase cycle: Capture raw agent events, Compile them into reusable knowledge artifacts, and Synthesize compressed context within token budgets.

High-Level Architecture

  Agent / SDK                    Hippocortex Cloud
  +-----------+                  +------------------------------------------+
  |           |   POST /capture  |  +--------+    +---------+    +--------+ |
  |  capture  | ----------------->  | Hono   | -> | Redis   | -> | Worker | |
  |  events   |   202 Accepted   |  | API    |    | Queue   |    | (Bull) | |
  |           |                  |  +--------+    +---------+    +---+----+ |
  +-----------+                  |                                   |      |
                                 |                              +----v----+ |
  +-----------+                  |                              | Postgres| |
  |           |  POST /synthesize|  +--------+    +---------+  | Events  | |
  | synthesize| <---------------->  | Hono   | -> | Semantic|  | Memories| |
  |  context  |   200 OK        |  | API    |    | Search  |  | Artifacts|
  |           |                  |  +--------+    +---------+  +---------+ |
  +-----------+                  +------------------------------------------+

Event Capture Pipeline

The capture pipeline is designed for high throughput and reliability. Events are accepted by the Hono API server, validated, and immediately queued to Redis via BullMQ. The API returns a 202 Accepted response without waiting for processing, keeping agent latency minimal.

Capture Flow

  SDK Client
     |
     | POST /v1/capture (or /v1/capture/batch)
     v
  +------------------+
  | Hono API Server  |
  |                  |
  | 1. Auth check    |  Bearer token -> tenant lookup
  | 2. Validate body |  type, sessionId, payload required
  | 3. Dedup check   |  idempotencyKey -> BullMQ job ID
  | 4. Queue event   |  BullMQ: captureQueue.add()
  | 5. Return 202    |  { jobId, status: "queued" }
  +--------+---------+
           |
           v
  +------------------+
  | Redis (BullMQ)   |
  |                  |
  | hx:bull:capture  |  Waiting -> Active -> Completed
  +--------+---------+
           |
           v
  +------------------+
  | Capture Worker   |
  |                  |
  | 1. Persist event |  -> PostgreSQL events table
  | 2. Salience score|  Assign relevance weight
  | 3. Metrics       |  Track capture throughput
  +------------------+

Event Types

Hippocortex accepts eight event types, each representing a different kind of agent interaction. The event type determines how the payload is interpreted during compilation.

messageConversational messages between user and agent. Contains role and content.
tool_callTool invocations by the agent. Records tool name and input parameters.
tool_resultResults returned from tool execution. Captures output and status.
file_editFile system modifications. Tracks path, diff, and action type.
test_runTest suite execution results. Records pass/fail counts and suite name.
command_execShell command executions. Captures command, exit code, and output.
browser_actionBrowser automation actions. Records URL, action type, and selector.
api_resultExternal API call results. Tracks endpoint, method, status, and response.

Idempotency and Deduplication

Events can include an idempotencyKey field. This key is used as the BullMQ job ID, so submitting the same event twice with the same idempotency key is a no-op. The second submission returns the result from the first. This is essential for at-least-once delivery semantics in distributed agent systems.

Batch Processing

The batch endpoint accepts up to 1,000 events per request. Each event is validated independently. Valid events are queued even if some fail validation. The response includes per-event error details for any rejected events. A batch ID groups all events from the same request.

Memory Compilation

The Memory Compiler transforms raw events into structured knowledge. This is the core intelligence of Hippocortex. Compilation runs asynchronously, triggered by POST /v1/learn requests.

Compilation Pipeline

  POST /v1/learn
       |
       v
  +--------------------+
  | Learn Queue (Bull) |
  +--------+-----------+
           |
           v
  +--------------------+
  | Compilation Worker |
  |                    |
  | Phase 1: COLLECT   |  Fetch unprocessed events from Postgres
  |   |                |  Group by session and agent
  |   v                |
  | Phase 2: ANALYZE   |  Pattern extraction
  |   |                |  - Sequence analysis (task schemas)
  |   |                |  - Failure correlation (playbooks)
  |   |                |  - Causal inference (patterns)
  |   |                |  - Decision extraction (policies)
  |   v                |
  | Phase 3: COMPILE   |  Transform patterns into artifacts
  |   |                |  - Merge with existing artifacts
  |   |                |  - Version tracking
  |   |                |  - Confidence scoring
  |   v                |
  | Phase 4: PERSIST   |  Write artifacts to Postgres
  |                    |  Update semantic memory index
  +--------------------+

Artifact Types

The compiler produces four types of knowledge artifacts, each serving a different purpose in agent reasoning:

Task Schema

Learned procedures and step sequences for recurring tasks. Extracted by analyzing successful tool call chains across sessions. Contains ordered steps, preconditions, expected outputs, and common variations.

Failure Playbook

Error patterns and recovery strategies compiled from past failures. Built by correlating error events with subsequent recovery actions. Contains error signatures, root causes, and proven remediation steps.

Causal Pattern

Cause-and-effect relationships identified across sessions. Discovered through temporal and contextual analysis of event sequences. Captures triggers, conditions, and outcomes.

Decision Policy

Decision criteria and preferences extracted from agent behavior. Identifies choice points where agents consistently select specific approaches. Contains decision contexts, options evaluated, and rationale.

Compilation Modes

Incremental (default)

Processes only events captured since the last compilation run. Efficient for regular updates. New patterns are merged with existing artifacts, updating confidence scores and evidence counts.

Full

Reprocesses all events from scratch. Useful after schema changes, major data imports, or when artifacts need complete reconstruction. More resource-intensive but produces the most accurate artifacts.

Context Synthesis

Context synthesis is the retrieval phase. Given a query (typically the current user message or task), the synthesizer searches across all memory layers, retrieves relevant entries, and packs them within a specified token budget. The result is a compressed context pack ready for injection into an LLM prompt.

Synthesis Pipeline

  POST /v1/synthesize
  { query: "deploy payment service", maxTokens: 4000 }
       |
       v
  +---------------------+
  | Synthesis Engine     |
  |                      |
  | 1. SEARCH            |  Semantic search over memories
  |    - Vector search   |  pgvector similarity search
  |    - Keyword match   |  Full-text search fallback
  |    - Recent recall   |  Time-weighted recency
  |                      |
  | 2. RETRIEVE          |  Fetch matching artifacts
  |    - Active only     |  Filter by is_active flag
  |    - Relevance rank  |  Score by query similarity
  |    - Top-K selection |  Limit to most relevant
  |                      |
  | 3. PACK              |  Token-budgeted assembly
  |    - Priority sort   |  Highest confidence first
  |    - Token counting  |  ~4 chars per token estimate
  |    - Budget check    |  Stop when limit reached
  |    - Drop overflow   |  Track entriesDropped
  |                      |
  | 4. RETURN            |  Context pack response
  |    - sections[]      |  Array of context entries
  |    - totalTokens     |  Actual tokens consumed
  |    - contextPack     |  Budget summary
  +---------------------+

Reasoning Sections

Synthesized context is organized into reasoning sections, each serving a specific purpose in agent decision-making:

proceduresKnown steps and workflows for the task at hand.
failuresPast failure patterns and how they were resolved.
decisionsPrevious decision contexts and outcomes.
factsEstablished facts and domain knowledge.
causalCause-and-effect relationships relevant to the query.
contextGeneral contextual information from past sessions.

Token Budget Management

The synthesizer uses a token estimation model (approximately 4 characters per token) to pack entries within the specified budget. Entries are prioritized by confidence score and relevance to the query. When the budget is exhausted, remaining entries are dropped and counted in entriesDropped. The compression ratio indicates how much source content was condensed into the output.

Knowledge Lifecycle

Artifacts follow a lifecycle that reflects their relevance and usage over time. This ensures the knowledge base stays current while preserving historical context.

Artifact Lifecycle

  +--------+     compilation      +--------+
  | Events | ------------------> | ACTIVE |
  +--------+                     +---+----+
                                     |
                     superseded by   |   no longer matches
                     newer version   |   current patterns
                                     |
                                +----v-------+
                                | DEPRECATED |
                                +----+-------+
                                     |
                                     |  retention period
                                     |  expires
                                     |
                                +----v-------+
                                | ARCHIVED   |
                                +------------+

  Status Meanings:
  ACTIVE       - Currently used in synthesis. Returned by queries.
  DEPRECATED   - Superseded or outdated. Not used in synthesis.
  ARCHIVED     - Retained for audit. Not queryable.

Versioning

Artifacts are versioned. Each compilation run can update existing artifacts by merging new evidence. The version counter increments on each update. When an artifact is fundamentally restructured (e.g., a task schema gains new steps), the old version is deprecated and a new artifact is created. Source events and memories are tracked for full provenance.

Confidence Scoring

Every artifact carries a confidence score between 0.0 and 1.0. Confidence is computed from the evidence count (how many source events support the pattern), consistency (how reliably the pattern appears), and recency (more recent evidence carries higher weight). TheminPatternStrength parameter in the learn API controls the minimum confidence threshold for artifact creation.

Predictive Context Assembly

Beyond reactive retrieval, Hippocortex supports predictive context assembly. By analyzing the current session's event sequence, the system can anticipate what knowledge the agent will need next and pre-assemble relevant context. This reduces synthesis latency for common workflows.

Example: If the agent is in the middle of a deployment workflow and has already executed steps 1-3 of a known task schema, the system can pre-load the failure playbooks for steps 4-5, so the agent has recovery strategies ready if something goes wrong.

Memory Fingerprints

Memory fingerprints enable cross-agent knowledge transfer. When an artifact is compiled, a content-based fingerprint is generated. This fingerprint can be used to identify equivalent knowledge across different agent instances or tenants (with explicit permission).

Use cases include: sharing best practices across a team of agents, bootstrapping a new agent with proven task schemas from an experienced agent, and detecting knowledge drift between agent versions.

Infrastructure

Infrastructure Stack

  +------------------------------------------------------------------+
  |                        Load Balancer / CDN                        |
  +------+----------------------------+------------------------------+
         |                            |
  +------v------+              +------v------+
  |  Hono API   |              |  Hono API   |      Stateless API
  |  Instance 1 |              |  Instance 2 |      servers
  +------+------+              +------+------+
         |                            |
         +------------+---------------+
                      |
         +------------v---------------+
         |      Redis (Upstash)       |
         |                            |
         |  - BullMQ job queues       |     Job queuing and
         |  - Rate limit counters     |     rate limiting
         |  - Session cache           |
         +------------+---------------+
                      |
         +------------v---------------+
         |   PostgreSQL (Neon)        |
         |                            |
         |  - events table            |     Primary data store
         |  - semantic_memories       |     with pgvector
         |  - artifacts table         |     extension
         |  - tenants, users          |
         |  - api_keys, billing       |
         |  - pgvector indexes        |
         +----------------------------+

PostgreSQL (Neon)

The primary data store. All events, memories, artifacts, and tenant data are stored in PostgreSQL with the pgvector extension for semantic search. Neon provides serverless PostgreSQL with autoscaling, branching, and point-in-time recovery. Connection pooling is managed at the application level with configurable pool sizes.

Redis (Upstash)

Used for job queuing (BullMQ), rate limiting, and ephemeral caching. Upstash Redis provides a serverless Redis instance with built-in persistence. The health endpoint monitors Redis memory usage, persistence status (AOF), and maxmemory policy (should be noeviction for queue data safety).

Hono API Server

The API layer is built on Hono, a lightweight web framework. Hono provides minimal overhead and runs on multiple runtimes (Node.js, Deno, Bun, Cloudflare Workers). The API is stateless with all state persisted in PostgreSQL and Redis, enabling horizontal scaling.

BullMQ Workers

Background workers process capture and learn jobs from Redis queues. Workers run as separate processes and can be scaled independently. Two queue types exist:

capture queueProcesses individual and batch capture events. Persists to PostgreSQL, assigns salience scores.
learn queueRuns compilation jobs. Analyzes events, extracts patterns, creates/updates artifacts.

Multi-Tenant Isolation

Every API request is authenticated and scoped to a tenant. Tenant isolation is enforced at multiple levels to prevent cross-tenant data access:

  • API Layer: Bearer token authentication resolves to a tenant ID. All subsequent operations are scoped to this tenant.
  • Database Layer: All queries include a tenant_id WHERE clause. There are no admin endpoints that bypass tenant scoping.
  • Queue Layer: Job payloads include the tenant ID, ensuring workers process events in the correct tenant context.
  • Search Layer: Semantic search is filtered by tenant ID before similarity matching, so results from other tenants are never returned.

Authentication Architecture

Hippocortex supports two authentication methods:

API Keys (Bearer Token)

Used for programmatic access. Keys are generated per-tenant with configurable permissions (read, write, admin). The raw key is shown once at creation. Server-side, keys are hashed with SHA-256 before storage. Keys come in two modes: hx_live_* for production and hx_test_* for development.

JWT (Dashboard)

Used for dashboard access. Access tokens (1 hour TTL) and refresh tokens (7 day TTL) are issued on login. Passwords are hashed with Argon2id. Rate limiting is applied per-email for login and registration attempts (5 attempts per 15 minutes).

Monitoring and Alerting

The system exposes comprehensive health and metrics endpoints. The health check reports subsystem status for PostgreSQL, Redis, and worker queues. Internal metrics collectors track capture throughput, synthesis latency, and queue backlog depth.

Health endpointGET /v1/health reports postgres, redis, worker, and alert status.
Prometheus metricsInternal Prometheus endpoint for scraping capture rates and latencies.
Alert engineConfigurable alerting for queue backlog, error rates, and system degradation.
BackpressureQueue depth monitoring with automatic worker scaling signals.

End-to-End Data Flow

Complete Data Flow

  1. CAPTURE
     Agent SDK ---> API ---> Redis Queue ---> Worker ---> PostgreSQL
                    202                                   (events table)

  2. LEARN
     API ---> Redis Queue ---> Compiler Worker
              202              |
                               +--> Analyze events
                               +--> Extract patterns
                               +--> Create/update artifacts ---> PostgreSQL
                               |                                 (artifacts)
                               +--> Update semantic memories --> PostgreSQL
                                                                 (memories)

  3. SYNTHESIZE
     API ---> Semantic Search (pgvector) ---> Artifact Lookup
     200      |                               |
              +--> Rank by relevance          +--> Filter active
              +--> Token budget packing       +--> Merge into sections
              |
              +--> Return context pack
                   { sections[], totalTokens, contextPack }

  4. USE
     Agent receives context pack
     +--> Inject into system prompt
     +--> LLM generates response with memory context
     +--> Capture response event (back to step 1)