Metadata-Version: 2.4
Name: cogito-ergo
Version: 0.3.0
Summary: Memory retrieval for AI agents — integer-pointer fidelity guarantee, 93.4% R@1 on LongMemEval_S (hybrid tier). By Hermes Labs.
Project-URL: Homepage, https://hermes-labs.ai
Project-URL: Repository, https://github.com/roli-lpci/cogito-ergo
Project-URL: Documentation, https://github.com/roli-lpci/cogito-ergo#readme
Project-URL: Issues, https://github.com/roli-lpci/cogito-ergo/issues
Project-URL: Changelog, https://github.com/roli-lpci/cogito-ergo/blob/main/CHANGELOG.md
Author-email: Hermes Labs <roli@hermes-labs.ai>
License: MIT
License-File: LICENSE
Keywords: agent,agents,ai,chromadb,fidelity,integer-pointer,llm,mem0,memory,ollama,rag,retrieval,vector-store,zer0dex
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: chromadb>=0.5.0
Requires-Dist: mem0ai>=1.0.5
Provides-Extra: dev
Requires-Dist: httpx; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: hybrid
Requires-Dist: bm25s>=0.2; extra == 'hybrid'
Provides-Extra: ollama
Description-Content-Type: text/markdown

# cogito-ergo

Memory retrieval for AI agents with structural fidelity. Two purpose-built paths:
the existing `/recall` scores **85% R@1** on cogito's internal 31-case atomic-fact
eval; the new `/recall_hybrid` tier scores **93.4% R@1** on LongMemEval_S
(session-retrieval benchmark, see [Hybrid recall](#hybrid-recall-9334-r1-on-longmemeval_s)).
Different workloads, different tools. The filter LLM outputs only integers — it
structurally cannot corrupt, rephrase, or hallucinate into the content returned
to your agent.

[![PyPI version](https://img.shields.io/pypi/v/cogito-ergo)](https://pypi.org/project/cogito-ergo/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.org/project/cogito-ergo/)
[![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![Made by Hermes Labs](https://img.shields.io/badge/made%20by-Hermes%20Labs-purple)](https://hermes-labs.ai)

---

## The Problem

Every retrieval system that uses an LLM to select or rank memories has the same failure mode: the LLM rephrases on the way out. You store `"auth tokens expire after 3600 seconds"` and get back `"authentication has a configurable timeout."` The specific fact is gone.

- **Raw vector search** returns candidates by similarity, but precision plateaus at 50–60% R@1 on real workloads
- **LLM-based re-rankers** improve relevance but generate text — they summarize, merge, or hallucinate into the content your agent receives
- **Full RAG pipelines** add latency and cost without solving the fidelity problem

cogito-ergo fixes this structurally. The filter LLM outputs only integer pointers (`[3, 7, 12]`). The server dereferences them to verbatim stored text. The LLM never sees, generates, or touches memory content. Fidelity is architectural, not a prompting convention.

| Mode | R@1 | hit@any | Latency |
|---|---|---|---|
| **Combined (snapshot + recall)** | **85%** | **96%** | 1303ms |
| recall only | 63% | 81% | 1197ms |
| recall_b (zero-LLM) | 56% | 96% | 127ms |

31 test cases, qwen3.5:2b filter model, fully local. $0/month.

---

## Architecture

```
                    ┌─────────────────────────────────────────────┐
                    │              cogito-ergo server              │
                    │                 :19420                       │
                    └──────────────────┬──────────────────────────┘
                                       │
                    ┌──────────────────▼──────────────────────────┐
                    │             SNAPSHOT LAYER                   │
                    │   Compressed markdown index (~741 tokens)    │
                    │   Built once from corpus via `cogito snapshot`│
                    │   Returned with /recall — no vector search   │
                    │   Solves cross-reference queries (0%→50% R@1)│
                    └──────────────────┬──────────────────────────┘
                                       │
                    ┌──────────────────▼──────────────────────────┐
                    │         STAGE 1 — recall_b (zero-LLM)       │
                    │   Query decomposition → sub-queries          │
                    │   Stop-word stripping + bigrams + trigrams   │
                    │   Vocab expansion (cogito calibrate)         │
                    │   Up to 8 sub-queries, merged with RRF       │
                    │   Latency: ~127ms                            │
                    └──────────────────┬──────────────────────────┘
                                       │ up to 100 candidates
                    ┌──────────────────▼──────────────────────────┐
                    │       STAGE 2 — integer-pointer filter       │
                    │   Filter LLM sees: [1] text  [2] text ...   │
                    │   Filter LLM outputs: [3, 7, 12]  ← ONLY    │
                    │   Server fetches candidates[3], [7], [12]   │
                    │   Returns: verbatim stored text              │
                    │   Added latency: ~1176ms                     │
                    └─────────────────────────────────────────────┘
```

The filter LLM never generates memory text. Out-of-range integers are silently ignored. Fidelity is a structural property of the pipeline, not a prompting convention.

---

## Benchmarks

Measured 2026-03-28. 31 test cases, qwen3.5:2b as filter model (local Ollama).

| Mode | R@1 | hit@any | MRR | Latency |
|---|---|---|---|---|
| Combined (snapshot + recall) | **85%** | **96%** | **0.878** | 1303ms |
| recall only | 63% | 81% | — | 1197ms |
| recall_b (zero-LLM) | 56% | 96% | — | 127ms |
| snapshot only | 41% | — | — | — |

Key results:
- Snapshot layer contributes **+15% hit@any** vs recall-only
- Cross-reference queries: recall alone gets **0% R@1**; combined gets **50%**
- recall_b matches combined hit@any (96%) at 10x lower latency — use it when cost matters

---

## Quick Start

**1. Install**

```bash
pip install cogito-ergo
```

**2. Pull Ollama models**

```bash
ollama pull mistral:7b
ollama pull nomic-embed-text
```

Requires a running [Ollama](https://ollama.ai) instance.

**3. Configure filter LLM**

```bash
# Option A: direct Anthropic key
export ANTHROPIC_API_KEY=sk-ant-...

# Option B: any OpenAI-compatible gateway (local LM Studio, OpenClaw, etc.)
export COGITO_FILTER_ENDPOINT=http://your-gateway/
export COGITO_FILTER_TOKEN=your-token
export COGITO_FILTER_MODEL=anthropic/claude-haiku-4-5
```

Or via `.cogito.json` in your working directory:

```json
{
  "filter_endpoint": "http://your-gateway/",
  "filter_token": "your-token",
  "filter_model": "anthropic/claude-haiku-4-5"
}
```

**4. Start the server**

```bash
cogito-server
# or: cogito server
```

**5. First recall**

```bash
cogito recall "what did we decide about the auth architecture"
```

```bash
curl -X POST http://127.0.0.1:19420/recall \
  -H "Content-Type: application/json" \
  -d '{"text": "auth architecture decisions"}'
```

---

## Hybrid recall (93.4% R@1 on LongMemEval_S)

`recall_hybrid` is an opt-in retrieval path that ports the architecture
benchmarked at **93.4% R@1 on LongMemEval_S** (from a 56% mem0 baseline). It
stays in the same integer-pointer fidelity contract as `/recall` — only
indices cross the LLM boundary.

```
Query
  │
  ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 1 — hybrid retrieval (zero-LLM)                          │
│   • sub-query decomposition + vocab expansion (recall_b logic) │
│   • dense retrieval with nomic search_query: / search_document:│
│     prefixes                                                    │
│   • BM25 over the candidate pool (bm25s, optional extra)       │
│   • Reciprocal Rank Fusion across runs                         │
│   • cosine-blended rerank against the original query           │
└───────────────────────────────────────────────────────────────┘
  │
  ▼
┌───────────────────────────────────────────────────────────────┐
│ Router (regex classifier)                                      │
│   • "you told me" / "you said"  → skip (keep Stage 1)          │
│   • "how many" / "what date"    → call cheap filter            │
│   • everything else             → keep Stage 1 at filter tier  │
└───────────────────────────────────────────────────────────────┘
  │
  ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 2 — cheap filter  (tier="filter", default)               │
│   500-char snippets, integer-pointer output                    │
└───────────────────────────────────────────────────────────────┘
  │
  ▼
┌───────────────────────────────────────────────────────────────┐
│ Stage 3 — flagship rerank  (tier="flagship", opt-in)           │
│   2000-char snippets, stronger model, integer output           │
│   Called when Stage 1 confidence is low or filter failed       │
└───────────────────────────────────────────────────────────────┘
```

### Tier tradeoffs

| Tier | Latency | R@1 (LongMemEval_S) | External calls | When to use |
|---|---|---|---|---|
| `zero_llm` | ~500ms | — | none | latency-sensitive paths, cost-sensitive ops |
| `filter` | ~1300ms | 90%+ | cheap filter LLM | default; temporal/counting queries benefit |
| `flagship` | ~3500ms | **93.4%** | filter + flagship model | hard queries, long sessions, benchmark setup |

### Quick start

```bash
# Optional dependency for best BM25 fusion (zero deps fallback if absent)
pip install cogito-ergo[hybrid]

# Opt-in: set a filter endpoint (any OpenAI-compatible API)
export COGITO_FILTER_ENDPOINT=https://dashscope-intl.aliyuncs.com/compatible-mode/v1
export COGITO_FILTER_TOKEN=sk-your-key
export COGITO_FILTER_MODEL=qwen-turbo

# Optional: flagship tier (stronger model, 4x larger context window)
export COGITO_FLAGSHIP_ENDPOINT=https://dashscope-intl.aliyuncs.com/compatible-mode/v1
export COGITO_FLAGSHIP_TOKEN=sk-your-key
export COGITO_FLAGSHIP_MODEL=qwen-max
# Or simpler — if DASHSCOPE_API_KEY is set, the flagship tier auto-configures
export DASHSCOPE_API_KEY=sk-your-key

cogito recall-hybrid "auth architecture decisions" --tier filter
```

HTTP:

```bash
curl -X POST http://127.0.0.1:19420/recall_hybrid \
  -H "Content-Type: application/json" \
  -d '{"text": "how many auth migrations have we done", "tier": "flagship", "limit": 10}'
```

Python:

```python
from cogito.recall_hybrid import recall_hybrid
from cogito.config import load, mem0_config
from mem0 import Memory

cfg = load()
mem = Memory.from_config(mem0_config(cfg))
hits, method = recall_hybrid(mem, "auth tokens", user_id="agent", cfg=cfg, tier="filter")
```

Graceful degradation: if the filter endpoint isn't configured, `tier="filter"`
falls back to `zero_llm`. If the flagship endpoint isn't configured,
`tier="flagship"` falls back to `filter` (which itself may degrade to
`zero_llm`). Nothing raises on missing credentials.

### Regression notice

The hybrid path was validated at **93.4% R@1 on LongMemEval_S** (multi-turn
dialog retrieval with turn-level chunking and session-date scaffolds).
**On the internal 31-case eval** (which measures keyword-recall over a mem0
store of short atomic memories), `/recall_hybrid` scores lower on R@1 than
the existing `/recall` path — they solve slightly different problems. The
hybrid path wins on hit@any, semantic-gap queries, and multi-memory
aggregation; the existing path wins on prefix-style direct lookup. Default
behavior is unchanged — `/recall` still drives `cogito recall`. Use
`/recall_hybrid` or `cogito recall-hybrid` when you need the hybrid trait.

---


### Claude Code session memory

cogito-ergo v0.3.0 can ingest your Claude Code sessions and query them with the
same turn-pair chunking used in the LongMemEval benchmark.

**Why it matters:** The 93.4% R@1 benchmark result requires role-structured session
data (user+assistant turn pairs). Flat atomic memories don't have this structure.
Claude Code stores every session as role-structured JSONL — wiring it to cogito
uses the same retrieval architecture as the benchmark, though Claude Code sessions
have not been independently evaluated at the same scale.

**Install:**
```bash
# No new dependencies — uses existing chromadb + Ollama (nomic-embed-text)
```

**Ingest:**
```bash
# Preview (dry run)
python3 -m cogito.ingest_claude_sessions --since 2026-04-11 --dry-run

# Ingest last 7 days
python3 -m cogito.ingest_claude_sessions --since 2026-04-11

# Ingest all sessions (may take a few minutes)
python3 -m cogito.ingest_claude_sessions
```

**Query:**
```python
from cogito.recall_sessions import query_sessions, query_both

# Session history only
results = query_sessions("what did we discuss about LPCI last week?", top_k=3)

# Atomic facts + session history side-by-side (no auto-merge)
both = query_both("cogito architecture decisions")
```

**MCP tools (new in v0.3.0):**
- `cogito_recall_sessions` — query Claude Code sessions
- `cogito_recall_both` — 3 atomic + 3 session results side-by-side

**Expected accuracy:** ~80-93% depending on query specificity (same pipeline as
the benchmark; accuracy drops when querying across very long sessions because the
session embedding averages across all turn content).

**Privacy:** All data stays local. Embedding uses Ollama (localhost:11434, nomic-embed-text).
ChromaDB at `~/.cogito/store`. Nothing sent to any cloud API. Ingestion is explicit —
nothing runs automatically. Re-running ingest is idempotent (dedup via content hash).

Full demo: [`docs/claude-code-memory-demo.md`](docs/claude-code-memory-demo.md)


## HTTP API

All endpoints return JSON. Server runs on port `19420` by default.

### `GET /health`

```json
{"status": "ok", "count": 1484, "version": "0.2.0", "calibrated": true, "snapshot": true}
```

Fields: `count` = total memories in store; `calibrated` = vocab_map present; `snapshot` = snapshot.md exists.

---

### `GET /snapshot`

Returns the compressed index markdown built by `cogito snapshot`.

```json
{"snapshot": "## Projects\n- **cogito-ergo** — ...", "path": "/home/user/.cogito/snapshot.md"}
```

Returns 404 if no snapshot has been built yet.

---

### `POST /query`

Narrow vector search. L2 threshold filter only. No LLM call. Fast.

Request:
```json
{"text": "query string", "limit": 5}
```

Response:
```json
{"memories": [{"text": "...", "score": 93.4}]}
```

---

### `POST /recall`

Broad search + integer-pointer filter. Two stages: zero-LLM RRF candidate pool, then cheap LLM selects by index.

Request:
```json
{"text": "query string", "limit": 50, "threshold": 400}
```

Response:
```json
{"memories": [{"text": "...", "score": 93.4}], "method": "filter"}
```

`method` field: `"filter"` = filter ran successfully; `"fallback_*"` = graceful degradation, all candidates returned instead. Possible fallbacks: `fallback_no_endpoint`, `fallback_unreachable`, `fallback_parse_error`, `fallback_error`.

---

### `POST /recall_b`

Zero-LLM recall only. Sub-query decomposition + RRF. 127ms latency. Same hit@any as combined (96%) at lower cost.

Request:
```json
{"text": "query string", "limit": 50}
```

Response:
```json
{"memories": [{"text": "...", "score": 0.016}], "method": "decompose_4_v"}
```

`method` field: `"decompose_N"` = N sub-queries ran; `"decompose_N_v"` = vocab expansion applied.

---

### `POST /recall_hybrid`

Hybrid BM25 + dense + RRF retrieval with tiered LLM escalation. Port of
the architecture that reached **93.4% R@1 on LongMemEval_S**. See
[Hybrid recall](#hybrid-recall-9334-r1-on-longmemeval_s) for the full
diagram and tradeoffs.

Request:
```json
{"text": "query string", "limit": 50, "tier": "filter", "top_k": 5}
```

`tier` is one of `"zero_llm"`, `"filter"` (default), `"flagship"`.
`top_k` is how many candidates the reranker sees (default 5).

Response:
```json
{"memories": [{"text": "...", "score": 0.72}], "method": "hybrid_12_bm25|filter"}
```

`method` field encodes the path taken (e.g. `"hybrid_24_bm25|default_s1"` =
Stage 1 order kept; `"hybrid_12_bm25|filter"` = cheap filter reranked;
`"hybrid_8_bm25|filter|flagship"` = both reranks ran; `"hybrid_6_nobm25_v|…"`
= bm25s not installed, vocab expansion active).

---

### `POST /store`

Write one memory verbatim. No extraction LLM. Agent decides the content.

This is the preferred write path for agent-curated content.

Request:
```json
{"text": "Switched from JWT to session tokens on 2026-03-27 due to compliance requirement", "id": "<optional uuid>"}
```

Response:
```json
{"id": "abc123...", "text": "Switched from JWT to session tokens..."}
```

---

### `POST /add`

Feed text through mem0's extraction LLM before storing. Extracts multiple atomic facts from unstructured input.

Use when you have raw/unstructured text and want automatic fact extraction.

Request:
```json
{"text": "free-form text to remember"}
```

Response:
```json
{"count": 3, "memories": ["extracted fact 1", "extracted fact 2", "extracted fact 3"]}
```

---

## CLI Reference

All CLI commands talk to the running HTTP server.

| Command | Description |
|---|---|
| `cogito recall "query"` | Two-stage recall via running server |
| `cogito recall "query" --limit 50 --raw` | Raw JSON output |
| `cogito recall-hybrid "query" --tier filter` | Hybrid BM25+dense+RRF recall (93.4% R@1 arch) |
| `cogito recall-hybrid "query" --tier flagship --top-k 5` | + flagship escalation on hard queries |
| `cogito query "query"` | Simple vector query, no filter |
| `cogito add "text"` | Add a memory via /add (mem0 extraction) |
| `cogito seed ~/notes/` | Bulk-seed from markdown files via /store |
| `cogito seed ~/notes/ --add` | Bulk-seed using /add (extraction mode) |
| `cogito seed ~/notes/ --dry-run` | Preview without writing |
| `cogito seed ~/notes/ --glob "*.txt"` | Custom file pattern |
| `cogito snapshot` | Build compressed index layer |
| `cogito snapshot --rebuild` | Force rebuild of snapshot |
| `cogito snapshot --dry-run` | Preview snapshot without writing |
| `cogito calibrate` | Build vocab bridge from corpus (one-time) |
| `cogito calibrate --dry-run` | Preview vocab mappings |
| `cogito health` | Check server status |
| `cogito server` | Start the server (alias for cogito-server) |
| `cogito-server --port 19420` | Start server directly |
| `cogito-server --config /path/to.json` | Start with explicit config file |

---

## Configuration

Priority: env vars > `.cogito.json` > defaults.

Config file is searched at `./.cogito.json` (cwd) then `~/.cogito/config.json`.

| Env var | Config key | Default | Description |
|---|---|---|---|
| `COGITO_PORT` | `port` | `19420` | Server port |
| `COGITO_USER_ID` | `user_id` | `"agent"` | Memory namespace (isolates stores) |
| `COGITO_FILTER_ENDPOINT` | `filter_endpoint` | — | OpenAI-compatible base URL for filter LLM |
| `COGITO_FILTER_TOKEN` | `filter_token` | — | Bearer token for filter endpoint |
| `COGITO_FILTER_MODEL` | `filter_model` | `anthropic/claude-haiku-4-5` | Filter LLM model name |
| `COGITO_FILTER_TIMEOUT_MS` | `filter_timeout_ms` | `12000` | Filter LLM timeout in ms |
| `ANTHROPIC_API_KEY` | `anthropic_api_key` | — | Direct Anthropic key (alternative to endpoint+token) |
| `COGITO_STORE_PATH` | `store_path` | `~/.cogito/store` | ChromaDB persistence path |
| `COGITO_COLLECTION` | `collection` | `cogito_memory` | ChromaDB collection name |
| `COGITO_OLLAMA_URL` | `ollama_url` | `http://localhost:11434` | Ollama base URL |
| `COGITO_LLM_MODEL` | `llm_model` | `mistral:7b` | LLM for fact extraction (/add) |
| `COGITO_EMBED_MODEL` | `embed_model` | `nomic-embed-text` | Embedding model |
| `COGITO_RECALL_LIMIT` | `recall_limit` | `50` | Candidate pool size for /recall and /recall_b |
| `COGITO_RECALL_THRESHOLD` | `recall_threshold` | `400.0` | L2 cutoff for /recall candidates |
| `COGITO_QUERY_THRESHOLD` | `query_threshold` | `250.0` | L2 cutoff for /query results |
| `COGITO_FLAGSHIP_ENDPOINT` | `flagship_endpoint` | — | OpenAI-compatible base URL for flagship rerank (recall_hybrid tier="flagship") |
| `COGITO_FLAGSHIP_TOKEN` | `flagship_token` | — | Bearer token for flagship endpoint |
| `COGITO_FLAGSHIP_MODEL` | `flagship_model` | — | Flagship model name (e.g. `qwen-max`) |
| `COGITO_FLAGSHIP_TIMEOUT_MS` | `flagship_timeout_ms` | `30000` | Flagship LLM timeout in ms |
| `DASHSCOPE_API_KEY` | — | — | If set, recall_hybrid auto-configures flagship to DashScope qwen-max |
| `COGITO_HYBRID_COSINE_WEIGHT` | `hybrid_cosine_weight` | `0.7` | Cosine vs RRF blend weight for hybrid retrieval (0..1) |

`filter_endpoint` accepts any OpenAI-compatible API: Anthropic gateway, LM Studio, Ollama's `/v1` compat layer, OpenClaw, etc.

For Ollama qwen3/qwen3.5 models used as filter, cogito automatically switches to the native Ollama `/api/chat` endpoint with `think: false` to suppress thinking mode.

---

## Python API

```python
from cogito.recall import recall
from cogito.recall_b import recall_b
from cogito.recall_hybrid import recall_hybrid
from cogito.config import load, mem0_config
from mem0 import Memory

cfg = load()  # reads .cogito.json + env vars
memory = Memory.from_config(mem0_config(cfg))

# Two-stage recall (recommended default)
memories, method = recall(memory, "auth architecture", user_id=cfg["user_id"], cfg=cfg)
for m in memories:
    print(m["text"])  # verbatim stored text, never rephrased

# Zero-LLM recall (fast path)
memories, method = recall_b(memory, "auth architecture", user_id=cfg["user_id"], cfg=cfg)

# Hybrid recall (BM25 + dense + RRF + tiered LLM; 93.4% R@1 on LongMemEval_S)
memories, method = recall_hybrid(
    memory, "auth architecture", user_id=cfg["user_id"], cfg=cfg,
    tier="filter",  # "zero_llm" | "filter" | "flagship"
)

print(method)  # e.g. "filter", "decompose_4_v", "hybrid_12_bm25|filter"
```

---

## Why integers?

When the filter LLM outputs only `[3, 7, 12]`:

- It cannot rephrase memory text — it never generates it
- It cannot hallucinate new facts into the output
- It cannot summarize two memories into one
- An out-of-range integer is silently ignored; it cannot inject noise

Compare to asking the LLM to "return the relevant passages" — even with careful prompting, LLMs will reword, compress, or merge content. The integer-pointer pattern makes fidelity a structural property of the pipeline, not a prompt engineering goal.

The filter prompt:
```
Output ONLY a JSON array of integers, ordered from most to least relevant.
Examples: [1, 4, 7]   or   []
```

The server then picks `candidates[i]` — verbatim stored text — for each valid integer `i`. The path from storage to retrieval contains no text generation.

---

## Setup Recommendations

**Before benchmarking or deploying — run zer0lint first.**

Ingestion quality directly limits retrieval quality. A poorly formatted memory store will underperform regardless of retrieval method. The technical extraction prompt baked into cogito's default config was validated to produce 0%→100% ingestion improvement via zer0lint diagnostics.

**Session start pattern (agent integration):**

```python
# 1. Load snapshot into context once at session start
import urllib.request, json
resp = urllib.request.urlopen("http://127.0.0.1:19420/snapshot")
snapshot = json.loads(resp.read())["snapshot"]
# inject snapshot into system prompt or first user message

# 2. Query per-message via /recall
# 3. Write new facts via /store (agent-curated) or /add (extraction)
```

**Calibrate for domain-specific vocabulary:**

```bash
cogito calibrate  # reads your corpus, writes vocab_map to .cogito.json
# then restart server to pick up new vocab_map
```

Calibration builds a plain-English → technical term bridge. Example: "how fast" → ["latency", "throughput", "ms"]. Improves recall_b on domain-specific queries without adding LLM calls.

---

## Built by Hermes Labs

cogito-ergo is part of the [Hermes Labs](https://hermes-labs.ai) AI agent tooling suite:

- **[zer0lint](https://github.com/roli-lpci/zer0lint)** — Memory extraction diagnostics. Run before benchmarking to verify store quality. The technical extraction prompt in cogito's default config was validated against zer0lint.
- **[zer0dex](https://github.com/roli-lpci/zer0dex)** — Dual-layer memory architecture pattern that cogito-ergo implements.
- **[lintlang](https://github.com/roli-lpci/lintlang)** — Static linter for AI agent tool descriptions and prompts
- **[Little Canary](https://github.com/roli-lpci/little-canary)** — Prompt injection detection
- **[Suy Sideguy](https://github.com/roli-lpci/suy-sideguy)** — Runtime policy enforcement for agents
- **cogito-ergo** — Two-stage memory retrieval ← you are here

---

## Roadmap

- [ ] Pluggable vector backends (pgvector, Qdrant, LlamaIndex)
- [ ] Pluggable extraction backends (non-Ollama)
- [ ] Session flush utility (end-of-session seeding)
- [ ] Benchmark harness as public CLI (`cogito bench`)
- [ ] Streaming /recall response

---

## License

MIT
