Metadata-Version: 2.1
Name: gleanr
Version: 0.5.0
Summary: Gleanr - Session-scoped memory layer for AI agents
Project-URL: Homepage, https://github.com/Saket-Kr/gleanr
Project-URL: Documentation, https://github.com/Saket-Kr/gleanr#readme
Project-URL: Repository, https://github.com/Saket-Kr/gleanr
Project-URL: Changelog, https://github.com/Saket-Kr/gleanr/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/Saket-Kr/gleanr/issues
Author: Saket Kumar
License: MIT
License-File: LICENSE
Keywords: agents,ai,context,llm,memory
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: pydantic>=2.0
Provides-Extra: all
Requires-Dist: gleanr[anthropic,chroma,http,openai,postgres,sqlite,tiktoken]; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20; extra == 'anthropic'
Provides-Extra: chroma
Requires-Dist: chromadb>=0.4.0; extra == 'chroma'
Provides-Extra: dev
Requires-Dist: httpx>=0.25.0; extra == 'dev'
Requires-Dist: mypy>=1.6.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: examples
Requires-Dist: aiosqlite>=0.19.0; extra == 'examples'
Requires-Dist: httpx>=0.25.0; extra == 'examples'
Requires-Dist: python-dotenv>=1.0; extra == 'examples'
Requires-Dist: rich>=13.0; extra == 'examples'
Provides-Extra: http
Requires-Dist: httpx>=0.25.0; extra == 'http'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Provides-Extra: postgres
Requires-Dist: asyncpg>=0.29.0; extra == 'postgres'
Requires-Dist: pgvector>=0.2.0; extra == 'postgres'
Provides-Extra: sqlite
Requires-Dist: aiosqlite>=0.19.0; extra == 'sqlite'
Provides-Extra: tiktoken
Requires-Dist: tiktoken>=0.5.0; extra == 'tiktoken'
Description-Content-Type: text/markdown

# Gleanr — Agent Context Management System

**Session-scoped memory for AI agents that actually remembers.**

Gleanr is a Python SDK that gives your AI agents persistent, structured memory across conversations. Unlike RAG systems that retrieve external knowledge, Gleanr manages the agent's *internal state*—what it decided, what constraints it discovered, what failed, and what the user prefers.

```python
from gleanr import Gleanr

# Initialize with your session
gleanr = Gleanr(session_id="user_123", storage=storage, embedder=embedder)
await gleanr.initialize()

# Ingest conversation turns
await gleanr.ingest("user", "Let's use PostgreSQL for the database")
await gleanr.ingest("assistant", "Decision: We'll use PostgreSQL for its robust JSON support")

# Recall relevant context (token-budgeted)
context = await gleanr.recall("What database are we using?", token_budget=2000)
# Returns: [ContextItem(content="Decision: We'll use PostgreSQL...", markers=["decision"], ...)]
```

## Why Gleanr?

Current LLM applications treat agent memory as a search problem. But **agent memory is not knowledge retrieval**:

| Aspect | Knowledge Retrieval (RAG) | Agent Memory (Gleanr) |
|--------|--------------------------|---------------------|
| Scope | External corpus | Internal session state |
| Lifespan | Persistent | Session-bound with decay |
| Trigger | Explicit queries | Every turn, automatically |
| Content | Documents, facts | Decisions, constraints, outcomes |

After 30-40 turns, agents without proper memory:
- Forget decisions made earlier ("Didn't we decide to use PostgreSQL?")
- Repeat failed approaches
- Lose track of user preferences
- Contradict themselves

Gleanr solves this by automatically tracking what matters and recalling it when relevant.

## Key Features

- **Automatic marker detection** — Identifies decisions, constraints, failures, and goals in conversation
- **Token-budgeted recall** — Always fits in your context window
- **Episode management** — Groups related turns, triggers reflection on close
- **LLM reflection with consolidation** — Extracts durable facts from episodes and keeps them accurate as requirements evolve
- **Staleness management** — Consolidation detects changes first; facts describe current state, never carry stale references. Old versions are superseded (not deleted), maintaining an audit trail
- **Two-level deduplication** — Store-level dedup supersedes paraphrases after reflection; recall-time dedup filters near-duplicates before budget allocation
- **Contradiction detection** — Consolidation prompt identifies changes first and resolves conflicting facts
- **Observability** — Built-in reflection tracing for debugging and monitoring
- **Pluggable storage** — SQLite for persistence, in-memory for testing
- **Provider agnostic** — Works with OpenAI, Anthropic, Ollama, or any embedder
- **Evaluation harness** — Automated testing across 6 scenarios with latency profiling

## Installation

```bash
# Core package (in-memory storage, no provider dependencies)
pip install gleanr

# With specific extras
pip install "gleanr[sqlite]"         # SQLite storage backend
pip install "gleanr[openai]"         # OpenAI provider
pip install "gleanr[anthropic]"      # Anthropic provider
pip install "gleanr[all]"            # All optional dependencies
```

For development:

```bash
git clone https://github.com/Saket-Kr/gleanr.git
cd gleanr
pip install -e ".[dev]"
```

## Quick Start

### 1. Basic Usage

```python
import asyncio
from gleanr import Gleanr
from gleanr.storage import InMemoryBackend

async def main():
    gleanr = Gleanr(
        session_id="demo",
        storage=InMemoryBackend(),
        embedder=your_embedder,    # See Providers section
        reflector=your_reflector,  # LLM-based fact extraction
    )
    await gleanr.initialize()

    # Ingest conversation turns
    await gleanr.ingest("user", "I need help building a REST API")
    await gleanr.ingest("assistant", "Decision: We'll use FastAPI for its automatic OpenAPI docs.")

    # Recall relevant context (token-budgeted)
    context = await gleanr.recall("What framework are we using?")
    print(context[0].content)  # "We'll use FastAPI..."

    await gleanr.close()

asyncio.run(main())
```

Defaults work out of the box — no config needed. Gleanr automatically detects episode boundaries, extracts facts via reflection, deduplicates, and manages staleness.

### 2. With SQLite Persistence

```python
from gleanr.storage import get_sqlite_backend

SQLiteBackend = get_sqlite_backend()
storage = SQLiteBackend("./agent_memory.db")

gleanr = Gleanr(
    session_id="user_123",
    storage=storage,
    embedder=embedder,
    reflector=reflector,
)
```

Sessions persist across restarts. Resume anytime with the same `session_id`.

### 3. How Reflection Works

When episodes close, Gleanr reflects on the conversation and extracts durable facts. On subsequent episodes, **consolidation** kicks in — existing facts are sent alongside new turns, and the reflector returns actions (keep/update/add/remove) to keep facts accurate:

```
Episode 1 → Reflects → "Database is PostgreSQL", "API style is REST"
Episode 2 → User says "switch to MySQL"
         → Consolidates → UPDATE "Database is MySQL" (supersedes PostgreSQL fact)
                        → KEEP "API style is REST"
```

The old "PostgreSQL" fact is preserved with a `superseded_by` pointer for audit trail, but only the current "MySQL" fact appears in recall results.

**Short episode carry-forward**: If an episode has fewer turns than `min_episode_turns`, those turns are buffered and included in the next episode's reflection. No data is ever silently dropped.

## Memory Model

Gleanr uses a three-level memory hierarchy:

### L0: Raw Turns
Every message in the conversation. Short-lived, used for immediate context.

### L1: Episodes
Groups of related turns around a goal or task. Automatically detected via:
- Turn count thresholds
- Time gaps between messages
- Topic boundaries
- Tool result patterns

### L2: Semantic Facts
Extracted from episodes via LLM reflection. Captures:
- **Decisions** — Choices made and their rationale
- **Constraints** — Limitations discovered
- **Failures** — What didn't work (to avoid repeating)
- **Goals** — User objectives

### 4. Observability (Reflection Tracing)

```python
from gleanr import Gleanr, ReflectionTrace

def on_trace(trace: ReflectionTrace):
    print(f"Reflection on episode {trace.episode_id} ({trace.mode})")
    print(f"  Input: {trace.input_turn_count} turns")
    if trace.prior_facts:
        print(f"  Prior facts: {len(trace.prior_facts)}")
    print(f"  Saved: {len(trace.saved_facts)} facts")
    print(f"  Superseded: {len(trace.superseded_facts)} facts")
    print(f"  Elapsed: {trace.elapsed_ms}ms")

gleanr = Gleanr(session_id="demo", storage=storage, embedder=embedder, reflector=reflector)
gleanr.set_trace_callback(on_trace)
await gleanr.initialize()
```

Traces capture the full reflection pipeline: input turns, prior facts, scoped facts, raw LLM output (actions or facts), saved facts, and superseded facts. Use `trace.to_dict()` for JSON serialization.

## Markers

Gleanr uses markers to signal importance. They're auto-detected or manually specified:

```python
# Auto-detected from content
await gleanr.ingest("assistant", "Decision: We'll use React for the frontend")
# Marker "decision" auto-detected

# Manually specified
await gleanr.ingest("user", "Important: Never use eval() in this codebase", markers=["constraint"])
```

Built-in marker types:
- `decision` — Choices made
- `constraint` — Limitations/requirements
- `failure` — Things that didn't work
- `goal` — Objectives to achieve
- `custom:*` — Your own markers

Marked content gets priority in recall and influences fact extraction.

## Recall

Recall is automatic and token-budgeted:

```python
context = await gleanr.recall(
    query="authentication",
    token_budget=2000,  # Max tokens to return
)

for item in context:
    print(f"[{item.role}] {item.content}")
    print(f"  Score: {item.score}, Markers: {item.markers}")
```

Recall prioritizes:
1. High-relevance semantic matches
2. Marked content (decisions, constraints, etc.)
3. Current episode turns
4. L2 facts from past episodes

## Providers

### Embeddings

**OpenAI:**
```python
from gleanr.providers.openai import OpenAIEmbedder
embedder = OpenAIEmbedder(api_key="sk-...")
```

**Anthropic:**
```python
from gleanr.providers.anthropic import AnthropicEmbedder
embedder = AnthropicEmbedder(api_key="sk-ant-...")
```

**Ollama (local):**
```python
# See examples/test_agent/llm.py for implementation
embedder = OllamaEmbedder(client)
```

**Custom:**
```python
from gleanr.providers import Embedder

class MyEmbedder(Embedder):
    async def embed(self, texts: list[str]) -> list[list[float]]:
        # Your implementation
        ...

    @property
    def dimension(self) -> int:
        return 384
```

### Reflection

Reflection requires an LLM to extract facts:

```python
from gleanr.providers.openai import OpenAIReflector
reflector = OpenAIReflector(api_key="sk-...")
```

Or implement your own:
```python
from gleanr.providers import Reflector

class MyReflector(Reflector):
    async def reflect(self, episode, turns) -> list[Fact]:
        # Call your LLM to extract facts
        ...
```

## Test Agent

Gleanr includes a fully functional test agent powered by Ollama for interactive experimentation.

### Setup

```bash
# Install example dependencies
pip install -e ".[examples]"

# Start Ollama (if not running)
ollama serve

# Pull required models
ollama pull mistral:7b-instruct
ollama pull nomic-embed-text
```

### Run

```bash
# Start a new session
python -m examples.test_agent.run --session my_test

# Resume an existing session
python -m examples.test_agent.run --session my_test

# Debug mode (shows recall items and Gleanr timings)
python -m examples.test_agent.run --session my_test --debug
```

Commands:
- `/stats` — Show session statistics (turns, episodes, facts)
- `/recall <query>` — Test recall directly
- `/episode` — Close current episode (triggers reflection)
- `/debug` — Toggle debug mode
- `/help` — Show all commands
- `/quit` — Exit

## Evaluation Harness

Gleanr ships with an automated evaluation framework for measuring memory accuracy and latency across multi-turn conversations.

### Quick Test

```bash
# Sanity check — 1 iteration, 10 turns
python -m examples.evaluation.run --quick
```

### Full Evaluation

```bash
# Default: 80 sessions across 8 turn counts (10-80), decision_tracking scenario
python -m examples.evaluation.run

# Test consolidation accuracy with the progressive_requirements scenario
python -m examples.evaluation.run --scenario progressive_requirements --quick

# Custom configuration
python -m examples.evaluation.run \
    --scenario progressive_requirements \
    --turns 10,20,30,40 \
    --iterations 5 \
    --max-concurrent 3 \
    --verbose

# List all scenarios
python -m examples.evaluation.run --list-scenarios
```

### Available Scenarios

| Scenario | Tests |
|----------|-------|
| `decision_tracking` | Recall of architectural decisions over time |
| `constraint_awareness` | Recall of constraints when relevant |
| `failure_memory` | Avoiding repeated failures |
| `multi_fact_tracking` | Independent recall of multiple facts |
| `goal_tracking` | Persistence of goals and objectives |
| `progressive_requirements` | Fact updates via consolidation — probes check updated facts, not originals |

Reports are saved as JSON and Markdown in `./evaluation_output/`.

## Configuration

Defaults work for most use cases. You only need `GleanrConfig` if you want to tune behavior.

### Common Tuning

```python
from gleanr import GleanrConfig
from gleanr.core.config import RecallConfig, ReflectionConfig

config = GleanrConfig(
    recall=RecallConfig(
        default_token_budget=4000,     # Match to your LLM's context window
    ),
    reflection=ReflectionConfig(
        max_facts_per_episode=10,      # Increase for dense conversations
    ),
)
```

| Setting | Default | When to change |
|---------|---------|----------------|
| `recall.default_token_budget` | 4000 | Your LLM can handle more/less context |
| `reflection.max_facts_per_episode` | 10 | Episodes are very dense or very sparse |
| `episode_boundary.max_turns` | 6 | Episodes are closing too early/late |

### Full Reference

<details>
<summary>All configuration options</summary>

```python
from gleanr import GleanrConfig
from gleanr.core.config import EpisodeBoundaryConfig, RecallConfig, ReflectionConfig

config = GleanrConfig(
    auto_detect_markers=True,

    episode_boundary=EpisodeBoundaryConfig(
        max_turns=6,                # Close episode after N turns
        max_time_gap_seconds=1800,  # Close after 30min gap
        close_on_tool_result=True,  # Close after tool completion
    ),

    recall=RecallConfig(
        default_token_budget=4000,
        current_episode_budget_pct=0.2,  # Budget fraction for current episode
        min_relevance_threshold=0.3,     # Min embedding similarity for facts
        max_fact_candidates=20,          # Top-K facts after relevance filter
        facts_only_recall=True,          # Skip raw turns when facts exist
        current_episode_boost=0.2,       # Additive boost for current episode turns
        recall_dedup_threshold=0.85,     # Filter near-duplicate facts at recall
    ),

    reflection=ReflectionConfig(
        min_episode_turns=2,
        max_facts_per_episode=10,
        min_confidence=0.7,                       # Min confidence to save a fact
        max_active_facts=100,                     # Archive excess by confidence
        dedup_similarity_threshold=0.90,          # Save-time duplicate detection
        store_dedup_threshold=0.80,               # Post-reflection paraphrase dedup
        consolidation_similarity_threshold=0.15,  # Scoping for large fact sets
        consolidation_max_unscoped_facts=100,     # Send all facts below this count
        background=True,                          # Async reflection after episode close
    ),
)
```

</details>

## API Reference

### Gleanr Class

```python
class Gleanr:
    async def initialize() -> None
    async def ingest(role: str, content: str, markers: list[str] = None) -> Turn
    async def recall(query: str, token_budget: int = None) -> list[ContextItem]
    async def close_episode(reason: str = "manual") -> str | None
    async def get_session_stats() -> SessionStats
    async def close() -> None
```

### Models

```python
@dataclass
class Turn:
    id: str
    session_id: str
    episode_id: str
    role: Role
    content: str
    markers: list[str]
    token_count: int
    created_at: datetime

@dataclass
class ContextItem:
    content: str
    role: Role
    markers: list[str]
    score: float
    token_count: int
    source_type: str  # "turn", "fact"
    source_id: str
```

## Design Philosophy

Gleanr follows these principles:

1. **Store conclusions, not evidence** — Don't store raw RAG results or chain-of-thought. Store what was decided and why.

2. **Memory is always-on** — Unlike tools that are invoked, memory recall happens every turn automatically.

3. **Token budgets are hard limits** — Never exceed the budget. Gracefully degrade by dropping lower-priority items.

4. **Episodes are mandatory** — All turns belong to episodes. This enables reflection and provides natural grouping.

5. **Reflection is essential** — L2 facts are the maintained, current-truth representation of session state. Without reflection, recall degrades significantly over long conversations.

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=gleanr

# Type checking
mypy gleanr
```

## Roadmap

- [x] Consolidating reflection — Facts update as requirements change
- [x] Deduplication — Embedding-based duplicate prevention
- [x] Contradiction detection — Resolve conflicting facts during consolidation
- [x] Observability — Reflection tracing with full input/output visibility
- [x] Evaluation harness — Automated accuracy and latency testing
- [ ] L3 Themes — Cross-episode patterns and user profiles
- [x] Async reflection queue — Non-blocking fact extraction (background mode)
- [ ] Multi-agent support — Shared memory across agents
- [ ] Cloud storage backends — Redis, PostgreSQL

## License

MIT License — See [LICENSE](LICENSE) for details.

## Contributing

Contributions welcome! Please read the design docs in `PLAN.md` to understand the architecture before submitting PRs.

---

**Gleanr** — Because agents should remember what matters.
