Metadata-Version: 2.4
Name: letta-ejentum
Version: 0.1.0
Summary: Letta tools for the Ejentum Reasoning Harness. Four agent-callable functions (harness_reasoning, harness_code, harness_anti_deception, harness_memory) registered with a Letta server via tools.upsert_from_function. Each call returns a task-matched cognitive operation engineered in two layers: a natural-language procedure plus an executable reasoning topology (graph DAG with gates, parallel branches, and meta-cognitive exits).
Project-URL: Homepage, https://ejentum.com
Project-URL: Documentation, https://ejentum.com/docs/api_reference
Project-URL: Repository, https://github.com/ejentum/letta-ejentum
Project-URL: Issues, https://github.com/ejentum/letta-ejentum/issues
Project-URL: Changelog, https://github.com/ejentum/letta-ejentum/blob/main/CHANGELOG.md
Project-URL: Pricing, https://ejentum.com/pricing
Author-email: Ejentum <info@ejentum.com>
License-Expression: MIT
License-File: LICENSE
Keywords: agentic-ai,ai,anti-deception,cognitive-scaffold,ejentum,letta,llm,memgpt,reasoning-harness,stateful-agents
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <3.14,>=3.10
Requires-Dist: letta-client>=0.1.0
Requires-Dist: requests>=2.31.0
Provides-Extra: dev
Requires-Dist: build>=1.2.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.6.0; extra == 'dev'
Description-Content-Type: text/markdown

# letta-ejentum

[Letta](https://letta.com) tools for the [Ejentum](https://ejentum.com) Reasoning Harness. Four agent-callable functions (`harness_reasoning`, `harness_code`, `harness_anti_deception`, `harness_memory`) that upload to a Letta server via `client.tools.upsert_from_function`, plus a `register_ejentum_tools(client)` one-liner that does all four in a call.

Each operation in the Ejentum library (679 of them, organized across four harnesses) is engineered in **two layers**:

- a **natural-language procedure** the model can read, naming the steps to take and the failure pattern to refuse, and
- an **executable reasoning topology**: a graph-shaped plan over those steps. The plan names explicit decision points where the model branches, parallel branches that run and rejoin, bounded loops that run until convergence, named meta-cognitive moments where the model is asked to stop, look at its own working, and re-enter at a specific step, plus escape paths for when the prescribed plan stops fitting the task at hand.

The natural-language layer tells the model *what* to do. The topology layer pins down *how* those steps connect: where to decide, where to loop, where to stop and look at itself. Together they act as a persistent attention anchor that survives long context windows and multi-turn execution chains, which is precisely where a model's own reasoning template typically decays.

Letta is a particularly natural host for the harness because Letta agents are stateful by design (core memory, archival memory, recall memory). The `harness_memory` tool is meant for exactly this kind of long-running stateful context: sharpening an observation the agent has already formed about cross-turn drift.

## Installation

```bash
pip install letta-ejentum
```

## Configuration

This shim is different from most: harness functions execute on the **Letta server**, not in the caller's process. So `EJENTUM_API_KEY` must be set in the Letta deployment's environment, not the local shell. See the Letta docs on tool-env configuration for your deployment (self-hosted, Letta Cloud, etc.).

Get an Ejentum API key at <https://ejentum.com/pricing> (free and paid tiers).

## Usage

### One-liner (recommended)

```python
import os
from letta_client import Letta
from letta_ejentum import register_ejentum_tools

client = Letta(api_key=os.environ["LETTA_API_KEY"])

tools = register_ejentum_tools(client)
tool_ids = [t.id for t in tools]

agent = client.agents.create(
    model="anthropic/claude-sonnet-4-6",
    embedding="openai/text-embedding-3-small",
    tool_ids=tool_ids,
)

response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[
        {"role": "user", "content":
            "We've spent three months on the GraphQL gateway. "
            "Should we keep going or pivot to REST?"},
    ],
)
```

### Register one tool at a time

```python
from letta_client import Letta
from letta_ejentum import harness_anti_deception

client = Letta(api_key="...")

tool = client.tools.upsert_from_function(func=harness_anti_deception)
```

### Human-in-the-loop staging

```python
# Every harness call will require manual approval before execution
tools = register_ejentum_tools(client, default_requires_approval=True)
```

## The four tools

| Function | Best for | Library size |
|---|---|---|
| `harness_reasoning` | Analytical, diagnostic, planning, multi-step tasks spanning abstraction, time, causality, simulation, spatial, and metacognition | 311 operations |
| `harness_code` | Code generation, refactoring, review, and debugging across the software-engineering layer | 128 operations |
| `harness_anti_deception` | Prompts that pressure the agent to validate, certify, or soften an honest assessment | 139 operations |
| `harness_memory` | Sharpening an observation already formed about cross-turn drift. Filter-oriented, not write-oriented. Format query as `"I noticed X. This might mean Y. Sharpen: Z."` | 101 operations |

## What an injection looks like

A real `reasoning` mode response on the query `investigate why our nightly ETL job has started failing intermittently over the past two weeks; nothing in the code or schema has changed`:

```
[NEGATIVE GATE]
The server's response time was accepted as average, despite a suspicious
rhythm break in its timing pattern.

[PROCEDURE]
Step 1: Establish baseline timing profiles by extracting historical
durations and intervals for each event type. Step 2: Compare each observed
timing against its baseline and compute deviation magnitude. Step 3:
Classify anomalies as too fast, too slow, too early, or too late, and rank
by severity. ... Step 5: If deviation exceeds two standard deviations,
probe root cause by tracing upstream dependencies. ...

[REASONING TOPOLOGY]
S1:durations -> FIXED_POINT[baselines] -> N{dismiss_timing_deviations_
without_investigation} -> for_each: S2:compare -> S3:deviation ->
G1{>2sigma?} --yes-> S4:classify -> S5:probe_cause -> FLAG -> continue --no->
S6:validate -> continue -> all_checked -> OUT:anomaly_report

[TARGET PATTERN]
Establish timing baselines by extracting historical response intervals.
Compare current server response time to this baseline. ...

[FALSIFICATION TEST]
If no event timing is flagged as suspiciously fast or slow relative to
baseline, temporal anomaly detection was not active.

Amplify: timing baseline comparison; anomaly classification; security
context elevation
Suppress: average timing acceptance; outlier normalization
```

The agent reads both the natural-language `[PROCEDURE]` and the graph-logic `[REASONING TOPOLOGY]` before generating its user-facing answer. The bracketed labels are instructions to the agent, not content to display.

## Why the unusual design

Letta's tool model is fundamentally different from BaseTool subclasses (LangChain, agno, smolagents) or factory toolsets (PydanticAI, CrewAI). Tools are plain Python functions whose source is serialized and executed in Letta's sandboxed runtime. That forces:

- **Imports inside the function body**, not at module top. Letta's serializer captures what the function needs at execution time.
- **No constructor**, no instance state. Configuration lives in the Letta server's environment (`EJENTUM_API_KEY`).
- **Google-style docstrings**, which Letta parses into the OpenAI tool schema.

This shim respects all three constraints. The four functions are intentionally verbose (some imports and the API URL repeated four times) because each one must stand alone for Letta's serializer.

## API reference

```python
from letta_ejentum import (
    harness_reasoning,
    harness_code,
    harness_anti_deception,
    harness_memory,
    HARNESS_FUNCTIONS,           # tuple of all four
    register_ejentum_tools,      # uploads all four to a Letta server
)

register_ejentum_tools(
    client,                                # letta_client.Letta instance
    default_requires_approval: bool = False,
) -> list[letta_client.types.Tool]
```

Each function returns a string. Errors are returned as human-readable strings (no exceptions cross the function boundary, so an agent step never crashes the run).

> **MCP alternative.** This package uses Letta's tool-upload mechanism. Letta also has an MCP client that can consume the hosted Ejentum MCP endpoint at `https://api.ejentum.com/mcp` with Bearer auth. The PyPI package skips that wiring and keeps tool-attach down to one line.

## Compatibility

- Python 3.10+
- `letta-client>=0.1.0`
- `requests>=2.31.0` (only used as a soft dep for local testing; the actual `requests` call happens inside the function on the Letta server, which provides its own runtime)

## Resources

- Ejentum homepage: <https://ejentum.com>
- Pricing: <https://ejentum.com/pricing>
- API reference: <https://ejentum.com/docs/api_reference>
- "Why LLM Agents Fail" essay: <https://ejentum.com/blog/why-llm-agents-fail>
- "Under Pressure" research paper: <https://doi.org/10.5281/zenodo.19392715>
- Letta documentation: <https://docs.letta.com>

## License

[MIT](./LICENSE)


## Measured effects

The Ejentum harness is benchmarked publicly under CC BY 4.0 at [github.com/ejentum/benchmarks](https://github.com/ejentum/benchmarks):

- **ELEPHANT** sycophancy: 5.8% composite on GPT-4o (40 real Reddit scenarios)
- **LiveCodeBench Hard**: 85.7% to 100% on Claude Opus (28 competitive programming tasks)
- **Memory retention**: 50% fewer stale facts served (20-turn implicit state changes)
- Plus per-harness numbers across BBH/CausalBench/MuSR, ARC-AGI-3, SciCode, and perception tasks

Methodology, scenarios, run scripts, and raw outputs are all in-repo.
