Metadata-Version: 2.4
Name: local-agent-core
Version: 0.1.0
Summary: Standalone agentic framework for local LLMs via Ollama — reliable tool calling, session persistence, and loop guards
License: MIT
Project-URL: Homepage, https://github.com/chibokocl/local-agent-core
Project-URL: Repository, https://github.com/chibokocl/local-agent-core
Project-URL: Bug Tracker, https://github.com/chibokocl/local-agent-core/issues
Project-URL: Changelog, https://github.com/chibokocl/local-agent-core/blob/main/CHANGELOG.md
Keywords: ollama,agent,llm,tool-calling,local-llm,agentic,openai-compatible,qwen,flask
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.27
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"

# local-agent-core

[![PyPI version](https://badge.fury.io/py/local-agent-core.svg)](https://pypi.org/project/local-agent-core/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A standalone Python framework that makes **Qwen3-8B behave reliably as an agent** when running via Ollama. It solves the three production problems you will hit within the first day of building a Qwen3 tool-calling loop:

| Problem | Root Cause | Fix |
|---|---|---|
| Model gets stuck / loops forever | No termination conditions | `guards.py` — MaxTurns + Budget + Repetition guards |
| Tool call results not handled cleanly | No orphan injection on crash | `tool_result.py` — always injects a result, even on failure |
| No session persistence between queries | History reconstructed per request | `session_store.py` — `Session` holds history in-memory |

Architecture is borrowed from Claude Code's `QueryEngine.ts`, `query.ts`, `Task.ts`, and `Tool.ts`.

---

## Install

```bash
pip install local-agent-core
```

**Prerequisites:** Ollama must be running locally with Qwen3:8b pulled.

```bash
ollama serve          # terminal tab 1
ollama pull qwen3:8b  # first time only
```

---

## Quick Start

```python
from local_agent_core import AgentSession, GuardTripped

agent = AgentSession(system_prompt="You are a helpful assistant.")

@agent.tool(
    name="calculator",
    description="Evaluate a math expression",
    parameters={
        "type": "object",
        "properties": {"expression": {"type": "string"}},
        "required": ["expression"]
    }
)
def calculator(expression: str) -> str:
    return str(eval(expression, {"__builtins__": {}}))

response = agent.chat("What is 42 * 7?")   # → "The result is 294."
response2 = agent.chat("Double that.")      # history preserved automatically
```

---

## Flask Integration

```python
from flask import Flask, request, jsonify
from local_agent_core import AgentSession, GuardTripped, DiskSessionStore

app = Flask(__name__)
store = DiskSessionStore("/var/data/sessions")

SYSTEM_PROMPT = "You are a helpful assistant."

@app.route("/chat", methods=["POST"])
def chat():
    body = request.json
    session_id = body.get("session_id")   # None = new session
    message    = body["message"]

    agent = AgentSession(
        system_prompt=SYSTEM_PROMPT,
        session_id=session_id,
        store=store,
        max_turns=15,
    )
    _register_tools(agent)   # attach your tools here

    try:
        response = agent.chat(message)
        return jsonify({
            "response": response,
            "session_id": agent.session_id,
            "guards": agent.guard_summary(),
        })
    except GuardTripped as g:
        return jsonify({
            "response": f"I had to stop: {g.reason}",
            "session_id": agent.session_id,
            "guard_tripped": g.guard_name,
        })
```

---

## API Reference

### `AgentSession`

The only class you need to import in application code.

```python
AgentSession(
    system_prompt           = "",               # Injected as role=system at turn 0
    model                   = "qwen3:8b",
    base_url                = "http://localhost:11434/v1",
    max_turns               = 20,               # Hard loop ceiling
    max_tokens_budget       = 32_000,           # Cumulative token ceiling per session
    max_tokens_per_response = 4096,             # Per-response output limit
    temperature             = 0.0,              # 0.0 = deterministic tool calling
    session_id              = None,             # Provide to resume existing session
    store                   = None,             # InMemorySessionStore by default
)
```

| Method | Purpose |
|---|---|
| `agent.chat(message)` | Send message, get response. Mutates session history. |
| `@agent.tool(name, description, parameters)` | Register a tool via decorator |
| `agent.register_tool(ToolDef)` | Register a pre-built ToolDef |
| `agent.guard_summary()` | Returns dict with turns/tokens/tool_calls used |
| `agent.history()` | Full raw message list |
| `agent.session_id` | The session's UUID string |
| `agent.reset_guards()` | Reset guard counters without clearing history |

---

### Loop Guards

Three independent loop-breakers. Any one raising `GuardTripped` terminates the loop.

```python
LoopGuards(
    max_turns            = 20,      # Raise after this many model responses
    max_tokens           = 32_000,  # Raise when cumulative tokens exceed this
    repetition_threshold = 3,       # Raise when same tool+args called this many times
)
```

**Tuning guide:**

| Task type | Recommended `max_turns` |
|---|---|
| Simple Q&A | 5 |
| Single tool lookup | 8 |
| Multi-step analysis | 15 |
| Complex coding / research | 30–50 |

**Catching guard trips:**

```python
from local_agent_core import GuardTripped

try:
    response = agent.chat(user_input)
except GuardTripped as g:
    print(g.guard_name)   # "MaxTurns" | "Budget" | "Repetition"
    print(g.reason)       # human-readable explanation
```

---

### Session Stores

```python
from local_agent_core import InMemorySessionStore, DiskSessionStore

# In-memory (default) — fast, resets on process restart
store = InMemorySessionStore()

# Disk-based — survives restarts, good for single-server deployments
store = DiskSessionStore("/var/data/sessions")
```

For multi-worker production (gunicorn with >1 worker), implement `RedisSessionStore` with the same `get()`, `create()`, `save()` interface as `DiskSessionStore`.

---

### Why temperature=0.0?

Deterministic output means the model makes the same tool call decision given the same history. With temperature > 0, a model might randomly decide not to use a tool, making the agent unreliable.

### Why strip Qwen3 reasoning?

Qwen3:8b includes a `reasoning` field in every response (~150–300 tokens of chain-of-thought). If re-injected into history, a 10-turn conversation wastes 2,000+ tokens on reasoning the model never needs to see again. `OllamaClient.extract_message()` strips it before adding to session history.

### Why not LangChain / LlamaIndex?

Both add abstraction layers that hide the `finish_reason` state machine and make it harder to inspect and fix broken message history. This framework exposes the raw OpenAI message format directly — you always know exactly what is being sent to the model.

---

## Observed Qwen3:8b Behaviour

| Metric | Value |
|---|---|
| Tokens for simple Q&A | ~150 total (15 prompt, 135 completion incl. reasoning) |
| Tokens for tool call (42*7) | ~666 total across 2 turns |
| Tool call format | Standard OpenAI — `finish_reason: "tool_calls"`, `arguments` as JSON string |
| Reasoning field | Always present, ~100–200 tokens, stripped by this library |
| Temperature=0 consistency | Deterministic across repeated runs |

---

## Known Limitations

- `requires_confirmation` flag in `ToolDef` is defined but not yet wired into the loop.
- `RedisSessionStore` not yet implemented — needed for multi-worker Flask (gunicorn with >1 worker).
- No async support — `OllamaClient` uses `httpx.Client` (sync). For FastAPI or async Flask, replace with `httpx.AsyncClient` and add `async/await` to `loop.py`.
- `BudgetGuard.max_tokens=32_000` is conservative — Qwen3:8b context window is 128k. Raise this for long research tasks.

---

## Running the Tests

Tests run against a live Ollama instance (not mocked — that is intentional).

```bash
pip install local-agent-core
ollama serve && ollama pull qwen3:8b
python -m pytest tests/ -v
```

---

## License

MIT
