Metadata-Version: 2.4
Name: clovis
Version: 0.5.8
Summary: cloooooo — SGLang + RAG hybrid + tools + router + structured outputs + eval
Author: Clovis Sfeir
License: MIT
Keywords: ai,llm,local-ai,ollama,openai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: bm25s>=0.2
Requires-Dist: ddgs>=0.1
Requires-Dist: fastapi>=0.111
Requires-Dist: fastembed>=0.3
Requires-Dist: httpx>=0.27
Requires-Dist: jsonschema>=4.0
Requires-Dist: lancedb>=0.6
Requires-Dist: ollama>=0.3
Requires-Dist: pillow>=10.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pymupdf>=1.24
Requires-Dist: python-docx>=1.1
Requires-Dist: rich>=13.0
Requires-Dist: trafilatura>=2.0
Requires-Dist: typer>=0.12
Requires-Dist: uvicorn[standard]>=0.30
Description-Content-Type: text/markdown

# clovis

<p align="center">
  <strong>Local LLM API · Web Search · Deep Research · RAG · Embeddings · Structured Outputs</strong>
</p>

<p align="center">
  <a href="https://pypi.org/project/clovis/"><img src="https://img.shields.io/pypi/v/clovis?color=blue" alt="PyPI version"></a>
  <a href="https://pypi.org/project/clovis/"><img src="https://img.shields.io/pypi/pyversions/clovis" alt="Python versions"></a>
  <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License"></a>
  <a href="https://pypi.org/project/clovis/"><img src="https://img.shields.io/pypi/dm/clovis" alt="Downloads"></a>
</p>

---

**clovis** is a Python client and production-ready API server for local LLMs, built on top of [SGLang](https://github.com/sgl-project/sglang) and Ollama. It ships with multi-step web research, a full RAG pipeline, vector embeddings, reranking, structured JSON outputs, vision, and an agentic deep-research mode — all accessible via a single HTTP endpoint.

## Features

- **Simple inference** — one-line calls with streaming, negative prompts, and extended reasoning
- **Web search** — live SearXNG results injected into context, date-aware
- **Deep thinking** — multi-step agentic research pipeline (MiroFlow) with source citations
- **Ultra deep thinking** — multi-axis research with automated gap analysis, 280+ sources synthesized into a structured report
- **RAG** — ingest PDF, DOCX, TXT documents; semantic search over your corpus
- **Embeddings** — 768-dim dense vectors via nomic-embed-text-v1.5
- **Reranking** — cross-encoder reranking of document candidates
- **Structured output** — JSON Schema-constrained generation
- **Vision** — image description from URL, file path, or base64
- **Auto-routing** — automatic mode selection based on query type
- **Conversation memory** — short-term history per conversation ID

---

## Installation

```bash
pip install clovis
```

**Requirements:** Python 3.10+ · SGLang or Ollama running locally

---

## Quick start

```python
from clovis import cloooooo

ai = cloooooo()  # connects to SGLang on localhost:61005

# Direct call
response = ai("Explain transformer architecture")
print(response)

# With options
response = ai(
    "Write a sonnet about entropy",
    negative_prompt="no rhymes",
    thinking=True,            # enables extended chain-of-thought
    context="You are a physicist who loves poetry.",
)

# Streaming
for token in ai.stream("Describe the Big Bang in detail"):
    print(token, end="", flush=True)

# Multi-turn conversation
conv = ai.conversation(context="You are a senior software engineer.")
conv("Explain dependency injection")
conv("Show me a Python example")  # remembers previous turn
conv("How would you test it?")
```

---

## API server

Start the server:

```bash
clovis serve --port 8000
clovis serve --port 8000 --key sk-your-secret-key   # with API key auth
```

All endpoints accept `Content-Type: application/json`. Streaming responses use `text/plain`.

---

### `POST /ia` — Universal endpoint

The main endpoint. Handles all inference modes.

```bash
curl -X POST http://localhost:8000/ia \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is quantum entanglement?", "use_web": true}'
```

#### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | `str` | **required** | The question or instruction |
| `mode` | `str` | `null` | `"simple"` · `"deep_thinking"` · `"ultra_deep_thinking"` |
| `use_web` | `bool` | `false` | Inject live web search results with current date |
| `thinking` | `bool` | `false` | Enable extended reasoning (chain-of-thought) |
| `stream` | `bool` | `false` | Stream tokens via `text/plain` |
| `use_memory` | `bool` | `false` | Load and save conversation history |
| `conversation_id` | `str` | `null` | Key for conversation memory |
| `context` | `str` | `null` | System-level context injected before the prompt |
| `negative_prompt` | `str` | `null` | Instructions for what to avoid |

#### Response

```json
{
  "response": "Quantum entanglement is a phenomenon where..."
}
```

For `deep_thinking` and `ultra_deep_thinking`, the response includes:

```json
{
  "answer": "...",
  "sources": ["https://...", "https://..."],
  "model_used": "miroflow:Qwen/Qwen3-32B-AWQ",
  "fallback_used": false
}
```

---

### Modes

#### `simple` — Direct inference

Fast, direct LLM call. Optionally augmented with web search (`use_web: true`) or reasoning (`thinking: true`).

```python
import httpx

r = httpx.post("http://localhost:8000/ia", json={
    "prompt": "Latest news on fusion energy",
    "use_web": True,
    "thinking": True,
})
print(r.json()["response"])
```

---

#### `deep_thinking` — Agentic web research

Multi-step research pipeline. Performs web searches, reasons over the results, and returns a structured answer with source citations. Designed for complex questions that require up-to-date information.

```python
r = httpx.post("http://localhost:8000/ia", json={
    "prompt": "What are the geopolitical implications of AGI development?",
    "mode": "deep_thinking",
}, timeout=300)

data = r.json()
print(data["answer"])       # full structured answer
print(data["sources"])      # list of URLs cited
print(data["fallback_used"])  # False = MiroFlow pipeline used
```

Streaming mode returns progress updates then the final JSON:

```bash
curl -X POST http://localhost:8000/ia \
  -d '{"prompt": "Impact of interest rates on tech stocks", "mode": "deep_thinking", "stream": true}'

# [deep_thinking... 5s]
# [deep_thinking... 10s]
# ...
# {"answer": "...", "sources": [...], "fallback_used": false}
```

---

#### `ultra_deep_thinking` — Multi-axis deep research

The most thorough mode. Decomposes the question into independent research axes, runs parallel searches on each, identifies knowledge gaps, fills them with additional targeted searches, then synthesizes a comprehensive structured report. Typically produces 10 000–15 000 character reports with 250–300 unique sources.

```python
r = httpx.post("http://localhost:8000/ia", json={
    "prompt": "How does reinforcement learning from human feedback (RLHF) work?",
    "mode": "ultra_deep_thinking",
    "stream": True,
}, timeout=600)

for chunk in r.iter_text():
    print(chunk, end="", flush=True)
```

Streaming output example:

```
[axe:Definition and mechanism] researching...
[axe:Definition and mechanism] OK — 5 832 chars, 36 sources
[axe:Historical context] researching...
...
[gap analysis round 1/2] 5 gaps identified
[axe:Gap-1.1] researching...
...
[synthesis] 15 sections · 63 000 chars · 288 sources...
{"answer": "## RLHF: Complete Technical Overview\n\n...", "sources_count": 281}
```

Presets (configurable via `/ultra_deep_thinking` endpoint):

| Preset | Axes | Depth | Gap rounds | Searches/axis |
|--------|------|-------|-----------|---------------|
| `fast` | 3 | 2 | 1 | 2 |
| `deep` *(default)* | 5 | 3 | 2 | 3 |
| `ultra` | 8 | 4 | 3 | 3 |

---

### `GET /health` — Server status

```bash
curl http://localhost:8000/health
```

```json
{
  "status": "ok",
  "version": "0.5.6",
  "model": "Qwen/Qwen3-32B-AWQ",
  "sglang_url": "http://localhost:61005",
  "modes": ["simple", "search", "thinking", "deep_thinking", "ultra_deep_thinking", "embed", "rerank", "vision"]
}
```

---

### `POST /embed` — Text embeddings

Generate 768-dimensional dense vectors (nomic-embed-text-v1.5).

```python
r = httpx.post("http://localhost:8000/embed", json={
    "texts": ["Hello world", "Machine learning basics", "Deep neural networks"],
    "prefix": "search_document",   # or "search_query"
})
data = r.json()
print(data["dim"])         # 768
print(len(data["embeddings"]))  # 3
```

---

### `POST /rerank` — Document reranking

Re-order documents by relevance to a query using a cross-encoder.

```python
r = httpx.post("http://localhost:8000/rerank", json={
    "query": "machine learning optimization",
    "documents": [
        "Gradient descent is an optimization algorithm for ML",
        "The weather in Paris is sunny today",
        "Adam optimizer adapts learning rates per parameter",
        "Football match results from last weekend",
    ],
    "top_k": 3,
})
for item in r.json()["results"]:
    print(f"{item['score']:.3f}  {item['document'][:60]}")
```

---

### `POST /structured` — JSON Schema output

Guarantee structured output conforming to any JSON Schema.

```python
schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "year": {"type": "integer"},
        "genres": {"type": "array", "items": {"type": "string"}},
        "rating": {"type": "number"},
    },
    "required": ["title", "year", "genres", "rating"],
}

r = httpx.post("http://localhost:8000/structured", json={
    "prompt": "Describe the movie Inception",
    "schema": schema,
})
print(r.json()["result"])
# {"title": "Inception", "year": 2010, "genres": ["sci-fi", "thriller"], "rating": 8.8}
```

---

### `POST /vision` — Image understanding

Describe or analyze images from a URL, local file path, or base64 string.

```python
r = httpx.post("http://localhost:8000/vision", json={
    "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/280px-PNG_transparency_demonstration_1.png",
    "prompt": "What objects do you see in this image?",
})
print(r.json()["response"])
```

---

### `POST /rag/ingest` + `POST /rag/ask` — Retrieval-augmented generation

Ingest your documents and ask questions over them.

```python
# Ingest a document
httpx.post("http://localhost:8000/rag/ingest", json={
    "path": "/path/to/your/document.pdf"
})

# Ask a question
r = httpx.post("http://localhost:8000/rag/ask", json={
    "question": "What are the main conclusions of the report?",
    "top_k": 5,
})
print(r.json()["response"])
```

Supported formats: PDF, DOCX, TXT, Markdown.

---

### `POST /route` — Auto-routing

Automatically select the best inference mode for a given prompt.

```python
r = httpx.post("http://localhost:8000/route", json={
    "prompt": "Write a Python function to sort a list",
})
print(r.json())
# {"response": "...", "task_type": "code", "model": "Qwen/Qwen3-32B-AWQ", "confidence": 0.92}
```

---

### Other endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/deep_think` | POST | Standalone multi-iteration deep research with streaming |
| `/ultra_deep_thinking` | POST | Standalone ultra deep research with preset control |
| `/tools/exec` | POST | Execute a registered tool |
| `/tools` | GET | List available tools |
| `/eval/run` | POST | Evaluate model responses against expected answers |
| `/rag/sources` | GET | List ingested document sources |
| `/openapi.json` | GET | OpenAPI schema |
| `/docs` | GET | Interactive API documentation |

---

## CLI

```bash
# Direct question
clovis "Explain the Riemann hypothesis"

# With options
clovis "Write a haiku about code" --no "no syllable counting"
clovis "Solve this integral" --think
clovis "Latest AI news" --web

# Interactive REPL
clovis repl

# Start API server
clovis serve --port 8000
clovis serve --port 8000 --key sk-your-secret-key
```

---

## Configuration

```bash
export CLOVIS_LOCAL_URL="http://localhost:61005"   # SGLang or Ollama endpoint
export CLOVIS_MODEL="Qwen/Qwen3-32B-AWQ"          # model name
export CLOVIS_API_KEY="sk-..."                    # bearer token for the API server
export SEARXNG_URL="http://localhost:8888"        # SearXNG instance (enables web search)
```

---

## Async usage

All blocking operations can be run in async contexts via `asyncio.run_in_executor`:

```python
import asyncio
import httpx

async def ask(prompt: str) -> str:
    async with httpx.AsyncClient(timeout=300) as client:
        r = await client.post("http://localhost:8000/ia", json={"prompt": prompt})
        return r.json()["response"]

async def main():
    results = await asyncio.gather(
        ask("What is Python?"),
        ask("What is Rust?"),
        ask("What is Go?"),
    )
    for r in results:
        print(r[:80])

asyncio.run(main())
```

---

## Streaming

All endpoints support `"stream": true`. Streaming responses use `Content-Type: text/plain` and emit tokens progressively.

```python
import httpx

with httpx.stream("POST", "http://localhost:8000/ia", json={
    "prompt": "Write a detailed explanation of CRISPR-Cas9",
    "stream": True,
    "thinking": True,
}) as r:
    for chunk in r.iter_text():
        print(chunk, end="", flush=True)
```

For `deep_thinking` streaming, progress markers are emitted every 5 seconds:

```
[deep_thinking... 5s]
[deep_thinking... 10s]
[deep_thinking... 55s]
{"answer": "...", "sources": [...], "fallback_used": false}
```

---

## License

MIT — [Clovis Sfeir](https://github.com/clovis-sfeir)
