Metadata-Version: 2.4
Name: clovis
Version: 0.5.12
Summary: Serveur IA de Clovis — AKA cloclo
Author: Clovis Sfeir
License: MIT
Keywords: ai,llm,local-ai,ollama,openai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: bm25s>=0.2
Requires-Dist: ddgs>=0.1
Requires-Dist: fastapi>=0.111
Requires-Dist: fastembed>=0.3
Requires-Dist: httpx>=0.27
Requires-Dist: jsonschema>=4.0
Requires-Dist: lancedb>=0.6
Requires-Dist: ollama>=0.3
Requires-Dist: pillow>=10.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pymupdf>=1.24
Requires-Dist: python-docx>=1.1
Requires-Dist: rich>=13.0
Requires-Dist: trafilatura>=2.0
Requires-Dist: typer>=0.12
Requires-Dist: uvicorn[standard]>=0.30
Description-Content-Type: text/markdown

# clovis

<p align="center">
  <strong>Web Search · Deep Research · RAG · Embeddings · Structured Outputs</strong>
</p>

<p align="center">
  <a href="https://pypi.org/project/clovis/"><img src="https://img.shields.io/pypi/v/clovis?color=blue" alt="PyPI version"></a>
  <a href="https://pypi.org/project/clovis/"><img src="https://img.shields.io/pypi/pyversions/clovis" alt="Python versions"></a>
  <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License"></a>
  <a href="https://pypi.org/project/clovis/"><img src="https://img.shields.io/pypi/dm/clovis" alt="Downloads"></a>
</p>

---

**clovis** is a Python client for the cloooooo.com AI API — and optionally a self-hosted server for those running their own GPU stack. It ships with multi-step web research, a full RAG pipeline, vector embeddings, reranking, structured JSON outputs, vision, and an agentic deep-research mode — all accessible via a single HTTP endpoint.

## Features

- **Simple inference** — one-line calls with streaming, negative prompts, and extended reasoning
- **Web search** — live results injected into context, always date-aware
- **Deep thinking** — multi-step agentic research pipeline with source citations
- **Ultra deep thinking** — multi-axis research with automated gap analysis, 280+ sources synthesized into a structured report
- **RAG** — ingest PDF, DOCX, TXT documents; semantic search over your corpus
- **Embeddings** — 768-dim dense vectors
- **Reranking** — cross-encoder reranking of document candidates
- **Structured output** — JSON Schema-constrained generation
- **Vision** — image description from URL, file path, or base64
- **Auto-routing** — automatic mode selection based on query type
- **Conversation memory** — short-term history per conversation ID

---

## Installation

```bash
pip install clovis
```

**Requires Python 3.10+**

---

## Quick start

No local setup required — connect directly to the hosted server:

```python
from clovis import cloooooo

# Option A — Hosted server (no GPU, no config)
ai = cloooooo(base_url="http://cloooooo.com")
print(ai("Explain transformer architecture"))

# Option B — Self-hosted (see "Self-hosting" section below)
ai = cloooooo()  # connects to localhost:61005
```

```python
# With options
response = ai(
    "Write a sonnet about entropy",
    negative_prompt="no rhymes",
    thinking=True,
    context="You are a physicist who loves poetry.",
)

# Streaming
for token in ai.stream("Describe the Big Bang in detail"):
    print(token, end="", flush=True)

# Multi-turn conversation
conv = ai.conversation(context="You are a senior software engineer.")
conv("Explain dependency injection")
conv("Show me a Python example")
conv("How would you test it?")
```

---

## API server

All endpoints accept `Content-Type: application/json`. Streaming responses use `text/plain`.

Base URL: `http://cloooooo.com` (hosted) or `http://localhost:8000` (self-hosted).

---

### `POST /ia` — Universal endpoint

The main endpoint. Handles all inference modes.

```bash
curl -X POST http://cloooooo.com/ia \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is quantum entanglement?", "use_web": true}'
```

#### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | `str` | **required** | The question or instruction |
| `mode` | `str` | `null` | `"simple"` · `"deep_thinking"` · `"ultra_deep_thinking"` |
| `use_web` | `bool` | `false` | Inject live web search results with current date |
| `thinking` | `bool` | `false` | Enable extended reasoning (chain-of-thought) |
| `stream` | `bool` | `false` | Stream tokens via `text/plain` |
| `use_memory` | `bool` | `false` | Load and save conversation history |
| `conversation_id` | `str` | `null` | Key for conversation memory |
| `context` | `str` | `null` | System-level context injected before the prompt |
| `negative_prompt` | `str` | `null` | Instructions for what to avoid |

#### Response

```json
{"response": "Quantum entanglement is a phenomenon where..."}
```

For `deep_thinking`:

```json
{
  "answer": "...",
  "sources": ["https://...", "https://..."],
  "model_used": "miroflow:Qwen/Qwen3-32B-AWQ",
  "fallback_used": false
}
```

For `ultra_deep_thinking` (async job):

```json
{"job_id": "b45d79...", "status": "pending", "poll": "/job/b45d79..."}
```

---

### Modes

#### `simple` — Direct inference

Fast, direct LLM call. Optionally augmented with web search or reasoning.

```python
import httpx

r = httpx.post("http://cloooooo.com/ia", json={
    "prompt": "Latest news on fusion energy",
    "use_web": True,
    "thinking": True,
})
print(r.json()["response"])
```

---

#### `deep_thinking` — Agentic web research

Multi-step research pipeline. Performs web searches, reasons over the results, and returns a structured answer with source citations.

```python
r = httpx.post("http://cloooooo.com/ia", json={
    "prompt": "What are the geopolitical implications of AGI development?",
    "mode": "deep_thinking",
}, timeout=300)

data = r.json()
print(data["answer"])
print(data["sources"])
```

Streaming mode returns progress updates then the final JSON:

```bash
curl -N -X POST http://cloooooo.com/ia \
  -d '{"prompt": "Impact of interest rates on tech stocks", "mode": "deep_thinking", "stream": true}'

# [deep_thinking... 5s]
# [deep_thinking... 10s]
# ...
# {"answer": "...", "sources": [...], "fallback_used": false}
```

---

#### `ultra_deep_thinking` — Multi-axis deep research

The most thorough mode. Decomposes the question into research axes, searches each independently, identifies knowledge gaps, fills them, then synthesizes a structured report. Typically produces 10 000–15 000 character reports with 250–300 unique sources. Runs as an async job (15–25 min).

```python
import httpx, time

# 1. Submit
r = httpx.post("http://cloooooo.com/ia", json={
    "prompt": "How does reinforcement learning from human feedback (RLHF) work?",
    "mode": "ultra_deep_thinking",
})
job_id = r.json()["job_id"]

# 2. Poll until done
while True:
    job = httpx.get(f"http://cloooooo.com/job/{job_id}").json()
    print(f"status: {job['status']} | {len(job['progress'])} steps logged")
    if job["status"] == "done":
        print(job["result"]["answer"][:500])
        break
    time.sleep(30)
```

Progress visible at each poll:

```
[décomposition] 5 axes : ['Définition', 'Historique', ...]
[axe:Définition] recherche en cours...
[axe:Définition] OK — 4 127 chars, 36 sources
...
[lacunes 1/2] 5 lacunes identifiées → 5/5 comblées
[synthèse] 15 sections · 65 000 chars · 295 sources...
[terminé] 14 067 chars · 280 sources uniques
```

Presets (via `/ultra_deep_thinking` endpoint):

| Preset | Axes | Gap rounds | Duration |
|--------|------|-----------|----------|
| `fast` | 3 | 1 | ~5 min |
| `deep` *(default)* | 5 | 2 | ~15 min |
| `ultra` | 8 | 3 | ~30 min |

---

### `GET /health` — Server status

```bash
curl http://cloooooo.com/health
```

```json
{
  "status": "ok",
  "version": "0.5.11",
  "model": "Qwen/Qwen3-32B-AWQ",
  "modes": ["simple", "search", "thinking", "deep_thinking", "ultra_deep_thinking", "embed", "rerank", "vision"]
}
```

---

### `POST /embed` — Text embeddings

```python
r = httpx.post("http://cloooooo.com/embed", json={
    "texts": ["Hello world", "Machine learning basics"],
})
print(r.json()["dim"])          # 768
print(len(r.json()["embeddings"]))  # 2
```

---

### `POST /rerank` — Document reranking

```python
r = httpx.post("http://cloooooo.com/rerank", json={
    "query": "machine learning optimization",
    "documents": [
        "Gradient descent is an optimization algorithm for ML",
        "The weather in Paris is sunny today",
        "Adam optimizer adapts learning rates per parameter",
    ],
    "top_k": 2,
})
for item in r.json()["results"]:
    print(f"{item['score']:.3f}  {item['document'][:60]}")
```

---

### `POST /structured` — JSON Schema output

```python
r = httpx.post("http://cloooooo.com/structured", json={
    "prompt": "Describe the movie Inception",
    "schema": {
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "year": {"type": "integer"},
            "genres": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["title", "year", "genres"],
    },
})
print(r.json()["result"])
# {"title": "Inception", "year": 2010, "genres": ["sci-fi", "thriller"]}
```

---

### `POST /vision` — Image understanding

```python
r = httpx.post("http://cloooooo.com/vision", json={
    "image": "https://example.com/photo.jpg",
    "prompt": "What objects do you see?",
})
print(r.json()["response"])
```

---

### `POST /rag/ingest` + `POST /rag/ask` — RAG

```python
httpx.post("http://cloooooo.com/rag/ingest", json={"path": "/path/to/doc.pdf"})

r = httpx.post("http://cloooooo.com/rag/ask", json={
    "question": "What are the main conclusions?",
    "top_k": 5,
})
print(r.json()["response"])
```

Supported formats: PDF, DOCX, TXT, Markdown.

---

### Other endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/job/{id}` | GET | Poll async job status + progress |
| `/route` | POST | Auto-select the best mode for a prompt |
| `/deep_think` | POST | Standalone iterative deep research (streaming) |
| `/tools` | GET | List available tools |
| `/rag/sources` | GET | List ingested documents |
| `/openapi.json` | GET | OpenAPI schema |
| `/docs` | GET | Interactive API documentation |

---

## Streaming

```python
import httpx

with httpx.stream("POST", "http://cloooooo.com/ia", json={
    "prompt": "Write a detailed explanation of CRISPR-Cas9",
    "stream": True,
}) as r:
    for chunk in r.iter_text():
        print(chunk, end="", flush=True)
```

---

## Self-hosting

> These components run server-side on cloooooo.com. **No local installation is required** if you use the hosted server.

To run your own instance (requires an NVIDIA GPU with 24 GB+ VRAM):

```bash
# 1. Start SGLang with Qwen3-32B-AWQ
python -m sglang.launch_server \
  --model-path Qwen/Qwen3-32B-AWQ \
  --port 61005 \
  --quantization awq

# 2. Start SearXNG (for web search modes)
docker run -p 8888:8080 searxng/searxng

# 3. Start clovis
clovis serve --port 8000
```

```bash
# Environment variables
export CLOVIS_LOCAL_URL="http://localhost:61005"
export CLOVIS_MODEL="Qwen/Qwen3-32B-AWQ"
export CLOVIS_API_KEY="sk-..."
export SEARXNG_URL="http://localhost:8888"
```

Then point the client to your server:

```python
ai = cloooooo(base_url="http://your-server:8000")
```

---

## License

MIT — [Clovis Sfeir](https://github.com/clovis-sfeir)
