Metadata-Version: 2.4
Name: sulci
Version: 0.2.5
Summary: The AI native context-aware semantic cache for LLM apps — stop paying for the same answer twice
License: MIT License
        
        Copyright (c) 2026 [sulci.io]
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://sulci.io
Project-URL: Repository, https://github.com/sulci-io/sulci-oss
Project-URL: Documentation, https://github.com/sulci-io/sulci-oss#readme
Project-URL: Bug Tracker, https://github.com/sulci-io/sulci-oss/issues
Keywords: sulci-cache,ai-native,ai-cache,context-aware,semantic-cache,llm,ai,anthropic,openai,vector-search,langchain,cost-optimization,cache,rag
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Provides-Extra: chroma
Requires-Dist: chromadb>=0.4.0; extra == "chroma"
Requires-Dist: sentence-transformers>=2.2.0; extra == "chroma"
Provides-Extra: qdrant
Requires-Dist: qdrant-client>=1.7.0; extra == "qdrant"
Requires-Dist: sentence-transformers>=2.2.0; extra == "qdrant"
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.7.4; extra == "faiss"
Requires-Dist: sentence-transformers>=2.2.0; extra == "faiss"
Provides-Extra: redis
Requires-Dist: redis>=5.0.0; extra == "redis"
Requires-Dist: redisvl>=0.1.0; extra == "redis"
Requires-Dist: sentence-transformers>=2.2.0; extra == "redis"
Provides-Extra: sqlite
Requires-Dist: sentence-transformers>=2.2.0; extra == "sqlite"
Provides-Extra: milvus
Requires-Dist: pymilvus>=2.3.0; extra == "milvus"
Requires-Dist: sentence-transformers>=2.2.0; extra == "milvus"
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: all
Requires-Dist: chromadb>=0.4.0; extra == "all"
Requires-Dist: qdrant-client>=1.7.0; extra == "all"
Requires-Dist: faiss-cpu>=1.7.4; extra == "all"
Requires-Dist: redis>=5.0.0; extra == "all"
Requires-Dist: redisvl>=0.1.0; extra == "all"
Requires-Dist: pymilvus>=2.3.0; extra == "all"
Requires-Dist: sentence-transformers>=2.2.0; extra == "all"
Requires-Dist: openai>=1.0.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Dynamic: license-file

# Sulci

**The AI native context-aware semantic caching for LLM apps — stop paying for the same answer twice**

[![Tests](https://github.com/sulci-io/sulci-oss/actions/workflows/tests.yml/badge.svg)](https://github.com/sulci-io/sulci-oss/actions/workflows/tests.yml)
[![PyPI](https://img.shields.io/pypi/v/sulci)](https://pypi.org/project/sulci/)
[![Python](https://img.shields.io/pypi/pyversions/sulci)](https://pypi.org/project/sulci/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](./LICENSE)

Sulci is a drop-in Python library that caches LLM responses by **semantic meaning**, not exact string match. When a user asks _"How do I deploy to AWS?"_ and someone else later asks _"What's the process for deploying on AWS?"_, Sulci returns the cached answer instead of calling the LLM again — saving cost and latency.

---

## Why Sulci

| Without Sulci                | With Sulci                                               |
| ---------------------------- | -------------------------------------------------------- |
| Every query hits the LLM API | Semantically similar queries return instantly from cache |
| $0.005 per call, every time  | Cache hits cost ~$0.0001 (embedding only)                |
| 1–3 second response time     | Cache hits return in <10ms                               |
| No memory across sessions    | Context-aware: understands conversation history          |

**Benchmark results (v0.2.1, 5,000 queries):**

- Overall hit rate: **85.9%**
- Hit latency p50: **0.74ms** (vs ~1,840ms for a live LLM call)
- Cost saved per 10k queries: **$21.47**
- Context-aware mode: **+20.8pp resolution accuracy** over stateless

---

## Install

```bash
pip install "sulci[sqlite]"    # SQLite — zero infra, local dev
pip install "sulci[chroma]"    # ChromaDB
pip install "sulci[faiss]"     # FAISS
pip install "sulci[qdrant]"    # Qdrant
pip install "sulci[redis]"     # Redis + RedisVL
pip install "sulci[milvus]"    # Milvus Lite
```

> **zsh users:** always wrap extras in quotes — `".[sqlite]"` not `.[sqlite]`.

---

## Quickstart

### Stateless (v0.1 style)

```python
from sulci import Cache

cache = Cache(backend="sqlite", threshold=0.85)

# store a response
cache.set("How do I deploy to AWS?", "Use the AWS CLI with 'aws deploy'...")

# exact or semantic hit — returns 3-tuple
response, similarity, context_depth = cache.get("What's the process for deploying on AWS?")

if response:
    print(f"Cache hit (sim={similarity:.2f}): {response}")
else:
    # call your LLM here
    pass
```

### Context-aware (v0.2 style)

```python
from sulci import Cache

cache = Cache(
    backend        = "sqlite",
    threshold      = 0.85,
    context_window = 4,     # remember last 4 turns
    query_weight   = 0.70,  # α — weight of current query vs context
    context_decay  = 0.50,  # halve weight per older turn
)

# turn 1
cache.set("What is Python?", "Python is a high-level programming language.", session_id="s1")

# turn 2 — context from turn 1 blended into the lookup vector
response, sim, depth = cache.get("Tell me more about it", session_id="s1")
```

### Drop-in with `cached_call`

```python
import anthropic
from sulci import Cache

cache = Cache(backend="sqlite", threshold=0.85, context_window=4)
client = anthropic.Anthropic()

def call_llm(prompt: str) -> str:
    msg = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    return msg.content[0].text

result = cache.cached_call(
    query         = "How do I deploy to AWS?",
    llm_fn        = call_llm,
    session_id    = "user-123",
    cost_per_call = 0.005,
)

print(result["response"])
print(f"Source:  {result['source']}")        # "cache" or "llm"
print(f"Latency: {result['latency_ms']}ms")
print(f"Saved:   ${result['saved_cost']:.4f}")
```

---

## API Reference

### Constructor

```python
cache = Cache(
    backend         = "sqlite",   # sqlite | chroma | faiss | qdrant | redis | milvus
    threshold       = 0.85,       # cosine similarity cutoff (0–1)
    embedding_model = "minilm",   # minilm | openai
    ttl_seconds     = None,       # None = no expiry
    personalized    = False,      # partition cache per user_id
    db_path         = "./sulci",  # on-disk path for sqlite / faiss
    context_window  = 0,          # turns to remember; 0 = stateless
    query_weight    = 0.70,       # α in blending formula
    context_decay   = 0.50,       # per-turn decay weight
    session_ttl     = 3600,       # session expiry in seconds
)
```

### Methods

| Method                                                                                 | Returns                   | Description                                                        |
| -------------------------------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------ |
| `cache.get(query, user_id=None, session_id=None)`                                      | `(str\|None, float, int)` | response, similarity, context_depth                                |
| `cache.set(query, response, user_id=None, session_id=None)`                            | `None`                    | Store entry, advance context window                                |
| `cache.cached_call(query, llm_fn, session_id=None, user_id=None, cost_per_call=0.005)` | `dict`                    | response, source, similarity, latency_ms, cache_hit, context_depth |
| `cache.get_context(session_id)`                                                        | `ContextWindow`           | Return session's context window                                    |
| `cache.clear_context(session_id)`                                                      | `None`                    | Reset session history                                              |
| `cache.context_summary(session_id=None)`                                               | `dict`                    | Snapshot of one or all sessions                                    |
| `cache.stats()`                                                                        | `dict`                    | hits, misses, hit_rate, saved_cost, total_queries, active_sessions |
| `cache.clear()`                                                                        | `None`                    | Evict all entries, reset stats and sessions                        |

> **Important:** `cache.get()` returns a **3-tuple** `(response, similarity, context_depth)` — not a 2-tuple like v0.1. Always unpack all three values.

---

## Context-Aware Blending

When `context_window > 0`, Sulci blends the current query vector with recent
conversation history before performing the similarity lookup:

```
lookup_vec = α · embed(query) + (1−α) · Σ(decay^i · turn_i)
```

- `α` = `query_weight` (default **0.70**) — how much the current query dominates
- `decay` = `context_decay` (default **0.50**) — halves weight per older turn
- Only **user query** vectors are stored in context (not LLM responses)
- Raw un-blended vectors stored in cache; blending happens at lookup time only

**Context-aware benchmark results (800 conversation pairs, context_window=4):**

| Domain              | Stateless | Context-aware | Δ           |
| ------------------- | --------- | ------------- | ----------- |
| customer_support    | 32%       | 88%           | **+56pp**   |
| developer_qa        | 80%       | 96%           | +16pp       |
| medical_information | 40%       | 60%           | +20pp       |
| **overall**         | **64.0%** | **81.6%**     | **+17.6pp** |

---

## Backends

| Backend         | ID       | Hit latency | Best for                                |
| --------------- | -------- | ----------- | --------------------------------------- |
| SQLite          | `sqlite` | <8ms        | Local dev, edge, serverless, zero infra |
| ChromaDB        | `chroma` | <10ms       | Fastest path to working, Python-native  |
| FAISS           | `faiss`  | <3ms        | GPU acceleration, massive scale         |
| Qdrant          | `qdrant` | <5ms        | Production, metadata filtering          |
| Redis + RedisVL | `redis`  | <1ms        | Existing Redis infra, lowest latency    |
| Milvus Lite     | `milvus` | <7ms        | Dev-to-prod without code changes        |

All backends are free tier or self-hostable at zero cost.

---

## Embedding Models

| ID       | Model                  | Dims | Latency | Notes                                        |
| -------- | ---------------------- | ---- | ------- | -------------------------------------------- |
| `minilm` | all-MiniLM-L6-v2       | 384  | 14ms    | **Default** — free, local, excellent quality |
| `openai` | text-embedding-3-small | 1536 | ~100ms  | Requires `OPENAI_API_KEY`                    |

The default `minilm` model runs entirely locally via `sentence-transformers`.
No network calls are made unless you explicitly configure `embedding_model="openai"`.

---

## Project Structure

```
.
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
├── LOCAL_SETUP.md
├── README.md
├── benchmark
│   ├── README.md               ← benchmark methodology and results
│   └── run.py                  ← benchmark CLI
├── examples
│   ├── anthropic_example.py    ← requires ANTHROPIC_API_KEY
│   ├── basic_usage.py          ← stateless cache demo, no API key needed
│   ├── context_aware.py        ← 4-demo walkthrough, fully offline
│   └── context_aware_example.py← additional context-aware patterns
├── pyproject.toml              ← name="sulci", version="0.2.5"
├── setup.py
├── sulci
│   ├── __init__.py             ← exports Cache, ContextWindow, SessionStore
│   ├── backends
│   │   ├── __init__.py
│   │   ├── chroma.py
│   │   ├── faiss.py
│   │   ├── milvus.py
│   │   ├── qdrant.py
│   │   ├── redis.py
│   │   └── sqlite.py
│   ├── context.py              ← ContextWindow + SessionStore
│   ├── core.py                 ← Cache engine (context-aware)
│   └── embeddings
│       ├── __init__.py
│       ├── minilm.py           ← default: all-MiniLM-L6-v2 (free, local)
│       └── openai.py           ← requires OPENAI_API_KEY
└── tests
    ├── test_backends.py        —  9 tests: per-backend contract + persistence
    ├── test_context.py         — 35 tests: ContextWindow, SessionStore, integration
    └── test_core.py            — 27 tests: cache.get/set, TTL, stats, personalization

7 directories, 29 files
```

---

## Running Tests

```bash
# full suite — 71 tests total
python -m pytest tests/ -v

# by file
python -m pytest tests/test_core.py -v       # 27 tests
python -m pytest tests/test_context.py -v    # 35 tests
python -m pytest tests/test_backends.py -v   #  9 tests (skipped if dep missing)

# single backend only
python -m pytest tests/test_backends.py -v -k sqlite
python -m pytest tests/test_backends.py -v -k chroma

# with coverage
python -m pytest tests/ -v --cov=sulci --cov-report=term-missing
```

Backend tests are **skipped — not failed** when their dependency isn't installed.
Install the backend extra to run its tests: `pip install -e ".[chroma]"`.

See [`LOCAL_SETUP.md`](./LOCAL_SETUP.md) for the full local development guide including
venv setup, backend installation, smoke testing, and troubleshooting.

---

## Benchmark

```bash
# fast run (~30 seconds)
python benchmark/run.py --no-sweep --queries 1000

# with context-aware pass
python benchmark/run.py --no-sweep --queries 1000 --context

# full benchmark
python benchmark/run.py --context
```

See [`benchmark/README.md`](./benchmark/README.md) for full methodology and results.

---

## Contributing

See [`CONTRIBUTING.md`](./CONTRIBUTING.md) for branching model, PR process, and coding standards.

---

## License

MIT — see [`LICENSE`](./LICENSE).

---

## Links

- **PyPI:** [sulci](https://pypi.org/project/sulci/)
- **GitHub:** [sulci-io/sulci-oss](https://github.com/sulci-io/sulci-oss)
- **Issues:** [github.com/sulci-io/sulci-oss/issues](https://github.com/sulci-io/sulci-oss/issues)
