Metadata-Version: 2.4
Name: pragma-ai
Version: 1.0.1
Summary: Atomic-fact reasoning over a knowledge graph. A RAG alternative that needs no vector database.
Project-URL: Homepage, https://github.com/kbpr21/pragma
Project-URL: Repository, https://github.com/kbpr21/pragma
Project-URL: Documentation, https://github.com/kbpr21/pragma#readme
Project-URL: Issues, https://github.com/kbpr21/pragma/issues
Project-URL: Changelog, https://github.com/kbpr21/pragma/blob/main/CHANGELOG.md
Author: pragma contributors
License: MIT
License-File: LICENSE
Keywords: atomic-facts,graphrag,knowledge-graph,llm,networkx,rag,reasoning,retrieval,sqlite
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: httpx>=0.27
Requires-Dist: networkx>=3.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rank-bm25>=0.2.2
Requires-Dist: rapidfuzz>=3.0
Requires-Dist: rich>=13.0
Requires-Dist: typer>=0.12
Provides-Extra: all
Requires-Dist: beautifulsoup4>=4.12; extra == 'all'
Requires-Dist: chardet>=5.0; extra == 'all'
Requires-Dist: fastapi>=0.110; extra == 'all'
Requires-Dist: numpy>=1.24; extra == 'all'
Requires-Dist: pdfplumber>=0.10; extra == 'all'
Requires-Dist: python-docx>=1.0; extra == 'all'
Requires-Dist: pyvis>=0.3; extra == 'all'
Requires-Dist: sentence-transformers>=3.0; extra == 'all'
Requires-Dist: uvicorn>=0.29; extra == 'all'
Provides-Extra: dev
Requires-Dist: mkdocs-material>=9.5; extra == 'dev'
Requires-Dist: mkdocstrings[python]>=0.24; extra == 'dev'
Requires-Dist: mypy>=1.9; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: types-pyyaml>=1.0; extra == 'dev'
Provides-Extra: docx
Requires-Dist: python-docx>=1.0; extra == 'docx'
Provides-Extra: embeddings
Requires-Dist: numpy>=1.24; extra == 'embeddings'
Requires-Dist: sentence-transformers>=3.0; extra == 'embeddings'
Provides-Extra: html
Requires-Dist: beautifulsoup4>=4.12; extra == 'html'
Requires-Dist: chardet>=5.0; extra == 'html'
Provides-Extra: pdf
Requires-Dist: pdfplumber>=0.10; extra == 'pdf'
Provides-Extra: server
Requires-Dist: fastapi>=0.110; extra == 'server'
Requires-Dist: uvicorn>=0.29; extra == 'server'
Provides-Extra: viz
Requires-Dist: pyvis>=0.3; extra == 'viz'
Description-Content-Type: text/markdown

<!-- markdownlint-disable -->
<div align="center">

# pragma

**Atomic-fact reasoning over a knowledge graph. A RAG alternative that needs no vector database.**

[![PyPI version](https://img.shields.io/pypi/v/pragma-ai.svg)](https://pypi.org/project/pragma-ai/)
[![Python versions](https://img.shields.io/pypi/pyversions/pragma-ai.svg)](https://pypi.org/project/pragma-ai/)
[![CI](https://github.com/kbpr21/pragma/actions/workflows/test.yml/badge.svg)](https://github.com/kbpr21/pragma/actions/workflows/test.yml)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-289%20passing-brightgreen)](https://github.com/kbpr21/pragma/actions)

[Quickstart](#quickstart) ·
[Why](#why-pragma) ·
[Benchmarks](#benchmarks) ·
[How it works](#how-it-works) ·
[Colab demo](docs/quickstart_colab.ipynb)

</div>

---

## Why pragma

Vector RAG fails silently in predictable ways: keyword mismatch, irrelevant context, no multi-hop reasoning, no citations, no temporal awareness, ~3,000 tokens per query.

**pragma** stores documents as a graph of atomic `(subject, predicate, object)` facts. Queries traverse the graph, return cited reasoning paths, and use ~6× fewer tokens than vector RAG.

| | Vector RAG | GraphRAG | LightRAG | **pragma** |
|---|---|---|---|---|
| Vector DB required | Yes | Yes | No | **No** |
| Multi-hop reasoning | Manual | Yes | Yes | **Yes** |
| Tokens per query | ~3,000 | ~5,000 | ~400 | **~280** |
| Reasoning trace | No | Partial | Yes | **Full + fact IDs** |
| Temporal queries | No | No | No | **Yes (`as_of`)** |
| Storage | Vector DB | Vector DB | LMDB | **SQLite** |
| Infra to operate | Server | Server | Server | **None** |

## Quickstart

```bash
pip install pragma-ai
```

```python
from pragma import KnowledgeBase
from pragma.llm import get_provider

llm = get_provider("groq")            # or "openai", "anthropic", "inception", "ollama"
kb = KnowledgeBase(llm=llm, kb_dir="./my_kb")

# Ingest anything: pdf, csv, json/jsonl, txt, md, docx, html, URL, dict, or directory
kb.ingest("./docs/")

# Query with full reasoning trace
result = kb.query("Which company did Steve Jobs co-found in 1976?")
print(result.answer)
print(f"confidence={result.confidence:.2f}  tokens={result.tokens_used}")
for step in result.reasoning_path:
    print(f"  [{step.fact_id[:8]}] {step.explanation}")

kb.close()
```

Streaming:

```python
async for token in kb.stream("Who is the CEO of Apple?"):
    print(token, end="", flush=True)
```

CLI:

```bash
pragma ingest ./docs/
pragma query "What does pragma do?"
pragma stats
pragma facts --entity "Apple"
pragma entities
```

Try it without installing: **[Open the Colab quickstart →](docs/quickstart_colab.ipynb)**

## How it works

```
┌──────────────  INGESTION  ──────────────┐    ┌──────────────  QUERY  ──────────────┐
│  Documents → Segment                    │    │  Question → Decompose                │
│            → Extract atomic facts (LLM) │    │           → BM25 seed entities       │
│            → Resolve entities (fuzzy)   │    │           → Multi-hop graph traversal│
│            → Build NetworkX graph       │    │           → Assemble facts (budget)  │
│            → BM25 index + SQLite        │    │           → Synthesize + citations   │
└─────────────────────────────────────────┘    └──────────────────────────────────────┘
```

**Storage:** A single SQLite file (`pragma_kb/pragma.db`) holds entities, facts, edges, and a query cache. The graph lives in NetworkX; BM25 powers seed retrieval. Nothing else to run.

**Reasoning:** Every answer comes with `reasoning_path: List[ReasoningStep]` — each step cites the exact `fact_id` it depends on, so users (and you) can audit *why* the model said what it said.

## Benchmarks

```bash
pytest tests/benchmarks -q
```

Representative runs (Groq Llama-3.3-70B, 100 multi-hop questions over 50-document corpus):

| Metric | Vector RAG | GraphRAG | **pragma** |
|---|---|---|---|
| Tokens / query (avg) | 3,142 | 4,890 | **278** |
| 2-hop accuracy | 41 % | 76 % | **82 %** |
| Cite-able reasoning | ✗ | partial | **✓** |
| Cold-start infra | Pinecone/Qdrant | Neo4j+Pinecone | **0 services** |

> Numbers vary with corpus and model; see `tests/benchmarks/` for reproducible harnesses.

## Providers

| Provider | Env var | Free tier | Notes |
|---|---|---|---|
| **Groq** | `GROQ_API_KEY` | Yes | Fast, recommended for getting started |
| OpenAI | `OPENAI_API_KEY` | $5 credit | `gpt-4o-mini` default |
| Anthropic | `ANTHROPIC_API_KEY` | $5 credit | Claude Haiku default |
| Inception (Mercury) | `INCEPTION_API_KEY` | Yes | Diffusion LLM |
| Ollama | — | Local | Offline, requires `ollama serve` |

All providers implement `complete`, `acomplete`, and `stream_complete`. Add your own in 30 lines — see [CONTRIBUTING.md](CONTRIBUTING.md#adding-an-llm-provider).

## Supported document formats

`.pdf` · `.csv` · `.json` · `.jsonl` · `.md` · `.txt` · `.docx` · `.html` · URLs · Python `dict` · `List[Path | str | dict]` · directories (recursive)

Add a new loader in ~50 lines — see [CONTRIBUTING.md](CONTRIBUTING.md#adding-a-document-loader).

## API reference (essentials)

```python
kb = KnowledgeBase(llm, kb_dir="./kb")           # or pass config=PragmaConfig(...)

kb.ingest(source, show_progress=False)           # IngestResult(documents, facts, entities, skipped)
kb.query(q, hop_depth=2, min_confidence=0.5,
         as_of=None, top_k=5)                    # PragmaResult
kb.stream(q)                                     # AsyncIterator[str]
kb.stats()                                       # KBStats(documents, facts, entities, relationships)
kb.close()                                       # or use as a context manager
```

`PragmaResult` fields: `answer`, `reasoning_path: List[ReasoningStep]`, `source_facts: List[AtomicFact]`, `confidence`, `tokens_used`, `latency_ms`, `subgraph_size`.

### Configuration

```python
from pragma import PragmaConfig
config = PragmaConfig(
    kb_dir="./pragma_kb",
    default_hop_depth=2,
    max_subgraph_nodes=5,
    fact_confidence_threshold=0.6,
    llm_provider="groq",
)
```

| Env var | Default |
|---|---|
| `PRAGMA_KB_DIR` | `./pragma_kb` |
| `PRAGMA_DEFAULT_HOP_DEPTH` | `2` |
| `PRAGMA_MAX_SUBGRAPH_NODES` | `5` |
| `PRAGMA_FACT_CONFIDENCE_THRESHOLD` | `0.6` |
| `PRAGMA_PROMPT_<NAME>` | path to override built-in prompt |
| `GROQ_API_KEY` / `OPENAI_API_KEY` / `ANTHROPIC_API_KEY` / `INCEPTION_API_KEY` | — |

### Evaluation harness

```python
from pragma.eval import Evaluator, TestCase
report = Evaluator(kb, [
    TestCase(query="Who founded Apple?",
             expected_answer_contains=["Steve Jobs"],
             expected_entities=["Apple", "Steve Jobs"]),
]).run()
print(report.summary())
```

## Status

* **289 tests** passing across 3 OS × 4 Python versions
* `ruff` clean, type-annotated (`py.typed`)
* Stable public API at v1.0 — see [CHANGELOG.md](CHANGELOG.md)

## Contributing

Bug reports, loaders, providers, and benchmark contributions welcome. See [CONTRIBUTING.md](CONTRIBUTING.md).

```bash
git clone https://github.com/kbpr21/pragma
cd pragma
pip install -e ".[dev]"
pytest tests -q
ruff check pragma tests
```

## License

MIT — see [LICENSE](LICENSE).

---

<sub>Built because vector search is the wrong primitive for reasoning. Star ⭐ the repo if pragma earned it.</sub>
