Metadata-Version: 2.4
Name: studyengine
Version: 0.1.0
Summary: Retrieval-augmented study engine: document parsing, structure-aware chunking, hybrid retrieval, LLM backends, and generated study tools.
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: sentence-transformers>=3.0
Requires-Dist: chromadb>=1.5.9
Requires-Dist: pymupdf>=1.26.7
Requires-Dist: docling>=2.97
Requires-Dist: python-docx>=1.1
Requires-Dist: rank-bm25>=0.2.2
Requires-Dist: tiktoken>=0.7
Requires-Dist: httpx>=0.27
Requires-Dist: anthropic>=0.40
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == "test"

# studyengine

Retrieval-augmented study engine. Document parsing, structure-aware chunking, hybrid retrieval, LLM backends, and generated study tools (summaries, quizzes, flashcards, close reading).

## Install

```bash
pip install studyengine
```

Python 3.11+.

## Usage

Point storage at an app-owned directory once, then ingest and retrieve.

```python
import studyengine

studyengine.configure("./storage")  # creates chroma/, embed_cache.sqlite, covers/, markdown/

from studyengine.parser import parse
from studyengine.chunker import chunk_document
from studyengine import vectorstore
from studyengine.retriever import retrieve

doc = parse("paper.pdf")
chunks = chunk_document(doc.document)
vectorstore.add_chunks(doc_id="paper", chunks=chunks)

hits = retrieve("what is the main claim?", doc_ids=["paper"])
```

## Modules

| Module | Purpose |
|--------|---------|
| `parser` | PDF/DOCX/text parsing via docling |
| `sections` | Heading-aware section detection |
| `chunker` | Structure-aware, token-budgeted chunking |
| `embedder` | Sentence-transformer embeddings with on-disk cache |
| `vectorstore` | Chroma-backed storage and hybrid query |
| `retriever` | Dense + BM25 retrieval fused with RRF |
| `composer` | Prompt assembly within a context budget |
| `summarizer` | Per-section summaries |
| `quiz` | Question generation and grading |
| `close_reading` | Scoped chat, comprehension, and "go deeper" streams |
| `llm` | Anthropic and Ollama backends behind a common interface |

## Configuration

`configure(root)` sets the storage layout. Backends and model choices read from environment variables (`LLM_BACKEND`, `ANTHROPIC_MODEL`, `EMBED_MODEL`, `OLLAMA_BASE_URL`, and others in `studyengine.config`).
