Metadata-Version: 2.4
Name: lc0-vic
Version: 0.1.1
Summary: LC-0 VIC: Logical Controller Zero / Virtual Intelligent Controller — tiered storage-local retrieval (L0/L1/L2) with bridge-ready APIs.
Author: ARPA Hellenic Logical Systems
License: MIT
Project-URL: Repository, https://github.com/arpahls/lc0_vic
Project-URL: Security policy, https://github.com/arpahls/lc0_vic/blob/main/SECURITY.md
Keywords: vector,embeddings,lancedb,retrieval,storage,agent,tiered
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Filesystems
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: AUTHORS.md
Requires-Dist: pydantic>=2.5
Provides-Extra: index
Requires-Dist: lancedb>=0.4; extra == "index"
Requires-Dist: numpy>=1.24; extra == "index"
Requires-Dist: pypdf>=4.0; extra == "index"
Provides-Extra: parse
Requires-Dist: docling>=1.0; extra == "parse"
Provides-Extra: ocr
Requires-Dist: easyocr>=1.7; extra == "ocr"
Provides-Extra: local-llm
Provides-Extra: bridge
Requires-Dist: uvicorn>=0.27; extra == "bridge"
Requires-Dist: starlette>=0.37; extra == "bridge"
Provides-Extra: milvus
Requires-Dist: pymilvus>=2.4.4; extra == "milvus"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1; extra == "dev"
Requires-Dist: httpx>=0.27; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: flake8>=7.0; extra == "dev"
Requires-Dist: mypy>=1.8; extra == "dev"
Requires-Dist: pyfakefs>=5.3; extra == "dev"
Dynamic: license-file

<div align="center">

## LC-0 VIC

**LC-0 VIC — Tiered filesystem retrieval:** ask natural-language questions about your local files, get ranked results with snippets. A reference implementation for intelligent, queryable storage.

<br/>

<img src="https://img.shields.io/badge/License-MIT-efcefa?style=flat-square" alt="License" />
<img src="https://img.shields.io/badge/Python-3.10+-bae6fd?style=flat-square" alt="Python Version" />
<a href="https://github.com/arpahls/lc0_vic"><img src="https://img.shields.io/badge/GitHub-lc0__vic-bbf7d0?style=flat-square" alt="GitHub" /></a>

<br/>

<a href="#why-this-exists">Why this exists</a> •
<a href="#mission">Mission</a> •
<a href="#architecture">Architecture</a> •
<a href="#quick-start">Quick Start</a> •
<a href="#documentation">Documentation</a> •
<a href="#contributing">Contributing</a> •
<a href="#references">References</a> •
<a href="#security">Security</a> •
<a href="#contact">Contact</a>

</div>

---

## Why this exists

**Kioxia Corporation** research line: **AiSAQ** (*All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval*, [arXiv:2404.06004](https://arxiv.org/abs/2404.06004)) describes **product-quantized, all-in-storage** approximate nearest-neighbor search suited to **DRAM-constrained** and **flash-resident** indices. Open-source reference implementation: **[kioxia-jp/aisaq-diskann](https://github.com/kioxia-jp/aisaq-diskann)**. Context and deployment notes: **[docs/KIOXIA_ECOSYSTEM.md](docs/KIOXIA_ECOSYSTEM.md)**.

**LC-0 VIC** (Logical Controller Zero / Virtual Intelligent Controller) is an open-source **reference implementation** with a **real CLI and tests**: **`pip install`**, then **`vic index`**, **`vic ask`**, **`vic demo`**, and optionally **`vic bridge`**. Architecturally it is **tiered retrieval**—**L0** metadata, **L1** vectors, **L2** optional deep parsing (**skillware**)—orchestrated by a **controller** from a **Librarian** **`QueryPlan`** (offline **stub** or **Ollama**). Vectors default to **LanceDB**; **Milvus Lite** is optional. Code layout: **`controller/`**, **`librarian/`**, **`warehouse/`**, **`index/`**, **`skillware/`**, **`bridge/`**. It runs fully **local-first**: you control the model endpoint (Ollama local or remote), data boundary, TLS/exposure, quotas, and governance—while still being **more than a paper design** (CI-backed, integration-tested).

**Landscape — why this repo exists alongside “computational storage landscape”:** Retrieval over drives and tiers does not land in isolation. **[Computational storage landscape](https://github.com/rosspeili/computational_storage_landscape)** maps the broader space; **LC-0 VIC** is a **small, runnable, contract-heavy slice**: filesystem-first **L0 → L1 → L2**, optional **Ollama** planning, and a **JSON HTTP bridge** for tools and automation.

This codebase runs **on the host** today—it does **not** claim to ship inside any vendor’s **SSD firmware** binary. The **research goal** is to **explore** whether **tiered retrieval** and these **API contracts** could **map** to **firmware- or device-adjacent** runtimes (e.g. **Samsung Magician**–class host/device stacks), while this repository stays a **portable** reference for **host** tooling, bridges, and tests.

---

## Mission

Unstructured data on disk is usually searched by paths and keywords. LC-0 VIC explores **intent-aware retrieval**: narrow candidates cheaply (L0/L1), then **load to DRAM** for evidence only when needed (L2). The design fits research and product narratives around **queryable storage**, **local-first AI**, and **bridge-ready** HTTP surfaces for management UIs.

---

## Architecture

```text
lc0_vic/
├── README.md
├── LICENSE
├── AUTHORS.md
├── pyproject.toml
├── requirements.txt
├── docs/
├── src/lc0_vic/
│   ├── controller/       # L0 → L1 → L2 orchestration
│   ├── librarian/        # Natural language → QueryPlan (stub / Ollama)
│   ├── warehouse/       # Filesystem abstraction
│   ├── index/           # LanceDB or Milvus Lite + manifest + jobs
│   ├── skillware/       # L2 modular parsers
│   ├── bridge/          # HTTP JSON service
│   ├── integrations/    # Ollama HTTP client helpers
│   ├── training/        # Synthetic plans helper (`vic-synthetic-plans`)
│   └── hardware_sim/    # Demo timing hooks only
├── training/
├── scripts/
├── tests/
├── data/mock_storage/
├── data/vector_db/
├── .controller/
└── .firmware/
```

### Tiered retrieval

1. **L0** — Path, size, mtime, type hints, tags.  
2. **L1** — Chunk embeddings after **`vic index --full`**: default **LanceDB** (`vic_chunks`); optional **Milvus Lite** with **`VIC_VECTOR_BACKEND=milvus`** and **`pip install -e ".[milvus]"`**. Embed model default **`embeddinggemma`** via Ollama; override **`VIC_OLLAMA_EMBED_MODEL`**.  
3. **L2** — Skillware modules (PDF layout, OCR, logs) when the plan requests them.

### Librarian (Ollama)

- **`stub`**: default offline planner for CI and quick runs.  
- **`ollama`**: `POST /api/chat` to **`VIC_OLLAMA_BASE_URL`** for both **local** and **remote** Ollama-compatible endpoints. Set **`VIC_OLLAMA_MODEL`** to a tag from `ollama list` on that server.

Example families (verify tags locally): **Qwen** (e.g. `qwen2.5:4b`), **Gemma** (e.g. `gemma2:2b` or your pulled images). For remote inference, set **`privacy_mode=cloud_reasoning_ok`** when not using loopback; optional **`VIC_OLLAMA_API_KEY`** for Bearer auth on gated gateways. Never commit secrets; use `.env` (gitignored).

---

## Quick start

**Install (pip, all optional stacks used in CI and demos):**

```bash
pip install -e ".[dev,index,bridge]"
```

**Windows (PowerShell):** same `pip` line; use `$env:VAR = "value"` instead of `export`, or `set VAR=value` in **cmd.exe**. Prefer copying [`.env.example`](.env.example) to `.env` so you do not rely on shell-specific syntax.

```bash
vic --help

vic index --root ./data/mock_storage --db ./data/vector_db

vic ask --format human "Find PDFs related to contracts"

# One-shot demo (seed + index + sample asks; needs Ollama + `.[index]`)
vic demo
```

**Ollama Librarian (Windows example):**

```bat
set VIC_LIBRARIAN_BACKEND=ollama
set VIC_OLLAMA_MODEL=qwen2.5:4b
set VIC_OLLAMA_BASE_URL=http://127.0.0.1:11434
vic ask "Who signed the contract?"
```

On Unix: `export VAR=value`. Copy [`.env.example`](.env.example) to `.env` for persistent configuration.

**HTTP bridge** (after `pip install -e ".[bridge]"`): `vic bridge` — **`GET /health`**, **`POST /v1/ask`**, **`POST /v1/index/start`** (202 + `job_id`), **`GET /v1/index/status`**. Optional **`VIC_BRIDGE_API_KEY`** (Bearer on `/v1/*`), **`VIC_BRIDGE_RATE_LIMIT_PER_MIN`**, **`VIC_JOB_DB_PATH`** for SQLite job persistence. See **[docs/API.md](docs/API.md)**.

---

## Documentation

**Start here:** **[Documentation index](docs/README.md)** — links **every** Markdown file in the repo (root, `docs/`, `training/`, `scripts/`, `assets/`, `data/`, `.controller/`, `.firmware/`) so nothing is orphaned.

Highlights:

- **[Architecture](docs/ARCHITECTURE.md)** — component boundaries and data flow.  
- **[API / contracts](docs/API.md)** — `QueryPlan`, bridge routes, environment variables.  
- **[Threat model](docs/THREAT_MODEL.md)** — privacy modes and Ollama URL policy.  
- **[Roadmap & production deep dive](docs/ROADMAP_DEEP_DIVE.md)** — demo path (`scripts/demo.sh` / `demo.ps1`) and phased production work.  
- **[Terminal demo](docs/DEMO.md)** — `vic demo`, `vic ask --format human`, warehouse defaults.  
- **[Milvus & scale](docs/MILVUS_ROADMAP.md)** — Milvus **Lite** (`VIC_VECTOR_BACKEND=milvus`, optional `[milvus]`) plus AiSAQ / scale notes.  
- **[Testing](docs/TESTING.md)** — pyfakefs vs real `tmp_path` for Lance/native I/O.  
- **[Security policy](SECURITY.md)** — vulnerability reporting and bridge hardening reminders.  
- **[Changelog](CHANGELOG.md)** — release notes (PyPI/GitHub release steps are manual until you configure publishing).

**Notebook:** [notebooks/lc0_vic_live_demo.ipynb](notebooks/lc0_vic_live_demo.ipynb) — mirrors CLI demo steps (also listed from [docs/README.md](docs/README.md)).

**Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) (links the doc index). **Authors:** [AUTHORS.md](AUTHORS.md).

---

## Contributing

See **[CONTRIBUTING.md](CONTRIBUTING.md)** for scope, tests, and pull request expectations.

---

## References

**Full bibliography and implementation stance (JAX, TurboQuant, papers):**  
**[docs/REFERENCES_AND_RELATED_WORK.md](docs/REFERENCES_AND_RELATED_WORK.md)**

**Algorithms & cited methodology (per-component):** **[docs/METHODOLOGY_AND_ALGORITHMS.md](docs/METHODOLOGY_AND_ALGORITHMS.md)** — KIOXIA AiSAQ (L1/ANN scale), BitNet report (Librarian SFT/DPO), ExPAND (I/O topology), SolidAttention (KV/SSD inference lessons).

**Local PDF copies (with open-access pointers):** **[docs/papers/README.md](docs/papers/README.md)** — AiSAQ (`arXiv:2404.06004`), BitNet b1.58 2B4T (`arXiv:2504.12285`), ExPAND (`arXiv:2505.18577`), SolidAttention (USENIX FAST ’26).

**Selected entry points**

- **[Computational storage landscape](https://github.com/rosspeili/computational_storage_landscape)** — strategic analysis (TinyLM, SSD-tier retrieval, industry context).
- **[Milvus — AiSAQ](https://milvus.io/docs/aisaq.md)** — flash-friendly / all-in-storage ANN integration; code lineage: **[kioxia-jp/aisaq-diskann](https://github.com/kioxia-jp/aisaq-diskann)**.
- **[Software-Enabled Flash](https://github.com/SoftwareEnabledFlash)** (Linux Foundation) — host-visible flash control; [softwareenabledflash.org](https://softwareenabledflash.org).
- **Google [TurboQuant](https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/)** — KV / attention-side compression (inference memory); **not** a substitute for filesystem indexing in LC-0; see reference doc.
- Patent **US8780634B2** (CAM NAND) — landscape only.

---

## License

Distributed under the **MIT License**. See [`LICENSE`](LICENSE).

---

## Security

See **[SECURITY.md](SECURITY.md)** for how to report vulnerabilities and what is in scope. Product-facing privacy and Ollama URL rules: **[docs/THREAT_MODEL.md](docs/THREAT_MODEL.md)**.

---

## Contact

**Issues:** [github.com/arpahls/lc0_vic/issues](https://github.com/arpahls/lc0_vic/issues)

**Organization:** [ARPA Hellenic Logical Systems](https://arpacorp.net)

---

<div align="center">
  <img src="https://raw.githubusercontent.com/arpahls/skillware/main/assets/arpalogo.png" alt="ARPA Logo" width="50px" />
  <br/>
  Built &amp; Maintained by ARPA Hellenic Logical Systems &amp; the Community
</div>
