Metadata-Version: 2.4
Name: tablevault
Version: 0.2.6
Summary: Centralized data repository for cross-process data filtering in Python.
Author-email: Jinjin Zhao <j2zhao@uchicago.edu>
License-Expression: MIT
Keywords: example,package,python
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python-arango
Requires-Dist: psutil
Provides-Extra: dev
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: mkdocs-material; extra == "dev"
Requires-Dist: mike; extra == "dev"
Dynamic: license-file

# TableVault

TableVault is a Python package for storing and querying workflow data with lineage tracking across scripts and notebooks.

It uses ArangoDB as the backend and gives you a single API (`Vault`) to:

- Store typed data lists (`file`, `document`, `embedding`, `record`)
- Track upstream/downstream dependencies between items
- Search by text, code provenance, and embedding similarity
- Coordinate long-running processes with safe pause/stop checkpoints

## Documentation

You can find the full documentation at [tablevault.org](https://tablevault.org).

## Installation

Install from PyPI:

```bash
pip install tablevault
```

## Citation

If you use TableVault in research, cite:

- Zhao, J. and Krishnan, S. (2025). *TableVault: Managing Dynamic Data Collections for LLM-Augmented Workflows*. NOVAS @ SIGMOD.  
  ArXiv: <https://arxiv.org/abs/2506.18257>

## License

MIT License. See `LICENSE`.
