Metadata-Version: 2.4
Name: digestkit-core
Version: 0.1.0
Summary: Core Source / Extractor protocols and concrete implementations for the digestkit ecosystem
Author: Koki NAKAMURA
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: digestkit,extractor,protocol,source
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: <3.14,>=3.11
Provides-Extra: all
Requires-Dist: httpx>=0.27; extra == 'all'
Requires-Dist: notion-client>=2; extra == 'all'
Requires-Dist: pypdf>=4; extra == 'all'
Requires-Dist: trafilatura>=1.12; extra == 'all'
Provides-Extra: dev
Requires-Dist: pyright; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: notion
Requires-Dist: notion-client>=2; extra == 'notion'
Provides-Extra: pdf
Requires-Dist: pypdf>=4; extra == 'pdf'
Provides-Extra: web
Requires-Dist: httpx>=0.27; extra == 'web'
Requires-Dist: trafilatura>=1.12; extra == 'web'
Description-Content-Type: text/markdown

**English** | [日本語](README.ja.md)

# digestkit-core

[![CI](https://img.shields.io/github/actions/workflow/status/koki-nakamura22/inboxkit/digestkit-core-inspection.yml?label=CI)](https://github.com/koki-nakamura22/inboxkit/actions/workflows/digestkit-core-inspection.yml)
[![License: Apache 2.0](https://img.shields.io/badge/license-Apache--2.0-blue)](https://opensource.org/licenses/Apache-2.0)
![Python](https://img.shields.io/pypi/pyversions/digestkit-core?label=python)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

Neutral core library providing `Source` / `Extractor` protocols and reusable
concrete implementations shared by [digestkit](../digestkit) (human-facing
1:1 digest pipeline) and [rag-ingest](../rag-ingest) (machine-facing 1:N
ingestion pipeline).

`digestkit-core` is deliberately kept free of LLM, vector-store, and
notification dependencies so it stays usable across both consumers.

## What's inside

| Module                                       | Provides                                                  |
| -------------------------------------------- | --------------------------------------------------------- |
| `digestkit_core.protocols`                   | `Source`, `Extractor` (`runtime_checkable` Protocols)     |
| `digestkit_core.types`                       | `Item`, `Digest`, `DigestkitError`, `FailureInfo`, ...    |
| `digestkit_core.sources.local_directory`     | `LocalDirectorySource` (filesystem glob)                  |
| `digestkit_core.sources.notion_database`     | `NotionDatabaseSource` (Notion DB query + ack callbacks)  |
| `digestkit_core.extractors.pdf`              | `PDFExtractor` + `ExtractionError`                        |
| `digestkit_core.extractors.webpage`          | `WebPageExtractor` (httpx + trafilatura)                  |

## Installation

> **Note**: digestkit-core is not yet published to PyPI. Install from the
> umbrella repository's `main` branch using a git URL until the first release.

```bash
pip install "digestkit-core @ git+https://github.com/koki-nakamura22/inboxkit.git@main#subdirectory=packages/digestkit-core"
```

For [uv](https://docs.astral.sh/uv/) projects:

```toml
[project]
dependencies = ["digestkit-core>=0.1,<0.2"]

[tool.uv.sources]
digestkit-core = { git = "https://github.com/koki-nakamura22/inboxkit.git", subdirectory = "packages/digestkit-core", branch = "main" }
```

End users will typically not depend on `digestkit-core` directly. Installing
`digestkit` or `rag-ingest` pulls it in automatically.

## Neutrality contract

`digestkit-core` is forbidden to depend on:

- LLM clients (`litellm`, provider SDKs)
- Vector stores (`sqlite-vec`, ...)
- Notification systems (SMTP, Slack SDK)
- `digestkit` or `rag-ingest` themselves (reverse-direction dependency)

This is enforced in CI via `.github/workflows/digestkit-core-inspection.yml`.
The rationale is documented in
[ADR-0003](../../docs/adr/0003-digestkit-core-extraction-policy.md).

## Contributing

See the umbrella [CONTRIBUTING.md](../../CONTRIBUTING.md) for development
setup, lint / format / typecheck targets, and the pre-commit hook.
