Metadata-Version: 2.4
Name: indx
Version: 0.0.1
Summary: Make directories AI-ready, not just files — turn a directory into a portable knowledge space.
Project-URL: Homepage, https://github.com/indxjp/indx
Project-URL: Documentation, https://docs.indx.jp
Project-URL: Repository, https://github.com/indxjp/indx
Project-URL: Issues, https://github.com/indxjp/indx/issues
Project-URL: Changelog, https://github.com/indxjp/indx/blob/main/CHANGELOG.md
Author: indx contributors
License-Expression: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: document-ai,embeddings,knowledge-graph,rag,retrieval
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: click>=8.1
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: pydantic>=2.6
Requires-Dist: rich>=13.0
Requires-Dist: tomli>=2.0; python_version < '3.11'
Requires-Dist: typer>=0.12
Provides-Extra: agent
Requires-Dist: claude-agent-sdk>=0.1; extra == 'agent'
Requires-Dist: fastmcp>=2.0; extra == 'agent'
Requires-Dist: langchain-core; extra == 'agent'
Requires-Dist: openai-agents>=0.1; extra == 'agent'
Requires-Dist: pydantic-ai-slim>=0.1; extra == 'agent'
Provides-Extra: all
Requires-Dist: anthropic; extra == 'all'
Requires-Dist: azure-ai-documentintelligence>=1.0.0; extra == 'all'
Requires-Dist: azure-core>=1.30; extra == 'all'
Requires-Dist: azure-identity>=1.16; extra == 'all'
Requires-Dist: azure-search-documents>=11.5.0; extra == 'all'
Requires-Dist: boto3>=1.40; extra == 'all'
Requires-Dist: chromadb; extra == 'all'
Requires-Dist: claude-agent-sdk>=0.1; extra == 'all'
Requires-Dist: cohere; extra == 'all'
Requires-Dist: docling; extra == 'all'
Requires-Dist: fastapi>=0.110; extra == 'all'
Requires-Dist: fastmcp>=2.0; extra == 'all'
Requires-Dist: flagembedding; extra == 'all'
Requires-Dist: google-cloud-aiplatform>=1.50; extra == 'all'
Requires-Dist: google-cloud-bigquery>=3.20; extra == 'all'
Requires-Dist: google-cloud-documentai>=2.20; extra == 'all'
Requires-Dist: google-genai>=1.0; extra == 'all'
Requires-Dist: httpx; extra == 'all'
Requires-Dist: lancedb; extra == 'all'
Requires-Dist: langchain-core; extra == 'all'
Requires-Dist: litellm>=1.40; extra == 'all'
Requires-Dist: llama-cloud-services; extra == 'all'
Requires-Dist: llama-index-core; extra == 'all'
Requires-Dist: markitdown; extra == 'all'
Requires-Dist: ollama; extra == 'all'
Requires-Dist: openai; extra == 'all'
Requires-Dist: openai-agents>=0.1; extra == 'all'
Requires-Dist: openai>=1.40; extra == 'all'
Requires-Dist: opensearch-py>=2.4; extra == 'all'
Requires-Dist: pgvector; extra == 'all'
Requires-Dist: psycopg[binary]; extra == 'all'
Requires-Dist: pydantic-ai-slim>=0.1; extra == 'all'
Requires-Dist: qdrant-client; extra == 'all'
Requires-Dist: qwen-vl-utils; extra == 'all'
Requires-Dist: sentence-transformers; extra == 'all'
Requires-Dist: torch; extra == 'all'
Requires-Dist: transformers; extra == 'all'
Requires-Dist: unstructured; extra == 'all'
Requires-Dist: uvicorn[standard]>=0.27; extra == 'all'
Requires-Dist: vllm; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic; extra == 'anthropic'
Provides-Extra: app
Requires-Dist: fastapi>=0.110; extra == 'app'
Requires-Dist: uvicorn[standard]>=0.27; extra == 'app'
Provides-Extra: aws
Requires-Dist: boto3>=1.40; extra == 'aws'
Provides-Extra: aws-opensearch
Requires-Dist: boto3>=1.40; extra == 'aws-opensearch'
Requires-Dist: opensearch-py>=2.4; extra == 'aws-opensearch'
Provides-Extra: azure
Requires-Dist: azure-ai-documentintelligence>=1.0.0; extra == 'azure'
Requires-Dist: azure-core>=1.30; extra == 'azure'
Requires-Dist: azure-identity>=1.16; extra == 'azure'
Requires-Dist: azure-search-documents>=11.5.0; extra == 'azure'
Requires-Dist: openai>=1.40; extra == 'azure'
Provides-Extra: bge
Requires-Dist: flagembedding; extra == 'bge'
Requires-Dist: torch; extra == 'bge'
Provides-Extra: chroma
Requires-Dist: chromadb; extra == 'chroma'
Provides-Extra: claude-agent
Requires-Dist: claude-agent-sdk>=0.1; extra == 'claude-agent'
Provides-Extra: cloud
Requires-Dist: docling; extra == 'cloud'
Requires-Dist: openai; extra == 'cloud'
Requires-Dist: qdrant-client; extra == 'cloud'
Provides-Extra: cohere
Requires-Dist: cohere; extra == 'cohere'
Provides-Extra: defaults
Requires-Dist: docling; extra == 'defaults'
Requires-Dist: flagembedding; extra == 'defaults'
Requires-Dist: ollama; extra == 'defaults'
Requires-Dist: qdrant-client; extra == 'defaults'
Requires-Dist: torch; extra == 'defaults'
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: nox; extra == 'dev'
Requires-Dist: numpy; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: pyyaml; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: docling
Requires-Dist: docling; extra == 'docling'
Provides-Extra: e5
Requires-Dist: sentence-transformers; extra == 'e5'
Requires-Dist: torch; extra == 'e5'
Provides-Extra: gcp
Requires-Dist: google-cloud-bigquery>=3.20; extra == 'gcp'
Requires-Dist: google-cloud-documentai>=2.20; extra == 'gcp'
Requires-Dist: google-genai>=1.0; extra == 'gcp'
Provides-Extra: gcp-vectorsearch
Requires-Dist: google-cloud-aiplatform>=1.50; extra == 'gcp-vectorsearch'
Requires-Dist: google-cloud-bigquery>=3.20; extra == 'gcp-vectorsearch'
Requires-Dist: google-cloud-documentai>=2.20; extra == 'gcp-vectorsearch'
Requires-Dist: google-genai>=1.0; extra == 'gcp-vectorsearch'
Provides-Extra: lancedb
Requires-Dist: lancedb; extra == 'lancedb'
Provides-Extra: langchain
Requires-Dist: langchain-core; extra == 'langchain'
Provides-Extra: litellm
Requires-Dist: litellm>=1.40; extra == 'litellm'
Provides-Extra: llamaindex
Requires-Dist: llama-index-core; extra == 'llamaindex'
Provides-Extra: llamaparse
Requires-Dist: llama-cloud-services; extra == 'llamaparse'
Provides-Extra: local
Requires-Dist: docling; extra == 'local'
Requires-Dist: flagembedding; extra == 'local'
Requires-Dist: ollama; extra == 'local'
Requires-Dist: qdrant-client; extra == 'local'
Requires-Dist: torch; extra == 'local'
Provides-Extra: markitdown
Requires-Dist: markitdown; extra == 'markitdown'
Provides-Extra: mcp
Requires-Dist: fastmcp>=2.0; extra == 'mcp'
Provides-Extra: ollama
Requires-Dist: ollama; extra == 'ollama'
Provides-Extra: openai
Requires-Dist: openai; extra == 'openai'
Provides-Extra: openai-agents
Requires-Dist: openai-agents>=0.1; extra == 'openai-agents'
Provides-Extra: pgvector
Requires-Dist: pgvector; extra == 'pgvector'
Requires-Dist: psycopg[binary]; extra == 'pgvector'
Provides-Extra: pydantic-ai
Requires-Dist: pydantic-ai-slim>=0.1; extra == 'pydantic-ai'
Provides-Extra: qdrant
Requires-Dist: qdrant-client; extra == 'qdrant'
Provides-Extra: qwen-vl
Requires-Dist: qwen-vl-utils; extra == 'qwen-vl'
Requires-Dist: torch; extra == 'qwen-vl'
Requires-Dist: transformers; extra == 'qwen-vl'
Provides-Extra: unstructured
Requires-Dist: unstructured; extra == 'unstructured'
Provides-Extra: vllm
Requires-Dist: openai; extra == 'vllm'
Requires-Dist: vllm; extra == 'vllm'
Provides-Extra: vlm-local
Requires-Dist: httpx; extra == 'vlm-local'
Description-Content-Type: text/markdown

# indx

> **Make directories AI-ready, not just files.** Point indx at a folder and get back a
> *knowledge space*: structure, folder lineage, file-to-file relationships, and semantic
> metadata that AI agents and RAG systems can reason over. Open-source · Python · CLI + SDK
> · Apache-2.0.

### See it: `indx demo` (build → inspect → query, fully offline)

One command builds, inspects, and queries a bundled sample corpus — no user data, no
installs, no API keys. Real captured output:

```text
$ indx demo
indx demo — building a sample 'team handbook' knowledge space…

stage: walk
stage: parse
stage: chunk
stage: relate
stage: enrich
stage: embed-pack
✓ 7 docs · 7 chunks · 19 relations → /tmp/indx-demo-XXXX/demo (0.01s)
  components: parser=plaintext llm=none embedder=hash store=jsonl format=.indx

/tmp/indx-demo-XXXX/demo  schema=1 indx=0.0.1
  documents=7 chunks=7 relations=19 embeddings=7 embedding=hash/256
       Types                    Relations
  type       count        type         count
  markdown       6        references      14
  text           1        sibling          5

sample query (keyword/lexical, offline): how do I onboard?
  score  source                      text
  0.121  engineering/code-review.md  # Code Review  Code review keeps our codebase…
  0.098  people/remote-work.md       # Remote Work Policy  Acme Robotics is remote-…
  0.095  handbook/welcome.md         # Welcome to Acme Robotics  This is the Acme …

✓ that's the whole flow — built offline with keyword/lexical retrieval, no API key.
  run it on your own folder: indx ./your-docs --out ./ai-ready.indx --offline
```

> The recording above is a trimmed, ANSI-stripped transcript of an actual `indx demo` run.

```bash
pip install indx
indx demo                                    # instant: build → inspect → query a bundled sample, fully offline, no data needed
indx ./docs --out ./ai-ready.indx --offline  # index your own folder, fully offline (zero extra deps)
indx inspect ./ai-ready.indx
indx query   ./ai-ready.indx "how do I onboard?"
indx app                                     # visual, config-driven tester: build → inspect → query in the browser (pip install indx[app])
```

> The **default** stack targets cloud backends (docling parser, OpenAI LLM +
> embeddings, qdrant store) — install it with `pip install indx[cloud]` and set the
> matching API keys. `--offline` selects the zero-dependency core stack (plaintext
> parser → `hash` embedder → `jsonl` no-DB store → `.indx` archive), so every command
> above runs as-is on a bare `pip install indx` with no extras and nothing to configure.
> For a fully managed single-vendor build, three cloud profile extras wire every slot to
> that cloud's services with one install and one flag:
> `pip install "indx[aws]"` → `indx ./docs --out ./out --aws` (Textract → Bedrock → Titan → S3 Vectors),
> `pip install "indx[azure]"` → `indx ./docs --out ./out --azure` (Document Intelligence → Azure OpenAI → AI Search),
> `pip install "indx[gcp]"` → `indx ./docs --out ./out --gcp` (Document AI → Gemini → gemini-embedding → BigQuery).
>
> Note what the offline core does and doesn't do. The `hash` embedder is a deterministic
> hashing trick, so offline `query` is **keyword/lexical** retrieval, **not** semantic
> vector search — true semantic search needs a real embedder extra (e.g. `bge` or
> `openai`) selected explicitly. Likewise, the offline `enrich` step derives metadata
> (type, topics, tags, summary) **locally and without an LLM call**; LLM/VLM enrichment is
> opt-in via the cloud/local extras.

indx **composes** file parsers (Docling, Unstructured, …) rather than replacing them, then
layers on what they discard — the *arrangement* of files. Every major component (parser,
LLM, embedder, vector store, output) is a swappable, typed slot, so you can run the
cloud default stack or the fully offline core from the same CLI.

## Plug a knowledge space into an AI agent

A `.indx` archive is a portable knowledge space — carry it like a **USB drive** and plug it
into any agent framework in one line:

```python
from indx.agent import connect

kb = connect("ai-ready/handbook.indx")   # load the "USB drive"
tools = kb.openai()                       # OpenAI Agents SDK …or .langchain() / .pydantic_ai() / .claude()
```

Or serve it to any [MCP](https://modelcontextprotocol.io) client — Claude Desktop, Cursor, the
TypeScript [Mastra](https://mastra.ai) framework — with no Python glue on the client side:

```bash
pip install "indx[agent]"            # all framework adapters + the MCP server
indx mcp ai-ready/handbook.indx      # serve indx_search / indx_overview / indx_get_document
```

Every connector exposes the same three read-only tools — **search**, **overview**,
**get-document** — built on the same retrieval path as the CLI. See the
[AI agents guide](https://docs.indx.jp/guides/ai-agents/).

## Status

Alpha (`0.0.1`). The zero-dependency core path (`plaintext` parser → `hash` embedder →
`jsonl` no-DB store → `.indx` archive) runs end to end and is fully air-gapped — reach it
with `indx demo` or by adding `--offline` to any build. The optional cloud/local backends
(docling, openai, ollama, bge-m3, qdrant, plus the managed AWS/Azure/GCP profiles, …) are
implemented and selected through the registry: install the matching extra
(e.g. `pip install "indx[cloud]"`) and provide credentials to switch a slot onto it. The
`.indx` archive format is at `schema_version` `"1"`; public APIs may still shift before
`1.0` — see the [CHANGELOG](https://github.com/indxjp/indx/blob/main/CHANGELOG.md) and the
[documentation](https://docs.indx.jp).

## Documentation

Full documentation — quickstart, guides, the pipeline & stages, and the API/CLI reference —
lives at **[docs.indx.jp](https://docs.indx.jp)**.

## Development

```bash
python -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]"
nox -s tests          # fast offline suite: unit + corpus
nox -l                # list every session (integration / docker / airgap / live / record-fixtures)
```

## License

Apache-2.0.
