# Open Data Products Python SDK for AI Agents

This repository contains the AI-agent-first Python SDK for the
OpenDataProducts.org standards family: ODPS, ODPC, ODPG, and ODPV.

Use this file as the operational handoff for AI agents that need to use the SDK
or its MCP server. Use `AGENTS.md` only when contributing code to this
repository.

## Primary Agent Surfaces

- Python package: `open_data_products`
- Unified CLI: `open-data-products`
- MCP server command: `open-data-products serve`
- ARWS manifest command: `open-data-products manifest --json`
- Optional Data Contract adapter: `pip install "open-data-products[contracts]"`
- Codex project MCP config: `.codex/config.toml`
- Claude Code project MCP config: `.mcp.json`
- Main human README: `README.md`
- API reference: `docs/API.md`
- Beginner setup guide: `examples/guides/00-setup-python.md`
- Functional test report: `docs/functional-test-report.md`

## Install Before MCP Launch

The project-level MCP configs are intentionally PATH-based and contain no local
absolute paths. Before an agent host starts the MCP server, install the package
in the active environment:

```bash
pip install -e .
```

or install the published package so `open-data-products` is available on PATH.

## Core Python API

Use the top-level API when the input spec is unknown:

- `load_document(path)`
- `detect_document(document)`
- `validate_document(document_or_path)`
- `explain_document(document_or_path)`
- `resolve_references(document_or_path)`
- `load_summary(path)`
- `list_resources()`
- `get_resource(id)`
- `list_generation_prompts()`
- `load_generation_prompt(name)`
- `get_config(domain="generation")`
- `get_config_path(domain="generation")`
- `copy_config_template("generation", destination)`
- `copy_generation_prompts(destination)`
- `print_config("generation", path)`
- `validate_config("generation", path)`
- `ensure_ollama_model(model="qwen2.5")`
- `load_generation_config(path)`
- `resolve_generation_settings(...)`
- `create_generation_client(settings)`
- `generate_local_artifact(kind, source, output_dir, model="qwen2.5")`
- `generate_local_artifacts(source_dir, output_dir, model="qwen2.5")`
- `resolve_product_contracts(product_path_or_document)`
- `validate_contract(contract_path_or_url)`
- `summarize_contract(contract_path_or_url)`
- `extract_contract_schema(contract_path_or_url)`
- `check_product_contract_alignment(product_path_or_document, contract_path_or_url)`
- `generate_product_contract_report(product_path_or_document, contract_path=None)`
- `build_catalog(input_dir, output_path=None, catalog_id=None, name=None, description=None)`
- `render_catalog_html(document)`
- `write_catalog_html(path, document)`
- `render_catalog_schema_json()`
- `build_catalog_artifacts()`
- `write_catalog_artifacts(output_dir, check=False)`
- `resolve_vocabulary_term(query)`
- `explain_vocabulary_term(term_id)`
- `check_vocabulary_relationship(source, verb, target)`
- `agent_vocabulary_context(term_id)`

These functions support ODPS, ODPC, ODPG, and ODPV documents through one
consistent agent-facing surface.

Data Contract functions are product-oriented. The SDK resolves native ODPS
`/product/contract` entries (`$ref`, `contractURL`, inline `spec`) and
extension-style `extensions.dataContract.href` entries. External lint/export
uses optional `datacontract-cli`; inline specs are static-only.

## CLI Commands

Use these commands for machine-readable workflows:

- `open-data-products validate <path> --json`
- `open-data-products explain <path> --json`
- `open-data-products refs <path> --json`
- `open-data-products resources --json`
- `open-data-products resources --id <resource-id> --json`
- `open-data-products summary <path>`
- `open-data-products generate <source-file> --kind signal --output <generated-dir> --model qwen2.5 --json`
- `open-data-products generate <product-source-dir> --kind product-reference --output <generated-dir> --model qwen2.5 --json`
- `open-data-products manifest --json`
- `open-data-products serve`
- `open-data-products odpc-build <fragments-dir> --output <catalog.yaml> --json`
- `open-data-products odpc-build <fragments-dir> --output <catalog.yaml> --html <catalog.html> --json`
- `open-data-products odpc-summary <catalog.yaml> --json`
- `open-data-products odpc-search "catalog data" --limit 3 --json`
- `open-data-products odpc-artifacts <output-dir> --check --json`
- `open-data-products odpv-summary --json`
- `open-data-products odpv-search "governance policy risk" --limit 3 --json`
- `open-data-products odpv-resolve "reusable data asset" --json`
- `open-data-products odpv-explain DataProduct --json`
- `open-data-products odpv-relationship DataProduct supports UseCase --json`
- `open-data-products odpv-context DataProduct --json`
- `open-data-products product resolve-contracts <product.yaml> --json`
- `open-data-products product check-contract <product.yaml> <contract.yaml> --json`
- `open-data-products product contract-report <product.yaml> [contract.yaml] --json`
- `open-data-products product align-contract <product.yaml> <contract.yaml> --json`
- `open-data-products product contract-schema <contract.yaml> --json`
- `open-data-products product export-contract <contract.yaml> --format jsonschema --json`
- `open-data-products product audit <product.yaml> --json`

ODPG graph reasoning commands:

- `open-data-products odpg-summary <graph.yaml>`
- `open-data-products odpg-traverse <graph.yaml> --start <node-id> --depth 2`
- `open-data-products odpg-analyze <graph.yaml>`
- `open-data-products odpg-agent-context <graph.yaml> --node <node-id> --depth 2`
- `open-data-products odpg-convert --input <graph.graphml> --output <graph.yaml> --json`
- `open-data-products odpg-generate <graph.yaml> --output <graph-explorer.html> --json`

## MCP Tools

The MCP server exposes safe, read-only tools:

- `validate_document`
- `explain_document`
- `resolve_references`
- `list_resources`
- `get_resource`
- `get_config`
- `validate_config`
- `load_summary`
- `catalog_artifacts`
- `search_terms`
- `resolve_vocabulary_term`
- `explain_vocabulary_term`
- `check_vocabulary_relationship`
- `vocabulary_term_context`
- `search_objects`
- `search_graph_objects`
- `summarize_graph`
- `traverse_graph`
- `analyze_graph`
- `agent_context`
- `resolve_product_contracts`
- `validate_product_contracts`
- `check_product_contract_alignment`
- `generate_product_contract_report`
- `summarize_product_contract_risks`
- `validate_data_contract`
- `summarize_data_contract`
- `extract_data_contract_schema`

Tool responses should stay compact and structured. Do not return full document
bodies from MCP handlers; use `load_summary` for lightweight artifact
references.

Contract MCP tools are safe/read-only and do not run live source tests. Use
`check_product_contract_alignment` or `generate_product_contract_report` for
agent-facing product-contract assessment.

## Bundled Resources

Agents can discover bundled resources with `list_resources()` or the
`list_resources` MCP tool. Important resource ids include:

- `odps.schema.json`
- `odpc.schema.yaml`
- `odpc.schema.json`
- `odpc.objects`
- `odpg.schema.yaml`
- `odpg.schema.json`
- `odpg.graph`
- `odpg.objects`
- `odpv.vocabulary`
- `odpv.terms`
- `generation.prompt.system`
- `generation.prompt.odps_data_product_fragment`
- `generation.prompt.odpc_use_case_fragment`
- `generation.prompt.odpc_objective_fragment`
- `generation.prompt.odpc_signal_fragment`
- `generation.prompt.odpg_graph_yaml`

## Generation Prompts

Prompt templates for LLM fragment generation are stored as editable
Markdown files under `open_data_products/generation/data/prompts/`.

The default provider is local Ollama on `http://localhost:11434` with the Qwen
2.5 model installed:

```bash
ollama pull qwen2.5
```

External providers are configured with a YAML file. Store provider names,
models, endpoints, input/output folders, and secret environment variable names
in config. Do not store API key values in config. For OpenAI-compatible
providers, use env vars such as `OPENAI_API_KEY`, `OPENROUTER_API_KEY`, or
`GROQ_API_KEY`. For Anthropic Claude, use `ANTHROPIC_API_KEY` with a provider
entry of `type: anthropic`. The OpenAI-compatible provider path has been
smoke-tested with OpenAI `gpt-4.1-mini` and Groq `openai/gpt-oss-120b` for ODPC
signal fragment generation.

When the SDK is installed from PyPI, the bundled generation config is a
template inside the package. Do not edit `site-packages`; run
`open-data-products config generation --copy-to my-generation.config.yaml`, edit the
copied file, then pass it with `open-data-products generate --config my-generation.config.yaml --kind signal`.
A folder target such as `--copy-to config/` creates missing folders and writes
`generation.config.yaml` inside that folder.
Bundled prompts are templates too; run
`open-data-products config generation --copy-prompts-to prompts/`, edit the
copied Markdown files, then pass them with `open-data-products generate --prompts prompts/ --kind signal`.
Validate a copied config before generation with
`open-data-products config generation --config my-generation.config.yaml --check`
or `validate_config("generation", "my-generation.config.yaml")`. Print the
selected YAML with
`open-data-products config generation --config my-generation.config.yaml --print`.
The check requires explicit provider/model settings, rejects key typos and
secret-looking values, and verifies configured input and prompt paths exist.
For local OpenAI-compatible servers such as LM Studio, vLLM, llama.cpp server,
or LocalAI, use provider entries with `type: openai-chat`; model names are
user-controlled strings and should match the selected server's loaded model.

Use `list_generation_prompts()` and `load_generation_prompt(name)` from Python,
or discover prompt resources with `open-data-products resources --json`.

Use `generate_local_artifact()`, `generate_local_artifacts_for_kind()`, or
`open-data-products generate --kind <kind>` to generate selected artifacts from
one source file or folder. Supported CLI kinds are `product-reference`,
`odps-product`, `use-case`, `objective`, `signal`, and `graph`.

For `--kind odps-product`, folder input processes each Markdown/text file into
one ODPS YAML product. The default `--profile minimal` is evidence-only.
`--profile complete-draft` drafts `SLA`, `dataQuality`, and `pricingPlans` for
review. `--include-components` accepts schema-backed product component names:
`contract`, `SLA`, `dataQuality`, `pricingPlans`, `license`, `dataAccess`,
`dataHolder`, `paymentGateways`, and `productStrategy`. `--max-source-chars`
chunks long source files into fact-extraction calls, merges facts, then
generates ODPS from the merged facts.

The CLI requires an explicit `--kind`. ODPC outputs are separate fragment files
with singular roots: `productReference`, `useCase`, `businessObjective`, and
`signal`. Use `--kind graph` when ODPG graph generation should receive those
generated fragment YAML files as context so graph node ids can match the
fragments.

The prompt set is intended to guide a configured model from source documents to
standards-ready YAML fragments:

- ODPC productReference fragment for an ODPS data product
- ODPC use case fragment
- ODPC business objective fragment
- ODPC signal fragment
- ODPG graph YAML

## Spec Namespaces

- `open_data_products.odps`: ODPS data product models and validation helpers
- `open_data_products.odpc`: ODPC catalog building, catalog helpers, object
  retrieval records, and derived catalog schema artifact generation/checking
- `open_data_products.odpg`: ODPG graph validation, traversal, analysis,
  context, external graph conversion, and explorer generation
- `open_data_products.odpv`: ODPV vocabulary loading, validation, search,
  artifacts, term resolution, relationship compatibility, and term context
- `open_data_products.generation`: editable LLM prompt templates for generating
  standards-ready ODPS, ODPC, and ODPG YAML artifacts, plus configured Ollama
  and OpenAI generation helpers
- `open_data_products.contracts`: optional Data Contract validation adapter,
  ODPS contract reference resolution, schema extraction, alignment checks, and
  product-level reports

## Safety

- The MCP surface is safe/read-only.
- Live Data Contract tests are not exposed through MCP.
- Do not include local absolute paths in project MCP configs.
- Do not include secrets in config, examples, tests, or fixtures.
- Use logical resource ids where possible and let the resource registry resolve
  package paths.
