Metadata-Version: 2.4
Name: ddigraph
Version: 0.4.1
Summary: DDI to Knowledge Graph toolkit - transform DDI metadata into graph databases (Neo4j, RDF, Gremlin, NetworkX)
Project-URL: Homepage, https://github.com/pbisson44/ddigraph
Project-URL: Repository, https://github.com/pbisson44/ddigraph
Project-URL: Documentation, https://pbisson44.github.io/ddigraph/
Project-URL: Changelog, https://github.com/pbisson44/ddigraph/blob/main/CHANGELOG.md
Project-URL: Discussion, https://github.com/pbisson44/ddigraph/discussions
Project-URL: Issues, https://github.com/pbisson44/ddigraph/issues
Author-email: ddigraph maintainers <ddigraph@noreply.github.com>
License: MIT License
        
        Copyright (c) 2025 Philippe Bisson
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
License-File: NOTICE
Keywords: ddi,ddi-codebook,ddi-l,ddi-lifecycle,fragment-instance,graph-database,gremlin,knowledge-graph,metadata,neo4j,networkx,rdf,sparql,survey,xml
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Database
Classifier: Topic :: Database :: Front-Ends
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Typing :: Typed
Requires-Python: <3.15,>=3.12
Requires-Dist: lxml>=5.1
Requires-Dist: neo4j>=5.19
Requires-Dist: orjson>=3.10
Requires-Dist: pydantic-settings>=2.13.1
Requires-Dist: pydantic>=2.7
Requires-Dist: xmlschema>=3.4
Provides-Extra: all
Requires-Dist: gremlinpython<4.0.0,>=3.8.1; extra == 'all'
Requires-Dist: networkx>=3.6.1; extra == 'all'
Requires-Dist: openpyxl>=3.1.5; extra == 'all'
Requires-Dist: pandas>=2.3.3; extra == 'all'
Requires-Dist: rdflib<8.0.0,>=7.5.0; extra == 'all'
Requires-Dist: sdmx1>=2.26; extra == 'all'
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == 'dev'
Requires-Dist: check-manifest>=0.50; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: psutil>=5.9; extra == 'dev'
Requires-Dist: pyroma>=4.2; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.1; extra == 'dev'
Requires-Dist: pytest>=8.2; extra == 'dev'
Requires-Dist: ruff>=0.15.10; extra == 'dev'
Requires-Dist: twine>=5.1; extra == 'dev'
Requires-Dist: types-lxml>=2026.2.16; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocs-static-i18n>=1.2; extra == 'docs'
Requires-Dist: mkdocs<2,>=1.6; extra == 'docs'
Requires-Dist: pymdown-extensions>=10.21.2; extra == 'docs'
Requires-Dist: textstat>=0.7; extra == 'docs'
Provides-Extra: gremlin
Requires-Dist: gremlinpython<4.0.0,>=3.8.1; extra == 'gremlin'
Provides-Extra: networkx
Requires-Dist: networkx>=3.6.1; extra == 'networkx'
Provides-Extra: pandas
Requires-Dist: openpyxl>=3.1.5; extra == 'pandas'
Requires-Dist: pandas>=2.3.3; extra == 'pandas'
Provides-Extra: rdf
Requires-Dist: rdflib<8.0.0,>=7.5.0; extra == 'rdf'
Provides-Extra: sdmx
Requires-Dist: sdmx1>=2.26; extra == 'sdmx'
Description-Content-Type: text/markdown

# ddigraph

[![CI](https://img.shields.io/github/actions/workflow/status/pbisson44/ddigraph/ci.yml?label=CI&logo=github)](https://github.com/pbisson44/ddigraph/actions)
[![codecov](https://codecov.io/gh/pbisson44/ddigraph/branch/main/graph/badge.svg)](https://codecov.io/gh/pbisson44/ddigraph)
[![PyPI](https://img.shields.io/pypi/v/ddigraph?logo=pypi&logoColor=white)](https://pypi.org/project/ddigraph/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.12%E2%80%933.14-blue?logo=python)](pyproject.toml)
[![Neo4j](https://img.shields.io/badge/Neo4j-5.x-green?logo=neo4j)](https://neo4j.com/docs/)
[![Code style](https://img.shields.io/badge/code%20style-ruff-000000?logo=ruff&logoColor=white)](https://docs.astral.sh/ruff/)
[![Type checking](https://img.shields.io/badge/type%20checking-mypy-1678be?logo=mypy&logoColor=white)](https://mypy-lang.org/)

A modern Python toolkit that transforms [DDI](https://ddialliance.org/) (Data Documentation
Initiative) XML metadata into knowledge graphs. Supports **DDI Codebook** and **DDI-L
FragmentInstance** formats with streaming parsing, batched writes, and full async I/O across
multiple graph backends.

[Documentation](https://pbisson44.github.io/ddigraph/) |
[Getting Started](https://pbisson44.github.io/ddigraph/getting-started/installation/) |
[PyPI](https://pypi.org/project/ddigraph/) |
[Source Code](https://github.com/pbisson44/ddigraph)

---

## Features

- **Multi-backend support** -- Neo4j, RDF/SPARQL, Gremlin, NetworkX, and pandas
- **Streaming XML processing** -- Memory-bounded `iterparse` for files of any size
- **Batched writes** -- UNWIND-based Cypher for 10-100x fewer database round trips
- **Async I/O** -- Concurrent parsing and writing with back-pressure control
- **Format auto-detection** -- Automatically identifies DDI Codebook vs Lifecycle format
- **Unified schema** -- Single source of truth for all node and relationship definitions
- **Adapter pattern** -- Plug in custom graph backends via `GraphWriteAdapter` protocol
- **Production-ready** -- Retry logic, observability hooks, pydantic-based configuration

## Quick Start

### Install

```bash
pip install ddigraph
```

### Load DDI metadata (CLI)

```bash
# Set Neo4j connection
export DDIGRAPH_NEO4J_URI=bolt://localhost:7687
export DDIGRAPH_NEO4J_USER=neo4j
export DDIGRAPH_NEO4J_PASSWORD=secret

# Bootstrap schema and load data (format is auto-detected)
ddigraph bootstrap
ddigraph load survey.xml --dataset-id my-survey
```

### Load DDI metadata (Python)

```python
import asyncio
from neo4j import AsyncGraphDatabase
from ddigraph import DDILoader, DDIFragmentLoader, detect_ddi_format
from ddigraph.config import Settings

async def main():
    settings = Settings()
    driver = AsyncGraphDatabase.driver(
        settings.neo4j_uri,
        auth=(settings.neo4j_user, settings.neo4j_password.get_secret_value()),
    )
    path = "survey.xml"
    if detect_ddi_format(path) == "lifecycle":
        loader = DDIFragmentLoader(driver, settings=settings)
        result = await loader.load(path)
    else:
        loader = DDILoader(driver, settings=settings)
        result = await loader.load(path, dataset_id="my-survey")
    print(result)  # {'Instrument': 1, 'Sequence': 388, 'QuestionItem': 373, ...}
    await driver.close()

asyncio.run(main())
```

## Supported Formats

| Format | Description | Use Case |
| ------ | ----------- | -------- |
| **DDI Codebook** | Traditional flat format with central Dataset node | Survey archives, data catalogs |
| **DDI-L FragmentInstance** | Lifecycle 3.x format with reusable fragments | Questionnaire design, CAPI/CAWI instruments |
| **DDI-CDI 1.0** | Cross-Domain Integration metadata | Data integration, statistical production |

### XSD Coverage

`ddigraph` ships with 100 % coverage of every concrete identifiable element
declared in the bundled XSD schemas (`schemas/`).  Coverage is enforced by the
audit script and a pytest guardrail so new schema releases surface any gaps:

| Flavor      | Scope                                                                 | Target | Covered |
| ----------- | --------------------------------------------------------------------- | -----: | ------: |
| DDI-L 3.x   | Concrete Maintainable + Versionable + Identifiable elements           |    189 |  100 %  |
| DDI-C 2.x   | Codebook elements with the `GLOBALS` attribute group (no layout tags) |     73 |  100 %  |
| DDI-CDI 1.0 | Concrete top-level entity elements (associations excluded)            |    210 |  100 %  |

Run `python scripts/xsd_coverage.py` to regenerate the audit or
`python scripts/xsd_coverage.py --json` for machine-readable output.

## Supported Backends

| Backend | Description | Use Case |
| ------- | ----------- | -------- |
| **Neo4j** | Native graph database (Bolt) | Production deployments, complex queries |
| **RDF/SPARQL** | Semantic web triplestores | Linked data, ontology integration |
| **Gremlin** | Graph traversal language | JanusGraph, Neptune, Cosmos DB |
| **NetworkX** | Python graph library | Local analysis, prototyping |
| **pandas** | DataFrame-based | Tabular analysis, Excel export |

## Docker Quick Start

```bash
docker run --rm --name neo4j-demo \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:5

export DDIGRAPH_NEO4J_URI=bolt://localhost:7687
export DDIGRAPH_NEO4J_USER=neo4j
export DDIGRAPH_NEO4J_PASSWORD=password

ddigraph bootstrap
ddigraph load your-file.xml --dataset-id demo
```

## Documentation

Full documentation is available at **[pbisson44.github.io/ddigraph](https://pbisson44.github.io/ddigraph/)** in English and French.

- [Getting Started](https://pbisson44.github.io/ddigraph/getting-started/installation/) -- Installation, quick start, 10-minute tutorial
- [User Guide](https://pbisson44.github.io/ddigraph/user-guide/architecture/) -- Architecture, DDI formats, relationships, adapters
- [Graph Backends](https://pbisson44.github.io/ddigraph/backends/neo4j/) -- Neo4j, RDF/SPARQL, Gremlin, NetworkX
- [Reference](https://pbisson44.github.io/ddigraph/reference/cli/) -- CLI commands, configuration
- [Advanced](https://pbisson44.github.io/ddigraph/advanced/tuning/) -- Performance tuning, AI readiness, standards interoperability
- [Contributing](https://pbisson44.github.io/ddigraph/project/contributing/) -- How to contribute

## Development

```bash
git clone https://github.com/pbisson44/ddigraph.git
cd ddigraph
pip install -e ".[dev,docs]"

ruff check . && ruff format .
# Docstring linting is currently enforced for src/ddigraph only.
pydocstyle src/ddigraph
mypy .
pytest
mkdocs serve
```

## License

MIT -- see [LICENSE](LICENSE) for details.
