Metadata-Version: 2.4
Name: schema-quarry
Version: 0.0.1
Summary: Schema Quarry
Author: Team Statistikktjenester
Author-email: Team Statistikktjenester <contact-email@ssb.no>
License-Expression: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: License :: OSI Approved :: MIT License
Classifier: Development Status :: 4 - Beta
Classifier: Typing :: Typed
Requires-Dist: click>=8.0.1
Requires-Dist: gcsfs>=2025.3.0
Requires-Dist: openapi-spec-validator>=0.8.4
Requires-Dist: prance>=25.4.8.0
Requires-Dist: pyarrow>=23.0.1
Requires-Dist: requests>=2.32.5
Requires-Dist: ruamel-yaml>=0.19.1
Maintainer: Statistics Norway, Data enablement Department (724)
Requires-Python: >=3.12
Project-URL: homepage, https://github.com/statisticsnorway/schema-quarry
Project-URL: repository, https://github.com/statisticsnorway/schema-quarry
Project-URL: documentation, https://statisticsnorway.github.io/schema-quarry
Project-URL: Changelog, https://github.com/statisticsnorway/schema-quarry/releases
Description-Content-Type: text/markdown

# Schema Quarry

[![PyPI](https://img.shields.io/pypi/v/schema-quarry.svg)][pypi status]
[![Status](https://img.shields.io/pypi/status/schema-quarry.svg)][pypi status]
[![Python Version](https://img.shields.io/pypi/pyversions/schema-quarry)][pypi status]
[![License](https://img.shields.io/pypi/l/schema-quarry)][license]

[![Documentation](https://github.com/statisticsnorway/schema-quarry/actions/workflows/docs.yml/badge.svg)][documentation]
[![Tests](https://github.com/statisticsnorway/schema-quarry/actions/workflows/tests.yml/badge.svg)][tests]
[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=statisticsnorway_schema-quarry&metric=coverage)][sonarcov]
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=statisticsnorway_schema-quarry&metric=alert_status)][sonarquality]

[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)][pre-commit]
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)][black]
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)][poetry]

[pypi status]: https://pypi.org/project/schema-quarry/
[documentation]: https://statisticsnorway.github.io/schema-quarry
[tests]: https://github.com/statisticsnorway/schema-quarry/actions?workflow=Tests
[sonarcov]: https://sonarcloud.io/summary/overall?id=statisticsnorway_schema-quarry
[sonarquality]: https://sonarcloud.io/summary/overall?id=statisticsnorway_schema-quarry
[pre-commit]: https://github.com/pre-commit/pre-commit
[black]: https://github.com/psf/black
[poetry]: https://python-poetry.org/

## Features

- Convert OpenAPI definitions to Parquet schemas
- Use Schema Quarry as a CLI or as a Python library

## Requirements

- Python 3.12+

## Installation

You can install _Schema Quarry_ via [pip] from [PyPI]:

```console
pip install schema-quarry
```

For local development with uv:

```console
uv sync --dev
```

## Usage

CLI:

```console
schema-quarry generate \
  --source "https://petstore3.swagger.io/api/v3/openapi.json" \
  --root-schema Pet \
  --output-file pet.parquet \
  --print-schema
```

Skattemelding example:

```console
schema-quarry generate \
  --source "https://app.swaggerhub.com/apiproxy/registry/skatteetaten/skattemelding-api/4.2.0" \
  --root-schema Skattemelding \
  --output-file skattemelding.parquet \
  --print-schema
```

Write directly to Google Cloud Storage:

```console
schema-quarry generate \
  --source "https://app.swaggerhub.com/apiproxy/registry/skatteetaten/skattemelding-api/4.2.0" \
  --root-schema Skattemelding \
  --output-file "gs://my-bucket/schemas/skattemelding.parquet"
```

Library:

```python
from schema_quarry import build_parquet_schema

result = build_parquet_schema(
    source="https://petstore3.swagger.io/api/v3/openapi.json",
    root_schema="Pet",
    output_file="pet.parquet",
)

print(result.schema)
print(result.parquet_path)
```

`output_file` also accepts `gs://...` URIs and writes them with `gcsfs`.

Please see the [Reference Guide] for details.

## Tests

Run the full test suite locally with:

```console
uv run pytest
```

The tests are split into two groups:

- `tests/unit/` checks the building blocks of the project, such as the CLI, the Python library API, and specific OpenAPI-to-Parquet mapping rules
- `tests/golden/` checks that real example inputs still produce exactly the same output as the checked-in reference files

The checked-in test data is stored in `tests/resources/`:

- `tests/resources/master/openapi/` contains OpenAPI documents used as regression inputs
- `tests/resources/master/parquet/` contains the expected Parquet schemas for those inputs
- `tests/resources/snapshots/schema-text/` contains expected text output for `format_schema(...)`

In other words:

- if you change the converter logic, the golden tests will tell you whether the generated Parquet schema changed for any of the reference APIs
- if you change schema formatting, the snapshot tests will tell you whether the human-readable schema text changed

You can also run the local quality checks with:

```console
uv run ruff check tests
uv run mypy tests
```

## Contributing

Contributions are very welcome.
To learn more, see the [Contributor Guide].

## License

Distributed under the terms of the [MIT license][license],
_Schema Quarry_ is free and open source software.

## Issues

If you encounter any problems,
please [file an issue] along with a detailed description.

## Credits

This project was generated from [Statistics Norway]'s [SSB PyPI Template].

[statistics norway]: https://www.ssb.no/en
[pypi]: https://pypi.org/
[ssb pypi template]: https://github.com/statisticsnorway/ssb-pypitemplate
[file an issue]: https://github.com/statisticsnorway/schema-quarry/issues
[pip]: https://pip.pypa.io/

<!-- github-only -->

[license]: https://github.com/statisticsnorway/schema-quarry/blob/main/LICENSE
[contributor guide]: https://github.com/statisticsnorway/schema-quarry/blob/main/CONTRIBUTING.md
[reference guide]: https://statisticsnorway.github.io/schema-quarry/reference.html
