Metadata-Version: 2.4
Name: czso
Version: 0.1.0
Summary: Python wrapper around Open Data from the Czech Statistical Office
Author: Mojmir Vinkler
License-Expression: MIT
License-File: LICENSE
Keywords: czech,czso,open-data,statistics
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.12
Requires-Dist: pandas>=2.0
Requires-Dist: requests>=2.28
Description-Content-Type: text/markdown

# czso

Python wrapper around Open Data from the [Czech Statistical Office (CZSO)](https://www.czso.cz/).

The real credit goes to CZSO for publishing hundreds of machine-readable datasets with proper metadata and codelists.

Inspired by the R package by Petr Bouchal: [petrbouchal/czso](https://github.com/petrbouchal/czso)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Marigold/czso/blob/main/demo.ipynb)

## Installation

```bash
pip install czso
```

Or with [uv](https://docs.astral.sh/uv/):

```bash
uv add czso
```

## Quick start

> **📓 See [`demo.ipynb`](demo.ipynb) for an interactive walkthrough** — browse the catalogue, download datasets, plot wages over time, and explore codelists.

```python
import czso

# Browse available datasets
catalogue = czso.get_catalogue()
print(catalogue[["dataset_id", "title"]].head())

# Download a dataset as a DataFrame
df = czso.get_table("110079")

# Get dataset with metadata
df, meta = czso.get_table("110079", include_metadata=True)
print(meta["title"])

# Get raw (uncleaned) data
df_raw = czso.get_table("110079", clean=False)

# Retrieve a codelist (číselník)
codelist = czso.get_codelist(100)
```

## API

| Function | Description |
|----------|-------------|
| `get_catalogue()` | Full catalogue of available CZSO open datasets |
| `get_table(dataset_id, ...)` | Download and read a dataset as a DataFrame |
| `get_dataset_metadata(dataset_id)` | JSON-LD metadata for a dataset |
| `get_table_schema(dataset_id)` | JSON table schema for a dataset resource |
| `get_codelist(codelist_id)` | Retrieve a CZSO codelist (číselník) |

### `get_table` options

- `resource_num` — which resource to download (default `0`)
- `force_redownload` — skip cache and re-download
- `dest_dir` — directory for caching downloaded files
- `clean` — drop code columns, rename to friendly names (default `True`)
- `include_metadata` — return `(DataFrame, metadata_dict)` tuple

## AI coding skill

This repo includes a skill that teaches AI coding agents how to use the `czso` package. Install it with:

```bash
npx skills add https://github.com/Marigold/czso --skill czso-data
```

## Related projects

- [mcp-csu](https://github.com/reloadcz/mcp-csu) — MCP server for the CZSO DataStat API. Gives AI assistants (Claude, etc.) direct access to 700+ statistical datasets. Run with `uvx mcp-csu`.
- [petrbouchal/czso](https://github.com/petrbouchal/czso) — R package for CZSO open data (the inspiration for this project).

## Development

```bash
make .venv          # install all deps (including dev: ruff, ty, pytest)
make format         # auto-format with ruff
make lint           # lint + fix with ruff
make check-typing   # typecheck with ty
make test           # format + lint + typecheck + unit tests
```

## License

MIT
