Metadata-Version: 2.4
Name: owid-catalog
Version: 1.1.1
Summary: Core data types used by OWID for managing data.
Project-URL: Homepage, https://ourworldindata.org
Project-URL: Documentation, https://docs.owid.io/projects/etl/libraries/catalog/
Project-URL: Repository, https://github.com/owid/etl
Project-URL: Issues, https://github.com/owid/etl/issues
Project-URL: Changelog, https://github.com/owid/etl/releases
Author-email: Our World in Data <tech@ourworldindata.org>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Typing :: Typed
Requires-Python: <3.15,>=3.10
Requires-Dist: brotli>=1.2.0
Requires-Dist: dataclasses-json>=0.6.7
Requires-Dist: deprecated>=1.2.14
Requires-Dist: dynamic-yaml>=1.3.5
Requires-Dist: ipdb>=0.13.9
Requires-Dist: jinja2>=3.1.6
Requires-Dist: jsonschema>=3.2.0
Requires-Dist: marshmallow>=3.26.2
Requires-Dist: mistune>=3.2.1
Requires-Dist: owid-datautils
Requires-Dist: owid-repack
Requires-Dist: pandas<3,>=2.2.3
Requires-Dist: platformdirs>=4.0.0
Requires-Dist: pyarrow>=10.0.1
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyreadr>=0.5.0
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: rapidfuzz>=3.14.3
Requires-Dist: rdata>=0.11.2
Requires-Dist: requests>=2.26.0
Requires-Dist: structlog>=21.5.0
Requires-Dist: tqdm>=4.66.0
Requires-Dist: typing-extensions>=4.7.1
Requires-Dist: unidecode>=1.3.4
Requires-Dist: urllib3>=2.7.0
Description-Content-Type: text/markdown

[![Build status](https://badge.buildkite.com/66cc67fc572120ca97b9ffff288d5d73cb33e019dd70323053.svg)](https://buildkite.com/our-world-in-data/owid-catalog-unit-tests)
[![PyPI version](https://badge.fury.io/py/owid-catalog.svg)](https://badge.fury.io/py/owid-catalog)
![](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue.svg)

# owid-catalog

_A Pythonic library for working with OWID data._

The `owid-catalog` library is the foundation of Our World in Data's data management system. It provides:

1. **Data APIs**: Access OWID's published data through unified client interfaces
2. **Data Structures**: Enhanced pandas DataFrames with rich metadata support

## Installation

```bash
pip install owid-catalog
```

## Quick Examples

### Accessing OWID Data

```python
from owid.catalog import fetch, search

# Search for charts (default)
charts = search("population")
tb = charts[0].fetch()

# Fetch data from OWID Chart at ourworldindata.org/grapher/life-expectancy
tb = fetch("life-expectancy")

# Search for tables
tables = search("population", kind="table", namespace="un")
tb = tables[0].fetch()

# Search indicators (using semantic search)
search("renewable energy", kind="indicator")
```

### Working with Data Structures

```python
from owid.catalog import Table
from owid.catalog import processing as pr

# Tables are pandas DataFrames with metadata
tb = Table(df, metadata={"short_name": "population"})

# Metadata propagates through operations
tb_filtered = tb[tb["year"] > 2000]  # Keeps metadata
tb_merged = pr.merge(tb1, tb2, on="country")  # Merges metadata
```

## Documentation

For detailed documentation, see:
- **[API Reference](https://docs.owid.io/projects/etl/en/latest/api/catalog/api/)**: ChartsAPI, IndicatorsAPI, TablesAPI
- **[Data Structures](https://docs.owid.io/projects/etl/en/latest/api/catalog/structures/)**: Dataset, Table, Variable, metadata handling
- **[Full Documentation](https://docs.owid.io/projects/etl/en/latest/api/catalog/)**: Complete library documentation

## Architecture

```mermaid
graph TB
etl -->|reads| snapshot[upstream datasets]
etl -->|generates| s3[data catalog]
catalog[owid-catalog] -->|queries| s3
```

This library is part of OWID's [ETL project](https://github.com/owid/etl), which contains recipes for all datasets we publish.

## Development

You need Python 3.10+, `uv` and `make` installed. Clone the repo, then you can simply run:

```
# run all unit tests and CI checks
make test

# watch for changes, then run all checks
make watch
```

## Changelog

### `v1.1.0`
- **Remove processing log feature**
  - Removed `ProcessingLog` and `processing_log` module from `owid.catalog.core`
  - Removed `combine_indicators_processing_logs` helper
  - Removed `update_log` / `amend_log` methods on `Indicator`
  - Removed processing-log tracking from `Table` arithmetic operations (`__add__`, `__sub__`, `__mul__`, etc.)
  - Removed `processing_log` field from `VariableMeta`

### `v1.0.1`
- **ResponseSet ergonomics**
  - Remove deprecated `ResponseSet.results` property (use `.items` instead)
  - Add `.to_dict()` method for serializing results to plain dicts (useful for AI/LLM context windows)
  - Add `all_fields` parameter to `.to_frame()` to temporarily override display mode without mutating instance state

### `v1.0.0`
- **New unified Client API**
  - `owid.catalog.Client` as single entry point with `ChartsAPI`, `IndicatorsAPI`, `TablesAPI`
  - Quick access via `search()` and `fetch()` convenience functions
  - Rich result types: `ChartResult`, `IndicatorResult`, `TableResult` with `ResponseSet` container
- **Charts API**
  - Fetch chart data by slug, URL, or slug with query params
  - Parse chart slugs from grapher/explorer URLs via `parse_chart_slug()`
  - Explorer best-effort fetching with graceful error handling
  - `set_ui_advanced()` / `set_ui_basic()` for display configuration
- **Tables API**
  - Search catalog by table, namespace, version, dataset, and channel
  - Fetch tables directly by catalog path
  - Embedded catalog index with local caching
- **Indicators API**
  - Semantic search via `search.owid.io` vector embeddings
  - Sort by relevance (similarity + popularity blend) or similarity only
  - `fetch()` for single-column indicator or `fetch_table()` for the full table
- **Search & discovery**
  - Fuzzy, exact, contains, and regex matching modes
  - `.latest()` filtering to keep only newest versions
  - Popularity scores (0.0-1.0) from analytics views, results sorted by popularity
  - `refresh_index` parameter to force catalog index reload
- **Data structures integration**
  - All `fetch()` methods return `owid.catalog.Table` with full metadata
  - `CatalogPath` helper for parsing catalog paths
  - Lazy loading with `load_data=False` for deferred data access
- **Library reorganization**
  - Restructured into `owid.catalog.core` (data structures) and `owid.catalog.api` (remote access)
  - `catalog.find()` deprecated in favor of `Client().tables.search()` (backwards compat maintained)
  - Legacy code moved to `owid.catalog.api.legacy`
  - New dependencies: `pydantic` v2.0+
- **Private data support**
  - Private datasets served from separate R2 bucket
  - API can fetch private data from private bucket
- **Performance**
  - Vectorized operations replacing `iterrows()` in TablesAPI
  - Embedded catalog index loading (removed ETLCatalog dependency)
  - Modularized search into helper methods
- **Other**
  - Thumbnail display in `ResponseSet` for chart results
  - JSON output format support
  - Comprehensive exception handling: `ChartNotFoundError`, `LicenseError`
  - API URLs immutable with Pydantic `Field(frozen=True)`

<details>
<summary>See previous versions</summary>

#### `v0.4.5`
- Allow both `table` and `dataset` parameters in `find()` (they can now be used together)
- Migrate from pyright to ty type checker for improved type checking

#### `v0.4.4`
- Enhanced `find()` with better search capabilities:
  - Case-insensitive search by default (use `case=True` for case-sensitive)
  - Regex support enabled by default for `table` and `dataset` parameters
  - New fuzzy search with `fuzzy=True` - typo-tolerant matching sorted by relevance
  - Configurable fuzzy threshold (0-100) to control match strictness
- New dependency: `rapidfuzz` for fuzzy string matching

#### `v0.4.3`
- Fixed minor bugs

#### `v0.4.0`
- **Highlights**
  - Support for Python 3.10-3.13 (was 3.11-3.13)
  - Drop support for Python 3.9 (breaking change)
- **Others**
  - Deprecate Walden.
  - Dependencies: Change `rdata` for `pyreadr`.
  - Support: indicator dimensions.
  - Support: MDIMs.
  - Switched from Poetry to UV package manager.
  - New decorator `@keep_metadata` to propagate metadata in pandas functions.
- Fixes: `Table.apply`, `groupby.apply`, metadata propagation, type hinting, etc.

#### `v0.3.11`
- Add support for Python 3.12 in `pypackage.toml`

#### `v0.3.10`
- Add experimental chart data API in `owid.catalog.charts`

#### `v0.3.9`
- Switch from isort & black & fake8 to ruff

#### `v0.3.8`
- Pin dataclasses-json==0.5.8 to fix error with python3.9

#### `v0.3.7`
- Fix bugs.
- Improve metadata propagation.
- Improve metadata YAML file handling, to have common definitions.
- Remove `DatasetMeta.origins`.

#### `v0.3.6`
- Fixed tons of bugs
- `processing.py` module with pandas-like functions that propagate metadata
- Support for Dynamic YAML files
- Support for R2 alongside S3

#### `v0.3.5`
- Remove `catalog.frames`; use `owid-repack` package instead
- Relax dependency constraints
- Add optional `channel` argument to `DatasetMeta`
- Stop supporting metadata in Parquet format, load JSON sidecar instead
- Fix errors when creating new Table columns

#### `v0.3.4`
- Bump `pyarrow` dependency to enable Python 3.11 support

#### `v0.3.3`
- Add more arguments to `Table.__init__` that are often used in ETL
- Add `Dataset.update_metadata` function for updating metadata from YAML file
- Python 3.11 support via update of `pyarrow` dependency

#### `v0.3.2`
- Fix a bug in `Catalog.__getitem__()`
- Replace `mypy` type checker by `pyright`

#### `v0.3.1`
- Sort imports with `isort`
- Change black line length to 120
- Add `grapher` channel
- Support path-based indexing into catalogs

#### `v0.3.0`
  - Update `OWID_CATALOG_VERSION` to 3
  - Support multiple formats per table
  - Support reading and writing `parquet` files with embedded metadata
  - Optional `repack` argument when adding tables to dataset
  - Underscore `|`
  - Get `version` field from `DatasetMeta` init
  - Resolve collisions of `underscore_table` function
  - Convert `version` to `str` and load json `dimensions`

#### `v0.2.9`
- Allow multiple channels in `catalog.find` function

#### `v0.2.8`
- Update `OWID_CATALOG_VERSION` to 2

#### `v0.2.7`
- Split datasets into channels (`garden`, `meadow`, `open_numbers`, ...) and make garden default one
- Add `.find_latest` method to Catalog

#### `v0.2.6`
- Add flag `is_public` for public/private datasets
- Enforce snake_case for table, dataset and variable short names
- Add fields `published_by` and `published_at` to Source
    - Added a list of supported and unsupported operations on columns
    - Updated `pyarrow`

#### `v0.2.5`
- Fix ability to load remote CSV tables

#### `v0.2.4`
- Update the default catalog URL to use a CDN

#### `v0.2.3`
- Fix methods for finding and loading data from a `LocalCatalog`

#### `v0.2.2`
- Repack frames to compact dtypes on `Table.to_feather()`

#### `v0.2.1`
- Fix key typo used in version check

#### `v0.2.0`
- Copy dataset metadata into tables, to make tables more traceable
- Add API versioning, and a requirement to update if your version of this library is too old

#### `v0.1.1`
- Add support for Python 3.8

#### `v0.1.0`

- Initial release, including searching and fetching data from a remote catalog

</details>
