Metadata-Version: 2.4
Name: lairs
Version: 0.3.0
Summary: A read/write dataset client for the Layers format, built on didactic.
Project-URL: Homepage, https://github.com/layers-pub/lairs
Project-URL: Documentation, https://layers.pub/lairs/
Project-URL: Repository, https://github.com/layers-pub/lairs
Project-URL: Issues, https://github.com/layers-pub/lairs/issues
Project-URL: Changelog, https://github.com/layers-pub/lairs/blob/main/CHANGELOG.md
Author-email: Aaron Steven White <aaronstevenwhite@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: annotation,atproto,bluesky,corpus,datasets,didactic,layers,lexicon,linguistics,nlp
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Typing :: Typed
Requires-Python: >=3.14
Requires-Dist: didactic>=0.9.0
Requires-Dist: duckdb
Requires-Dist: httpx
Requires-Dist: libipld
Requires-Dist: multiformats
Requires-Dist: panproto>=0.56.0
Requires-Dist: pyarrow
Requires-Dist: textual>=8
Requires-Dist: websockets>=13
Provides-Extra: amr
Requires-Dist: penman; extra == 'amr'
Provides-Extra: appview
Requires-Dist: httpx; extra == 'appview'
Provides-Extra: audio
Requires-Dist: librosa; extra == 'audio'
Requires-Dist: soundfile; extra == 'audio'
Provides-Extra: bio
Requires-Dist: obonet; extra == 'bio'
Provides-Extra: brat
Requires-Dist: brat; extra == 'brat'
Provides-Extra: conllu
Requires-Dist: conllu; extra == 'conllu'
Provides-Extra: elan
Requires-Dist: pympi-ling; extra == 'elan'
Provides-Extra: fsspec
Requires-Dist: fsspec; extra == 'fsspec'
Provides-Extra: geo
Requires-Dist: geopandas; extra == 'geo'
Requires-Dist: shapely; extra == 'geo'
Provides-Extra: hf
Requires-Dist: datasets; extra == 'hf'
Requires-Dist: huggingface-hub; extra == 'hf'
Provides-Extra: labelstudio
Requires-Dist: label-studio-sdk; extra == 'labelstudio'
Provides-Extra: lexical
Requires-Dist: glazing>=0.2; extra == 'lexical'
Provides-Extra: ml
Requires-Dist: datasets; extra == 'ml'
Requires-Dist: fsspec; extra == 'ml'
Requires-Dist: huggingface-hub; extra == 'ml'
Requires-Dist: torch; extra == 'ml'
Provides-Extra: neural
Requires-Dist: mne; extra == 'neural'
Requires-Dist: xarray; extra == 'neural'
Provides-Extra: polars
Requires-Dist: polars; extra == 'polars'
Provides-Extra: reconciliation
Requires-Dist: httpx; extra == 'reconciliation'
Provides-Extra: scholarly
Requires-Dist: habanero; extra == 'scholarly'
Requires-Dist: pyalex; extra == 'scholarly'
Provides-Extra: spacy
Requires-Dist: spacy; extra == 'spacy'
Provides-Extra: textgrid
Requires-Dist: textgrid; extra == 'textgrid'
Provides-Extra: tf
Requires-Dist: tensorflow; (python_version < '3.14') and extra == 'tf'
Provides-Extra: torch
Requires-Dist: torch; extra == 'torch'
Provides-Extra: tracking
Requires-Dist: mlflow; extra == 'tracking'
Requires-Dist: wandb; extra == 'tracking'
Provides-Extra: video
Requires-Dist: av; extra == 'video'
Requires-Dist: decord; (python_version < '3.14') and extra == 'video'
Provides-Extra: webdataset
Requires-Dist: webdataset; extra == 'webdataset'
Provides-Extra: wikidata
Requires-Dist: qwikidata; extra == 'wikidata'
Requires-Dist: sparqlwrapper; extra == 'wikidata'
Description-Content-Type: text/markdown

<h1 align="center">lairs</h1>

<p align="center">
  <em>A read/write dataset client for the Layers format, built on didactic.</em>
</p>

<p align="center">
  <a href="https://github.com/layers-pub/lairs/actions/workflows/ci.yml"><img src="https://github.com/layers-pub/lairs/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://layers.pub/lairs/"><img src="https://img.shields.io/badge/docs-online-blue" alt="Docs"></a>
  <a href="https://pypi.org/project/lairs/"><img src="https://img.shields.io/pypi/v/lairs" alt="PyPI"></a>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.14%2B-blue" alt="Python 3.14+"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="License: MIT"></a>
</p>

<p align="center">
  <a href="https://layers.pub/lairs/tutorial/"><strong>Tutorial</strong></a>
  ·
  <a href="https://layers.pub/lairs/guide/"><strong>Guides</strong></a>
  ·
  <a href="https://layers.pub/lairs/concepts/"><strong>Concepts</strong></a>
  ·
  <a href="https://layers.pub/lairs/reference/"><strong>API</strong></a>
  ·
  <a href="https://layers.pub/lairs/development/"><strong>Development</strong></a>
</p>

---

`lairs` is a Python client for reading and writing data in the
[Layers](https://github.com/layers-pub) format. It downloads `pub.layers.*`
records from ATProto Personal Data Servers, validates them against models
generated from the Layers lexicons, holds them in memory or in a local
content-addressed store, and exposes them through a `datasets`-like API with
tooling for the modalities Layers carries: audio, video, and time-series
signals. On the write side it constructs records, uploads media blobs, and
publishes records in bulk to the authenticated user's own repository, with the
local store doubling as schema-aware version control.

The mental model: `datasets` and `git` for decentralised linguistic annotation.

`lairs` is built on [didactic](https://github.com/panproto/didactic), which is
built on [panproto](https://github.com/panproto/phrom). Every structured value
in `lairs` is a `didactic` model. The project never uses dataclasses, pydantic,
or ad-hoc classes for its data, and type hints never use `Any`.

The ATProto lexicons are the single source of truth. The `pub.layers.*` models
are not written by hand. They are generated from the vendored lexicons and
committed to the repository. Updating to a new Layers version is a re-vendor, a
regeneration, and a drift check (`lairs gen --check`).

## Installation

The core install carries no integration dependencies. Each integration is an
optional extra, discovered at runtime through entry points, so importing `lairs`
never imports an integration's dependency.

```bash
pip install lairs                 # core
pip install "lairs[hf]"           # HuggingFace datasets and Hub
pip install "lairs[torch]"        # PyTorch exporter
pip install "lairs[audio]"        # audio decoding
pip install "lairs[conllu]"       # the CoNLL-U codec
```

## Usage

```python
import lairs

corpus = lairs.load_corpus(
    "at://did:plc:abc/pub.layers.corpus.corpus/ud-en",
    source="pds",
)
print(len(corpus.expressions))
print(corpus.expressions[0].text)
```

The `lairs` command vendors lexicons, regenerates models, and pulls,
materialises, publishes, and inspects corpora:

```bash
lairs gen --check          # fail if the committed models drift from the lexicons
lairs pull did:plc:abc     # ingest an account's records into a local repository
lairs materialize <uri>    # build Arrow and Parquet views
lairs publish --repo ... --revision v0.1 --to did:plc:abc   # dry-run plan by default
```

## Documentation

The documentation follows the [Diátaxis](https://diataxis.fr/) structure: a
tutorial, task-oriented guides, conceptual explanation, and an API reference
rendered from the source docstrings. Build it locally with:

```bash
uv run --group docs mkdocs serve
```

## Development

```bash
uv sync
uv run ruff format --check lairs tests
uv run ruff check lairs tests
uv run ty check
uv run pytest                    # unit tests only
uv run pytest --run-integration  # include integration tests (docker, network, extras)
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for the full contribution guide and the
[Development](https://layers.pub/lairs/development/) section of the
documentation for testing, code generation, and the release process. All
participants are expected to follow the [Code of Conduct](CODE_OF_CONDUCT.md).

## Changelog

Notable changes are recorded in [CHANGELOG.md](CHANGELOG.md).

## License

`lairs` is released under the [MIT License](LICENSE).
