Metadata-Version: 2.4
Name: annotationpro-format
Version: 0.1.0
Summary: Read and write Annotation PRO .antx files (XML format)
Author: Annotation PRO
License-Expression: MIT
Project-URL: Repository, https://github.com/annotationpro/annotationpro-format-python
Keywords: annotation,antx,annotation-pro,speech,transcription
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Dynamic: license-file

# annotationpro-format

Python library for reading and writing **Annotation PRO** `.antx` files (XML annotation format used by the Annotation PRO application).

## About ANTX files

ANTX files are XML documents that store speech or audio annotations: layers, segments (with labels and time bounds), and project configuration. They are produced and consumed by [Annotation PRO](https://annotationpro.org/).

**Time in ANTX is stored in samples**, not in seconds. Each annotation file has a **samplerate** (e.g. 44100 Hz) in its configuration. Segment `start` and `duration` in the file are sample counts. To work in seconds you can use the samplerate from the annotation and the helper functions provided by this library (see below).

## Requirements

- Python 3.9+

## Installation

From PyPI:

```bash
pip install annotationpro-format
```

From source (development):

```bash
pip install -e .
```

## Usage

### Reading an ANTX file

```python
from annotationpro_format import deserialize_annotation

with open("project.antx", "r", encoding="utf-8") as f:
    xml_content = f.read()

annotation = deserialize_annotation(xml_content)
samplerate = int(annotation.configuration["Samplerate"])

for layer in annotation.layers:
    print(f"Layer: {layer.name}")
    for seg in layer.segments:
        print(f"  {seg.start} – {seg.duration} (samples) – {seg.label}")
```

### Writing an ANTX file (using samples)

Segment `start` and `duration` are stored in samples. You can set them directly when you work in sample counts:

```python
from annotationpro_format import Annotation, Layer, Segment, serialize_annotation

annotation = Annotation(samplerate=44100)
layer = Layer("Transcription")
# start=0, duration=66150 means 0 to 1.5 seconds at 44100 Hz
layer.segments.append(Segment("hello", start=0.0, duration=66150.0))
annotation.layers.append(layer)

xml_str = serialize_annotation(annotation)
with open("output.antx", "w", encoding="utf-8") as f:
    f.write(xml_str)
```

### Working with time in seconds (helpers)

To avoid manual sample math, use the seconds helpers. They use the annotation’s samplerate to convert between seconds and samples.

**Writing segments with start/duration in seconds:**

```python
from annotationpro_format import (
    Annotation,
    Layer,
    Segment,
    serialize_annotation,
    set_segment_start_seconds,
    set_segment_duration_seconds,
)

annotation = Annotation(samplerate=44100)
layer = Layer("Transcription")

# Create segment (start/duration in samples will be set by helpers)
seg = Segment("hello", start=0.0, duration=0.0)
set_segment_start_seconds(annotation, seg, 0.0)   # 0 seconds
set_segment_duration_seconds(annotation, seg, 1.5)  # 1.5 seconds
layer.segments.append(seg)

# Another segment: from 2.5 s to 4.0 s (duration 1.5 s)
seg2 = Segment("world", start=0.0, duration=0.0)
set_segment_start_seconds(annotation, seg2, 2.5)
set_segment_duration_seconds(annotation, seg2, 1.5)
layer.segments.append(seg2)

annotation.layers.append(layer)
with open("output.antx", "w", encoding="utf-8") as f:
    f.write(serialize_annotation(annotation))
```

**Reading segment times in seconds:**

```python
from annotationpro_format import (
    deserialize_annotation,
    get_segment_start_seconds,
    get_segment_duration_seconds,
)

with open("project.antx", "r", encoding="utf-8") as f:
    annotation = deserialize_annotation(f.read())

for layer in annotation.layers:
    for seg in layer.segments:
        start_sec = get_segment_start_seconds(annotation, seg)
        duration_sec = get_segment_duration_seconds(annotation, seg)
        print(f"{start_sec:.2f}s – {start_sec + duration_sec:.2f}s: {seg.label}")
```

**Low-level conversion (seconds ↔ samples):**

```python
from annotationpro_format import seconds_to_samples, samples_to_seconds

samplerate = 44100
samples = seconds_to_samples(1.5, samplerate)  # 66150.0
seconds = samples_to_seconds(66150, samplerate)  # 1.5
```

## Model overview

- **Annotation** – Root object: `configuration` (dict, includes `Samplerate`), `layers` (list).
- **Layer** – One annotation layer: `name`, `id`, `segments` (list-like `SegmentCollection`), plus display options (colors, height, etc.).
- **Segment** – One segment: `label`, `start`, `duration` (in **samples**), and optional fields (`feature`, `language`, `group`, etc.).
- **SegmentCollection** – List of segments for a layer; setting `id_layer` on the collection is handled when you append segments.
- **AudioFile** – Model for audio file references (not used in the current serialization).

## Publishing to PyPI (for maintainers)

Publishing the library to PyPI is **fully automated** via GitHub Actions.

- A release is published when a **tag** matching `vX.Y.Z` is **pushed to GitHub**, e.g.:

  ```bash
  git tag v0.1.1
  git push origin v0.1.1
  ```

- Before creating the tag, **bump the version** in `pyproject.toml`:

  ```toml
  [project]
  version = "0.1.1"
  ```

- The **tag name** (e.g. `v0.1.1`) and the version in `pyproject.toml` (e.g. `0.1.1`) **must match**.

When the tag is pushed:

1. The workflow in `.github/workflows/publish-pypi.yml` runs.
2. It checks out the code at that tag.
3. It builds the package with `python -m build` (sdist + wheel).
4. It uploads the artifacts in `dist/` to PyPI using `twine upload dist/*` and the `PYPI_API_TOKEN` secret configured in the repository.

If the workflow succeeds, the new version is immediately available on PyPI as `annotationpro-format`.

## License

MIT – see [LICENSE](LICENSE).
