Metadata-Version: 2.4
Name: cjm-forced-alignment-adapter-interface
Version: 0.0.2
Summary: Typed forced-alignment task-adapter interface — ForcedAlignmentAdapter ABC, ForcedAlignResult/ForcedAlignItem wire DTOs, and forced-alignment persistence helpers.
Author-email: "Christian J. Mills" <9126128+cj-mills@users.noreply.github.com>
License: Apache-2.0
Project-URL: Repository, https://github.com/cj-mills/cjm-forced-alignment-adapter-interface
Project-URL: Documentation, https://cj-mills.github.io/cjm-forced-alignment-adapter-interface/
Keywords: nbdev
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cjm_plugin_system>=0.0.40
Dynamic: license-file

# cjm-forced-alignment-adapter-interface


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install latest from the GitHub \[repository\]\[repo\]:

``` bash
pip install cjm_forced_alignment_adapter_interface
```

## Project Structure

    nbs/
    ├── adapter.ipynb # The typed forced-alignment task contract — `ForcedAlignmentAdapter` ABC +
    ├── core.ipynb    # Standardized word-level forced-alignment DTOs — wire-registered so
    └── storage.ipynb # Standardized SQLite storage for forced alignment results with content hashing

Total: 3 notebooks

## Module Dependencies

``` mermaid
graph LR
    adapter["adapter<br/>Forced Alignment Adapter"]
    core["core<br/>Core Data Structures"]
    storage["storage<br/>Forced Alignment Storage"]

    adapter --> core
```

*1 cross-module dependencies detected*

## CLI Reference

No CLI commands found in this project.

## Module Overview

Detailed documentation for each module in the project:

### Forced Alignment Adapter (`adapter.ipynb`)

> The typed forced-alignment task contract — `ForcedAlignmentAdapter`
> ABC +

#### Import

``` python
from cjm_forced_alignment_adapter_interface.adapter import (
    ForcedAlignmentToolProtocol,
    ForcedAlignmentAdapter
)
```

#### Classes

``` python
@runtime_checkable
class ForcedAlignmentToolProtocol(Protocol):
    """
    PROVISIONAL structural contract for forced-alignment-capable tools.
    
    Mirrors the fused-era surface (task-shaped `execute`); re-derived from
    native tool surfaces when the Option C cascade splits tools (stage 8).
    """
    
    def execute(self, audio: Union[str, Path], text: str, **kwargs) -> Any: ...
```

``` python
class ForcedAlignmentAdapter(TaskAdapter):
    """
    Typed forced-alignment task adapter: model-ready audio + transcript
    text in, `ForcedAlignResult` (word-level timings) out.
    
    Input contract (carried over from the fused-era ForcedAlignmentPlugin):
    the caller guarantees MODEL-READY audio; `text` is the transcript to
    align against the audio.
    
    Persistence sits BESIDE the task method: the storage module's
    `ForcedAlignmentStorage` is the cache / persist seam — note the
    FOUR-key cache identity (`audio_path`, `audio_hash`, `text_hash`,
    `config_hash`): the task input is the (audio, transcript) PAIR, so the
    text hash is part of the cache key (unlike transcription's three-key).
    
    The result DTO is wire-registered ("forced_alignment.result"): items
    arrive at the host as typed `ForcedAlignItem`s, not dicts.
    """
    
    def align(
            self,
            audio: Union[str, Path],  # Path to MODEL-READY audio (converted upstream)
            text: str,                # Transcript text to align against the audio
            **kwargs,                 # Adapter-specific options (language, ...)
        ) -> ForcedAlignResult:       # Word-level alignment output
        "Align transcript text to model-ready audio at word level."
```

### Core Data Structures (`core.ipynb`)

> Standardized word-level forced-alignment DTOs — wire-registered so

#### Import

``` python
from cjm_forced_alignment_adapter_interface.core import (
    ForcedAlignItem,
    ForcedAlignResult
)
```

#### Classes

``` python
@dataclass
class ForcedAlignItem:
    "A single word-level alignment result."
    
    text: str  # The aligned word (punctuation typically stripped by model)
    start_time: float  # Start time in seconds
    end_time: float  # End time in seconds
```

``` python
@dataclass
class ForcedAlignResult:
    "Standardized output for all forced alignment plugins."
    
    items: List[ForcedAlignItem]  # Word-level alignments
    metadata: Dict[str, Any] = field(...)  # Plugin-specific metadata
    
    def from_dict(
        "Reconstruct from a wire payload, re-typing nested items.

Nested-DTO reconstruction is why this class defines its own
`from_dict` instead of relying on @wire_type's flat default
(which would leave `items` as plain dicts)."
```

### Forced Alignment Storage (`storage.ipynb`)

> Standardized SQLite storage for forced alignment results with content
> hashing

#### Import

``` python
from cjm_forced_alignment_adapter_interface.storage import (
    ForcedAlignmentRow,
    ForcedAlignmentStorage
)
```

#### Classes

``` python
@dataclass
class ForcedAlignmentRow:
    "A single row from the forced_alignments table."
    
    job_id: str  # Unique job identifier
    audio_path: str  # Path to the source audio file
    audio_hash: str  # Hash of source audio in "algo:hexdigest" format
    text: str  # Input transcript text that was aligned
    text_hash: str  # Hash of input text in "algo:hexdigest" format
    config_hash: str  # Hash of the effective alignment config used
    items: Optional[List[Dict[str, Any]]]  # Serialized ForcedAlignItems
    metadata: Optional[Dict[str, Any]]  # Plugin metadata
    created_at: Optional[float]  # Unix timestamp
```

``` python
class ForcedAlignmentStorage:
    def __init__(
        self,
        db_path: str  # Absolute path to the SQLite database file
    )
    "Standardized SQLite storage for forced alignment results."
    
    def __init__(
            self,
            db_path: str  # Absolute path to the SQLite database file
        )
        "Initialize storage, create table, run migrations, and build indexes."
    
    def save(
            self,
            job_id: str,        # Unique job identifier
            audio_path: str,    # Path to the source audio file
            audio_hash: str,    # Hash of source audio in "algo:hexdigest" format
            text: str,          # Input transcript text
            text_hash: str,     # Hash of input text in "algo:hexdigest" format
            config_hash: str,   # Hash of the effective alignment config
            items: Optional[List[Dict[str, Any]]] = None,  # Serialized ForcedAlignItems
            metadata: Optional[Dict[str, Any]] = None       # Plugin metadata
        ) -> None
        "Save or replace a forced alignment result (upsert by audio_path + text_hash + config_hash)."
    
    def save_with_logging(
            self,
            *,
            job_id: str,        # Unique job identifier
            audio_path: str,    # Path to the source audio file
            audio_hash: str,    # Hash of source audio in "algo:hexdigest" format
            text: str,          # Input transcript text
            text_hash: str,     # Hash of input text in "algo:hexdigest" format
            config_hash: str,   # Hash of the effective alignment config
            items: Optional[List[Dict[str, Any]]] = None,  # Serialized ForcedAlignItems
            metadata: Optional[Dict[str, Any]] = None,      # Plugin metadata
            logger: Optional[logging.Logger] = None         # Optional logger for success/failure messages
        ) -> bool:  # True if saved; False if the save failed (error logged, not raised)
        "Save a result, logging success/failure. Failures are logged and swallowed (returns False).

Centralizes the try/save/log/except block every forced-alignment plugin reimplements.
Returns True on success so callers can gate post-save side effects on the result."
    
    def get_cached(
            self,
            audio_path: str,   # Path to the source audio file
            audio_hash: str,   # Content hash of the audio (cache miss if the file changed)
            text_hash: str,    # Hash of the input transcript text (part of the cache key)
            config_hash: str   # Hash of the effective alignment config
        ) -> Optional[ForcedAlignmentRow]:  # Cached row or None
        "Retrieve a content-correct cached alignment for an (audio, transcript) pair.

Matches on audio_path + audio_hash + text_hash + config_hash. A changed audio
file (new audio_hash) misses even if a stale row exists at the same
(audio_path, text_hash, config_hash) — the next save() replaces it."
    
    def get_by_job_id(
            self,
            job_id: str  # Job identifier to look up
        ) -> Optional[ForcedAlignmentRow]:  # Row or None if not found
        "Retrieve a forced alignment result by job ID."
    
    def list_jobs(
            self,
            limit: int = 100  # Maximum number of rows to return
        ) -> List[ForcedAlignmentRow]:  # List of forced alignment rows
        "List forced alignment jobs ordered by creation time (newest first)."
    
    def verify_audio(
            self,
            job_id: str  # Job identifier to verify
        ) -> Optional[bool]:  # True if audio matches, False if tampered, None if job not found
        "Verify the source audio file still matches its stored hash."
    
    def verify_text(
            self,
            job_id: str  # Job identifier to verify
        ) -> Optional[bool]:  # True if text matches, False if tampered, None if job not found
        "Verify the input text still matches its stored hash."
```
