Metadata-Version: 2.4
Name: cjm-transcription-adapter-interface
Version: 0.0.2
Summary: Typed transcription task-adapter interface — TranscriptionAdapter ABC, TranscriptionResult wire DTO, and transcription persistence helpers.
Author-email: "Christian J. Mills" <9126128+cj-mills@users.noreply.github.com>
License: Apache-2.0
Project-URL: Repository, https://github.com/cj-mills/cjm-transcription-adapter-interface
Project-URL: Documentation, https://cj-mills.github.io/cjm-transcription-adapter-interface/
Keywords: nbdev
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cjm_plugin_system>=0.0.40
Dynamic: license-file

# cjm-transcription-adapter-interface


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` bash
pip install cjm_transcription_adapter_interface
```

## Project Structure

    nbs/
    ├── adapter.ipynb # The typed transcription task contract — `TranscriptionAdapter` ABC +
    ├── core.ipynb    # Standardized result DTO for the transcription task — wire-registered
    └── storage.ipynb # Standardized SQLite storage for transcription results with content hashing

Total: 3 notebooks

## Module Dependencies

``` mermaid
graph LR
    adapter["adapter<br/>Transcription Adapter"]
    core["core<br/>Core Data Structures"]
    storage["storage<br/>Transcription Storage"]

    adapter --> core
```

*1 cross-module dependencies detected*

## CLI Reference

No CLI commands found in this project.

## Module Overview

Detailed documentation for each module in the project:

### Transcription Adapter (`adapter.ipynb`)

> The typed transcription task contract — `TranscriptionAdapter` ABC +

#### Import

``` python
from cjm_transcription_adapter_interface.adapter import (
    TranscriptionToolProtocol,
    TranscriptionAdapter
)
```

#### Classes

``` python
@runtime_checkable
class TranscriptionToolProtocol(Protocol):
    """
    PROVISIONAL structural contract for transcription-capable tools.
    
    Mirrors the fused-era surface (task-shaped `execute`); re-derived from
    native tool surfaces when the Option C cascade splits tools (stage 8).
    """
    
    def execute(self, audio: Union[str, Path], **kwargs) -> Any: ...
```

``` python
class TranscriptionAdapter(TaskAdapter):
    """
    Typed transcription task adapter: model-ready audio in,
    `TranscriptionResult` out.
    
    Input contract (carried over from the fused-era TranscriptionPlugin):
    the caller guarantees MODEL-READY audio — format / sample-rate /
    channel handling happens upstream (ffmpeg `convert_for_model`), never
    in the adapter.
    
    Persistence sits BESIDE the task method (pass-2 Thread 3): the storage
    module's `TranscriptionStorage` provides the adapter-level cache /
    persist seam (`get_cached(audio_path, audio_hash, config_hash)` +
    `save_with_logging(...)`).
    
    The result DTO is wire-registered ("transcription.result"): returned
    values cross the worker boundary typed via the substrate's `core.wire`
    envelope.
    """
    
    def transcribe(
            self,
            audio: Union[str, Path],  # Path to MODEL-READY audio (converted upstream)
            **kwargs,                 # Adapter-specific options (language, task, ...)
        ) -> TranscriptionResult:     # Typed transcription output
        "Transcribe model-ready audio to text."
```

### Core Data Structures (`core.ipynb`)

> Standardized result DTO for the transcription task — wire-registered

#### Import

``` python
from cjm_transcription_adapter_interface.core import (
    TranscriptionResult
)
```

#### Classes

``` python
@dataclass
class TranscriptionResult:
    "Standardized output for all transcription plugins."
    
    text: str  # The transcribed text
    confidence: Optional[float]  # Overall confidence (0.0 to 1.0)
    segments: Optional[List[Dict[str, Any]]]  # Timestamped segments
    metadata: Dict[str, Any] = field(...)  # Additional metadata
```

### Transcription Storage (`storage.ipynb`)

> Standardized SQLite storage for transcription results with content
> hashing

#### Import

``` python
from cjm_transcription_adapter_interface.storage import (
    TranscriptionRow,
    TranscriptionStorage
)
```

#### Classes

``` python
@dataclass
class TranscriptionRow:
    "A single row from the transcriptions table."
    
    job_id: str  # Unique job identifier
    audio_path: str  # Path to the source audio file
    audio_hash: str  # Hash of source audio in "algo:hexdigest" format
    config_hash: str  # Hash of the effective transcription config used
    text: str  # Transcribed text output
    text_hash: str  # Hash of transcribed text in "algo:hexdigest" format
    segments: Optional[List[Dict[str, Any]]]  # Timestamped segments
    metadata: Optional[Dict[str, Any]]  # Plugin metadata
    created_at: Optional[float]  # Unix timestamp
```

``` python
class TranscriptionStorage:
    def __init__(
        self,
        db_path: str  # Absolute path to the SQLite database file
    )
    "Standardized SQLite storage for transcription results."
    
    def __init__(
            self,
            db_path: str  # Absolute path to the SQLite database file
        )
        "Initialize storage, create table, run migrations, and build indexes."
    
    def save(
            self,
            job_id: str,        # Unique job identifier
            audio_path: str,    # Path to the source audio file
            audio_hash: str,    # Hash of source audio in "algo:hexdigest" format
            config_hash: str,   # Hash of the effective transcription config
            text: str,          # Transcribed text output
            text_hash: str,     # Hash of transcribed text in "algo:hexdigest" format
            segments: Optional[List[Dict[str, Any]]] = None,  # Timestamped segments
            metadata: Optional[Dict[str, Any]] = None         # Plugin metadata
        ) -> None
        "Save or replace a transcription result (upsert by audio_path + config_hash)."
    
    def save_with_logging(
            self,
            *,
            job_id: str,        # Unique job identifier
            audio_path: str,    # Path to the source audio file
            audio_hash: str,    # Hash of source audio in "algo:hexdigest" format
            config_hash: str,   # Hash of the effective transcription config
            text: str,          # Transcribed text output
            text_hash: str,     # Hash of transcribed text in "algo:hexdigest" format
            segments: Optional[List[Dict[str, Any]]] = None,  # Timestamped segments
            metadata: Optional[Dict[str, Any]] = None,        # Plugin metadata
            logger: Optional[logging.Logger] = None           # Optional logger for success/failure messages
        ) -> bool:  # True if saved; False if the save failed (error logged, not raised)
        "Save a result, logging success/failure. Failures are logged and swallowed (returns False).

Centralizes the try/save/log/except block every transcription plugin reimplements.
Returns True on success so callers can gate post-save side effects on the result."
    
    def get_cached(
            self,
            audio_path: str,   # Path to the source audio file
            audio_hash: str,   # Content hash of the audio (cache miss if the file changed)
            config_hash: str   # Hash of the effective transcription config
        ) -> Optional[TranscriptionRow]:  # Cached row or None
        "Retrieve a content-correct cached transcription result.

Matches on audio_path + audio_hash + config_hash. A changed audio file
(new audio_hash) misses even if a stale row exists at the same
(audio_path, config_hash) — the next save() replaces it."
    
    def get_by_job_id(
            self,
            job_id: str  # Job identifier to look up
        ) -> Optional[TranscriptionRow]:  # Row or None if not found
        "Retrieve a transcription result by job ID."
    
    def list_jobs(
            self,
            limit: int = 100  # Maximum number of rows to return
        ) -> List[TranscriptionRow]:  # List of transcription rows
        "List transcription jobs ordered by creation time (newest first)."
    
    def verify_audio(
            self,
            job_id: str  # Job identifier to verify
        ) -> Optional[bool]:  # True if audio matches, False if tampered, None if job not found
        "Verify the source audio file still matches its stored hash."
    
    def verify_text(
            self,
            job_id: str  # Job identifier to verify
        ) -> Optional[bool]:  # True if text matches, False if tampered, None if job not found
        "Verify the transcription text still matches its stored hash."
```
