Metadata-Version: 2.4
Name: cjm-vad-adapter-interface
Version: 0.0.4
Summary: Typed voice-activity-detection task-adapter interface — VADAdapter ABC + GenericVADAdapter (cache/persist bookends around a pure-compute tool), the VADToolProtocol, and VAD persistence helpers. The VADResult data noun lives in cjm-capability-primitives.
Author-email: "Christian J. Mills" <9126128+cj-mills@users.noreply.github.com>
License: Apache-2.0
Project-URL: Repository, https://github.com/cj-mills/cjm-vad-adapter-interface
Project-URL: Documentation, https://cj-mills.github.io/cjm-vad-adapter-interface/
Keywords: nbdev
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cjm_plugin_system>=0.0.46
Requires-Dist: cjm_capability_primitives>=0.0.5
Dynamic: license-file

# cjm-vad-adapter-interface


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` bash
pip install cjm_vad_adapter_interface
```

## Project Structure

    nbs/
    ├── adapter.ipynb # The typed voice-activity-detection task contract — the `VADAdapter` ABC + the `VADToolProtocol` structural contract (capability-unit Option C, pass-2 Thread 3).
    ├── generic.ipynb # The generic (tool-agnostic) VAD adapter — cache-check, invoke the bound tool's pure-compute `detect_speech`, persist. Reused across every tool capability satisfying `VADToolProtocol`, exactly as `GenericTranscriptionAdapter` is reused across transcribers.
    └── storage.ipynb # Standardized SQLite storage for voice-activity-detection results with content hashing.

Total: 3 notebooks

## Module Dependencies

``` mermaid
graph LR
    adapter["adapter<br/>VAD Adapter"]
    generic["generic<br/>Generic VAD Adapter"]
    storage["storage<br/>VAD Storage"]

    generic --> adapter
    generic --> storage
```

*2 cross-module dependencies detected*

## CLI Reference

No CLI commands found in this project.

## Module Overview

Detailed documentation for each module in the project:

### VAD Adapter (`adapter.ipynb`)

> The typed voice-activity-detection task contract — the `VADAdapter`
> ABC + the `VADToolProtocol` structural contract (capability-unit
> Option C, pass-2 Thread 3).

#### Import

``` python
from cjm_vad_adapter_interface.adapter import (
    VADToolProtocol,
    VADAdapter
)
```

#### Classes

``` python
@runtime_checkable
class VADToolProtocol(Protocol):
    """
    Structural contract for voice-activity-detection tool capabilities
    (born-final at stage 8 — derived from the native tool surface).
    
    Pure compute: `detect_speech` reads the model-ready audio + runs inference +
    builds the typed result. `get_current_config` supplies the effective config
    the generic adapter hashes for its cache key. Persistence is NOT here — the
    adapter owns it (the native-surface seam).
    """
    
    def detect_speech(self, audio: Union[str, Path], **kwargs) -> VADResult: ...
        def get_current_config(self) -> Dict[str, Any]: ...
    
    def get_current_config(self) -> Dict[str, Any]: ...
```

``` python
class VADAdapter:
    def __init__(
        self,
        tool: VADToolProtocol,  # The bound tool capability instance (worker-side binding)
    )
    """
    Typed voice-activity-detection task adapter: model-ready audio in,
    `VADResult` out.
    
    Input contract: the caller guarantees MODEL-READY audio — format /
    sample-rate / channel handling happens upstream (ffmpeg `convert`), never
    in the adapter or tool.
    
    Native-surface model (stage 8 / PILLAR 1c): the TOOL is pure compute; the
    ADAPTER owns the cache + persistence bookends (see `GenericVADAdapter`) +
    the per-call `force` control. Storage resolves from the substrate-injected
    `PLUGIN_DATA_DIR`; `db_path` is not on the tool protocol.
    
    Implementations run in-worker beside their tool capability and are
    constructed with the bound tool instance: `AdapterClass(tool)` (mirrors
    `GraphStorageAdapter`). The result DTO is wire-registered ("vad.result"):
    returned values cross the worker boundary typed.
    """
    
    def __init__(
            self,
            tool: VADToolProtocol,  # The bound tool capability instance (worker-side binding)
        )
    
    def detect_speech(
            self,
            audio: Union[str, Path],  # Path to MODEL-READY audio (converted upstream)
            **kwargs,                 # Provenance + tool options
        ) -> VADResult:               # Typed VAD output
        "Detect speech segments in model-ready audio."
```

### Generic VAD Adapter (`generic.ipynb`)

> The generic (tool-agnostic) VAD adapter — cache-check, invoke the
> bound tool’s pure-compute `detect_speech`, persist. Reused across
> every tool capability satisfying `VADToolProtocol`, exactly as
> `GenericTranscriptionAdapter` is reused across transcribers.

#### Import

``` python
from cjm_vad_adapter_interface.generic import (
    GenericVADAdapter
)
```

#### Classes

``` python
class GenericVADAdapter(VADAdapter):
    """
    Generic VAD adapter: cache-check -> pure-compute tool -> persist.
    
    Works against ANY tool satisfying `VADToolProtocol`. The bookends:
    
      1. cache check (file_path + file_hash + config_hash) BEFORE invoking the
         tool, so a hit never loads the model;
      2. the tool's pure-compute `detect_speech` on a miss / forced call;
      3. `save_with_logging` (upsert by file_path + config_hash).
    
    `config_hash` reuses `hash_dict_canonical(get_current_config())` (the SAME
    canonical hash the fused-era plugin used). `force` rides
    `CallEnvelope.control` (not a task kwarg, keeping `detect_speech(audio)`
    pure). Storage lives at `<PLUGIN_DATA_DIR>/vad.db`; the substrate injects
    the per-capability `PLUGIN_DATA_DIR` at spawn, so the adapter neither
    hard-codes a path nor asks the tool for one.
    """
    
    def detect_speech(
            self,
            audio: Union[str, Path],  # Path to MODEL-READY audio (converted upstream)
            **kwargs,                 # Provenance + tool options
        ) -> VADResult:               # Typed VAD output
        "Cache-check, invoke the bound tool's pure-compute `detect_speech`, persist."
```

### VAD Storage (`storage.ipynb`)

> Standardized SQLite storage for voice-activity-detection results with
> content hashing.

#### Import

``` python
from cjm_vad_adapter_interface.storage import (
    VADRow,
    VADStorage
)
```

#### Classes

``` python
@dataclass
class VADRow:
    "A single row from the vad_results table."
    
    file_path: str  # Path to the analyzed (model-ready) audio file
    file_hash: str  # Hash of the analyzed file in "algo:hexdigest" format
    config_hash: str  # Hash of the VAD detection config used
    ranges: Optional[List[Dict[str, Any]]]  # Detected speech segments (serialized TimeRanges)
    metadata: Optional[Dict[str, Any]]  # VAD metadata
    created_at: Optional[float]  # Unix timestamp
```

``` python
class VADStorage:
    def __init__(
        self,
        db_path: str  # Absolute path to the SQLite database file
    )
    "Standardized SQLite storage for voice-activity-detection results."
    
    def __init__(
            self,
            db_path: str  # Absolute path to the SQLite database file
        )
        "Initialize storage and create table if needed."
    
    def save(
            self,
            file_path: str,     # Path to the analyzed audio file
            file_hash: str,     # Hash of the analyzed file in "algo:hexdigest" format
            config_hash: str,   # Hash of the VAD detection config
            ranges: Optional[List[Dict[str, Any]]] = None,  # Detected speech segments
            metadata: Optional[Dict[str, Any]] = None        # VAD metadata
        ) -> None
        "Save or replace a VAD result (upsert by file_path + config_hash)."
    
    def save_with_logging(
            self,
            *,
            file_path: str,     # Path to the analyzed audio file
            file_hash: str,     # Hash of the analyzed file in "algo:hexdigest" format
            config_hash: str,   # Hash of the VAD detection config
            ranges: Optional[List[Dict[str, Any]]] = None,  # Detected speech segments
            metadata: Optional[Dict[str, Any]] = None,       # VAD metadata
            logger: Optional[logging.Logger] = None           # Optional logger for success/failure messages
        ) -> bool:  # True if saved; False if the save failed (error logged, not raised)
        "Save a result, logging success/failure. Failures are logged and swallowed (returns False).

CR-14 follow-up: records a RESULT_SAVED account either way (ok flag +
file/config references — the journal never carries content) so saves AND
swallowed save-failures become auditable journal rows."
    
    def get_cached(
            self,
            file_path: str,   # Path to the audio file
            file_hash: str,   # Content hash of the file (cache miss if the file changed)
            config_hash: str  # Config hash to match
        ) -> Optional[VADRow]:  # Cached row or None
        "Retrieve a content-correct cached VAD result.

Matches on file_path + file_hash + config_hash, so a changed file (new
file_hash) misses the cache even though a stale row may still exist at the
same (file_path, config_hash) — the next save() replaces it.

CR-14 follow-up: a hit records a CACHE_HIT account (the cache-serving
decision is an account-of-action)."
    
    def list_jobs(
            self,
            limit: int = 100  # Maximum number of rows to return
        ) -> List[VADRow]:  # List of VAD rows
        "List VAD results ordered by creation time (newest first)."
    
    def verify_file(
            self,
            file_path: str,   # Path to the audio file
            config_hash: str  # Config hash to look up
        ) -> Optional[bool]:  # True if file matches, False if changed, None if not found
        "Verify the analyzed file still matches the hash stored for (file_path, config_hash)."
```
