Metadata-Version: 2.4
Name: cjm-transcription-plugin-voxtral-vllm
Version: 0.0.24
Summary: Mistral Voxtral plugin for the cjm-transcription-plugin-system library - provides local speech-to-text transcription through vLLM with configurable model selection and parameter control.
Author-email: "Christian J. Mills" <9126128+cj-mills@users.noreply.github.com>
License: Apache-2.0
Project-URL: Repository, https://github.com/cj-mills/cjm-transcription-plugin-voxtral-vllm
Project-URL: Documentation, https://cj-mills.github.io/cjm-transcription-plugin-voxtral-vllm
Keywords: nbdev,jupyter,notebook,python
Classifier: Natural Language :: English
Classifier: Intended Audience :: Developers
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastcore
Requires-Dist: cjm_transcription_plugin_system>=0.0.24
Requires-Dist: torch
Requires-Dist: numpy
Requires-Dist: soundfile
Requires-Dist: transformers
Requires-Dist: accelerate
Requires-Dist: librosa
Requires-Dist: mistral-common[audio]
Requires-Dist: uv
Requires-Dist: vllm
Requires-Dist: nest-asyncio
Dynamic: license-file

# cjm-transcription-plugin-voxtral-vllm


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` bash
pip install cjm_transcription_plugin_voxtral_vllm
```

## Project Structure

    nbs/
    ├── meta.ipynb   # Metadata introspection for the Voxtral VLLM plugin used by cjm-ctl to generate the registration manifest.
    └── plugin.ipynb # Plugin implementation for Mistral Voxtral transcription through vLLM server

Total: 2 notebooks

## Module Dependencies

``` mermaid
graph LR
    meta["meta<br/>Metadata"]
    plugin["plugin<br/>Voxtral VLLM Plugin"]

    plugin --> meta
```

*1 cross-module dependencies detected*

## CLI Reference

No CLI commands found in this project.

## Module Overview

Detailed documentation for each module in the project:

### Metadata (`meta.ipynb`)

> Metadata introspection for the Voxtral VLLM plugin used by cjm-ctl to
> generate the registration manifest.

#### Import

``` python
from cjm_transcription_plugin_voxtral_vllm.meta import (
    get_plugin_metadata
)
```

#### Functions

``` python
def get_plugin_metadata() -> Dict[str, Any]: # Plugin metadata for manifest generation
    """Return metadata required to register this plugin with the PluginManager."""
    # Fallback base path (current behavior for backward compatibility)
    base_path = os.path.dirname(os.path.dirname(sys.executable))
    
    # Use CJM config if available, else fallback to env-relative paths
    cjm_data_dir = os.environ.get("CJM_DATA_DIR")
    cjm_models_dir = os.environ.get("CJM_MODELS_DIR")
    
    # Plugin data directory
    plugin_name = "cjm-transcription-plugin-voxtral-vllm"
    if cjm_data_dir
    "Return metadata required to register this plugin with the PluginManager."
```

### Voxtral VLLM Plugin (`plugin.ipynb`)

> Plugin implementation for Mistral Voxtral transcription through vLLM
> server

#### Import

``` python
from cjm_transcription_plugin_voxtral_vllm.plugin import (
    VLLMServer,
    VoxtralVLLMPluginConfig,
    VoxtralVLLMPlugin
)
```

#### Functions

``` python
@patch
def supports_streaming(
    self: VoxtralVLLMPlugin # The plugin instance
) -> bool: # True if streaming is supported
    "Check if this plugin supports streaming transcription."
```

``` python
@patch
def execute_stream(
    self: VoxtralVLLMPlugin, # The plugin instance
    audio: Union[str, Path], # Audio data or path to audio file
    **kwargs # Additional plugin-specific parameters
) -> Generator[str, None, TranscriptionResult]: # Yields text chunks, returns final result
    "Stream transcription results chunk by chunk."
```

#### Classes

``` python
class VLLMServer:
    def __init__(
        self,
        model: str = "mistralai/Voxtral-Mini-3B-2507", # Model name to serve
        port: int = 8000, # Port for the server
        host: str = "0.0.0.0", # Host address to bind to
        gpu_memory_utilization: float = 0.85, # Fraction of GPU memory to use
        log_level: str = "INFO", # Logging level (DEBUG, INFO, WARNING, ERROR)
        capture_logs: bool = True, # Whether to capture and display server logs
        **kwargs # Additional vLLM server arguments
    )
    "vLLM server manager for Voxtral models."
    
    def __init__(
            self,
            model: str = "mistralai/Voxtral-Mini-3B-2507", # Model name to serve
            port: int = 8000, # Port for the server
            host: str = "0.0.0.0", # Host address to bind to
            gpu_memory_utilization: float = 0.85, # Fraction of GPU memory to use
            log_level: str = "INFO", # Logging level (DEBUG, INFO, WARNING, ERROR)
            capture_logs: bool = True, # Whether to capture and display server logs
            **kwargs # Additional vLLM server arguments
        )
    
    def add_log_callback(
            self, 
            callback: Callable[[str], None] # Function that receives log line strings
        ) -> None: # Returns nothing
        "Add a callback function to receive each log line."
    
    def start(
        "Start the vLLM server.

Session A 2026-05-27: dropped the wall-clock `timeout` argument — startup
time is unbounded (model download + CUDA graph capture), and any operator-
set value would either race a slow network or be conservatively huge. The
substrate's proxy.prefetch now drives stall detection via the
report_progress callback (no progress update for SubstrateConfig.
prefetch_stall_threshold_seconds → substrate SIGTERMs the worker). When
report_progress is None we still loop forever; only the OS / operator can
abort."
    
    def stop(self) -> None: # Returns nothing
            """Stop the vLLM server."""
            if self.process and self.process.poll() is None
        "Stop the vLLM server."
    
    def restart(self) -> None: # Returns nothing
            """Restart the server."""
            self.stop()
            time.sleep(2)
            self.start()
        
        def is_running(self) -> bool: # True if server is running and responsive
        "Restart the server."
    
    def is_running(self) -> bool: # True if server is running and responsive
        "Check if server is running and responsive."
    
    def get_recent_logs(
            self, 
            n: int = 100 # Number of recent log lines to retrieve
        ) -> List[str]: # List of recent log lines
        "Get the most recent n log lines."
    
    def get_metrics_from_logs(self) -> dict: # Dictionary with performance metrics
            """Parse recent logs to extract performance metrics."""
            metrics = {
                "prompt_throughput": 0.0,
        "Parse recent logs to extract performance metrics."
    
    def tail_logs(
            self, 
            follow: bool = True, # Continue displaying new logs as they arrive
            n: int = 10 # Number of initial lines to display
        ) -> None: # Returns nothing
        "Tail the server logs (similar to tail -f)."
```

``` python
@dataclass
class VoxtralVLLMPluginConfig:
    "Configuration for Voxtral VLLM transcription plugin."
    
    model_id: str = field(...)
    device: str = field(...)
    server_mode: str = field(...)
    server_url: str = field(...)
    server_port: int = field(...)
    gpu_memory_utilization: float = field(...)
    max_model_len: int = field(...)
    language: Optional[str] = field(...)
    temperature: float = field(...)
    auto_start_server: bool = field(...)
    capture_server_logs: bool = field(...)
    dtype: str = field(...)
    tensor_parallel_size: int = field(...)
```

``` python
class VoxtralVLLMPlugin:
    def __init__(self):
        """Initialize the Voxtral VLLM plugin with default configuration."""
        self.logger = logging.getLogger(f"{__name__}.{type(self).__name__}")
        self.config: VoxtralVLLMPluginConfig = None
    "Mistral Voxtral transcription plugin via vLLM server."
    
    def __init__(self):
            """Initialize the Voxtral VLLM plugin with default configuration."""
            self.logger = logging.getLogger(f"{__name__}.{type(self).__name__}")
            self.config: VoxtralVLLMPluginConfig = None
        "Initialize the Voxtral VLLM plugin with default configuration."
    
    def name(self) -> str:  # The plugin name identifier
            """Get the plugin name identifier (single source of truth: meta.py)."""
            return get_plugin_metadata()["name"]
    
        @property
        def version(self) -> str:  # The plugin version string
        "Get the plugin name identifier (single source of truth: meta.py)."
    
    def version(self) -> str:  # The plugin version string
            """Get the plugin version string (single source of truth: meta.py / __version__)."""
            return get_plugin_metadata()["version"]
        
        @property
        def supported_formats(self) -> List[str]: # List of supported audio formats
        "Get the plugin version string (single source of truth: meta.py / __version__)."
    
    def supported_formats(self) -> List[str]: # List of supported audio formats
            """Get the list of supported audio file formats."""
            return ["wav", "mp3", "flac", "m4a", "ogg", "webm", "mp4", "avi", "mov"]
        
        def get_current_config(self) -> Dict[str, Any]: # Current configuration as dictionary
        "Get the list of supported audio file formats."
    
    def get_current_config(self) -> Dict[str, Any]: # Current configuration as dictionary
            """Return current configuration state."""
            if not self.config
        "Return current configuration state."
    
    def get_config_schema(self) -> Dict[str, Any]: # JSON Schema for configuration
            """Return JSON Schema for UI generation."""
            return dataclass_to_jsonschema(VoxtralVLLMPluginConfig)
    
        @staticmethod
        def get_config_dataclass() -> VoxtralVLLMPluginConfig: # Configuration dataclass
        "Return JSON Schema for UI generation."
    
    def get_config_dataclass() -> VoxtralVLLMPluginConfig: # Configuration dataclass
            """Return dataclass describing the plugin's configuration options."""
            return VoxtralVLLMPluginConfig
        
        def _apply_config(
            self,
            config: Optional[Any] = None  # Configuration dataclass, dict, or None
        ) -> None
        "Return dataclass describing the plugin's configuration options."
    
    def initialize(
            self,
            config: Optional[Any] = None  # Configuration dataclass, dict, or None
        ) -> None
        "First-time setup. CR-4: the manual server-restart diff-checks are
replaced by declarative RELOAD_TRIGGER metadata; the substrate's
reconfigure path fires `_release_vllm_server` then re-applies config
via `_apply_config`."
    
    def prefetch(self) -> None:
            """CR-4 (SG-19): eagerly spawn the managed vLLM server so the first
            execute() doesn't pay the startup cost (model load, CUDA graph capture,
            weight download for cold caches). No-op in external mode (caller
            manages the server). Idempotent via `_ensure_server_running`'s
            is_running() check.
    
            Session A 2026-05-27: passes `self.report_progress` (inherited from
            PluginInterface) through to VLLMServer so substrate.proxy.prefetch's
            stall detection sees progress events on every vLLM log line. Replaces
            the prior wall-clock `server_startup_timeout` config field — operators
            no longer race network speeds against an arbitrary timeout value.
            """
            if self.config and self.config.server_mode == "managed"
        "CR-4 (SG-19): eagerly spawn the managed vLLM server so the first
execute() doesn't pay the startup cost (model load, CUDA graph capture,
weight download for cold caches). No-op in external mode (caller
manages the server). Idempotent via `_ensure_server_running`'s
is_running() check.

Session A 2026-05-27: passes `self.report_progress` (inherited from
PluginInterface) through to VLLMServer so substrate.proxy.prefetch's
stall detection sees progress events on every vLLM log line. Replaces
the prior wall-clock `server_startup_timeout` config field — operators
no longer race network speeds against an arbitrary timeout value."
    
    def on_disable(self) -> None:
            """CR-2: release the vLLM server subprocess when the operator disables
            the plugin while keeping the worker alive. Re-enable + next execute
            lazy-respawns via `_ensure_server_running`."""
            self._release_vllm_server()
        
        def _ensure_server_running(
            self,
            report_progress: Optional[Callable[[float, str], None]] = None,  # Session A: substrate-driven stall detection callback
        ) -> None
        "CR-2: release the vLLM server subprocess when the operator disables
the plugin while keeping the worker alive. Re-enable + next execute
lazy-respawns via `_ensure_server_running`."
    
    def execute(
            self,
            audio: Union[str, Path], # Audio data or path to audio file to transcribe
            **kwargs # Additional arguments to override config
        ) -> TranscriptionResult: # Transcription result with text and metadata
        "Transcribe audio using Voxtral via vLLM."
    
    def is_available(self) -> bool: # True if vLLM and dependencies are available
            """Check if vLLM and required dependencies are available."""
            if not OPENAI_AVAILABLE
        "Check if vLLM and required dependencies are available."
    
    def cleanup(self) -> None
        "Release resources on unload. CR-4: delegates to `_release_vllm_server`
so both the worker-unload path AND the operator-disable / reconfigure
paths converge on one release implementation."
```
