Metadata-Version: 2.4
Name: subvocal
Version: 1.0.0rc1
Summary: Hardware-agnostic middleware connecting sEMG silent-speech interfaces to LLM agents
Project-URL: Homepage, https://github.com/PranavKalkunte/subvocal
Project-URL: Documentation, https://github.com/PranavKalkunte/subvocal/tree/main/docs
Project-URL: Changelog, https://github.com/PranavKalkunte/subvocal/blob/main/CHANGELOG.md
Author-email: Pranav Kalkunte <pranav.kalkunte@utexas.edu>
License-Expression: MIT
License-File: LICENSE
Keywords: accessibility,agents,bci,llm,mcp,semg,silent-speech,subvocal
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Human Machine Interfaces
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: numpy>=1.20
Requires-Dist: platformdirs>=3.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: all
Requires-Dist: h5py>=3.0; extra == 'all'
Requires-Dist: joblib>=1.1; extra == 'all'
Requires-Dist: onnx>=1.14; extra == 'all'
Requires-Dist: prometheus-client>=0.14.0; extra == 'all'
Requires-Dist: pyttsx3>=2.90; extra == 'all'
Requires-Dist: scikit-learn>=1.0; extra == 'all'
Requires-Dist: scipy>=1.7; extra == 'all'
Requires-Dist: torch>=2.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: h5py>=3.0; extra == 'dev'
Requires-Dist: joblib>=1.1; extra == 'dev'
Requires-Dist: markdown>=3.5; extra == 'dev'
Requires-Dist: onnx>=1.14; extra == 'dev'
Requires-Dist: prometheus-client>=0.14.0; extra == 'dev'
Requires-Dist: pyright>=1.1.350; extra == 'dev'
Requires-Dist: pytest-cov>=4; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: pyttsx3>=2.90; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: scikit-learn>=1.0; extra == 'dev'
Requires-Dist: scipy>=1.7; extra == 'dev'
Requires-Dist: torch>=2.0; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Provides-Extra: export
Requires-Dist: onnx>=1.14; extra == 'export'
Provides-Extra: hardware
Requires-Dist: h5py>=3.0; extra == 'hardware'
Requires-Dist: scipy>=1.7; extra == 'hardware'
Provides-Extra: metrics
Requires-Dist: prometheus-client>=0.14.0; extra == 'metrics'
Provides-Extra: ml
Requires-Dist: joblib>=1.1; extra == 'ml'
Requires-Dist: scikit-learn>=1.0; extra == 'ml'
Requires-Dist: scipy>=1.7; extra == 'ml'
Requires-Dist: torch>=2.0; extra == 'ml'
Provides-Extra: tts
Requires-Dist: pyttsx3>=2.90; extra == 'tts'
Description-Content-Type: text/markdown

# Subvocal SDK: Physiological Silent Speech Interface Middleware

The **Subvocal SDK** is an open-source, hardware-agnostic middleware platform that connects surface electromyography (sEMG) interfaces to LLM-driven AI agents.

Rather than locking developers to a proprietary neckband or a closed whole-word vocabulary, the Subvocal SDK provides the software rails—signal conditioning, deep learning training skeletons, articulatory phonetic shorthand simulators, and context-aware decoders—to enable high-accuracy, low-latency, and open-vocabulary silent speech control.

---

## 🛠️ Installation

```bash
pip install subvocal
```

The base install is lightweight (pydantic + numpy) and covers the pipeline, hardware drivers, shorthand decoding, context, and the MCP server. Optional extras pull in heavier subsystems:

| Extra | Enables | Installs |
|-------|---------|----------|
| `subvocal[ml]` | Classifier training, inference, calibration (`subvocal.emg_core`) | scipy, scikit-learn, joblib, torch |
| `subvocal[hardware]` | Public-dataset drivers (Ninapro, PutEMG, CSL-HDEMG) | scipy, h5py |
| `subvocal[tts]` | Audio feedback outside macOS | pyttsx3 |
| `subvocal[export]` | ONNX model export | onnx |
| `subvocal[all]` | Everything above | — |

## 🚀 Quickstart

A complete pipeline—synthetic sEMG source through intent reconstruction to action execution—runs offline in a few lines:

```python
from subvocal import SubvocalPipeline
from subvocal.core.testing import MockActionExecutor, MockContextProvider, MockLLMProvider
from subvocal.hardware.drivers import SyntheticSignalGenerator
from subvocal.core.models import CommandToken
import time

hardware = SyntheticSignalGenerator(fs=1000.0, num_channels=8)

def classify(frame):
    """Replace with subvocal.emg_core.ml.infer.InferenceEngine for real models."""
    arr = frame.to_numpy()
    if abs(arr).max() > 1.0:  # a command burst is present
        return CommandToken(text="gt", confidence=0.95, timestamp=time.time())
    return None

pipeline = SubvocalPipeline(
    hardware=hardware,
    classify_fn=classify,
    llm_provider=MockLLMProvider(),       # or resolve_provider() / ClaudeProvider() ...
    context_provider=MockContextProvider(),
    executor=MockActionExecutor(),
    phrase_timeout_seconds=0.5,
    on_action=lambda action, status: print("observed:", action.action_type, status),
)

hardware.start()
hardware.trigger_command("gt", duration_ms=120)
for _ in range(30):
    action = pipeline.step(window_ms=50)
    if action:
        print("Executed:", action.action_type, action.params)
        # -> Executed: goto {'arguments': ['google.com'], 'resolved_text': 'GOTO google.com', ...}
        break
    time.sleep(0.05)  # real-time pacing: the phrase ends after 0.5 s of silence
```

Swap in a real LLM provider (`subvocal.core.llm_providers.ClaudeProvider`, `OpenAIProvider`, `GeminiProvider`, `LlamaProvider`), a real driver (`OpenBCICytonDriver`, `DelsysTrignoDriver`, `FileReplayDriver`), and a trained classifier (`subvocal.emg_core.ml.infer.InferenceEngine`) without changing the pipeline code. `subvocal.resolve_provider()` picks the best provider for the environment automatically — a real LLM when an API key is present, the offline `HeuristicProvider` otherwise.

### Production behavior

- **Typed errors**: everything the SDK raises derives from `subvocal.SubvocalError` (`HardwareError`, `ProviderError`, `ConfigurationError`, `PolicyViolationError`, ...), each compatible with the builtin exception type it replaces.
- **Resilient providers**: configurable per-request timeouts and exponential-backoff retries for transient failures (connection errors, HTTP 408/429/5xx); non-retryable statuses fail fast.
- **Observability**: `pipeline.stats` exposes running counters (frames, tokens, intents, executed/blocked actions, errors, uptime), and `on_token` / `on_intent` / `on_action` / `on_error` observer callbacks stream pipeline lifecycle events without ever breaking the pipeline. Every phrase is JSONL-traced for audit.
- **Safety**: pluggable policy engine with dry-run mode; set `raise_on_policy_violation=True` to turn rejections into `PolicyViolationError`.

### MCP server

The SDK ships a stdio Model Context Protocol server so Claude Desktop (or any MCP client) can ingest subvocal commands as tools:

```bash
subvocal-mcp
```

Claude Desktop config:

```json
{
  "mcpServers": {
    "subvocal": { "command": "subvocal-mcp" }
  }
}
```

---

## 📂 Repository Structure

```
subvocal/
├── src/subvocal/           # The installable package
│   ├── core/               # Data models, interfaces, pipeline, security policies, LLM providers
│   ├── hardware/           # HAL drivers (file replay, synthetic, OpenBCI, Delsys) + dataset loaders
│   ├── emg_core/           # DSP filters, TD10 features, classifiers (RF/CNN/GRU/Transformer)
│   ├── shorthand/          # Phonetic shorthand vocabulary, simulator, hybrid decoder
│   ├── context/            # User context schemas and phonetic context matching
│   ├── mcp/                # Model Context Protocol stdio server
│   └── tts/                # Multi-backend TTS feedback engine
├── tests/                  # Pytest suite
├── benchmarks/             # 50-case intent-reconstruction eval harnesses
├── tools/                  # Site/API-page builders, license audit, benchmark runner
└── docs/                   # GitHub Pages site (landing, docs, platform corpus, API reference)
    └── content/            # Markdown sources for the platform corpus and walkthrough
```

---

## 🚀 Core Features

1. **Articulatory Shorthand Decoder**: Overcomes the whole-word sEMG vocabulary ceiling. Decodes compressed phonetic consonant shorthand inputs (e.g. `g gl` -> `Google`) under heavy muscle-movement noise.
2. **Asymmetric Levenshtein Distance**: A dynamic programming string alignment cost matrix configured with physiological sEMG confusion clusters (Glottal, Labial, Alveolar, Velar, Rhotic) to discount vowel/consonant omissions in silent speech.
3. **Command-Aware Context Prioritization**: Dynamic target matching against active user contacts (`TYPE`), calendar events (`SEARCH`), browser URLs (`GOTO`), and active application screen elements (`CLICK`).
4. **Physiological Signal Conditioning**: Preprocessing filter configurations defaulting to AlterEgo's `1.3–50.0 Hz` bandpass filter (designed for low-velocity articulatory gestures) with configuration support for standard `20.0–450.0 hz` EMG.
5. **Classifiers (RF + Deep Learning)**: Custom pipelines to train scikit-learn **Random Forest**, PyTorch **1D CNN**, **GRU**, and **Transformer** architectures on raw multi-channel sEMG traces.
6. **Asynchronous Execution (V2 Architecture)**: Low-latency, thread-safe asynchronous pipeline orchestration built on LiveKit's `OpsQueue` and `IncrementalDispatcher` design.
7. **Physiological Signal Monitoring**: Real-time EMA-smoothed signal level activity detection and MOS-like connection quality scoring (evaluating saturation, drift, and dropouts).
8. **Prometheus Telemetry**: Integrated Prometheus metric exporter and pre-built Grafana monitoring dashboards for tracking SDK errors, session lifecycles, and action execution statistics.
9. **HMAC-Signed Capability Grants**: Secure token-based credentials (`ActionGrants`) specifying allowed command scopes, confidence thresholds, and dry-run policies, verified dynamically via the `GrantsPolicy` middleware.
10. **MCP Integration**: A zero-dependency stdio JSON-RPC server exposing pipeline status, token injection, phrase processing, and calibration as MCP tools.

---

## 🧪 Development

```bash
git clone https://github.com/PranavKalkunte/subvocal.git
cd subvocal
pip install -e ".[all,dev]"

pytest                      # test suite
ruff check src tests       # lint
pyright                     # type check
python benchmarks/eval_runner.py   # 50-case heuristic benchmark
```

Runtime artifacts (traces, trained models) are written to the per-user data directory; override with `SUBVOCAL_DATA_DIR` / `SUBVOCAL_MODELS_DIR`.

Contributions are welcome — see [CONTRIBUTING.md](CONTRIBUTING.md) for the workflow and quality gates, and [SECURITY.md](SECURITY.md) for vulnerability reporting.

---

## 📄 License
This repository is open-sourced under the **MIT License**. See [LICENSE](LICENSE) for details.
