Introduction
Welcome to the Subvocal SDK documentation. Subvocal SDK is an open-source, hardware-agnostic middleware platform designed to connect surface electromyography (sEMG) biosensors directly to LLM-driven AI agents.
Traditional silent speech interfaces lock developers into proprietary neckbands or restricted, pre-trained whole-word vocabularies. Subvocal SDK overcomes these limits by providing the software infrastructure (digital signal preprocessing, deep learning training skeletons, articulatory phonetic shorthand simulators, and context-aware decoders) to enable open-vocabulary control with high accuracy and low execution latency.
Setting up
Get your development pipeline or documentation environment up and running in minutes.
Make it yours
Design a customizable silent speech input system that matches your hardware setup, custom training models, and intent prioritization rules.
Repository Structure
The Subvocal SDK is structured as a monorepo containing modular packages:
subvocal/
├── src/subvocal/ # The installable package (pip install subvocal)
│ ├── core/ # Data models, interfaces, pipeline, security policies, LLM providers
│ ├── hardware/ # HAL drivers (file replay, synthetic, OpenBCI, Delsys) + dataset loaders
│ ├── emg_core/ # DSP filters, TD10 features, ML classifiers (RF/CNN/GRU/Transformer)
│ ├── shorthand/ # Phonetic shorthand vocabulary, simulator, hybrid decoder
│ ├── context/ # User context schemas and phonetic context matching
│ ├── mcp/ # Model Context Protocol stdio server (subvocal-mcp)
│ └── tts/ # Multi-backend TTS feedback engine
├── tests/ # Pytest suite
├── benchmarks/ # 50-case intent-reconstruction eval harnesses
├── platform/ # Publishable specifications & documentation
├── LICENSE # MIT License
└── README.md # Monorepo overview
Quickstart Guide
Get a complete silent speech pipeline running offline in three steps.
Step 1: Install the SDK
The base install is lightweight (pydantic + numpy) and covers the pipeline, hardware drivers, shorthand decoding, context, and the MCP server:
pip install subvocal
Optional extras pull in heavier subsystems:
pip install "subvocal[ml]" # classifier training & inference (torch, scikit-learn)
pip install "subvocal[hardware]" # public-dataset drivers (Ninapro, PutEMG, CSL-HDEMG)
pip install "subvocal[tts]" # audio feedback outside macOS
pip install "subvocal[all]" # everything
Tip: CPU Inference Gating
PyTorch sequence models are configured by default to run on CPU threads. This avoids MPS device-pooling overhead on Apple Silicon, keeping gesture inference latencies under 1 millisecond.
Step 2: Run an end-to-end pipeline
The following runs the full pipeline — synthetic sEMG source, classification, intent reconstruction, and action execution — with no hardware or API keys:
from subvocal import SubvocalPipeline
from subvocal.core.testing import MockActionExecutor, MockContextProvider, MockLLMProvider
from subvocal.hardware.drivers import SyntheticSignalGenerator
from subvocal.core.models import CommandToken
import time
hardware = SyntheticSignalGenerator(fs=1000.0, num_channels=8)
def classify(frame):
arr = frame.to_numpy()
if abs(arr).max() > 1.0:
return CommandToken(text="gt", confidence=0.95, timestamp=time.time())
return None
pipeline = SubvocalPipeline(
hardware=hardware,
classify_fn=classify,
llm_provider=MockLLMProvider(),
context_provider=MockContextProvider(),
executor=MockActionExecutor(),
phrase_timeout_seconds=0.5,
)
hardware.start()
hardware.trigger_command("gt", duration_ms=120)
for _ in range(30):
action = pipeline.step(window_ms=50)
if action:
print("Executed:", action.action_type, action.params)
break
time.sleep(0.05)
Step 3: Execute target reconstruction benchmarks
From a repository checkout, run the evaluation harness to measure spelling shorthand reconstruction accuracy and execution latency under simulated physiological noise:
git clone https://github.com/PranavKalkunte/subvocal.git
cd subvocal
pip install -e ".[all,dev]"
python benchmarks/eval_runner.py
Expected Results
The evaluation harness simulates muscle-movement biopotential noise across 50 realistic shorthand command scenarios (e.g.
g gl ->
Google), returning a baseline heuristic accuracy of
74.0% at
<0.72 ms execution latency.
Local Development & Integration
Configure the signal preprocessing pipeline, train models on your own raw biopotentials, and customize shorthand alignment costs.
1. Custom Signal Preprocessing (DSP)
Physiological raw signals ingest at 250 Hz. Signal conditioning routines are located in subvocal.emg_core.dsp.filters and apply:
- An AlterEgo-inspired 1.3–50.0 Hz bandpass filter designed to capture slow, low-velocity articulatory gestures.
- A 60 Hz notch filter (configurable to 50 Hz for EU power grids) to remove AC line noise.
- Time-domain feature extraction in
subvocal.emg_core.dsp.features yielding TD10 segment features (840-dimensional representations per segment window).
2. Training Custom Classifiers
Custom pipeline training skeletons are located in subvocal.emg_core.ml.train (requires the [ml] extra). The SDK supports:
- Random Forest: Standard lightweight heuristic baseline.
- 1D CNN: Temporal convolutional model mapping multi-channel muscle traces.
- GRU (Gated Recurrent Unit): Sequence model trained on temporal sEMG feature frames.
- Transformer: Small attention encoder over multi-channel segments.
# Train a 1D CNN on a user's recorded calibration data
from subvocal.emg_core.ml.train import train_model
from subvocal.emg_core.ml.config_schema import TrainingConfig
config = TrainingConfig(model_type="cnn", epochs=20, batch_size=16, lr=1e-3, test_size=0.2)
metrics = train_model("your_user_id", model_type="cnn", config_obj=config)
print(metrics["accuracy"])
Where data and models live
Calibration recordings and trained weights resolve to the per-user data directory; override with the
SUBVOCAL_DATA_DIR and
SUBVOCAL_MODELS_DIR environment variables.
3. Live Inference & Cooldown Gating
Inference is managed by a unified InferenceEngine in subvocal.emg_core.ml.infer. It applies confidence thresholds and adaptive cooldown gating (e.g., ignoring predictions within 500ms of a successful trigger) to prevent duplicate gesture triggers in real time.
4. Articulatory Shorthand Decoder Customization
Rather than classifying whole words (which degrades rapidly beyond small vocabularies), users speak compressed phonetic consonant shorthand (e.g. g gl for Google).
The decoder in subvocal.shorthand.decoder aligns shorthand strings using a custom dynamic programming cost matrix (Asymmetric Levenshtein) configured with biological muscle confusion groups in subvocal.shorthand.spec:
- Labial: physical gestures overlap (`p`, `b`, `m`, `f`, `v`).
- Alveolar: tongue placement overlap (`t`, `d`, `s`, `z`, `n`, `l`).
- Velar: back-of-tongue overlap (`k`, `g`, `ng`).
- Rhotic: throat muscle movement (`r`).
Consonant omissions are weighted based on biological gesture overlap. Update the target vocabulary directly in subvocal.shorthand.vocab to adapt command priorities.
5. Zero-Dependency TTS Engine
Located in subvocal.tts.engine, the speech feedback system operates offline with zero external service requirements. It defaults to macOS system commands (say and afplay) to provide local audio confirmations.
MCP Integration
The SDK ships a stdio-based Model Context Protocol server so Claude Desktop — or any MCP-compatible client — can ingest subvocal commands as standard tool calls. It installs as a console command:
subvocal-mcp
Claude Desktop configuration
Add the server to claude_desktop_config.json:
{
"mcpServers": {
"subvocal": { "command": "subvocal-mcp" }
}
}
Exposed tools
- get_pipeline_status: Reports hardware connection, classifier model, and buffer state.
- get_token_buffer: Returns the accumulated command tokens awaiting phrase reconstruction.
- inject_token: Injects a shorthand token into the pipeline buffer (useful for testing without hardware).
- process_phrase: Forces immediate intent reconstruction and action dispatch on the buffered tokens.
- trigger_calibration: Starts a per-user classifier calibration run in the background (requires the
[ml] extra).
Resources
The server also exposes subvocal://intent/history, a JSON log of recently executed intents, as an MCP resource.
Protocol details
The server implements JSON-RPC 2.0 over stdio against protocol version
2025-03-26 with zero third-party dependencies. The full low-bandwidth intent profile proposal lives in
platform/mcp_intent_profile.md.