Project Resonance / System Architecture

Local, privacy-first audio restoration. Phone recording in, podcast-quality out. Version 6 — April 2026

Product Roadmap

Phase 1 — Now
CLI Tool
pip install, one command. Works on any audio format. Package as Python library + CLI.
Phase 2
Validation Demo
HuggingFace Space. Strangers upload audio. Collect feedback. Learn what breaks across voices, mics, noise.
Phase 3
Consumer UI
Semantic controls: Clarity, Warmth, Brightness, Loudness, Smoothness. AI profile suggestions. Better Audacity for everyone.
End State
Real-Time On-Device
System-level audio filter. Every meeting, every video, podcast quality in real time. Your ears never suffer.

Current Pipeline — 5 Stages

1
Noise Suppression
Kills background noise — hum, room tone, fans, street. Leaves voice untouched.
DeepFilterNet3 • 1M params • Rust/C++ core • Real-time capable • ~3s / 2min
2
Speech Enhancement
Improves voice clarity, fills in spectral detail lost by phone mics. Single-pass discriminative model.
MossFormer2_SE_48K (ClearVoice/Alibaba) • 48kHz native • File I/O mode • ~7s / 2min
3
Professional Mastering
The same DSP chain used by podcast engineers: high-pass, EQ, dual compression, de-essing, presence, air.
Spotify Pedalboard • C++ JUCE core • HPF 80Hz → EQ -3dB@300Hz → Comp 2:1+3:1 → De-ess -4dB@6kHz → +2.5dB@3kHz → +2dB@10kHz
4
Loudness Normalization
Targets −18 LUFS — the standard for podcast distribution platforms.
pyloudnorm • ITU-R BS.1770 measurement • EBU R128 normalization
5
Brick-Wall Limiter
Prevents clipping. Ceiling at −1.5 dB ensures safe playback on all devices.
Pedalboard Limiter • −1.5 dB ceiling • True peak limiting

Module Architecture

User Surfaces

Phase 1
CLI
Command-line interface. Format conversion via ffmpeg. Signal handling + cleanup.
cli.py
Phase 2
API Server
FastAPI endpoints for programmatic access. HuggingFace Spaces deployment.
api.py
Phase 3
Consumer UI
Gradio / web UI with semantic controls. Drag-and-drop. AI profile suggestion.
app.py

Public API Layer

Library Interface
import phonepod
phonepod.enhance("input.m4a", "output.wav")
engine = phonepod.Engine(preset="podcast")
enhanced, sr = engine.enhance(tensor, sample_rate)
__init__.py
presets.py

Core

Processing Engine
PodcastEngine
5-stage pipeline. DeepFilterNet → MossFormer2 → Pedalboard → LUFS → Limiter. Pure tensor-in, tensor-out. No user-facing file I/O.
engine.py
File I/O
Processor
Loads audio, converts to mono, passes to engine, saves output. Owns the file boundary.
processor.py

Infrastructure

Compatibility
_compat.py
Isolates the torchaudio.backend monkey-patch. One place, documented, tested.
tech debt
Quality Metrics
metrics.py
DNSMOS blind quality estimation. PESQ when reference available. Spectral analysis.
phase 2
Feedback
feedback.py
Opt-in submission to HuggingFace Dataset. Three consent tiers. No telemetry.
phase 2

Semantic Controls — Phase 3 Vision

Human language for audio effects. What the user sees vs what the DSP does.

Clarity
→ DeepFilterNet attenuation + MossFormer2 strength
Warmth
→ Low-shelf EQ + tube saturation drive
Brightness
→ Presence boost (3kHz) + air boost (10kHz)
Loudness
→ LUFS target (-24 to -14) + compressor ratio
Smoothness
→ De-esser strength + compressor attack/release

Feedback Loop Architecture

CLI / Space / App
User processes audio. Locally. Always.
Opt-In Prompt
"How'd it sound?" Rating + optional issue tags. Per-submission consent. Never automatic.
Consent Tier Selection
Metadata only, or metadata + audio, or metadata + audio + contact. User picks every time.
HuggingFace Dataset
Public, versioned, inspectable. Users can see exactly what's collected. Audio column support built in.
Automated Analysis
Cluster by spectral profile. Quality distribution per cluster. Find where the pipeline fails. Generate candidate parameter adjustments.
Manual Review
Listen to 1-2 star submissions. Categorize failure mode. Decide: parameter tune vs new model vs wontfix.
Pipeline Update
New presets from real clusters. Updated parameters. Regression tests pinned to real audio scores.

Privacy Model

1
Metadata Only
No audio leaves your device.
Sends: rating, issue tags, device info, spectral stats, pipeline version, duration
Value: know WHAT fails
2
Metadata + Audio
Input and output .wav files included.
Sends: everything in Tier 1 + audio files
Value: hear WHY it fails
Consent: explicit per-submission, deletion available via submission ID
3
Full Research
Contact info for follow-up.
Sends: everything in Tier 2 + email/handle
Value: tune for specific setups
Rare: power users and beta testers only

Never collected: IP address (stripped at ingestion), location, OS/hardware beyond what user shares, anything without explicit per-submission consent. No telemetry. No phone-home. No analytics.

Feedback Data Schema

Field
Type
Description
submission_id
UUID
Unique ID for deletion requests
timestamp
ISO 8601
When submitted
source
enum
"cli" | "space" | "app"
version
string
Pipeline version (e.g. "0.1.0")
rating
int 1-5
User quality rating
issues[]
string[]
"noisy", "robotic", "quiet", "loud", "thin", "muddy"
device
string | null
Recording device (optional)
input_sr
int
Input sample rate
input_duration
float
Duration in seconds
input_lufs
float
Measured input loudness
output_lufs
float
Measured output loudness
pipeline_params
object
Full DSP config used for this run

has_audio
bool
Whether audio files are included
input_audio
Audio | null
Raw input (Tier 2+ only)
output_audio
Audio | null
Enhanced output (Tier 2+ only)

input_snr_est
float
Auto-computed: estimated input SNR (dB)
spectral_profile
object
Auto-computed: {bass, mid, presence, air} energy in dB
noise_type_est
string
Auto-computed: "room", "street", "wind", "hum", "clean"
dnsmos_score
float
Auto-computed: blind quality estimate (1-5)

Real-Time Readiness Assessment

Component Current Latency Real-Time? Blocker? Path Forward
DeepFilterNet3 ~3s / 2min (0.025x RT) Yes No Already has streaming API + LADSPA/VST plugin mode
MossFormer2 ~7s / 2min (0.058x RT) No Yes Batch model with 4s sliding windows. Need streaming alternative or prove DeepFilterNet + DSP is sufficient for live
Pedalboard DSP ~0.1s / 2min (instant) Yes No Built on JUCE — a real-time audio framework. Sample-level processing.
LUFS + Limiter ~0.05s / 2min (instant) Partial No Needs windowed approximation instead of integrated measurement

Dream State

Now
280 lines on one laptop
One voice tested
No users
Phase 1
pip install phonepod
One command
It just works
Phase 2
HuggingFace demo
Real feedback
Pipeline hardens
Phase 3
Semantic controls
AI profiles
Better Audacity
End State
Real-time on-device
System audio filter
Your ears never suffer