Local, privacy-first audio restoration. Phone recording in, podcast-quality out. Version 6 — April 2026
Phase 1 — Now
CLI Tool
pip install, one command. Works on any audio format. Package as Python library + CLI.
Phase 2
Validation Demo
HuggingFace Space. Strangers upload audio. Collect feedback. Learn what breaks across voices, mics, noise.
Phase 3
Consumer UI
Semantic controls: Clarity, Warmth, Brightness, Loudness, Smoothness. AI profile suggestions. Better Audacity for everyone.
End State
Real-Time On-Device
System-level audio filter. Every meeting, every video, podcast quality in real time. Your ears never suffer.
Processing Engine
PodcastEngine
5-stage pipeline. DeepFilterNet → MossFormer2 → Pedalboard → LUFS → Limiter. Pure tensor-in, tensor-out. No user-facing file I/O.
engine.py
File I/O
Processor
Loads audio, converts to mono, passes to engine, saves output. Owns the file boundary.
processor.py
Compatibility
_compat.py
Isolates the torchaudio.backend monkey-patch. One place, documented, tested.
tech debt
Quality Metrics
metrics.py
DNSMOS blind quality estimation. PESQ when reference available. Spectral analysis.
phase 2
Feedback
feedback.py
Opt-in submission to HuggingFace Dataset. Three consent tiers. No telemetry.
phase 2
Human language for audio effects. What the user sees vs what the DSP does.
Clarity
→ DeepFilterNet attenuation + MossFormer2 strength
Warmth
→ Low-shelf EQ + tube saturation drive
Brightness
→ Presence boost (3kHz) + air boost (10kHz)
Loudness
→ LUFS target (-24 to -14) + compressor ratio
Smoothness
→ De-esser strength + compressor attack/release
1
Metadata Only
No audio leaves your device.
Sends: rating, issue tags, device info, spectral stats, pipeline version, duration
Value: know WHAT fails
2
Metadata + Audio
Input and output .wav files included.
Sends: everything in Tier 1 + audio files
Value: hear WHY it fails
Consent: explicit per-submission, deletion available via submission ID
3
Full Research
Contact info for follow-up.
Sends: everything in Tier 2 + email/handle
Value: tune for specific setups
Rare: power users and beta testers only
submission_id
UUID
Unique ID for deletion requests
timestamp
ISO 8601
When submitted
source
enum
"cli" | "space" | "app"
version
string
Pipeline version (e.g. "0.1.0")
rating
int 1-5
User quality rating
issues[]
string[]
"noisy", "robotic", "quiet", "loud", "thin", "muddy"
device
string | null
Recording device (optional)
input_sr
int
Input sample rate
input_duration
float
Duration in seconds
input_lufs
float
Measured input loudness
output_lufs
float
Measured output loudness
pipeline_params
object
Full DSP config used for this run
has_audio
bool
Whether audio files are included
input_audio
Audio | null
Raw input (Tier 2+ only)
output_audio
Audio | null
Enhanced output (Tier 2+ only)
input_snr_est
float
Auto-computed: estimated input SNR (dB)
spectral_profile
object
Auto-computed: {bass, mid, presence, air} energy in dB
noise_type_est
string
Auto-computed: "room", "street", "wind", "hum", "clean"
dnsmos_score
float
Auto-computed: blind quality estimate (1-5)
Now
280 lines on one laptop
One voice tested
No users
→
Phase 1
pip install phonepod
One command
It just works
→
Phase 2
HuggingFace demo
Real feedback
Pipeline hardens
→
Phase 3
Semantic controls
AI profiles
Better Audacity
→
End State
Real-time on-device
System audio filter
Your ears never suffer