AI Speech Coaching
Speaking Studio
200+ of my own recordings. Transcribed with Whisper, analyzed with wav2vec2 emotion models, and roasted by three AI coaches with very different takes on what I need to fix.
200+
Videos analyzed
26,143
Words transcribed
2,253
Unique words
5.1
Fillers per minute
~120 WPM
Avg speaking rate
2021–2026
Years of recordings
Performance Dashboard
7-Dimension Speaker Score
Scores aggregated across 200+ recordings spanning 2021–2026. Personal transcript data kept private.
Pipeline
How It Works
01
Ingest
200+ .mov recordings fed in. ffmpeg extracts audio, filenames encode the date.
02
Transcribe
OpenAI Whisper transcribes with a custom prompt to preserve disfluencies — every 'um', 'like', 'you know' stays in.
03
Analyze
wav2vec2 emotion model scores arousal, dominance, and valence per chunk. VADER handles NLP sentiment. Metrics computed across 7 dimensions.
04
Coach
Three AI coach personas (Marcus, Riley, Ted) each score every dimension and deliver targeted drills via local Gemma 4.
The Coaches
Three Coaches. Three Takes.
EXECUTIVE COACH
Marcus, MBA
LIFE & PERFORMANCE COACH
Coach Riley
SPEECH MECHANICS COACH
Ted "Tape" Vargas
Tech Stack
What's Under the Hood
Want to run it on your recordings?
Drop me a line. The pipeline runs locally — Whisper + Gemma 4 via LM Studio, no cloud costs once set up.
Get in touch