AI Speech Coaching

Speaking Studio

200+ of my own recordings. Transcribed with Whisper, analyzed with wav2vec2 emotion models, and roasted by three AI coaches with very different takes on what I need to fix.

200+

Videos analyzed

26,143

Words transcribed

2,253

Unique words

5.1

Fillers per minute

~120 WPM

Avg speaking rate

2021–2026

Years of recordings

Performance Dashboard

7-Dimension Speaker Score

255075100ClarityPacingVocabularyConfidenceToneDisfluencyControlStorytelling
87Clarity
69Pacing
67Vocabulary
62Confidence
56Tone
49Disfluency Control
45Storytelling

Scores aggregated across 200+ recordings spanning 2021–2026. Personal transcript data kept private.

Pipeline

How It Works

01

Ingest

200+ .mov recordings fed in. ffmpeg extracts audio, filenames encode the date.

02

Transcribe

OpenAI Whisper transcribes with a custom prompt to preserve disfluencies — every 'um', 'like', 'you know' stays in.

03

Analyze

wav2vec2 emotion model scores arousal, dominance, and valence per chunk. VADER handles NLP sentiment. Metrics computed across 7 dimensions.

04

Coach

Three AI coach personas (Marcus, Riley, Ted) each score every dimension and deliver targeted drills via local Gemma 4.

The Coaches

Three Coaches. Three Takes.

EXECUTIVE COACH

Marcus, MBA

LIFE & PERFORMANCE COACH

Coach Riley

SPEECH MECHANICS COACH

Ted "Tape" Vargas

Tech Stack

What's Under the Hood

Python 3.11OpenAI WhisperGemma 4 (local)wav2vec2HuggingFaceLM StudioffmpeglibrosaVADER SentimentChart.js

Want to run it on your recordings?

Drop me a line. The pipeline runs locally — Whisper + Gemma 4 via LM Studio, no cloud costs once set up.

Get in touch