KH logo← All projects

DATA ANALYSIS · MACHINE LEARNING

Actual Wrapped

2021 – 2025

Spotify gives you a cute annual recap. I downloaded five years of raw play-by-play data and actually looked at it: themes, energy, skip patterns, and a model that predicts whether I'll skip a song before it ends.

55,392
total plays
3+ years
of data
9,594
2024 songs enriched
#1
yetep, top artist

Top artists, all time

1yetep2,070 plays
2Big Sean1,345 plays
3ILLENIUM1,189 plays
4Pritam1,181 plays
5Arijit Singh1,160 plays

What the data showed

Theme distribution histogram
Radar of theme share across 2024 plays. Romantic dominates at 28.9%, nearly double the next closest theme (Empowered at 17.4%).
Theme × energy chart
Themes ranked by average audio energy. Euphoric and Empowered sit at the top; Peaceful at the bottom.
Skip count by theme
Skip rate by theme. Romantic songs get skipped the most in absolute numbers, but they also get played the most.
SHAP feature importance
SHAP values from the XGBoost skip predictor. Daily average energy and language (is_english) are the dominant drivers.

MODEL RESULT

Can you predict a skip?

Short answer: kind of. An RF + XGBoost soft-voting ensemble lands around 70–75% accuracy without data leakage. The two features that matter most are daily average energy (your mood that day) and whether the song is in English. Genre alone is basically useless. Including ms_played inflates accuracy to 92%, but that's cheating since play-time is downstream of the skip decision itself.

TRY IT ON YOUR OWN DATA

Get your Spotify history

  1. 1.Go to spotify.com/account/privacy → request your Extended Streaming History (takes a few days).
  2. 2.Drop the JSON files into the notebook and run it top to bottom.
  3. 3.The 2024 enrichment (energy / theme / language) requires a separate audio-features API call. Details in the notebook.