Strictly hypothesis testing

Blogs home

I’ll get straight to it. I love Strictly. You may know it by “Dancing with the Stars”, “Bailando por un Sueño”, or “Danse avec les stars”. I resisted for a decade, but for the last two years I’ve been fully seduced by the big glitter ball and haven’t missed an episode.

Now as a fully qualified arm-chair critic, I think I’ve noticed that to get a good score on Strictly you just need to do the Charleston. It hides a multitude of sins. You sort of jump about a bit, do something called “swivel”, and fall over at the end. Then you get a minimum 35. Job done.

My family disagree, so I’ve set about to test my hypothesis. The great news is that this is really easy to do with R. In fact, this whole post uses no more R than you’d find in an introduction to R course. My family can’t wait to find out the results. Wherever they went.

So let’s get started and finally put this to rest. Base R has everything we need but we’ll get there quicker with the tidyverse. As most of my notebooks now start, let’s load the main tidyverse packages.

library(tidyverse) # I've set message=FALSE everywhere

The Data

We’re lucky that someone at www.ultimatestrictly.com has been keeping track of every dance that’s ever happened on strictly. And we’re even luckier that they have made it available as a CSV download. It only goes up to 2016 but that’s still 14 series so it should be enough.

With the URL all we need are a couple of tweaks to the default read_csv to account properly for an unusual NA string and some sparse columns that guess the data type incorrectly.

url <- "https://www.ultimatestrictly.com/s/SCD-Results-S14.csv"
raw_results <- read_csv(file = url, na = c("-"), guess_max = 10000)

head(raw_results) %>%
  knitr::kable() # pretty markdown tables
Couple Dance Song Series Week Order Craig Arlene Len Bruno Alesha Darcey Jennifer Donny Total
Natasha & Brendan Cha cha cha Chain Of Fools 1 1 1 5 7 8 7 NA NA NA NA 27
Lesley & Anton Waltz He Was Beautiful 1 1 2 6 8 8 7 NA NA NA NA 29
Chris & Hanna Cha cha cha Lady Marmalade 1 1 3 4 4 7 4 NA NA NA NA 19
Jason & Kylie Waltz Three Times A Lady 1 1 4 5 5 6 5 NA NA NA NA 21
Verona & Paul Cha cha cha R.E.S.P.E.C.T 1 1 5 7 6 7 7 NA NA NA NA 27
Claire & John Waltz Unchained Melody 1 1 6 7 7 8 5 NA NA NA NA 27

We get every dance, for every week in every series, and individual judges scores as well as the total score.

Let’s start with a little light cleaning. The formatting on the dances is a bit inconsistent. For example we have Cha cha chaand Cha Cha Cha. Most of it can be fixed by forcing the dance names to be a consistent case. I like the look of title case so we’ll use stringr’s str_to_title.

dances <- raw_results %>%
  mutate(Dance = str_to_title(Dance))

dances %>%
  select(Dance) %>%
  head() %>%
  knitr::kable()
Dance
Cha Cha Cha
Waltz
Cha Cha Cha
Waltz
Cha Cha Cha
Waltz

My next step would usually be to gather this into a tidy long data frame, maybe separate the couples into celebrity and professional. But in this instance we only want the Dance and the Total, so we’ll leave it like it is.

Top Dances

I’d like to know which dances get the best Total score. We can do this with a group_bysummarise combination.

dances %>%
  group_by(Dance) %>%
  summarise(MeanScore = mean(Total, na.rm = TRUE),
            Count = n()) %>%
  head(3)
## # A tibble: 3 x 3
##   Dance           MeanScore Count
##   <chr>               <dbl> <int>
## 1 American Smooth      31.6   102
## 2 Argentine Tango      35.7    43
## 3 Cha Cha              32       1

I’m removing anything with less than 10 scores because these tend to be hybrid dances and one-offs.

  filter(Count >= 10) %>%

The Show Dance only appears in the final episode and is multi-style. Out it goes.

 filter(Dance != "Show Dance") %>%

I don’t want any “hops” in there. The hops are those ones where everyone’s on the floor at the same time and you have no idea what’s going on but they all seem to be enjoying themselves. There’s only a Lindy Hop in this data but I don’t want any hops. Ever. No Hops!

  filter(!str_detect(Dance, " Hop")) %>%

Putting it all together we get our top dances.

top_dances <- dances %>%
  group_by(Dance) %>%
  summarise(MeanScore = mean(Total, na.rm = TRUE),
            Count = n()) %>%
  filter(Count >= 10)  %>%
  filter(Dance != "Show Dance") %>%
  filter(!str_detect(Dance, " Hop")) %>%
  arrange(desc(MeanScore))

knitr::kable(top_dances)
Dance MeanScore Count
Argentine Tango 35.67442 43
Viennese Waltz 32.10000 90
Charleston 31.74324 74
American Smooth 31.57843 102
Quickstep 30.71538 130
Foxtrot 30.05738 122
Samba 29.46296 108
Paso Doble 29.21552 116
Tango 28.89552 134
Salsa 28.72549 102
Jive 28.26016 123
Rumba 28.17273 110
Waltz 27.93233 133
Cha Cha Cha 25.44805 154

And there we have it. I’m sort of right. The Charleston is top 3. And who doesn’t like an Argentine Tango? But there’s something about that count… What if they tend to only do the Charleston in later weeks when they’re a bit better?

We know that the scores go up by week. In fact it’s pretty linear.

dances %>%
  ggplot(aes(x = Week, y = Total, group = Week)) + 
  geom_boxplot()

Let’s do the same again and add average week. This will give the mean week in which each dance appears. A bigger number will mean they tend to appear later.

top_dances_week <- dances %>%
  group_by(Dance) %>%
  summarise(MeanScore = mean(Total, na.rm = TRUE),
            MeanWeek = mean(Week, na.rm = TRUE),
            Count = n()) %>%
  filter(Count >= 10)  %>%
  filter(Dance != "Show Dance") %>%
  filter(!str_detect(Dance, " Hop")) %>%
  arrange(desc(MeanScore))

knitr::kable(head(top_dances_week, 5))
Dance MeanScore MeanWeek Count
Argentine Tango 35.67442 10.023256 43
Viennese Waltz 32.10000 6.911111 90
Charleston 31.74324 6.891892 74
American Smooth 31.57843 7.333333 102
Quickstep 30.71538 5.823077 130

Rats. My hypothesis is in trouble. The Charleston does indeed have a high MeanWeek which means it appears later in the series when the scores are higher. My children have long since gone to bed so let’s take it up a notch and build a statistical model that can account for Week and Dance at the same time.

Linear Model

First, we’ll take the individual dance level data and reuse the filtering for the top_dances.

main_dances <- dances %>%
  filter(Dance %in% top_dances$Dance)

We want a simple linear model that accounts for the increasing score by week and the effect we’re interested in – the dance.

fit <- lm(Total ~ Week + Dance, data = main_dances)
summary(fit)
## 
## Call:
## lm(formula = Total ~ Week + Dance, data = main_dances)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -29.9515  -3.4593   0.4912   3.5637  19.4965 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          23.37653    0.59660  39.183  < 2e-16 ***
## Week                  1.11844    0.04051  27.607  < 2e-16 ***
## DanceArgentine Tango  1.08747    0.95628   1.137  0.25564    
## DanceCha Cha Cha     -1.87934    0.68459  -2.745  0.00612 ** 
## DanceCharleston       0.65854    0.79808   0.825  0.40941    
## DanceFoxtrot          0.16272    0.70368   0.231  0.81716    
## DanceJive            -1.25416    0.70372  -1.782  0.07492 .  
## DancePaso Doble      -1.22840    0.71043  -1.729  0.08399 .  
## DanceQuickstep        0.82609    0.69385   1.191  0.23400    
## DanceRumba           -1.78227    0.72064  -2.473  0.01350 *  
## DanceSalsa           -1.36169    0.73365  -1.856  0.06364 .  
## DanceSamba           -1.83586    0.72150  -2.545  0.01104 *  
## DanceTango           -0.13999    0.69274  -0.202  0.83988    
## DanceViennese Waltz   0.99380    0.75585   1.315  0.18877    
## DanceWaltz           -0.03570    0.70003  -0.051  0.95934    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.225 on 1526 degrees of freedom
## Multiple R-squared:  0.3969, Adjusted R-squared:  0.3914 
## F-statistic: 71.75 on 14 and 1526 DF,  p-value: < 2.2e-16

After removing the strong effect of week, there’s not much left of significance. It can be tempting to cherry pick individual levels but the only one with much signal is the Cha Cha Cha. Ultimate Strictly describes it as “A cheeky, fun dance, but rarely a show stopper”.

Conclusion

Unfortunately it looks like the evidence doesn’t support my hypothesis that the Charleston is more favourably marked. Maybe all that swivelling is harder than it looks. Faye and Giovanni did undeniably do an excellent job with their, perfect scoring, Sound of Music Charleston. What we did find is that the Cha Cha Cha might score lower. Although that’s most likely because it’s seen as an easier dance for, well, People like Quentin.

I’m sorry Quentin. I would fare no better.

Well seeing as I’m wrong, and everyone else is asleep. Let’s just pretend this never happened.

Code for this blog post on GitHub.