Diving into the Statistical Analysis of Swimming


Introduction

For as long as I can remember, I have been a SwimScout. Heatsheets, record books, results, videos, you name it and I studied it. It never mattered when or where a time was swum because it was, and always will be, a race against the clock. And while some might say that makes it easy, I would say it makes it simple.

As all swimmers know, the clock does not lie. It has no emotion. It is not loyal to any team or country or conference, nor can it be influenced by the reaction of a coach or player. It is one of the most accurate measurements of success available, capable of stripping away much of the noise that could affect the outcome of other sports, and leaves everyone in agreement on knowing what needs to be done in order to win. Simply touch the wall first.

But the story that the clock does not tell is the one about what comprised the swim in the first place, and I am not talking about splits. True, a best time is a sum of its splits, but the splits are the sum of strokes and kicks, and those strokes and kicks are being conducted by a male or a female that could have a height differential of two feet and a weight differential of up to 100 lbs. Catch my drift? An efficient race is an effective race and that is what statistical analysis of swimming can tell you. How to tailor your race to exactly the type of swimmer you are.

Heck, whether saw (or read) Moneyball or not, you have likely heard of Brad Pitt and Jonah Hill so the phrase “statistical analysis” is not foreign to you. But if you were anything like me when I was a swimmer, all you did was count, and never measure.

But even the counting had to start somewhere, and for me it was watching Nate Dusing tie the Male 100 SCY Butterfly National High School record at the Kentucky State High School Championships on February 22, 1997. I remember this race for two reasons:

  1. this was the first time I took notice to underwater kicking (no offense to Pankratov in 96)

  2. (more importantly) this was the first time I remember counting strokes in a race

Dusing, a Cincinnati Marlin that attended Covington Catholic in Northern Kentucky, was so quick underwater that he only needed one stroke the first lap and two the second to touch at the 50 in 21.69 seconds. I will say that again, 21.69! For those of you who do not remember, the Junior National qualifying time in the 50 SCY Freestyle back then was 21.69, and this guy still had two more laps to go for a 100 Butterfly. Admittedly, I stopped counting the second 50 as I was still in shock from his first 50, but seeing 47.10 as the final time on the scoreboard certainly brought me back to reality.

From that day until my Senior Day at the University of Florida, I trained with an absolute stroke count in mind because I knew it was the right foundation upon which to build a race strategy.

However, it was not until after I retired and started to view the sport as a spectator that I began to pay attention to the relative stats of swimming, and how they too can factor into a successful race strategy.


2008 Olympics

One of the first races that caught my eye from a statistical analysis standpoint was the Male 4×100 LCM Freestyle Relay Final at the 2008 Beijing Olympics. A race I was fortunate enough to witness in person.

Now, like many others, I thought Lezak’s chances of winning were as low as Lloyd Christmas winning over Mary Samsonite, but I do remember thinking that Bernard rushed the first 50, and that left the door open for Lezak.

Now we all know what happened next, but I am going to tell you anyways. Bernard faded and Captain America soared to victory by running down a former world record holder, in what can arguably be one of the greatest single Olympic performances ever.

But what most of us do not know is how it exactly happened and, more interestingly, how easily it could have gone the other way. Even though Lezak was technically within reach when he dove in, Bernard, with clear water on both sides, should have outsplit him by ~0.40 seconds.

*The following exhibits do not account for any differential in textile vs. non-textile suits


Exhibit 1
Male 4x100 LCM Freestyle Relay Final, Anchor Leg Splits, Beijing Olympics, 2008

Jason Lezak 100 LCM Freestyle: 1st 50 / 2nd 50 = Total Time (in ss.hh)

  • Splits: 21.50/24.56 = 46.06 (14% increase, 50-over-50)

  • Stroke Count: 29/34 = 63 (17% increase, 50-over-50)

Alain Bernard 100 LCM Freestyle: 1st 50 / 2nd 50 = Total Time (in ss.hh)

  • Splits: 21.27/25.46 = 46.73 (20% increase, 50-over-50)

  • Stroke Count: 34/42 = 76 (24% increase, 50-over-50)

Cool, huh? Actually, not really. All this tells us is that Bernard is a splash and dash sprinter, whereas Lezak is a closer. It does not tell us anything about how else the race could have ended. In order to see that, we need to compare another race where the two swimmers were involved, and what better than the Male 100 LCM Freestyle Individual Final of the same Olympiad.


Exhibit 2
Male 100 LCM Freestyle Individual Final, Beijing Olympics, 2008

Jason Lezak (3rd Place): 1st 50 / 2nd 50 = Total Time (in ss.hh)

  • Splits: 22.86/24.81 = 47.67 (9% increase, 50-over-50)

  • Stroke Count: 29/37 = 66 (28% increase, 50-over-50)

Alain Bernard (1st Place): 1st 50 / 2nd 50 = Total Time (in ss.hh)

  • Splits: 22.53/24.68 = 47.21 (10% increase, 50-over-50)

  • Stroke Count: 34/41 = 75 (21% increase, 50-over-50)

This comparable analysis shows us that, on neutral circumstances, Alain Bernard is 0.46 seconds faster than Jason Lezak and he accomplishes that by swimming a controlled first 50 in 34 strokes so that he is able to bring home his second 50 within 10% of his first.

Bernard lost by 0.08 seconds. Had he maintained the same race strategy as his individual race, he could have afforded to go out an entire second (well, 0.99 seconds to be exact) slower, and still would have been able to hold off Lezak.


Closing

And, just for fun, let’s look at how Lezak’s swim stacks up in comparison to others.

Exhibit 3
Establishing a Benchmark to Compare Time and Length

Lezak split 2.50% faster than the existing world record of 47.24, which was set by Eamon Sullivan from Australia with his lead-off leg about 100 seconds earlier.

The following are a few select events and their adjusted time by lower the existing record by 2.50%:

  • 100 Butterfly – Male World Record = 48.58 (Existing = 49.82, Michael Phelps)

  • 100 Backstroke – Male World Record = 50.64 (Existing = 51.94, Aaron Peirsol)

  • 100 Backstroke – Female World Record = 56.67 (Existing = 58.12, Gemma Spofforth)

  • 800 Freestyle – Female World Record = 8:01.52 (Existing = 8:13.86, Katie Ledecky)

  • 50 Freestyle – NCAA Male Division 1 Record = 18.01 (Existing = 18.47, Cesar Cielo)

  • 200 Freestyle – NCAA Male Division 1 Record = 1:28.92 (Existing = 1:31.20, Simon Burnett)

  • 100 Butterfly – NCAA Female Division 1 Record = 48.76 (Existing = 50.01, Natalie Coughlin)

  • 200 Breaststroke – NCAA Female Division 1 Record = 2:01.37 (Existing = 2:04.48, Breeja Larson)

Now I know you could argue his reaction time (0.06, if I recall) should be accounted for versus a flat start (call it 0.72 to be fair), but that in no way should negate the praise this swim rewards, it only argues for the need to dive deeper (pun intended) into the elements of the swimming equation.

Hey, we may never see another 46.06 relay split again. But the numbers support that we will witness someone dropping 2.50% off a record again. I just hope that if it comes with similar circumstances surrounding the outcome of that race in Beijing, that I can be there to witness it.


Footnotes

Author: Elliot Meena

Published: January 17, 2014

Sources: NBC Olympics

Notes:

  • SCY: Short-Course-Yards (i.e., a 25-yard pool)

  • LCM: Long-Course-Meters (i.e., a 50-meter pool)

  • Male 4×100 LCM Freestyle Relay Final at the 2008 Beijing Olympics: http://vimeo.com/7735706

  • Male 100 LCM Freestyle Individual Final: http://vimeo.com/17021260

  • Copyright 2022, all rights reserved