The psych sheet is finally here, so we have essentially all the pre Trials information we are going to get.... Current photo via Mike Lewis/Ola Vista Photography
The psych sheet is finally here, so we have essentially all the pre Trials information we are going to get. I constructed a rough statistical model of Olympic Trials. The two variables it uses are a swimmer’s long course times since September last year and their seed time on the psych sheet. I used this information to calculate an expected time for each swimmer, and a percentage chance of qualifying individually in each event.
-The strongest favorite to make the team is Katie Ledecky in the 800 Free at 96%. For the men it’s Michael Phelps in the 200 IM at 85%
-The weakest favorite to make the team is Micah Lawrence in the 200 Breast at 39%. For the men it’s Connor Jaeger in the 400 Free at 45%
-The highest seed the model picks in the top 2 is 5th; Lilly King in the 200 Breast
-The highest seed the model picks in the top 8 is 19th; Clara Smiddy in the 200 Back
-The model ranks every 1 seed in the top 2. The lowest it picks an 2 seed is 5th; Missy Franklin in the 100 Free
-Most of the predicted times are slower than seed. In 2012 only 19% of women and 32% of men beat their seed time. For example, the top 8 seeds in the 2012 women’s 100 free:
-The model was tuned using 2012 data. I would have preferred more years, but the older data isn’t as good. Trials performance from the super suit year in 2008 isn’t a good predictor of an un super suited year, and I don’t trust the completeness of in season data from 2004 and earlier.
-This is a probabilistic forecast, so it’s supposed to get some things wrong if it’s correctly calibrated. It should get around 28 of the 52 individual Olympic qualifiers right (28 of the people it says will get 1st or 2nd, actually make the team)
-The model doesn’t account for athletes’ specific taper history. Some athlete’s have a well established pattern of large or small tapers.
-The model only accounts for swimmers ranked 1-24. Swimmers out side that range are extreme long shots, but have a non zero chance of making the meet.
-The model doesn’t know which in season times were rested and which weren’t. All in season times are treated the same
-The model doesn’t account for likely scratches. For example, its pick for 2nd in the 100 free, Michael Phelps, typically does enough to get on the 400 Free Relay and scratches before finals. If you want an adjusted percentage after a scratch, allocate the scratched swimmer’s chances proportionally among the remaining swimmers.
-There is some evidence that top 3 seeds are a bit more consistent than lower seeds. The model doesn’t account for this which may explain some of the relatively low percentages for swimmer who are thought of as locks in their events.
-I only used in season times ranked in the top 6000 since last September. If a top 24 swimmer put up a time outside the top 6000, it’s not accounted for by the model.
-There are 19 top 24 seeds without a time in the event in the last year. Some of them will probably scratch the event (Connor Jaeger 24th 200 IM). Only 2 are ranked in the top 8 (Michael Klueh 8th 1500. Jack Conger 6th 100 Back).
-If you’re looking at this to fill out the SwimSwam time prediction contest, it’s worth noting that this model’s predicted 1st place times are better than the predicted time of its top ranked swimmer. For example in the women’s 400 IM, the model has Maya DiRado 1st in 4:33.84. It thinks she has a 50% chance of going faster than that time and a 50% chance of going slower. She only has a 34% chance to get 1st place. That means at least 16% of the time when she’s faster than the predicted time, it still won’t be good enough to win. This effect pushes the predicted winning time faster than the predicted time of the top ranked swimmer.
-To get a 95% confidence interval for the expected times, take approximately +/-2% from the predicted time.
Dude this is really cool. I’ve always wanted to try something like this. Andrew, I have a few questions for you if you get the chance:
How did you go about compiling swimmers best times since September? I know that USA swimming has a database where you can look up individual’s times through out the season, but I’m guessing you didn’t manually look up each swimmer’s best time. Were you able to get access to the unfiltered original database and extract swimmers matching the top 24 in all events? Also what software did you use for the analysis?
In order to create the model, was the idea to throw all the relevant variables from the 2012 results sheet and… Read more »
FiveOs
8 years ago
Schmitty, 2nd place 200 free – 1:55.(sorry Missy, you’ll make the team on the 1 & 2 back)
Hannah Saiz 2:08
FiveOs
8 years ago
At finals, Ivy Martin, 2nd place 50 free.
ct swim fan
8 years ago
How is the 800 free and to a bit lesser extent the 400 free, not 100% for Ledecky winning. She could spot the field almost half a pool length in the 800 and still win. How much of a cushion would she need to get 100%?
“nothing in this universe is 100% certain except for death” and all that.
Attila the Hunt
8 years ago
Oh look people, here’s another who completely failed to read the article and went straight to the comment section!
bobo gigi
8 years ago
By the way, why everybody looks so obsessed with the times? Times don’t matter this summer. At trials like at olympic games. In 40 years you will not remember the winning times. You will remember (if you have a very good memory) who has qualified and you wuill especially remember who has won olympic medals. Only the places count. And I’ve much more fun to predict who will represent USA in Rio than predicting the winning times. It’s so hard to predict times at a qualifying meet. We don’t know who is fully tapered. We don’t know how swimmers will react in a final. It’s common to see “slow” times in finals. Swimmers can look at… Read more »
It’s incredibly naive to expect a hardcore fan of elite competitive swimming not to be obsessed with times.
Because, you know, in case anyone has forgotten, swimming is a time-measured sport.
But miracles do happen. There might be a flock of flying pigs in Paris as we speak. Or the Devil Instagrammed himself inside his frozen abode.
Bobo’s point about OT and Olympic times being irrelevant in the long term is exactly correct. For the old boys on this site, I’m sure you remember the 1-2 by Dolan and Vendt in Syndey. Anybody remember their times off the top of their head? If that’s too obscure, I’d love to hear one of the “obsessed fans” list the 4 splits on the Beijing 4FRR.
In the short term, swimming is measured in times, but long-term, it’s measured in gold medals.
EVERYONE KNOWS that the most important thing is getting into the team and then win some hardware in Rio, preferably gold.
You seriously think we don’t know that?
Hint: That is NOT the point. And the most clueless one is perhaps the one who keeps bring it in their every other comment.
Time matters for many of us – SORRY . If Adrian pops up a 47.7 in final , i would definitely enjoy it fully and feel ” Hey he might seriously challenge Mc Evoy in 5 weeks time “
If Phelps wins the gold in rio with a 50.8 for 100 fly or 1:53.6 in 200 fly I would be disappointed. And Mark Spitz went 1:52.9 in 200 free, we remember that. Not as much as the medal I agree. But the sport has been taken so much further.
Just drown yourself in the splendor of these magnificent times throughout the years :
2:05.96
2:06.62
4:03.85
And more recent ones such as 8:06.68 or 24.43
I don’t know any other hardcore swim fans who are not mesmerized watching a 2:04.06, instead of just shrugged it off, oh as long as she wins gold, I don’t care for times.
I pity such swim fans who couldn’t enjoy the majestic 47.04 or 1:52.09
Joe
8 years ago
The predicted values seem to be median values – so 50% of the time the athlete will go faster and 50% of the time they’ll go slower. That means you’ll still see lots of PBs for trials since the predicted values are usually near PBs – roughly 50% of the time, which is probably about par for the course for the top swimmers at OTs (of course among all OT swimmers the PB rate is much lower)
Finfan
8 years ago
Best article evah! We might as well award the spots and while we’re at it run the models for Rio too! This is great fodder for all us swim geeks, but it’s a colossal was of time. Scrubbing the psych sheet is still the best way to prognosticate meet results. Guess what? The computer loses in the end!
This is a MODEL. It was explained in detail. It is neither right nor wrong. It is a math exercise. It is not a person’s ‘guess’ and it was not described as such. I find it interesting, based on the parameters used to create the model. I’m just a swim geek waiting around excited to see how these trials turn out, and I’m grabbing anything I can to enjoy what I’m looking forward to. What I rarely find interesting is some bald prognosticator who “definitely knows” how something will turn out. These are HUMANS swimming races, with good days, bad days, missed tapers, missed turns, a surprise flu bug, a nagging shoulder injury. For all of you who are so… Read more »
Braden Keith is the Editor-in-Chief and a co-founder/co-owner of SwimSwam.com.
He first got his feet wet by building The Swimmers' Circle beginning in January 2010, and now comes to SwimSwam to use that experience and help build a new leader in the sport of swimming.
Aside from his life on the InterWet, …
Dude this is really cool. I’ve always wanted to try something like this. Andrew, I have a few questions for you if you get the chance:
How did you go about compiling swimmers best times since September? I know that USA swimming has a database where you can look up individual’s times through out the season, but I’m guessing you didn’t manually look up each swimmer’s best time. Were you able to get access to the unfiltered original database and extract swimmers matching the top 24 in all events? Also what software did you use for the analysis?
In order to create the model, was the idea to throw all the relevant variables from the 2012 results sheet and… Read more »
Schmitty, 2nd place 200 free – 1:55.(sorry Missy, you’ll make the team on the 1 & 2 back)
Hannah Saiz 2:08
At finals, Ivy Martin, 2nd place 50 free.
How is the 800 free and to a bit lesser extent the 400 free, not 100% for Ledecky winning. She could spot the field almost half a pool length in the 800 and still win. How much of a cushion would she need to get 100%?
Ledecky could be the only swimmer entered in the event and still wouldn’t have a 100% chance of winning
Time of the year. If my prediction since 2 years ago will come true. How Cal/Terri ruined Franklin.
“nothing in this universe is 100% certain except for death” and all that.
Oh look people, here’s another who completely failed to read the article and went straight to the comment section!
By the way, why everybody looks so obsessed with the times?
Times don’t matter this summer. At trials like at olympic games.
In 40 years you will not remember the winning times. You will remember (if you have a very good memory) who has qualified and you wuill especially remember who has won olympic medals.
Only the places count.
And I’ve much more fun to predict who will represent USA in Rio than predicting the winning times. It’s so hard to predict times at a qualifying meet. We don’t know who is fully tapered. We don’t know how swimmers will react in a final. It’s common to see “slow” times in finals. Swimmers can look at… Read more »
You are not obsessed with times?
I just saw a flock of pigs flying over Eiffel Tower.
It’s incredibly naive to expect a hardcore fan of elite competitive swimming not to be obsessed with times.
Because, you know, in case anyone has forgotten, swimming is a time-measured sport.
But miracles do happen. There might be a flock of flying pigs in Paris as we speak. Or the Devil Instagrammed himself inside his frozen abode.
Bobo’s point about OT and Olympic times being irrelevant in the long term is exactly correct. For the old boys on this site, I’m sure you remember the 1-2 by Dolan and Vendt in Syndey. Anybody remember their times off the top of their head? If that’s too obscure, I’d love to hear one of the “obsessed fans” list the 4 splits on the Beijing 4FRR.
In the short term, swimming is measured in times, but long-term, it’s measured in gold medals.
EVERYONE KNOWS that the most important thing is getting into the team and then win some hardware in Rio, preferably gold.
You seriously think we don’t know that?
Hint: That is NOT the point. And the most clueless one is perhaps the one who keeps bring it in their every other comment.
Time matters for many of us – SORRY . If Adrian pops up a 47.7 in final , i would definitely enjoy it fully and feel ” Hey he might seriously challenge Mc Evoy in 5 weeks time “
If Adrian pops up a very fast 47.7, someone is not going to be very happy because it means Adrian is going to swim slower in Rio.
If Phelps wins the gold in rio with a 50.8 for 100 fly or 1:53.6 in 200 fly I would be disappointed. And Mark Spitz went 1:52.9 in 200 free, we remember that. Not as much as the medal I agree. But the sport has been taken so much further.
Just drown yourself in the splendor of these magnificent times throughout the years :
2:05.96
2:06.62
4:03.85
And more recent ones such as 8:06.68 or 24.43
I don’t know any other hardcore swim fans who are not mesmerized watching a 2:04.06, instead of just shrugged it off, oh as long as she wins gold, I don’t care for times.
I pity such swim fans who couldn’t enjoy the majestic 47.04 or 1:52.09
The predicted values seem to be median values – so 50% of the time the athlete will go faster and 50% of the time they’ll go slower. That means you’ll still see lots of PBs for trials since the predicted values are usually near PBs – roughly 50% of the time, which is probably about par for the course for the top swimmers at OTs (of course among all OT swimmers the PB rate is much lower)
Best article evah! We might as well award the spots and while we’re at it run the models for Rio too! This is great fodder for all us swim geeks, but it’s a colossal was of time. Scrubbing the psych sheet is still the best way to prognosticate meet results. Guess what? The computer loses in the end!
This is a MODEL. It was explained in detail. It is neither right nor wrong. It is a math exercise. It is not a person’s ‘guess’ and it was not described as such. I find it interesting, based on the parameters used to create the model. I’m just a swim geek waiting around excited to see how these trials turn out, and I’m grabbing anything I can to enjoy what I’m looking forward to. What I rarely find interesting is some bald prognosticator who “definitely knows” how something will turn out. These are HUMANS swimming races, with good days, bad days, missed tapers, missed turns, a surprise flu bug, a nagging shoulder injury. For all of you who are so… Read more »
Couldn’t have said better myself. Actually I think I did just shorter.