Fastest 100m sprint times

data cleaning
EDA
split/apply/combine
Explore the 1,000 fastest times ever recorded for the 100m sprint, in men’s and women’s track, cleaning the data and conducting EDA.
Author

Will Townes

Published

August 30, 2024

Data files
Data year

2021

Motivation

This data comes from a compilation of official times for runners in 100m sprint events around the world.

Data

There are two files, one for men and one for women. Each contains the top 1,000 times (as of 2021) for men or women in the 100m sprint. Each row represents one time.

Both data files are in tab-separated format.

Data preview

sprint_m.txt

sprint_w.txt

Variable descriptions

Variable Description
Rank Ranking of the time
Time Number of seconds to complete the 100m sprint. Times have an “A” suffix if the event was at an altitude greater than 1,000 meters above sea level (e.g. “10.36A”).
Wind Windspeed in meters per second. Positive number is a tailwind.
Name Name of the sprinter.
Country Country the sprinter represented.
Birthdate When the sprinter was born (format DD.MM.YY)
City Where the race took place.
Date When the race took place (format DD.MM.YYYY)

The format is the same in both data files.

Questions

This dataset provides many opportunities for data cleaning, wrangling, and EDA. For instance, in data cleaning:

  1. The Time column contains a suffix to indicate if the event was at altitude. Create a new Altitude column that is true if the event was at altitude, and adjust Time to contain only the time as a number.
  2. Some of the text columns contain special characters encoded as HTML entities. For instance, ü is written ü. Use a package that can decode these to convert the text back to a readable format.
  3. Convert the Birthdate and Date columns to your programming language’s date objects (such as datetime.date in Python or Date objects in R). How do you handle two-digit years and determine which century they occurred in?

For data wrangling and split-apply-combine operations:

  1. Find the fastest time achieved by runners of each country.
  2. Find the fastest time recorded each year. Is this increasing or decreasing over time?
  3. Which sprinters have the most times in the top 1,000?

For EDA:

  1. Is the race time correlated with the wind speed? Do tailwinds make runners faster?
  2. Do certain cities have faster average times than others?

References

Data scraped from Peter Larsson’s Track and Field all-time Performances Homepage:

Copyright held by Peter Larsson, email: kl78vc at alltime-athletics.com. Approval to redistribute was granted in September 2024.