Fastest 100m sprint times
data cleaning
EDA
split/apply/combine
Explore the 1,000 fastest times ever recorded for the 100m sprint, in men’s and women’s track, cleaning the data and conducting EDA.
Motivation
This data comes from a compilation of official times for runners in 100m sprint events around the world.
Data
There are two files, one for men and one for women. Each contains the top 1,000 times (as of 2021) for men or women in the 100m sprint. Each row represents one time.
Both data files are in tab-separated format.
Data preview
sprint_m.txt
sprint_w.txt
Variable descriptions
Variable | Description |
---|---|
Rank | Ranking of the time |
Time | Number of seconds to complete the 100m sprint. Times have an “A” suffix if the event was at an altitude greater than 1,000 meters above sea level (e.g. “10.36A”). |
Wind | Windspeed in meters per second. Positive number is a tailwind. |
Name | Name of the sprinter. |
Country | Country the sprinter represented. |
Birthdate | When the sprinter was born (format DD.MM.YY) |
City | Where the race took place. |
Date | When the race took place (format DD.MM.YYYY) |
The format is the same in both data files.
Questions
This dataset provides many opportunities for data cleaning, wrangling, and EDA. For instance, in data cleaning:
- The
Time
column contains a suffix to indicate if the event was at altitude. Create a newAltitude
column that is true if the event was at altitude, and adjustTime
to contain only the time as a number. - Some of the text columns contain special characters encoded as HTML entities. For instance, ü is written
ü
. Use a package that can decode these to convert the text back to a readable format. - Convert the
Birthdate
andDate
columns to your programming language’s date objects (such asdatetime.date
in Python orDate
objects in R). How do you handle two-digit years and determine which century they occurred in?
For data wrangling and split-apply-combine operations:
- Find the fastest time achieved by runners of each country.
- Find the fastest time recorded each year. Is this increasing or decreasing over time?
- Which sprinters have the most times in the top 1,000?
For EDA:
- Is the race time correlated with the wind speed? Do tailwinds make runners faster?
- Do certain cities have faster average times than others?
References
Data scraped from Peter Larsson’s Track and Field all-time Performances Homepage:
Copyright held by Peter Larsson, email: kl78vc
at alltime-athletics.com
. Approval to redistribute was granted in September 2024.