CMU S&DS Data Repository
The Data Repository curates interesting datasets for use in statistics and data science education. Each dataset is supported by a story describing its origin and application, and a set of interesting questions that can be answered using the data. This means:
- Every dataset has context in a scientific field, pop culture, or daily life.
- Beyond context, datasets are interesting. They feature more than just a dozen observations from an antiquated scientific study — many feature thousands of observations of dozens of variables, and answer questions interesting to a wide audience.
- Just like in science, some datasets give null results.
- Instructors can easily build lessons and assignments from the suggested questions.
Datasets are organized by broad subject areas on the left, or you can browse a sortable list of all datasets.