Traffic stops in Connecticut
Motivation
In the United States, police officers routinely make traffic stops for a variety of reasons, like speeding, reckless driving, or use of cell phones while driving. They’re one of the main ways ordinary people interact with police, and while most traffic stops are completely routine, others are more serious, leading to searches, arrests, and sometimes violence.
Because traffic stops are so common, they’re also widely scrutinized. Some allege that officers are racially biased in who they stop, which vehicles they choose to search, or who they decide to give tickets instead of warnings to. Until recent years, detailed data was not available to help answer these allegations, but new initiatives have begun to collect extensive data from police departments across many states.
Data
This dataset records all traffic stops made by the Connecticut State Police (not the police departments of individual towns and cities) from October 2013 to March 2015, amounting to nearly 320,000 traffic stops. Each row is one stop.
The data summarizes each stop, including driver demographics, the reason for the stop, and whether a search was conducted or contraband was found.
Data preview
connecticut-stops.csv.gz
Variable descriptions
Variable | Description |
---|---|
id | A unique ID assigned to each stop. |
state | The state in which each stop occurred (always CT, for Connecticut, in this data) |
stop_date | Date of the traffic stop (YYYY-MM-DD) |
stop_time | Time the stop occurred (24-hour time, HH:MM) |
location_raw | Location name from original records, used to identify the county |
county_name | County in which the stop occurred |
county_fips | The unique FIPS ID code for the county |
fine_grained_location | A more detailed location, not in a standardized format |
police_department | Name of the police department making the stop |
driver_gender | Gender of the driver, as recorded by the officer making the stop |
driver_age_raw | Driver’s age from the original dataset. Not always in a consistent format |
driver_age | Driver’s age; NA if under 15 or greater than 99 |
driver_race_raw | The driver’s race, as recorded by the officer. Not in a standard format. |
driver_race | The driver’s race, standardized to these options: White, Black, Hispanic, Asian, Other, and NA (unknown). Asian includes Asian, Pacific Islander, and Indian. Native Americans are listed as Other, and anyone with Hispanic ethnicity is listed as Hispanic. |
violation_raw | The violation for which the driver was stopped, from the original data |
violation | The violation broken into standard categories, like Speeding, Lights, or Moving violation. Some stops involve multiple violations, separated by commas. |
search_conducted | TRUE if the officer searched the vehicle; FALSE otherwise. |
search_type_raw | Justification for the search, if conducted, as written by the police agency |
search_type | Justification for the search broken down into consistent categories like Consent (driver gave consent). Many are listed as Other with no reason given |
contraband_found | Whether the search found contraband (like drugs). TRUE or FALSE, and FALSE if no search was performed. |
stop_outcome | The result of the stop: arrest, ticket, warning, and so on. |
is_arrested | Whether an arrest was made (TRUE or FALSE). |
officer_id | A unique ID for the officer making the stop. |
stop_duration | Duration of the stop, broken into categories (1-5 min, 16-30 min, and so on). |
Questions
Conduct an exploratory analysis of the data. What stop types are most common? Is there variation by race or gender? What types of stop are most likely to lead to a ticket?
One way to look for racial discrimination in the decision to search a vehicle is called the “outcome test”. If officers do not discriminate, and instead use only objective evidence to decide a car should be searched for drugs, then the rate at which they find drugs should be the same across racial groups. But if officers unfairly target a racial group for searches, they may conduct more searches of drivers in that group without finding any drugs.
Build a model to predict whether contraband is found after a search is conducted, using race and other appropriate covariates. Does the outcome test appear to show discrimination? Carefully explain the caveats of your results, discussing confounding variables and mentioning other types of data you would like to have.
Is there any evidence that traffic stops are most common certain times of year or on certain days of the week? Before conducting your analysis, make some hypotheses about what you expect to find. (Do you think officers would be less likely to make traffic stops in the cold winter months, for example?)
References
Data collected and cleaned by the Stanford Open Policing Project, released under the Open Data Commons Attribution License. Code and data descriptions available on GitHub. The Open Policing Project has released similar data files for many other states.
For a rigorous analysis of this data, see E. Pierson, C. Simoiu, J. Overgoor, S. Corbett-Davies, D. Jenson, A. Shoemaker, V. Ramachandran, P. Barghouty, C. Phillips, R. Shroff, and S. Goel. (2020) “A large-scale analysis of racial disparities in police stops across the United States”. Nature Human Behavior 4, 736-745. https://doi.org/10.1038/s41562-020-0858-1