Traffic stops in Connecticut

classification

logistic regression

Records for every traffic stop made by the Connecticut State Police over several years, including the reason for the stop and demographic details. Use classifiers to study whether stop and search decisions show signs of bias.

Author

Alex Reinhart

Published

February 9, 2019

Data files

• connecticut-stops.csv.gz

Data year

2015

Motivation

In the United States, police officers routinely make traffic stops for a variety of reasons, like speeding, reckless driving, or use of cell phones while driving. They’re one of the main ways ordinary people interact with police, and while most traffic stops are completely routine, others are more serious, leading to searches, arrests, and sometimes violence.

Because traffic stops are so common, they’re also widely scrutinized. Some allege that officers are racially biased in who they stop, which vehicles they choose to search, or who they decide to give tickets instead of warnings to. Until recent years, detailed data was not available to help answer these allegations, but new initiatives have begun to collect extensive data from police departments across many states.

Data

This dataset records all traffic stops made by the Connecticut State Police (not the police departments of individual towns and cities) from October 2013 to March 2015, amounting to nearly 320,000 traffic stops. Each row is one stop.

The data summarizes each stop, including driver demographics, the reason for the stop, and whether a search was conducted or contraband was found.

Data preview

connecticut-stops.csv.gz

Variable descriptions

Variable	Description
id	A unique ID assigned to each stop.
state	The state in which each stop occurred (always CT, for Connecticut, in this data)
stop_date	Date of the traffic stop (YYYY-MM-DD)
stop_time	Time the stop occurred (24-hour time, HH:MM)
location_raw	Location name from original records, used to identify the county
county_name	County in which the stop occurred
county_fips	The unique FIPS ID code for the county
fine_grained_location	A more detailed location, not in a standardized format
police_department	Name of the police department making the stop
driver_gender	Gender of the driver, as recorded by the officer making the stop
driver_age_raw	Driver’s age from the original dataset. Not always in a consistent format
driver_age	Driver’s age; NA if under 15 or greater than 99
driver_race_raw	The driver’s race, as recorded by the officer. Not in a standard format.
driver_race	The driver’s race, standardized to these options: White, Black, Hispanic, Asian, Other, and NA (unknown). Asian includes Asian, Pacific Islander, and Indian. Native Americans are listed as Other, and anyone with Hispanic ethnicity is listed as Hispanic.
violation_raw	The violation for which the driver was stopped, from the original data
violation	The violation broken into standard categories, like Speeding, Lights, or Moving violation. Some stops involve multiple violations, separated by commas.
search_conducted	TRUE if the officer searched the vehicle; FALSE otherwise.
search_type_raw	Justification for the search, if conducted, as written by the police agency
search_type	Justification for the search broken down into consistent categories like Consent (driver gave consent). Many are listed as Other with no reason given
contraband_found	Whether the search found contraband (like drugs). TRUE or FALSE, and FALSE if no search was performed.
stop_outcome	The result of the stop: arrest, ticket, warning, and so on.
is_arrested	Whether an arrest was made (TRUE or FALSE).
officer_id	A unique ID for the officer making the stop.
stop_duration	Duration of the stop, broken into categories (1-5 min, 16-30 min, and so on).

Questions

Conduct an exploratory analysis of the data. What stop types are most common? Is there variation by race or gender? What types of stop are most likely to lead to a ticket?
One way to look for racial discrimination in the decision to search a vehicle is called the “outcome test”. If officers do not discriminate, and instead use only objective evidence to decide a car should be searched for drugs, then the rate at which they find drugs should be the same across racial groups. But if officers unfairly target a racial group for searches, they may conduct more searches of drivers in that group without finding any drugs.

Build a model to predict whether contraband is found after a search is conducted, using race and other appropriate covariates. Does the outcome test appear to show discrimination? Carefully explain the caveats of your results, discussing confounding variables and mentioning other types of data you would like to have.
Is there any evidence that traffic stops are most common certain times of year or on certain days of the week? Before conducting your analysis, make some hypotheses about what you expect to find. (Do you think officers would be less likely to make traffic stops in the cold winter months, for example?)

References

Data collected and cleaned by the Stanford Open Policing Project, released under the Open Data Commons Attribution License. Code and data descriptions available on GitHub. The Open Policing Project has released similar data files for many other states.

For a rigorous analysis of this data, see E. Pierson, C. Simoiu, J. Overgoor, S. Corbett-Davies, D. Jenson, A. Shoemaker, V. Ramachandran, P. Barghouty, C. Phillips, R. Shroff, and S. Goel. (2020) “A large-scale analysis of racial disparities in police stops across the United States”. Nature Human Behavior 4, 736-745. https://doi.org/10.1038/s41562-020-0858-1