Traffic stops in Connecticut

classification
logistic regression
Records for every traffic stop made by the Connecticut State Police over several years, including the reason for the stop and demographic details. Use classifiers to study whether stop and search decisions show signs of bias.
Author

Alex Reinhart

Published

February 9, 2019

Data files
Data year

2015

Motivation

In the United States, police officers routinely make traffic stops for a variety of reasons, like speeding, reckless driving, or use of cell phones while driving. They’re one of the main ways ordinary people interact with police, and while most traffic stops are completely routine, others are more serious, leading to searches, arrests, and sometimes violence.

Because traffic stops are so common, they’re also widely scrutinized. Some allege that officers are racially biased in who they stop, which vehicles they choose to search, or who they decide to give tickets instead of warnings to. Until recent years, detailed data was not available to help answer these allegations, but new initiatives have begun to collect extensive data from police departments across many states.

Data

This dataset records all traffic stops made by the Connecticut State Police (not the police departments of individual towns and cities) from October 2013 to March 2015, amounting to nearly 320,000 traffic stops. Each row is one stop.

The data summarizes each stop, including driver demographics, the reason for the stop, and whether a search was conducted or contraband was found.

Data preview

connecticut-stops.csv.gz

Variable descriptions

Variable Description
id A unique ID assigned to each stop.
state The state in which each stop occurred (always CT, for Connecticut, in this data)
stop_date Date of the traffic stop (YYYY-MM-DD)
stop_time Time the stop occurred (24-hour time, HH:MM)
location_raw Location name from original records, used to identify the county
county_name County in which the stop occurred
county_fips The unique FIPS ID code for the county
fine_grained_location A more detailed location, not in a standardized format
police_department Name of the police department making the stop
driver_gender Gender of the driver, as recorded by the officer making the stop
driver_age_raw Driver’s age from the original dataset. Not always in a consistent format
driver_age Driver’s age; NA if under 15 or greater than 99
driver_race_raw The driver’s race, as recorded by the officer. Not in a standard format.
driver_race The driver’s race, standardized to these options: White, Black, Hispanic, Asian, Other, and NA (unknown). Asian includes Asian, Pacific Islander, and Indian. Native Americans are listed as Other, and anyone with Hispanic ethnicity is listed as Hispanic.
violation_raw The violation for which the driver was stopped, from the original data
violation The violation broken into standard categories, like Speeding, Lights, or Moving violation. Some stops involve multiple violations, separated by commas.
search_conducted TRUE if the officer searched the vehicle; FALSE otherwise.
search_type_raw Justification for the search, if conducted, as written by the police agency
search_type Justification for the search broken down into consistent categories like Consent (driver gave consent). Many are listed as Other with no reason given
contraband_found Whether the search found contraband (like drugs). TRUE or FALSE, and FALSE if no search was performed.
stop_outcome The result of the stop: arrest, ticket, warning, and so on.
is_arrested Whether an arrest was made (TRUE or FALSE).
officer_id A unique ID for the officer making the stop.
stop_duration Duration of the stop, broken into categories (1-5 min, 16-30 min, and so on).

Questions

  1. Conduct an exploratory analysis of the data. What stop types are most common? Is there variation by race or gender? What types of stop are most likely to lead to a ticket?

  2. One way to look for racial discrimination in the decision to search a vehicle is called the “outcome test”. If officers do not discriminate, and instead use only objective evidence to decide a car should be searched for drugs, then the rate at which they find drugs should be the same across racial groups. But if officers unfairly target a racial group for searches, they may conduct more searches of drivers in that group without finding any drugs.

    Build a model to predict whether contraband is found after a search is conducted, using race and other appropriate covariates. Does the outcome test appear to show discrimination? Carefully explain the caveats of your results, discussing confounding variables and mentioning other types of data you would like to have.

  3. Is there any evidence that traffic stops are most common certain times of year or on certain days of the week? Before conducting your analysis, make some hypotheses about what you expect to find. (Do you think officers would be less likely to make traffic stops in the cold winter months, for example?)

References

Data collected and cleaned by the Stanford Open Policing Project, released under the Open Data Commons Attribution License. Code and data descriptions available on GitHub. The Open Policing Project has released similar data files for many other states.

For a rigorous analysis of this data, see E. Pierson, C. Simoiu, J. Overgoor, S. Corbett-Davies, D. Jenson, A. Shoemaker, V. Ramachandran, P. Barghouty, C. Phillips, R. Shroff, and S. Goel. (2020) “A large-scale analysis of racial disparities in police stops across the United States”. Nature Human Behavior 4, 736-745. https://doi.org/10.1038/s41562-020-0858-1