Mapping Police Violence
Motivation
Race and policing are some of the most inescapable topics in today’s America. There are large political arguments over the proper role of police, the effect of their tactics on historically disadvantaged groups, and their perceived biases. These controversies are amplified by a number of police killings of unarmed people, particularly African Americans, which have led to massive protests over the excessive use of force and sometimes to the prosecution of the officers involved.
There is no central government database recording all police uses of force, or even all killings by police officers. Several journalistic and advocacy groups have attempted to fill the gaps by compiling government data and news reporting to build large datasets including police killings and the circumstances surrounding them.
This dataset, Mapping Police Violence, is compiled by the advocacy group Campaign Zero. It builds on other databases, such as one compiled by the Washington Post, to include information from news reporting and other sources to provide additional context about each death. It is the most comprehensive dataset detailing all kinds of police violence incidents and helps us understand the scale of police violence and the differences across race/ethnicity and different geographical locations.
We can also combine this with the 2020 Census data on the county level to understand the prevalence of police violence and racial/ethnic differences in prevalence in different counties. The methodology is last updated on Oct 3, 2022, and the dataset is downloaded on Aug 7, 2023.
Data
Each row represents one death in an incident of police violence. There are 11,861 deaths recorded from 2012 to 2023.
Mapping Police Violence include detailed descriptions of each incident, including date and victim’s age, race, gender, and an image. It also includes more details about the incident including the location to the street address, the cause of death (can be multiple), circumstances, agency responsible, initial reasons, officer names and history, and so on.
The Census data provides 2020 demographic information for each county. It can be joined with the police violence data using the FIPS
column, which uniquely identifies counties, though some incidents of violence do not have a county recorded or the county could not be matched to a FIPS code.
Not all variables are available for all incidents. For older incidents, some data around prosecution (and its result) for the officer could be available.
Data preview
mapping-police-violence.csv.gz
county-census-2020.csv.gz
Variable Descriptions
The dataset codebook from Mapping Police Violence and the U.S. Census Bureau documentation contain the full list of variables. This is an extract of a few useful variables from both tables. Note that many variables can have multiple values, separated by commas, in the Mapping Police Violence dataset.
Mapping Police Violence
Variable | Description |
---|---|
name | Name of victim killed by law enforcement |
age | Age of victim at time of death |
gender | Victim’s indicated gender by news/official reports |
race | Race of victim according to news/official reports |
date | Date of lethal action on victim, in MM/DD/YYYY format |
city | City where incident took place |
state | State where incident took place |
county | County where incident took place |
FIPS | FIPS code identifying the county where the incident took place. Not available when county is not given, if county is spelled incorrectly, or in most incidents in Connecticut and Alaska, which have unusual county systems |
agency_responsible | Agency of law enforcement officer who took lethal action and responsible for killing |
cause_of_death | Cause of death of the victim, such as gunshot, Taser, vehicle, or drowning. Multiple reasons can be separated by commas |
signs_of_mental_illness | Indicates whether official/news reports mentioned any signs of mental illness or mental health crises: Yes/No/Unknown/Drug or Alcohol Use |
allegedly_armed | Indicates whether victim was armed according to news/official sources: Allegedly Armed/Unarmed/Unclear/Vehicle |
off_duty_killing | Indicates whether officer was working in an official capacity at the time lethal action was taken |
encounter_type | Initial reported reason(s) for police to be on the scene prior to using deadly force |
officer_names | Name of officer responsible for highest level of force |
officer_races | Race of officer responsible for highest level of force |
officer_known_past_shootings | Indicates if officer has had a previous incident of police violence in the database |
Census County Resident Population Estimates by Sex, Race, and Hispanic Origin
Census estimates for April 1, 2020, for all age groups.
Variable | Description |
---|---|
STNAME | State Name |
CTYNAME | County Name |
FIPS | FIPS code identifying the county |
TOT_POP | Total population |
NHWA_MALE/NHWA_FEMALE | Not Hispanic, White alone male/female population |
NHBA_MALE/NHBA_FEMALE | Not Hispanic, Black or African American alone male/female population |
NHIA_MALE/NHIA_FEMALE | Not Hispanic, American Indian and Alaska Native alone male/female population |
NHAA_MALE/NHAA_FEMALE | Not Hispanic, Asian alone male/female population |
NHNA_MALE/NHNA_FEMALE | Not Hispanic, Native Hawaiian and Other Pacific Islander alone male/female population |
NHTOM_MALE/NHTOM_FEMALE | Not Hispanic, Two or More Races male/female population |
H_MALE/H_FEMALE | Hispanic male/female population |
Questions
- Do some EDA. What’s the age and gender distribution of victims? What are the most common encounter types in the dataset? What are the most common causes of death? Explore these questions, and others that seem interesting in the data, with charts and graphs.
- One main question we would like to explore is that whether the demographic composition of the locality affects the occurrences of police violence. We can use the Census data to provide information on 7 groups (non-Hispanic White, non-Hispanic Black, non-Hispanic Asian, non-Hispanic American Indian/Alaskan Native, non-Hispanic Native Hawaiian or other Pacific Islanders, non-Hispanic two or more races, and Hispanic) in each county. Build a model predicting the count of police violence in each county in the dataset based on their racial/ethnic minority percentage and other factors you think might be relevant. What does the result suggest?
- Another argument people often make is that officers responsible for police violence against minorities often bear no consequences for their actions. Based on this data, build a logistic model to predict the likelihood of the police officer(s) responsible being charged with all factors you think might be relevant. What does the result suggest?
- Some people suggest that signs of mental health issues and/or drug use might also affect how police respond to a person, say we categorize any mentioning of a gunshot in the cause of death as an indicator for a higher level of force used. Does the data seem to support this claim?
- Similarly, does the perception of the individual being armed appear to influence the extent of force employed by the police?
References
Police violence data aggregated by Mapping Police Violence. Data retrieved on August 7, 2023.
Census data from the US Census Bureau’s County Population by Characteristics data. Technical documentation.