https://icasas101.github.io/FinalDSTutorial/ <<<<< This is the link to our website!
By Josh Kellner and Isabella Casas
CMPS 3160 - Introduction to Data Science - Professor Mattei
As technology and techniques regarding data science have developed, data and the interpretation of it have been more and more important and frequently used in the making of important decisions as well as the legitimization of certain ideas. With the information and strategies we have learned in this class and tools like Pandas, we are able to interpret data on a very large scale such that these decisions can be made confidently. As we looked through dataframes and considered which ones we would be able to use to create a project that is of value to New Orleans, we found that the NOPD's Stop and Search Field Interviews database had a lot of potential.
The issue of discrimination against minorities, particularly black people, by police in our country has, of course, been as frequently discussed as ever over the last few years due to specific incidents such as the cases regarding George Floyd, Breonna Taylor, Rayshard Brooks, and a host of others as well as the widespread abundance of protests against it all over the country. This is by no means a new conversation but right now, the country seems to be going through a period of racial reckoning in part due to social media's ability to spread information and set trends so quickly. There have been multiple large protests throughout the city regarding discrimination by police and plenty of relevant work done by organizations such as The New Orleans Peoples' Assembly. We believe that with the existing momentum, relevant data and a good interpretation of it is as important, valuable, and likely to spark more change as ever. Any social or political movement is undoubtedly more effective with clear, hard evidence of the injustices that the movement hopes to bring an end to. Certain statistics such as the fact that Black male offenders receive a sentence that is, on average, 19.1% longer than similarly situated White offenders have identified the manifestations of biases in such a clear way that they are undeniable.
Our main question is "Are police more likely to question, search, and/or take more severe actions against people of color?" In an ideal situation, any unfair biases that are discovered by this project will be used to clearly reveal a need for new policy that would correct these unethical discriminatory actions and prompt city officials to put that policy in place.
Link to dataset: https://data.nola.gov/Public-Safety-and-Preparedness/Stop-and-Search-Field-Interviews-/kitu-f4uy/data
The Data Center data can be found in our repository.
For our Final Tutorial, we have partnered up to analyze a dataset called “Stop and Search (Field Interviews).” It is filled with data regarding instances of people being questioned by the New Orleans Police Department. Some of the information about these interviews includes when and where it happened, the officer conducting the questioning and potential search, a description of the individual being searched including age, gender, race, height and weight, the reason the interview was conducted, actions taken, etc. We plan to analyze this information in such a way that one can use our analysis to learn about any biases that NOPD has, or a lack thereof, and how these biases manifest themselves. We expect to specifically look at relationships between frequencies of interviews and searches and descriptors of the subjects of these interviews and searches as well as the relationships between the severity of the actions taken by the police and the descriptors of the subjects. The dataset provides information about the car that the subject was driving, if they were driving one, which will be another variable that can shed light on biases.
In addition to that, we used this dataset: https://portal-nolagis.opendata.arcgis.com/datasets/140759858aa14bb6a5a2fe099ccf4c07_0/data?geometry=-90.926%2C29.813%2C-88.840%2C30.229 to create our map of New Orleans.
We also used datasets provided to us by The Data Center that contained data about the demographics of New Orleans. Since we had to request these datasets by email, there is no link, but they can be downloaded at the top of this page.
In terms of a collaboration plan, we have a Github repository set up to keep track of our most up to date work as well as each update. Every two weeks we plan to meet on Zoom to divide specific chunks of work to be done. In these meetings we will review the work we have done since the last meeting and work through things that we couldn’t complete individually, together.
Our first step was to import our necessary libaries and then download the data files.
import pandas as pd
import numpy as np
import geopandas as gpd
!head ../FinalDSTutorial/Stop_and_Search__Field_Interviews_.csv
!head ../FinalDSTutorial/Neighborhood profiles data tables.xlsx
!head ../FinalDSTutorial/NOPD_Police_Zones.shp
!head ../FinalDSTutorial/NOPD_Police_Zones.csv
The first dataset we're loading comes from one of The Data Center's spreadsheets, and will contain information regarding the genders of people living in New Orleans from 2000 and 2014-2018. The "blank" column comes from a spacer column in the original spreadsheet.
gender_df = pd.read_excel("../FinalDSTutorial/Neighborhood profiles data tables.xlsx",
sheet_name='Table 2',
names=['Location', 'Female2000', 'Female2014-2018', 'FemaleMOE', 'Blank', 'Male2000', 'Male2014-2018', 'MaleMOE'],
header=None)
gender_df = gender_df.dropna(how='all')
# easier to drop these now rather than later
gender_df = gender_df.drop([6, 8, 10, 11])
gender_df = gender_df.set_index('Location')
# must standardize missing data
gender_df = gender_df.replace('NaN%', np.nan)
# these columns did not originally have the correct type
gender_df = gender_df.astype({'Female2000': 'float64', 'Female2014-2018': 'float64', 'FemaleMOE': 'float64'})
display(gender_df)
# checking to ensure correct types are there
display(gender_df.dtypes)
Also coming from The Data Center, this dataset contains information regarding the racial makeup of each neighborhood of New Orleans.
race_df = pd.read_excel("../FinalDSTutorial/Neighborhood profiles data tables.xlsx",
sheet_name='Table 4',
names=['Location', 'Black2000', 'Black2014-2018', 'BlackMOE', 'Blank1', 'White2000', 'White2014-2018', 'WhiteMOE', 'Blank2', 'Asian2000', 'Asian2014-2018', 'AsianMOE', 'Blank3', 'AmerIndian2000', 'AmerIndian2014-2018', 'AmerIndianMOE', 'Blank4', 'Biracial2000', 'Biracial2014-2018', 'BiracialMOE', 'Blank5', 'Hispanic2000', 'Hispanic 2013-2017', 'HispanicMOE', 'Blank6', 'Other2000', 'Other2013-2017', 'OtherMOE'],
header=None)
race_df = race_df.dropna(how='all')
race_df = race_df.drop([6, 8, 10, 11])
# it was easier to drop these columns now rather than later
race_df = race_df.drop(columns=['Blank1', 'Blank2', 'Blank3', 'Blank4', 'Blank5', 'Blank6'])
race_df = race_df.set_index('Location')
# standardize missing data
race_df = race_df.replace('NaN%', np.nan)
display(race_df)
# displaying types for quality control. For some reason these all were identified correctly the first time.
display(race_df.dtypes)
This dataset is the "big one" from the NOPD, and includes information regarding all individuals who were stopped by them. As you can see, there is a ton of missing information and several unnecessary columns for this project, so we will have a lot of cleaning up to do.
df = pd.read_csv("../FinalDSTutorial/Stop_and_Search__Field_Interviews_.csv", dtype={'FieldInterviewID': int})
display(df.head())
display(df.dtypes)
Finally, this dataset contains information about the NOPD zones. This is so we can analyze stops based on location later.
# set the filepath and load in shapefile
map_df = gpd.read_file("../FinalDSTutorial/NOPD_Police_Zones.shp")
# drop unnecessary column
map_df = map_df.drop(columns=['District'])
# check data types
display(map_df.head())
# check map display
map_df.plot()
Now we must also add the file that goes along with this map.
zonesDf = pd.read_csv("../FinalDSTutorial/NOPD_Police_Zones.csv")
display(zonesDf.head())
display(zonesDf.dtypes)
Our next step is to clean up our messy NOPD dataset. First, we will drop any columns that will not be necessary for our analysis. Although one of our original questions was about class, and we wanted to use vehicles to somehow designate a person's class, the information in those columns seems to be too inconsistent, so we will drop them for now.
dropped_df = df.drop(columns=['NOPD_Item', 'VehicleYear', 'VehicleMake', 'VehicleModel', 'VehicleStyle', 'VehicleColor', 'SubjectWeight', 'SubjectHeight', 'SubjectEyeColor', 'SubjectHairColor', 'SubjectAge', 'SubjectHasPhotoID', 'SubjectDriverLicState', 'CreatedDateTime', 'LastModifiedDateTime', 'Longitude', 'Latitude', 'Zip', 'BlockAddress'])
dropped_df.head()
Next, we will change our index to be both FieldInterviewID and SubjectID in order to not have multiple FieldInterview entries, but still be able to see how many individuals were involved in a single interview. We are also going to make sure that missing values are replaced with NaN.
df = dropped_df.set_index(['FieldInterviewID', 'SubjectID'])
df = df.replace('-', np.nan)
df['SubjectGender'] = df['SubjectGender'].fillna('UNKNOWN')
df.head()
Now we are ready for some analysis. First, we are going to see what are the most common actions taken against people of a certain race. To begin, we create a list of unique actions to see what categories we will have to do analysis for.
# create list of unique actions
actionsLst = df.ActionsTaken.unique()
actionsLst
We can see that every ActionsTaken column actually has multiple pieces of information per row, so we will need to separate each category into columns of their own later. For now, we will go through the actions list to make sure the terms they use are uniform throughout, since if they are it will make our job easier later.
# making sure the terms they use are uniform throughout. It seems like they are.
count1 = 0
count2 = 0
count3 = 0
count4 = 0
for i in actionsLst:
if (type(i) is str):
if ('Stop Results: No action taken' in i):
count1 += 1
elif ('Stop Results: no action taken' in i):
count2 += 1
elif ('Stop Results: No Action Taken' in i):
count3 += 1
elif ('Stop Results: No Action taken' in i):
count4 += 1
print(count1)
print(count2)
print(count3)
print(count4)
# counting total number of people of each race and gender that was ever stopped in this dataset
amerCount = df['SubjectRace'].value_counts()["AMER. IND."]
asianCount = df['SubjectRace'].value_counts()["ASIAN"]
blackCount = df['SubjectRace'].value_counts()["BLACK"]
hispanicCount = df['SubjectRace'].value_counts()["HISPANIC"]
whiteCount = df['SubjectRace'].value_counts()["WHITE"]
maleCount = df['SubjectGender'].value_counts()["MALE"]
femaleCount = df['SubjectGender'].value_counts()["FEMALE"]
unknownCount = df['SubjectGender'].value_counts()["UNKNOWN"]
# initializing count for actions taken against each race
noActionAmer = noActionAsian = noActionBlack = noActionHispanic = noActionWhite = 0
nanActionAmer = nanActionAsian = nanActionBlack = nanActionHispanic = nanActionWhite = 0
The following cells turn the ActionsTaken column into multiple columns that each represent a single action, so we can create visualizations easier. The visualizations also included actions taken per gender.
numRows = len(df.index)
actionCategories = []
temp = ""
semiBool = True
for i in actionsLst:
if (type(i) is str):
semiBool = True
for j in i:
if (semiBool == True):
if(j != ":"):
temp += j
else:
semiBool = False
if (temp not in actionCategories):
actionCategories.append(temp)
temp = ""
elif (j == ";"):
semiBool = True
temp1 = []
for i in range(numRows):
temp1.append(0)
stopResultsLst = []
subjectTypeLst = []
searchOccurredLst = []
searchTypesLst = []
legalBasisesLst = []
emptyLst = ["-", "-", "-", "-", "-"]
for index, row in df.iterrows():
tempLst = emptyLst
if (type(row["ActionsTaken"]) is str):
list1 = row["ActionsTaken"].split(':')
list2 = []
for i in list1:
temp = i.split(';')
for j in temp:
if (j[0] == ' '):
list2.append(j[1:])
else:
list2.append(j)
for i in (range(len(list2))):
if (list2[i] in actionCategories):
if (list2[i] == 'Stop Results'):
tempLst[0] = list2[i + 1]
elif (list2[i] == 'Subject Type'):
tempLst[1] = list2[i + 1]
elif (list2[i] == 'Search Occurred'):
tempLst[2] = list2[i + 1]
elif (list2[i] == 'Search Types'):
tempLst[3] = list2[i + 1]
elif (list2[i] == 'Legal Basises'):
tempLst[4] = list2[i + 1]
stopResultsLst.append(tempLst[0])
subjectTypeLst.append(tempLst[1])
searchOccurredLst.append(tempLst[2])
searchTypesLst.append(tempLst[3])
legalBasisesLst.append(tempLst[4])
df['StopResults'] = stopResultsLst
df['SubjectType'] = subjectTypeLst
df['SearchOccurred'] = searchOccurredLst
df['SearchTypes'] = searchTypesLst
df['LegalBasises'] = legalBasisesLst
df.head()
Now for some visualization! The following bar graphs use a simple count to allow us to visualize how frequently each type of action occurs per race. This analysis does not take the racial makeup of New Orleans into account; it strictly counts the number of occurrences on the dataset. This shows us, given there has already been a police interaction, the odds that each stop result will happen per race.
Here we see when no actions were taken by race and gender.
noAmer = noAsian = noBlack = noHispanic = noWhite = noFemale = noMale = noUnknown = 0
for index, row in df.iterrows():
if (row["StopResults"] == "No action taken"):
if (row['SubjectRace'] == "AMER. IND."):
noAmer += 1
elif (row['SubjectRace'] == "ASIAN"):
noAsian += 1
elif (row['SubjectRace'] == "BLACK"):
noBlack += 1
elif (row['SubjectRace'] == "HISPANIC"):
noHispanic += 1
elif (row['SubjectRace'] == "WHITE"):
noWhite += 1
if (row['SubjectGender'] == "FEMALE"):
noFemale += 1
if (row['SubjectGender'] == "MALE"):
noMale += 1
if ((row['SubjectGender'] == "UNKNOWN")):
noUnknown += 1
noAmerPer = (noAmer / amerCount) * 100
noAsianPer = (noAsian / asianCount) * 100
noBlackPer = (noBlack / blackCount) * 100
noHispanicPer = (noHispanic / hispanicCount) * 100
noWhitePer = (noWhite / whiteCount) * 100
noFemalePer = (noFemale / femaleCount) * 100
noMalePer = (noMale / maleCount) * 100
noUnknownPer = (noUnknown / unknownCount) * 100
noPerDict = {"Race": ["Asian", "Black", "Hispanic", "Indigenous", "White"], "Percentage of Interactions Resulting in No action taken": [noAsianPer, noBlackPer, noHispanicPer, noAmerPer, noWhitePer]}
noPerDictGender = {"Gender": ["Female", "Male", "Unknown"], "Percentage of Interactions Resulting in No action taken": [noFemalePer, noMalePer, noUnknownPer]}
noRaceDf = pd.DataFrame(noPerDict).set_index('Race')
noGenderDf = pd.DataFrame(noPerDictGender).set_index('Gender')
noRaceDf.plot.bar()
noGenderDf.plot.bar()
This graph shows the percentage of interactions that did not have a specified result.
nanAmer = nanAsian = nanBlack = nanHispanic = nanWhite = nanFemale = nanMale = nanUnknown = 0
for index, row in df.iterrows():
if ((row["StopResults"] == "-")):
if (row['SubjectRace'] == "AMER. IND."):
nanAmer += 1
elif (row['SubjectRace'] == "ASIAN"):
nanAsian += 1
elif (row['SubjectRace'] == "BLACK"):
nanBlack += 1
elif (row['SubjectRace'] == "HISPANIC"):
nanHispanic += 1
elif (row['SubjectRace'] == "WHITE"):
nanWhite += 1
if (row['SubjectGender'] == "FEMALE"):
nanFemale += 1
if (row['SubjectGender'] == "MALE"):
nanMale += 1
if ((row['SubjectGender'] == "UNKNOWN")):
nanUnknown += 1
nanAmerPer = (nanAmer / amerCount) * 100
nanAsianPer = (nanAsian / asianCount) * 100
nanBlackPer = (nanBlack / blackCount) * 100
nanHispanicPer = (nanHispanic / hispanicCount) * 100
nanWhitePer = (nanWhite / whiteCount) * 100
nanFemalePer = (nanFemale / femaleCount) * 100
nanMalePer = (nanMale / maleCount) * 100
nanUnknownPer = (nanUnknown / unknownCount) * 100
nanPerDict = {"Race": ["Asian", "Black", "Hispanic", "Indigenous", "White"], "Percentage of Interactions with no specified result": [nanAsianPer, nanBlackPer, nanHispanicPer, nanAmerPer, nanWhitePer]}
nanPerDictGender = {"Gender": ["Female", "Male", "Unknown"], "Percentage of Interactions Resulting with no specified result": [nanFemalePer, nanMalePer, nanUnknownPer]}
nanRaceDf = pd.DataFrame(nanPerDict).set_index('Race')
nanGenderDf = pd.DataFrame(nanPerDictGender).set_index('Gender')
nanRaceDf.plot.bar()
nanGenderDf.plot.bar()
Here we can see who received verbal warnings.
verbalWarningAmer = verbalWarningAsian = verbalWarningBlack = verbalWarningHispanic = verbalWarningWhite = verbalWarningFemale = verbalWarningMale = verbalWarningUnknown = 0
for index, row in df.iterrows():
if (row["StopResults"] == "Verbal Warning"):
if (row['SubjectRace'] == "AMER. IND."):
verbalWarningAmer += 1
elif (row['SubjectRace'] == "ASIAN"):
verbalWarningAsian += 1
elif (row['SubjectRace'] == "BLACK"):
verbalWarningBlack += 1
elif (row['SubjectRace'] == "HISPANIC"):
verbalWarningHispanic += 1
elif (row['SubjectRace'] == "WHITE"):
verbalWarningWhite += 1
if (row['SubjectGender'] == "FEMALE"):
verbalWarningFemale += 1
if (row['SubjectGender'] == "MALE"):
verbalWarningMale += 1
if ((row['SubjectGender'] == "UNKNOWN")):
verbalWarningUnknown += 1
verbalWarningAmerPer = (verbalWarningAmer / amerCount) * 100
verbalWarningAsianPer = (verbalWarningAsian / asianCount) * 100
verbalWarningBlackPer = (verbalWarningBlack / blackCount) * 100
verbalWarningHispanicPer = (verbalWarningHispanic / hispanicCount) * 100
verbalWarningWhitePer = (verbalWarningWhite / whiteCount) * 100
verbalWarningFemalePer = (verbalWarningFemale / femaleCount) * 100
verbalWarningMalePer = (verbalWarningMale / maleCount) * 100
verbalWarningUnknownPer = (verbalWarningUnknown / unknownCount) * 100
verbalWarningPerDict = {"Race": ["Asian", "Black", "Hispanic", "Indigenous", "White"], "Percentage of Interactions Resulting in Verbal Warnings": [verbalWarningAsianPer, verbalWarningBlackPer, verbalWarningHispanicPer, verbalWarningAmerPer, verbalWarningWhitePer]}
verbalWarningPerDictGender = {"Gender": ["Female", "Male", "Unknown"], "Percentage of Interactions Resulting in Verbal Warnings": [verbalWarningFemalePer, verbalWarningMalePer, verbalWarningUnknownPer]}
verbalWarningRaceDf = pd.DataFrame(verbalWarningPerDict).set_index('Race')
verbalWarningGenderDf = pd.DataFrame(verbalWarningPerDictGender).set_index('Gender')
verbalWarningRaceDf.plot.bar()
verbalWarningGenderDf.plot.bar()
This graph shows who received a citation.
citationAmer = citationAsian = citationBlack = citationHispanic = citationWhite = citationFemale = citationMale = citationUnknown = 0
for index, row in df.iterrows():
if (row["StopResults"] == "Citation Issued"):
if (row['SubjectRace'] == "AMER. IND."):
citationAmer += 1
elif (row['SubjectRace'] == "ASIAN"):
citationAsian += 1
elif (row['SubjectRace'] == "BLACK"):
citationBlack += 1
elif (row['SubjectRace'] == "HISPANIC"):
citationHispanic += 1
elif (row['SubjectRace'] == "WHITE"):
citationWhite += 1
if (row['SubjectGender'] == "FEMALE"):
citationFemale += 1
if (row['SubjectGender'] == "MALE"):
citationMale += 1
if ((row['SubjectGender'] == "UNKNOWN")):
citationUnknown += 1
citationAmerPer = (citationAmer / amerCount) * 100
citationAsianPer = (citationAsian / asianCount) * 100
citationBlackPer = (citationBlack / blackCount) * 100
citationHispanicPer = (citationHispanic / hispanicCount) * 100
citationWhitePer = (citationWhite / whiteCount) * 100
citationFemalePer = (citationFemale / femaleCount) * 100
citationMalePer = (citationMale / maleCount) * 100
citationUnknownPer = (citationUnknown / unknownCount) * 100
citationPerDict = {"Race": ["Asian", "Black", "Hispanic", "Indigenous", "White"], "Percentage of Interactions Resulting in Citations Issued": [citationAsianPer, citationBlackPer, citationHispanicPer, citationAmerPer, citationWhitePer]}
citationPerDictGender = {"Gender": ["Female", "Male", "Unknown"], "Percentage of Interactions Resulting in Citations Issued": [citationFemalePer, citationMalePer, citationUnknownPer]}
citationRaceDf = pd.DataFrame(citationPerDict).set_index('Race')
citationGenderDf = pd.DataFrame(citationPerDictGender).set_index('Gender')
citationRaceDf.plot.bar()
citationGenderDf.plot.bar()
This chart shows who received a summons.
summonsAmer = summonsAsian = summonsBlack = summonsHispanic = summonsWhite = summonsFemale = summonsMale = summonsUnknown = 0
for index, row in df.iterrows():
if (row["StopResults"] == "Summons Issued"):
if (row['SubjectRace'] == "AMER. IND."):
summonsAmer += 1
elif (row['SubjectRace'] == "ASIAN"):
summonsAsian += 1
elif (row['SubjectRace'] == "BLACK"):
summonsBlack += 1
elif (row['SubjectRace'] == "HISPANIC"):
summonsHispanic += 1
elif (row['SubjectRace'] == "WHITE"):
summonsWhite += 1
if (row['SubjectGender'] == "FEMALE"):
summonsFemale += 1
if (row['SubjectGender'] == "MALE"):
summonsMale += 1
if ((row['SubjectGender'] == "UNKNOWN")):
summonsUnknown += 1
summonsAmerPer = (summonsAmer / amerCount) * 100
summonsAsianPer = (summonsAsian / asianCount) * 100
summonsBlackPer = (summonsBlack / blackCount) * 100
summonsHispanicPer = (summonsHispanic / hispanicCount) * 100
summonsWhitePer = (summonsWhite / whiteCount) * 100
summonsFemalePer = (summonsFemale / femaleCount) * 100
summonsMalePer = (summonsMale / maleCount) * 100
summonsUnknownPer = (summonsUnknown / unknownCount) * 100
summonsPerDict = {"Race": ["Asian", "Black", "Hispanic", "Indigenous", "White"], "Percentage of Interactions Resulting in Summons Issued": [summonsAsianPer, summonsBlackPer, summonsHispanicPer, summonsAmerPer, summonsWhitePer]}
summonsPerDictGender = {"Gender": ["Female", "Male", "Unknown"], "Percentage of Interactions Resulting in Summons Issued": [summonsFemalePer, summonsMalePer, summonsUnknownPer]}
summonsRaceDf = pd.DataFrame(summonsPerDict).set_index('Race')
summonsGenderDf = pd.DataFrame(summonsPerDictGender).set_index('Gender')
summonsRaceDf.plot.bar()
summonsGenderDf.plot.bar()
This graph shows stops that resulted in Law Enforcement Assisted Diversion, or L.E.A.D.. This action allows NOPD officers to divert individuals who are about to be arrested for a "low-level, non-violent municipal offense to intensive case management when the alleged offense is believed to be a product of underlying mental illness, substance abuse, or social challenges." See https://nola.gov/health-department/behavioral-health/lead/#:~:text=LEAD%20provides%20NOPD%20officers%20in,illness%2C%20substance%20abuse%2C%20or%20social for more information.
This particular action is notable in that we can see how extremely rare it is for it to occur.
leadAmer = leadAsian = leadBlack = leadHispanic = leadWhite = leadFemale = leadMale = leadUnknown = 0
for index, row in df.iterrows():
if ((row["StopResults"] == "L.E.A.D.")):
if (row['SubjectRace'] == "AMER. IND."):
leadAmer += 1
elif (row['SubjectRace'] == "ASIAN"):
leadAsian += 1
elif (row['SubjectRace'] == "BLACK"):
leadBlack += 1
elif (row['SubjectRace'] == "HISPANIC"):
leadHispanic += 1
elif (row['SubjectRace'] == "WHITE"):
leadWhite += 1
if (row['SubjectGender'] == "FEMALE"):
leadFemale += 1
if (row['SubjectGender'] == "MALE"):
leadMale += 1
if ((row['SubjectGender'] == "UNKNOWN")):
leadUnknown += 1
leadAmerPer = (leadAmer / amerCount) * 100
leadAsianPer = (leadAsian / asianCount) * 100
leadBlackPer = (leadBlack / blackCount) * 100
leadHispanicPer = (leadHispanic / hispanicCount) * 100
leadWhitePer = (leadWhite / whiteCount) * 100
leadFemalePer = (leadFemale / femaleCount) * 100
leadMalePer = (leadMale / maleCount) * 100
leadUnknownPer = (leadUnknown / unknownCount) * 100
leadPerDict = {"Race": ["Asian", "Black", "Hispanic", "Indigenous", "White"], "Percentage of Interactions resulting in L.E.A.D.": [leadAsianPer, leadBlackPer, leadHispanicPer, leadAmerPer, leadWhitePer]}
leadPerDictGender = {"Gender": ["Female", "Male", "Unknown"], "Percentage of Interactions Resulting resulting in L.E.A.D.": [leadFemalePer, leadMalePer, leadUnknownPer]}
leadRaceDf = pd.DataFrame(leadPerDict).set_index('Race')
leadGenderDf = pd.DataFrame(leadPerDictGender).set_index('Gender')
leadRaceDf.plot.bar()
leadGenderDf.plot.bar()
This graph shows what stops resulted in searches.
searchAmer = searchAsian = searchBlack = searchHispanic = searchWhite = searchFemale = searchMale = searchUnknown = 0
for index, row in df.iterrows():
if ((row["SearchTypes"] == "-")):
if (row['SubjectRace'] == "AMER. IND."):
searchAmer += 1
elif (row['SubjectRace'] == "ASIAN"):
searchAsian += 1
elif (row['SubjectRace'] == "BLACK"):
searchBlack += 1
elif (row['SubjectRace'] == "HISPANIC"):
searchHispanic += 1
elif (row['SubjectRace'] == "WHITE"):
searchWhite += 1
if (row['SubjectGender'] == "FEMALE"):
searchFemale += 1
if (row['SubjectGender'] == "MALE"):
searchMale += 1
if ((row['SubjectGender'] == "UNKNOWN")):
searchUnknown += 1
searchAmerPer = ((searchAmer / amerCount) * 100)
searchAsianPer = ((searchAsian / asianCount) * 100)
searchBlackPer = ((searchBlack / blackCount) * 100)
searchHispanicPer = ((searchHispanic / hispanicCount) * 100)
searchWhitePer = ((searchWhite / whiteCount) * 100)
searchFemalePer = (100 - (searchFemale / femaleCount) * 100)
searchMalePer = (100 - (searchMale / maleCount) * 100)
searchUnknownPer = (100 - (searchUnknown / unknownCount) * 100)
searchPerDict = {"Race": ["Asian", "Black", "Hispanic", "Indigenous", "White"], "Percentage of Interactions resulting in some type of search": [searchAsianPer, searchBlackPer, searchHispanicPer, searchAmerPer, searchWhitePer]}
searchPerDictGender = {"Gender": ["Female", "Male", "Unknown"], "Percentage of Interactions Resulting resulting in some type of search": [searchFemalePer, searchMalePer, searchUnknownPer]}
searchRaceDf = pd.DataFrame(searchPerDict).set_index('Race')
searchGenderDf = pd.DataFrame(searchPerDictGender).set_index('Gender')
searchRaceDf.plot.bar()
searchGenderDf.plot.bar()
And finally, this chart shows who got arrested.
arrestAmer = arrestAsian = arrestBlack = arrestHispanic = arrestWhite = arrestFemale = arrestMale = arrestUnknown = 0
for index, row in df.iterrows():
if (row["StopResults"] == "Physical Arrest"):
if (row['SubjectRace'] == "AMER. IND."):
arrestAmer += 1
elif (row['SubjectRace'] == "ASIAN"):
arrestAsian += 1
elif (row['SubjectRace'] == "BLACK"):
arrestBlack += 1
elif (row['SubjectRace'] == "HISPANIC"):
arrestHispanic += 1
elif (row['SubjectRace'] == "WHITE"):
arrestWhite += 1
if (row['SubjectGender'] == "FEMALE"):
arrestFemale += 1
if (row['SubjectGender'] == "MALE"):
arrestMale += 1
if ((row['SubjectGender'] == "UNKNOWN")):
arrestUnknown += 1
arrestAmerPer = (arrestAmer / amerCount) * 100
arrestAsianPer = (arrestAsian / asianCount) * 100
arrestBlackPer = (arrestBlack / blackCount) * 100
arrestHispanicPer = (arrestHispanic / hispanicCount) * 100
arrestWhitePer = (arrestWhite / whiteCount) * 100
arrestFemalePer = (arrestFemale / femaleCount) * 100
arrestMalePer = (arrestMale / maleCount) * 100
arrestUnknownPer = (arrestUnknown / unknownCount) * 100
arrestPerDict = {"Race": ["Asian", "Black", "Hispanic", "Indigenous", "White"], "Percentage of Interactions Resulting in Physical Arrest": [arrestAsianPer, arrestBlackPer, arrestHispanicPer, arrestAmerPer, arrestWhitePer]}
arrestPerDictGender = {"Gender": ["Female", "Male", "Unknown"], "Percentage of Interactions Resulting in Physical Arrest": [arrestFemalePer, arrestMalePer, arrestUnknownPer]}
arrestRaceDf = pd.DataFrame(arrestPerDict).set_index('Race')
arrestGenderDf = pd.DataFrame(arrestPerDictGender).set_index('Gender')
arrestRaceDf.plot.bar()
arrestGenderDf.plot.bar()
This graph differs slightly from its predecessors in that it finds not just the percentage of traffic stops, but specifically the percentage of traffic stops that resulted in some sort of search.
trafficSearchAmer = trafficSearchAsian = trafficSearchBlack = trafficSearchHispanic = trafficSearchWhite = trafficSearchFemale = trafficSearchMale = trafficSearchUnknown = 0
trafficStopAmer = trafficStopAsian = trafficStopBlack = trafficStopHispanic = trafficStopWhite = trafficStopFemale = trafficStopMale = trafficStopUnknown = 0
for index, row in df.iterrows():
if (row["StopDescription"] == "TRAFFIC VIOLATION"):
if (row['SubjectRace'] == "AMER. IND."):
trafficStopAmer += 1
elif (row['SubjectRace'] == "ASIAN"):
trafficStopAsian += 1
elif (row['SubjectRace'] == "BLACK"):
trafficStopBlack += 1
elif (row['SubjectRace'] == "HISPANIC"):
trafficStopHispanic += 1
elif (row['SubjectRace'] == "WHITE"):
trafficStopWhite += 1
if (row['SubjectGender'] == "FEMALE"):
trafficStopFemale += 1
if (row['SubjectGender'] == "MALE"):
trafficStopMale += 1
if ((row['SubjectGender'] == "UNKNOWN")):
trafficStopUnknown += 1
if ((row["StopResults"] != "No action taken") and (row["StopResults"] == "-")):
if (row['SubjectRace'] == "AMER. IND."):
trafficSearchAmer += 1
elif (row['SubjectRace'] == "ASIAN"):
trafficSearchAsian += 1
elif (row['SubjectRace'] == "BLACK"):
trafficSearchBlack += 1
elif (row['SubjectRace'] == "HISPANIC"):
trafficSearchHispanic += 1
elif (row['SubjectRace'] == "WHITE"):
trafficSearchWhite += 1
if (row['SubjectGender'] == "FEMALE"):
trafficSearchFemale += 1
if (row['SubjectGender'] == "MALE"):
trafficSearchMale += 1
if ((row['SubjectGender'] == "UNKNOWN")):
trafficSearchUnknown += 1
trafficSearchAmerPer = ((trafficSearchAmer / trafficStopAmer) * 100)
trafficSearchAsianPer = ((trafficSearchAsian / trafficStopAsian) * 100)
trafficSearchBlackPer = ((trafficSearchBlack / trafficStopBlack) * 100)
trafficSearchHispanicPer = ((trafficSearchHispanic / trafficStopHispanic) * 100)
trafficSearchWhitePer = ((trafficSearchWhite / trafficStopWhite) * 100)
trafficSearchFemalePer = (100 - (trafficSearchFemale / trafficStopFemale) * 100)
trafficSearchMalePer = (100 - (trafficSearchMale / trafficStopMale) * 100)
trafficSearchUnknownPer = (100 - (trafficSearchUnknown / trafficStopUnknown) * 100)
trafficSearchPerDict = {"Race": ["Asian", "Black", "Hispanic", "Indigenous", "White"], "Percentage of traffic stops resulting in some type of search": [trafficSearchAsianPer, trafficSearchBlackPer, trafficSearchHispanicPer, trafficSearchAmerPer, trafficSearchWhitePer]}
trafficSearchPerDictGender = {"Gender": ["Female", "Male", "Unknown"], "Percentage of traffic stops resulting in some type of search": [trafficSearchFemalePer, trafficSearchMalePer, trafficSearchUnknownPer]}
trafficSearchRaceDf = pd.DataFrame(trafficSearchPerDict).set_index('Race')
trafficSearchGenderDf = pd.DataFrame(trafficSearchPerDictGender).set_index('Gender')
trafficSearchRaceDf.plot.bar()
trafficSearchGenderDf.plot.bar()
This analysis yields a number that is closer to what our partner at the Data Center told us to aim for. However, this uses a variable that we were less trustful of because we found many inconsistencies in it.
trafficSearchAmer2 = trafficSearchAsian2 = trafficSearchBlack2 = trafficSearchHispanic2 = trafficSearchWhite2 = 0
trafficStopAmer2 = trafficStopAsian2 = trafficStopBlack2 = trafficStopHispanic2 = trafficStopWhite2 = 0
trafficViolations = 0
for index, row in df.iterrows():
if (row["StopDescription"] == "TRAFFIC VIOLATION") and (row["EventDate"][6:10] == "2017"):
trafficViolations += 1
if (row['SubjectRace'] == "AMER. IND."):
trafficStopAmer2 += 1
elif (row['SubjectRace'] == "ASIAN"):
trafficStopAsian2 += 1
elif (row['SubjectRace'] == "BLACK"):
trafficStopBlack2 += 1
elif (row['SubjectRace'] == "HISPANIC"):
trafficStopHispanic2 += 1
elif (row['SubjectRace'] == "WHITE"):
trafficStopWhite2 += 1
if (row["SearchOccurred"] == "Yes"):
if (row['SubjectRace'] == "AMER. IND."):
trafficSearchAmer2 += 1
elif (row['SubjectRace'] == "ASIAN"):
trafficSearchAsian2 += 1
elif (row['SubjectRace'] == "BLACK"):
trafficSearchBlack2 += 1
elif (row['SubjectRace'] == "HISPANIC"):
trafficSearchHispanic2 += 1
elif (row['SubjectRace'] == "WHITE"):
trafficSearchWhite2 += 1
trafficSearchAmerPer2 = ((trafficSearchAmer2 / trafficStopAmer2) * 100)
trafficSearchAsianPer2 = ((trafficSearchAsian2 / trafficStopAsian2) * 100)
trafficSearchBlackPer2 = ((trafficSearchBlack2 / trafficStopBlack2) * 100)
trafficSearchHispanicPer2 = ((trafficSearchHispanic2 / trafficStopHispanic2) * 100)
trafficSearchWhitePer2 = ((trafficSearchWhite2 / trafficStopWhite2) * 100)
trafficSearchPerDict2 = {"Race": ["Asian", "Black", "Hispanic", "Indigenous", "White"], "Percentage of traffic stops resulting in some type of search": [trafficSearchAsianPer2, trafficSearchBlackPer2, trafficSearchHispanicPer2, trafficSearchAmerPer2, trafficSearchWhitePer2]}
trafficSearchRaceDf2 = pd.DataFrame(trafficSearchPerDict2).set_index('Race')
trafficSearchRaceDf2.plot.bar()
Unfortunately for many of these analyses, we do not have great data regarding gender because there are high amounts of "unknown" genders recorded. However, we can still use the demographic information we have to see whether the stops per race are proportional to the racial makeup of this city.
To prepare for this, we will begin work on the gender dataset since there are only three attributes at play - "male", "female", and "unknown". This will include cleaning it up by dropping unnecessary columns and changing the dtype for some columns.
gender_df = gender_df.dropna(how='all')
gender_df = gender_df.drop(columns=['Blank'])
display(gender_df)
display(gender_df.dtypes)
After cleaning up this dataset a little, we can move onto more data analysis. Our first analysis is a simple test to see if we can extract any useful information based on a subject's gender. Below is a simple pie chart that shows roughly how many men vs how many women were stopped by police in total.
df['SubjectGender'].value_counts().plot.pie(colors=['blue', 'pink', 'gray'])
This pie chart compares the percentage of women and men located in the city of New Orleans.
NewOrleansGenders = gender_df[['Female2014-2018', 'Male2014-2018']]
display(NewOrleansGenders.loc['New Orleans'].plot.pie(y='Gender in New Orleans, 2014-2018', colors=['pink', 'blue']))
From this simple comparison, we can see that the number of males that are stopped by police exceeds what should be considered proportional. If the number of men stopped were proportional to the population of the city, then roughly 50% of these stops would have been males.
Now we need to clean up the race dataframe and exclude data that came from years we don't need.
clean_race_df = race_df.drop(columns=['Black2000', 'BlackMOE', 'White2000', 'WhiteMOE', 'Asian2000', 'AsianMOE', 'AmerIndian2000', 'AmerIndianMOE', 'Biracial2000', 'BiracialMOE', 'Hispanic2000', 'HispanicMOE', 'Other2000', 'OtherMOE'])
Now we can create a bar graph showing the total percentages of each action taken by race.
comp1 = pd.merge(noRaceDf, nanRaceDf, left_index=True, right_index=True)
comp2 = pd.merge(comp1, verbalWarningRaceDf, left_index=True, right_index=True)
comp3 = pd.merge(comp2, citationRaceDf, left_index=True, right_index=True)
comp4 = pd.merge(comp3, summonsRaceDf, left_index=True, right_index=True)
comp5 = pd.merge(comp4, leadRaceDf, left_index=True, right_index=True)
comp6 = pd.merge(comp5, searchRaceDf, left_index=True, right_index=True)
comp7 = pd.merge(comp6, arrestRaceDf, left_index=True, right_index=True)
comp8 = pd.merge(comp7, trafficSearchRaceDf, left_index=True, right_index=True)
display(comp8)
comp8.plot.bar().legend(loc='center left',bbox_to_anchor=(1.0, 0.5))
The following shows the raw data from the New Orleans row of the demographic dataset. Below that, we will use this information to further compare the population of people in the traffic stops with the population of New Orleans.
blackPer = race_df.iloc[72]["Black2014-2018"]
whitePer = race_df.iloc[72]["White2014-2018"]
asianPer = race_df.iloc[72]["Asian2014-2018"]
amerPer = race_df.iloc[72]["AmerIndian2014-2018"]
hispanicPer = race_df.iloc[72]["Hispanic 2013-2017"]
Here we found out how often each action occurred relating to the populations of each race.
percentageLst = []
proportionLst = []
for i in df.StopResults.unique():
theSum = 0
for j in df[df.StopResults == i]["SubjectRace"].value_counts():
theSum += j
count = 0
for k in df[df.StopResults == i]["SubjectRace"].value_counts():
percentage = k / theSum
if (count == 0):
curRace = "Black"
elif (count == 1):
curRace = "White"
elif (count == 2):
curRace = "Hispanic"
elif (count == 3):
curRace = "Asian"
elif (count == 5):
curRace = "American Indian"
if (count != 4):
percentageLst.append(percentage)
proportion = 0
if (count == 0):
proportion = percentage / blackPer
proportionLst.append(proportion)
elif (count == 1):
proportion = percentage / whitePer
proportionLst.append(proportion)
elif (count == 2):
proportion = percentage / hispanicPer
proportionLst.append(proportion)
elif (count == 3):
proportion = percentage / asianPer
proportionLst.append(proportion)
elif (count == 5):
proportion = percentage / amerPer
proportionLst.append(proportion)
count += 1
If the rate at which certain races are stopped is proportional to the population of the city, then the following visualizations should show a 1:1 ratio between the two numbers.
import matplotlib.pyplot as plt
keys = ["Black", "White", "Hispanic", "Asian", "American Indian"]
# gathering values of percentages into list
for val in percentageLst:
idkVals = percentageLst[0:5]
physArrestVals = percentageLst[5:10]
citationVals = percentageLst[10:15]
verbalVals = percentageLst[15:20]
noneVals = percentageLst[20:25]
summonsVals = percentageLst[25:30]
leadVals = percentageLst[30:32]
# making large list to use for dataframe
bigPerLst = [idkVals, physArrestVals, citationVals, verbalVals, noneVals, summonsVals, leadVals]
# creating and displaying dataframe plus bar chart
percentDf = pd.DataFrame(bigPerLst, index = ['Unspecified', 'Physical Arrest', 'Citations', 'Verbal Warning', 'No Action', 'Summons', 'L.E.A.D.'], columns=["Black", "White", "Hispanic", "Asian", "American Indian"])
display(percentDf)
percentDf.plot.bar(title="Percentage of races").legend(loc='center left',bbox_to_anchor=(1.0, 0.5))
# same as above, but for proportions
for value in proportionLst:
idkVals2 = proportionLst[0:5]
physArrestVals2 = proportionLst[5:10]
citationVals2 =proportionLst[10:15]
verbalVals2 = proportionLst[15:20]
noneVals2 = proportionLst[20:25]
summonsVals2 = proportionLst[25:30]
leadVals2 = proportionLst[30:32]
bigProLst = [idkVals2, physArrestVals2, citationVals2, verbalVals2, noneVals2, summonsVals2, leadVals2]
proDf = pd.DataFrame(bigProLst, index = ['Unspecified', 'Physical Arrest', 'Citations', 'Verbal Warning', 'No Action', 'Summons', 'L.E.A.D.'], columns=["Black", "White", "Hispanic", "Asian", "American Indian"])
display(proDf)
proDf.plot.bar(title="Proportion of races").hlines(1, -1, 7)
The plot above shows that, in most cases, Black people experience a greater rate of each action than any other race. One exception is in the citations category, where American Indians are disproportionately issued more (however, we should take into account that there is a very low number of American Indians in New Orleans and small samples can be misleading). The other exception is the L.E.A.D. program, where white people are disproportionately more likely to be inducted into it. Again, there are less than 100 instances of L.E.A.D. occurring in this dataset, so this could also be misleading.
We would now like to see in which areas of New Orleans do most stops occur. To do this, we must first join our big dataframe with our zones dataframe. We followed this tutorial (https://towardsdatascience.com/lets-make-a-map-using-geopandas-pandas-and-matplotlib-to-make-a-chloropleth-map-dddc31c1983d) to accomplish this.
df['Zone'] = df['Zone'].astype(str)
df['District'] = df['District'].astype(str)
df['District'] = df['District'].str.cat(df['Zone'])
df = df.drop(columns=['Zone'])
display(df.head())
map_df = map_df.rename(columns={"Zone": "District"})
map_df
After cleaning up our datasets a little, we must merge them in order to get the map plot to work.
mergedDf = df.set_index('District').join(map_df.set_index('District'), on='District')
mergedDf
The following map plot shows which stop result occurs most frequently in each district of the city. For example, we can see that in the Uptown area people mostly get citations, no actions, or verbal warnings.
from geopandas import GeoDataFrame
variable = 'StopResults'
vmin, vmax = 120, 220
fig, ax = plt.subplots(1, figsize=(10, 10))
mergedDf = GeoDataFrame(mergedDf)
mergedDf.plot(column=variable, cmap='Blues', linewidth=0.8, ax=ax, edgecolor='0.8', legend=True)
Now we create a new merged dataframe that will count the total number of events - regardless of what they were - that occurred in each district. This will show us where the highest rates of police activity are in the city.
resultsDf = df['District'].value_counts().to_frame()
resultsDf = resultsDf.rename(columns={'District':'Total'})
merged2 = resultsDf.join(map_df.set_index('District'))
merged2
from mpl_toolkits.axes_grid1 import make_axes_locatable
variable = 'Total'
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)
fig, ax = plt.subplots(1, figsize=(10, 6))
merged2 = merged2.replace(40995, 0)
display(merged2.Total.max())
merged2 = GeoDataFrame(merged2)
merged2.plot(column=variable, cmap = 'Blues', linewidth=0.8, ax=ax, edgecolor='0.8', legend=True, legend_kwds={'shrink': 0.3})
To make this map work, we had to exclude the number we got for the French Quarter because the number of actions that were taken there were so absurdly high it made every other district look insignificant.
In conclusion, we have found that there do seem to be disproportionalities, the most obvious and general is the frequency of Black subjects in this dataset. People attribute this to the increased police prescence in Black neighborhoods. We looked at the data two different ways - one in which we compared the frequency of each stop result involvoing each race to the populations of those races in New Orleans and another in which we compared the frequency of each stop result involving each race to the occurrences of each race in the dataset. The former was the one that showed the disproportionality. Many people believe this problem can be solved by being more careful with how we deal with crime; police force may not be the most effective solution in many cases.
We feel that we have found some statistics that we have not seen before, but there has certainly plenty of other work related to this issue, and from a larger scope, the issue of the entire criminal justice system. We encourage our readers to continue their research perhaps by reading the literature and watching the documentary below:
https://www.nature.com/articles/d41586-020-01846-z An article entitled "What the data say about police brutality and racial bias — and which reforms might work" which contains some clear and shocking statistics as well as important anecdotal information.
"Are Prisons Obsolete?" is a book by the brilliant and prolific civil rights leader, author, and scholar, Angela Davis in which she argues in favor of a transformative change in the American criminal justice system in the form of what she refers to as "decarceration."
"The Black and the Blue" is a book by former police officer Matthew Horace and his writing partner, Ron Harris, in which they reveal the crimes and injustices that Horace experienced during his time as a police officer.
"13th" is a documentary by Ava DuVernay Spencer Averick which discusses the effect of the racial discrimination and the loophole of 13th amendment which abolished slavery throughout the United States and ended involuntary servitude except as a punishment for conviction of a crime on the criminal justice system.