Marc Ojalvo & Kyle Strougo

Website: https://mojalvo.github.io/

Since 2015, there has been almost 5000 cases of police brutality in the United States. With the death of George Floyd this past summer, there has been a call to action to refrom the policing system to prevent unecessary deaths from occuring. (Futher reading about these issues attached at the end of notebook)

Our project hopes to find patters in police brulaity to infrom and educate others on the most suspectible groups of potential victims. We hope to use data to find patters and valuable statitics.

Our project will be looking at the correlation between cities' income and where police shootings occur. We are attempting to answer the following questions:

1. Is there a correlation between income and police brutality?¶

2. Is there a correlation between income inequality (standard deviation of income) and police brutality cases?¶

3. What are the demographics of police brutality vicitims? Are certain groups disproportionately targeted?¶

We hypothesize that there are more police shootings in low-income areas than in higher-income areas and that there is a correlation between income inequality and police shootings.

The primary data set we will be looking at is the mean income of all U.S cities & towns as of 2019. The data sets include more information than just the mean income, such as the county, type, longitude, latitude, median, and standard deviation. After cleaning up the data, we decided to keep the state, city, mean, median, and standard deviation. We decided to keep both the median and mean because a difference between the two could entail skewed data. The standard deviation is a good indicator of income inequality within a city - another data point that we plan to investigate. We found this data set from https://www.kaggle.com/goldenoakresearch/us-household-income-stats-geo-locations/notebooks

Another dataset we will be analyzing is police shootings that occurred between 2015 and 2020. The data set contains the victim's name, the data, race, gender, city, and other information about the shooting. Since we are just looking at the correlation between a city's income and the number of police shootings, we decided to rid most of the data besides where the shooting occurred and the victims' names. We found this data set from https://www.kaggle.com/ahsen1330/us-police-shootings

Since the raw counts of police brutality isn't comparable, since some cities have larger population than others. To ensure our data is comparable, we imported the population of every US city and town and divided turned police brutality cases into per 10,000 people. We recived the data from https://simplemaps.com/data/us-cities.

The mean, median, and standard deviation of the city income data set will be the basis for our project. We plan to compare the number of police shootings in a city with the city's income statistics to test our hypotheses. After mapping each shooting to a city, we will determine if there is a correlation between the data points. Modifying the data to our liking will require utilizing SQL like commands made available in Pandas. We will also use visualization techniques to graph and visualize any trends.

In terms of collaboration, we have been in contact for a few weeks now. After researching what information was available online, we decided that these two datasets could work great together. The two overlap with the necessary attributes to make an analysis and correlation. We expect to find a general trend between the two and are both interested in the possible results. We will not need any more data sets to test our hypotheses.

We have been collaborating through Zoom, sharing our screens, and switching off coding while the other assists. Jupyter Notebook has been an excellent tool for debugging and working with the dataset. Furthermore, GitHub has served as an environment to hold and commit new changes to our files. We have also utilized our local editors and used excel to preview and make changes to data organization.

#installing packages for graphs
!pip3 install geopandas
!pip install descartes

Requirement already satisfied: geopandas in /opt/conda/lib/python3.8/site-packages (0.8.1)
Requirement already satisfied: pandas>=0.23.0 in /opt/conda/lib/python3.8/site-packages (from geopandas) (1.1.0)
Requirement already satisfied: shapely in /opt/conda/lib/python3.8/site-packages (from geopandas) (1.7.1)
Requirement already satisfied: fiona in /opt/conda/lib/python3.8/site-packages (from geopandas) (1.8.18)
Requirement already satisfied: pyproj>=2.2.0 in /opt/conda/lib/python3.8/site-packages (from geopandas) (3.0.0.post1)
Requirement already satisfied: numpy>=1.15.4 in /opt/conda/lib/python3.8/site-packages (from pandas>=0.23.0->geopandas) (1.19.1)
Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.8/site-packages (from pandas>=0.23.0->geopandas) (2020.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.8/site-packages (from pandas>=0.23.0->geopandas) (2.8.1)
Requirement already satisfied: click<8,>=4.0 in /opt/conda/lib/python3.8/site-packages (from fiona->geopandas) (7.1.2)
Requirement already satisfied: attrs>=17 in /opt/conda/lib/python3.8/site-packages (from fiona->geopandas) (19.3.0)
Requirement already satisfied: certifi in /opt/conda/lib/python3.8/site-packages (from fiona->geopandas) (2020.6.20)
Requirement already satisfied: click-plugins>=1.0 in /opt/conda/lib/python3.8/site-packages (from fiona->geopandas) (1.1.1)
Requirement already satisfied: cligj>=0.5 in /opt/conda/lib/python3.8/site-packages (from fiona->geopandas) (0.7.1)
Requirement already satisfied: munch in /opt/conda/lib/python3.8/site-packages (from fiona->geopandas) (2.5.0)
Requirement already satisfied: six>=1.7 in /opt/conda/lib/python3.8/site-packages (from fiona->geopandas) (1.15.0)
Requirement already satisfied: descartes in /opt/conda/lib/python3.8/site-packages (1.1.0)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.8/site-packages (from descartes) (3.2.2)
Requirement already satisfied: numpy>=1.11 in /opt/conda/lib/python3.8/site-packages (from matplotlib->descartes) (1.19.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->descartes) (1.2.0)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->descartes) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib->descartes) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib->descartes) (0.10.0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.1->matplotlib->descartes) (1.15.0)

DATA COLLECTION¶

#centers all graphs for orginizational purposes
from IPython.core.display import HTML as Center

Center(""" <style>
.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
}
</style> """)

#impoting pandas libarary
import pandas as pd
import numpy as np
import geopandas as gpd

from shapely.geometry import Point
from geopandas import GeoDataFrame
from scipy import ndimage
from scipy import stats

#imports used for heat map
import matplotlib.pylab as pylab
import matplotlib.pyplot as plt

#importing income data and turning it into data frame
income_df = pd.read_csv("./data/kaggle_income_edited.csv")
income_df.head()

#removed columns from income data unnessary for our analysis
income_df = income_df[['State_Name','City','Mean','Median','Stdev','Lat','Lon']]

#rename state column to match income name
income_df.rename(columns = {'State_Name':'State'}, inplace=True)

income_df

For our income data, we kepts all information we though would be usefull for the analyis. We thought mean, median and standard deviation would be the most important statistics, and lat and lon for mapping purposes.

#importing shootings data and turning it into data frame
shootings_df = pd.read_csv("./data/shootings.csv")
shootings_df

#crating  specific data frames for future graph use 
shootings_demographics = shootings_df[['race','gender','age']]

shootings_demographics

#removed columns from shooting data unnessary for our analysis
shootings_df = shootings_df[['city','state']]

#rename state column to match income name
shootings_df.rename(columns = {'state':'State', 'city':'City'}, inplace=True)

shootings_df

/opt/conda/lib/python3.8/site-packages/pandas/core/frame.py:4290: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(

For our police brutality data frame, we decided to seperate it into two different frames. One contianing the city and state of the case, and the other containing the demographics of the victims.

#importing population per city data frame and turning it into data frame
population_df = pd.read_csv('./data/uscities.csv', skipinitialspace=True)

population_df.head()

#only keep columns that were needed
population_df = population_df[['state_name','city','population']]

#rename columns for merging purposes
population_df.rename(columns = {'state_name':'State', 'city':'City'}, inplace=True)

population_df

For our polulation data, we only kept the 2019 population since that is all we needed to standardize our police brutality cases

DATA PROCESSING¶

#converts all State names to State codes
shootings_df['State'] = shootings_df['State'].map({
    'AL':'Alabama',
    'AK':'Alaska',
    'AZ':'Arizona',
    'AR':'Arkansas',
    'CA':'California',
    'CO':'Colorado',
    'CT':'Connecticut',
    'DE':'Delaware',
    'FL':'Florida',
    'GA':'Georgia',
    'HI':'Hawaii',
    'ID':'Idaho',
    'IL':'Illinois',
    'IN':'Indiana',
    'IA':'Iowa',
    'KS':'Kansas',
    'KY':'Kentucky',
    'LA':'Louisiana',
    'MA':'Maine',
    'MD':'Maryland',
    'MA':'Massachusetts',
    'MI':'Michigan',
    "MN":'Minnesota',
    "MS":'Mississippi',
    'MO': 'Missouri',
    'MT':'Montana',
    'NE':'Nebraska',
    'NV':'Nevada',
    'NH':'New Hampshire',
    'NJ':'New Jersey',
    'NM':'New Mexico',
    'NY':'New York',
    'NC':'North Carolina',
    'ND':'North Dakota',
    'OH':'Ohio',
    'OK':'Oklahoma',
    'OR':'Oregon',
    'PA':'Pennsylvania',
    'RI':'Rhode Island',
    'SC':'South Carolina',
    'SD':'South Dakota',
    'TN':'Tennessee',
    'TX':'Texas',
    'UT':'Utah',
    'VT':'Vermont',
    'VA':'Virginia',
    'WA':'Washington',
    'WV':'West Virginia',
    'WI':'Wisconsin',
    'WY':'Whyoming'

})

shootings_df

<ipython-input-11-77f39a93a66a>:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  shootings_df['State'] = shootings_df['State'].map({

We had to change the state names to their associated state code so that we could properly merge the two data sets. We want to merge both data sets on city and state since there are some city names that are used multiple times. To ensure that these duplicate city names were not confused, we had to merge on state as well. The income data set had each city using a state name, but the shootings data set had each city using the state code.

#count amount of shootings in each city and create data frame
shooting_counts = shootings_df['City'].value_counts().rename_axis('City').reset_index(name='counts')

#combines data frames to include shooting count
shootings_df = shootings_df.merge(shooting_counts, on=['City'], how='inner')

#creates data frame that includes shooting count per city (only counting cities with shootings)
income_shooting_df_inner = income_df.merge(shootings_df, on=['City', 'State'], how='inner')

#drop dulicate cities that came from income dataset
income_shooting_df_inner.drop_duplicates(subset=['City'], inplace=True)

#sorts by city alphabetically for orginazation purposes
income_shooting_df_inner.sort_values('City', inplace=True)

income_shooting_df_inner

#creates data frame that includes shooting count per city (counting all cities shootings)
income_shooting_df = income_df.merge(shootings_df, on=['City', 'State'], how='left')

#replaces 0 with nans so mean is not skewed
income_shooting_df.replace(0, np.nan, inplace=True)

#groups by city/states because there are multiple city/state combos
income_shooting_df_all = income_shooting_df.groupby(['City','State']).agg({'Mean': 'mean', 'Median': 'mean', 'Stdev': 'mean', 'counts':'mean'})
income_shooting_df_all = income_shooting_df_all.reset_index()

#sorts by city alphabetically for orginazation purposes
income_shooting_df_all.sort_values('City', inplace=True)

#replaces all Nans with 0
income_shooting_df_all.replace(np.nan, 0, inplace=True)

income_shooting_df_all

Another issue we had with our data was that the income data was separated by area code and not just cities. We originally just dropped duplicate names, and used the data for the first instance of each name. However, after doing some deep analysis, we realized that we should not be doing so because then our data wouldn’t be accurate. Instead, we grouped all instances of a city/state pair, and found the mean of the Mean, Median, and Standard Deviation.

#adds population to each city/state combo
income_shooting_df_all = income_shooting_df_all.merge(population_df, on=['City', 'State'], how='left')

#creates a rate column to make sure data is comparable 
income_shooting_df_all['rate'] = (income_shooting_df_all['counts']/ income_shooting_df_all['population']) * 10000

income_shooting_df_all

After merging our income, police brutality cases, and population data frames, we had to get the rate of police brutality cases per 10,000 people. To do that, we divided the count of police brutality cases by the population of the city, and then multiplied it by 10,000. By having the rates, the data is more comparable, since some cities have such a small population but high police brutality numbers.

EXPLORATORY ANALYSIS & DATA VISUALIZATION¶

#plots the longitutate and lagititude of each shooting
geometry = [Point(xy) for xy in zip(income_shooting_df_inner['Lon'], income_shooting_df_inner['Lat'])]
gdf = GeoDataFrame(income_shooting_df_inner, geometry=geometry)

#this is a simple map that goes with geopandas
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
ax = gdf.plot(ax=world.plot(figsize=(50, 10)), marker='o', color='red', markersize=15);

#zooms into the united states
minx, miny, maxx, maxy = gdf.total_bounds
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)

#sets map title
ax.set_title("Police shootings in the United States", fontsize=25)

Text(0.5, 1.0, 'Police shootings in the United States')

In the map above we plotted the shootings in the US. Each red dot represents the location of a shooting. Clusters represent multiple shootings. If you notice the more populated cities, tend to have more shootings which makes sence. Futermore, we do not see as many police shootings in the midwest, or southwestern part of the country. Most shootings occur in the east, with the excpetion of California

from scipy import ndimage
import matplotlib.pylab as pylab
import matplotlib.pyplot as plt

#code come from: https://nbviewer.jupyter.org/gist/perrygeo/c426355e40037c452434#function to create heatmap comes from 
def heatmap(d, bins=(100,100), smoothing=1.3, cmap='jet'):
    def getx(pt):
        return pt.coords[0][0]

    def gety(pt):
        return pt.coords[0][1]

    x = list(d.geometry.apply(getx))
    y = list(d.geometry.apply(gety))
    heatmap, xedges, yedges = np.histogram2d(y, x, bins=bins)
    extent = [yedges[0], yedges[-1], xedges[-1], xedges[0]]

    logheatmap = np.log(heatmap)
    logheatmap[np.isneginf(logheatmap)] = 0
    logheatmap = ndimage.filters.gaussian_filter(logheatmap, smoothing, mode='nearest')

    plt.imshow(logheatmap, cmap=cmap, extent=extent)
    plt.colorbar()
    plt.gca().invert_yaxis()
    plt.show()

heatmap(income_shooting_df_inner, bins=70, smoothing=1)

<ipython-input-18-01ce90eeba5e>:18: RuntimeWarning: divide by zero encountered in log
  logheatmap = np.log(heatmap)

Since the map before did not properly incdicate the amount of shootings per area, we decided to create a heat map to show the amount of shooting given in an rea. The hotter the color ( so the closer to red) the most shootings there are in a location. If you notice as before, most shootings occur in the east, with the exception of California.

#to display charts on top of eachother
plt.figure(0)

#counts race of each shooting victim
race_counts = shootings_demographics['race'].value_counts()

#plots counts into pie graph
race_counts.plot.pie(figsize=(7,7), colors=['darkgreen', 'crimson', 'pink', 'yellow', 'orange', 'brown'])
plt.title('Race of Victims ', fontsize=25)

#to display charts on top of eachother
plt.figure(1)

#counts gender of each shooting victim
gender_counts = shootings_demographics['gender'].value_counts()

#plots counts into pie graph
gender_counts.plot.pie(figsize=(7,7), colors=['blue', 'fuchsia'])
plt.title('Gender of Victims', fontsize=25)

#to display charts on top of eachother
plt.figure(2)

#plots age distribution of victimes
shootings_demographics['age'].plot.hist(bins=25, figsize=(10,5), color=['black'])
plt.title('Age distribution of Victims', fontsize=25)

#shows all the plots
plt.show()

The first pie chart above shows the proportion of shootings by race. The major take away from this graph is that proportionately speaking white people are more likely to get shot by a police than black people. However, the argument about black people experiencing worse treatment by cops than there counterparts derives from the percentage of shooting compared to there population. According to governing.com, African Americas comprise of 12.5% of the american population, but according to our stats make up approximately 25% of police brutality cases. Where as white people make up about 60% of the US polulation but only account for half of police brutality case.

The second pie chart shows the proportion of victims by race. Clearly there the majority of police brutality victims are males.

The last chart is a histogram of police brutality victims by age. The marjoity of victimes fall in the late 20s early 30 age group, with the least falling past 65.

#creates pivot table
race_gender = (shootings_demographics.
                   groupby(["gender", "race"])['gender'].
                   count())

#turns table in Dateframe
race_gender.to_frame()

#plots pivot table into bar graph
race_gender.plot.bar()
plt.title('Shooting Counts by Gender and Race', fontsize=25)

Text(0.5, 1.0, 'Shooting Counts by Gender and Race')

The graph above shows the gender/race combo of the shootings. We created this graph so you could visualize the comparison of shootings depending on those two factors. If you notice, across the board, white males are the most likely victim of police brutality cases, second being black males and third being hispanic males.

ANALYSIS & HYPOTHESIS TESTING¶

#standardized mean and standard deviation to make graph more readable 
income_shooting_df_all['Mean_std'] = (
    (income_shooting_df_all['Mean'] - income_shooting_df_all['Mean'].mean()) /
    income_shooting_df_all['Mean'].std())

income_shooting_df_all['Stdev_std'] = (
    (income_shooting_df_all['Stdev'] - income_shooting_df_all['Stdev'].mean()) /
    income_shooting_df_all['Stdev'].std())

#removes all cities without a population or rate to get more accurate results 
income_shooting_df_all.dropna(axis=0, inplace=True)

income_shooting_df_all

Since the mean and standard deviation numbers had a larger range, we thought it would be easier and more understandable to standardize the mean and standard deviation. We did this by normalizing the numbers, so if you look at the graphs below, it makes more sense how the rates correlate.

#fills NaN with 0 so correlation purposes
income_shooting_df_all.fillna(0, inplace=True)

income_shooting_df_all.corr()

#creates cooraltion matrix between desired variables
variables = ['Mean', 'Stdev', 'rate']

correlation = income_shooting_df_all[variables].corr()

correlation['rate']

Mean    -0.038483
Stdev   -0.015460
rate     1.000000
Name: rate, dtype: float64

Here we are showing the correlation between the amount of shootings with mean, median, and standard deviation. The first thing to note is the correlation between Mean and Counts with a correlation of -0.025369. Although the correlation is small, a negative correlation suggests that as mean income rises, the amount of police shootings decrease. This makes sence as I add assume places with more money have less police brutlaity cases. Another interesting point is that police brutality and standard deviation have a negative correlation, which suggests that cities with lower income inequality tend to see more police brulaity cases.

#graphing relationship between mean and counts
plt.figure(0)
income_shooting_df_all.plot.scatter(x='Mean_std', y='rate', figsize=(10,5))
plt.title('Means vs. Rates', fontsize=25)

#graphing relationship between standard deviation and counts
plt.figure(1)
income_shooting_df_all.plot.scatter(x='Stdev_std', y='rate', figsize=(10,5))
plt.title('Standard Deviation vs. Rates', fontsize=25)

plt.show()

<Figure size 432x288 with 0 Axes>

The graphs above are plots of the counts of the shootings vs. the mean income of cities and states. If you notice the counts vs. mean graph is skewed more to the left, which makes sense assuming that the there is a negative correlation in shootings. If you notice, the cities with the highest mean income tend to have 0 police brutality cases. On the other hand, the standard deviation vs. counts graphs shows something a bit different. With a slight left skew, but more centered data, we can see that there is a range of income inequality where we see the most police brutalities.

Overall, our project does support our hypothesis that low income areas tend to have more police shootings. This would make sense, and lower income areas tend to have more crime, and therefore stricter policing. Similarly, places with large income dispareties also have a stricter police force, which results in more cases of police brutality.

# refine a data frame to cities with populations greater than 100,000
mask = ((income_shooting_df_all['population'] > 100000))

large_cities_df = income_shooting_df_all.loc[mask]

#graphing relationship between mean and counts for cities with populations greater than 100,000
plt.figure(0)
large_cities_df.plot.scatter(x='Mean_std', y='rate', figsize=(10,5))
plt.title('Means vs. Rates (City Populations: >100,000)', fontsize=25)

#graphing relationship between standard deviation and counts for cities with populations greater than 100,000
plt.figure(1)
large_cities_df.plot.scatter(x='Stdev_std', y='rate', figsize=(10,5))
plt.title('Standard Deviation vs. Rates (City Populations: >100,000)', fontsize=25)

plt.show()

<Figure size 432x288 with 0 Axes>

#fills NaN with 0 so correlation purposes
large_cities_df.fillna(0, inplace=True)

large_cities_df.corr()

/opt/conda/lib/python3.8/site-packages/pandas/core/frame.py:4311: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().fillna(

#creates cooraltion matrix between desired variables
variables = ['Mean', 'Stdev', 'rate']

correlation = large_cities_df[variables].corr()

correlation['rate']

Mean    -0.106275
Stdev   -0.126123
rate     1.000000
Name: rate, dtype: float64

In the above cell you can find the correlations for the 100,000 city population data. We wanted to refine our data to large cities in order to analyze the shooting income correlations for high population areas. As compared to the total country data this more refined data provides a stronger correlation. A negative correlation here suggests that as mean income rises, the amount of police shootings decrease. Again, this makes sence as I add assume places with more money have less police brutlaity cases. Furthermore, cities with lower income inequality tend to see more police brulaity cases.

You can observe similar trends seen in the origial two graphs, however within a more refined an accurate range. The mean graph is mostly skewed to the left. This supports our original analysis that for areas with lower income, they experience higher levels of police brutality. Furthermore, for the standard deviation graph, again it is mostly centered. However, for this more refined scale we can observe a stronger skew to the left. These two observations support our original hypothesis that low income areas are subjected to high levels of police brutality.

INSIGHT & POLICY DECISION¶

Utilizing our analysis, we can effectively locate areas subjected to vulnerability. We can share our findings with the police force in order to more educate, prepare, and avoid for high volumes of shootings for a given city. From this project we have learned just how powerful the intersection and collaboration of a variety of python data tools can be in order to perform an in depth analytical study. We also learned that raw rates are not always effective in comparing certain statistics. Overall, people can use Data analysis to solve problems that face our everyday lives.

Further Resources¶

An article about the death of george floyd is attached here: https://www.nytimes.com/2020/05/31/us/george-floyd-investigation.html

A link to a non-profit group called the marshall project, which is speer heading the police reform movement is attached here: https://www.themarshallproject.org/records/110-police-reform

	id	State_Name	State_ab	County	City	Place	Type	Area_Code	Lat	Lon	Mean	Median	Stdev	sum_w
0	1011000	Alabama	AL	Mobile County	Chickasaw	Chickasaw city	City	251	30.771450	-88.079697	38773	30506	33101	1638.260513
1	1011010	Alabama	AL	Barbour County	Louisville	Clio city	City	334	31.708516	-85.611039	37725	19528	43789	258.017685
2	1011020	Alabama	AL	Shelby County	Columbiana	Columbiana city	City	205	33.191452	-86.615618	54606	31930	57348	926.031000
3	1011030	Alabama	AL	Mobile County	Satsuma	Creola city	City	251	30.874343	-88.009442	63919	52814	47707	378.114619
4	1011040	Alabama	AL	Mobile County	Dauphin Island	Dauphin Island	Town	251	30.250913	-88.171268	77948	67225	54270	282.320328

	State	City	Mean	Median	Stdev	Lat	Lon
0	Alabama	Chickasaw	38773	30506	33101	30.771450	-88.079697
1	Alabama	Louisville	37725	19528	43789	31.708516	-85.611039
2	Alabama	Columbiana	54606	31930	57348	33.191452	-86.615618
3	Alabama	Satsuma	63919	52814	47707	30.874343	-88.009442
4	Alabama	Dauphin Island	77948	67225	54270	30.250913	-88.171268
...	...	...	...	...	...	...	...
32521	Puerto Rico	Guaynabo	30649	13729	37977	18.397925	-66.130633
32522	Puerto Rico	Aguada	15520	9923	15541	18.385424	-67.203310
32523	Puerto Rico	Aguada	41933	34054	31539	18.356565	-67.180686
32524	Puerto Rico	Aguada	0	0	0	18.412041	-67.213413
32525	Puerto Rico	Aguadilla	28049	20229	33333	18.478094	-67.160453

	id	name	date	manner_of_death	armed	age	gender	race	city	state	signs_of_mental_illness	threat_level	flee	body_camera	arms_category
0	3	Tim Elliot	2015-01-02	shot	gun	53.0	M	Asian	Shelton	WA	True	attack	Not fleeing	False	Guns
1	4	Lewis Lee Lembke	2015-01-02	shot	gun	47.0	M	White	Aloha	OR	False	attack	Not fleeing	False	Guns
2	5	John Paul Quintero	2015-01-03	shot and Tasered	unarmed	23.0	M	Hispanic	Wichita	KS	False	other	Not fleeing	False	Unarmed
3	8	Matthew Hoffman	2015-01-04	shot	toy weapon	32.0	M	White	San Francisco	CA	True	attack	Not fleeing	False	Other unusual objects
4	9	Michael Rodriguez	2015-01-04	shot	nail gun	39.0	M	Hispanic	Evans	CO	False	attack	Not fleeing	False	Piercing objects
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
4890	5916	Rayshard Brooks	2020-06-12	shot	Taser	27.0	M	Black	Atlanta	GA	False	attack	Foot	True	Electrical devices
4891	5925	Caine Van Pelt	2020-06-12	shot	gun	23.0	M	Black	Crown Point	IN	False	attack	Car	False	Guns
4892	5918	Hannah Fizer	2020-06-13	shot	unarmed	25.0	F	White	Sedalia	MO	False	other	Not fleeing	False	Unarmed
4893	5921	William Slyter	2020-06-13	shot	gun	22.0	M	White	Kansas City	MO	False	other	Other	False	Guns
4894	5924	Nicholas Hirsh	2020-06-15	shot	gun	31.0	M	White	Lawrence	KS	False	attack	Car	False	Guns

	race	gender	age
0	Asian	M	53.0
1	White	M	47.0
2	Hispanic	M	23.0
3	White	M	32.0
4	Hispanic	M	39.0
...	...	...	...
4890	Black	M	27.0
4891	Black	M	23.0
4892	White	F	25.0
4893	White	M	22.0
4894	White	M	31.0

	City	State
0	Shelton	WA
1	Aloha	OR
2	Wichita	KS
3	San Francisco	CA
4	Evans	CO
...	...	...
4890	Atlanta	GA
4891	Crown Point	IN
4892	Sedalia	MO
4893	Kansas City	MO
4894	Lawrence	KS

Income and Police Shootings

By: Marc Ojalvo and Kyle Strougo