I have an information frame, which includes a number of statistics on crimes, including the date and time of the crime, as well as a category.
0 5/13/2015 8:55 VEHICLE THEFT
1 5/13/2015 8:41 OTHER OFFENSES
2 5/13/2015 8:36 OTHER OFFENSES
3 5/13/2015 8:30 NON-CRIMINAL
4 5/13/2015 8:17 OTHER OFFENSES
5 5/13/2015 8:16 OTHER OFFENSES
6 5/13/2015 8:10 LARCENY/THEFT
7 5/13/2015 8:00 BURGLARY
8 5/13/2015 8:00 MISSING PERSON
9 5/13/2015 8:00 OTHER OFFENSES
10 5/13/2015 8:00 ASSAULT
---
So, for the example above, it simply prints: "Other violations."
This is a massive database with over 400,000 rows.
I need to write a function that allows me to enter any given time interval (using from and to), and then determine which category of crime has occurred with the greatest frequency. This is what I have and it does not work:
import pandas as pd
import csv
import datetime
timeData = open("timeData.csv")
df = pd.read_csv('timeData.csv')
from datetime import timedelta, date
df['Dates'] = pd.to_datetime(df['Dates'])
def daterange(start_date, end_date):
for n in range(int ((end_date - start_date).days)):
yield start_date + timedelta(n)
start_date = date(2015, 5, 1)
end_date = date(2015, 6, 2)
for daterange(start_date, end_date):
df['Category'].value_counts()
( A) , . ( B). , , .
?