Do I need to add the days of the year where the graph was zero? (Crime analysis in R)

I am analyzing a crime in the Baltimore area (5 years of data). I create line charts for specific types of crime in specific areas of the area. However, not every type of crime is reported every day in every district. Therefore, there are no days in zero-count data. There are only those days when crime was reported in the data. This visually affects the data of the line diagram touching the x axis at zero. Does this negatively affect the trend lines created by stat_smooth to identify an increase / decrease in crime types?

Playable code for creating a line chart:

#Read crime data from GitHub repo into a R dataframe
df = read.csv("https://raw.githubusercontent.com/brianthomasbaker/Baltimore-Crime-Analysis/master/Baltimore_SE_Reported_Crime_2010_to_2014.csv", stringsAsFactors=FALSE, sep=",")

#Format CrimeDate column
df$CrimeDate = as.Date(df$CrimeDate, "%m/%d/%Y")

#Create new dataframe of only Larceny From Auto crimes by Day of the Year in Canton (2010-2014)
library(dplyr)
df_cantonlarcauto = df %>%
  filter(Neighborhood == "Canton", Description == "LARCENY FROM AUTO") %>%
  group_by(CrimeDate) %>%
  summarize(crimes = n())

#Create Line Chart using ggplot
library(ggplot2)
ggplot(df_cantonlarcauto, aes(x = CrimeDate, y = crimes, group=1)) +
  geom_line() +
  scale_size_area() +
  stat_smooth(method = "gam") +
  xlab("Year") +
  ylab("Number of Crimes") +
  ylim(0,13) +
  theme(plot.title = element_text(family = "Trebuchet MS", color="#666666", face="bold", size=32, hjust=0)) +
  theme(axis.title = element_text(family = "Trebuchet MS", color="#666666", face="bold", size=22)) +
  ggtitle("Larceny From Auto\nCanton (2010-2014)")

head(df_cantonlarcauto)

, 2 3 . ? , R? ?

+4
2

:

library(dplyr)
df_cantonlarcauto_missing = data_frame(CrimeDate = seq(min(df_cantonlarcauto$CrimeDate), max(df_cantonlarcauto$CrimeDate), 1)) %>% 
  left_join(df_cantonlarcauto)

(ggplot (df_cantonlarcauto_missing, aes (x = CrimeDate, y = , group = 1)) +...), .

, , , , 0 , - (, ), /0

df_cantonlarcauto_missing = data_frame(CrimeDate = seq(min(df_cantonlarcauto$CrimeDate), max(df_cantonlarcauto$CrimeDate), 1)) %>% 
  left_join(df_cantonlarcauto) %>% 
  mutate(crimes = ifelse(is.na(crimes), 0, crimes)) %>% 
  mutate(crimes = c(rep(NA, 6), rollmean(crimes, 7, align = "right")))

ggplot(df_cantonlarcauto_missing, aes(x = CrimeDate, y = crimes, group=1)) +
  geom_line() +
  scale_size_area() +
  stat_smooth(method = "gam") +
  xlab("Year") +
  ylab("Number of Crimes") +
  # ylim(0,13) +
  theme(plot.title = element_text(family = "Trebuchet MS", color="#666666", face="bold", size=32, hjust=0)) +
  theme(axis.title = element_text(family = "Trebuchet MS", color="#666666", face="bold", size=22)) +
  ggtitle("Larceny From Auto\nCanton (2010-2014)")

Moving Average Plot

+3

NA , . :

xy <- data.frame(CrimeDate = seq(df_cantonlarcauto$CrimeDate[1], to = df_cantonlarcauto$CrimeDate[nrow(df_cantonlarcauto)], by = 1))
xy <- merge(xy, df_cantonlarcauto, all.x = TRUE)

ggplot(xy, aes(x = CrimeDate, y = crimes, group=1)) +
    geom_line() +
    scale_size_area() +
    stat_smooth(method = "gam") +
    xlab("Year") +
    ylab("Number of Crimes") +
    ylim(0,13) +
    theme(plot.title = element_text(family = "Trebuchet MS", color="#666666", face="bold", size=32, hjust=0)) +
    theme(axis.title = element_text(family = "Trebuchet MS", color="#666666", face="bold", size=22)) +
    ggtitle("Larceny From Auto\nCanton (2010-2014)")

enter image description here

+1

Source: https://habr.com/ru/post/1608854/


All Articles