R - graphical frequency of observations over time with a small range of values

I would try to graphically display the frequency of observations over time. I have a dataset where hundreds of laws are encoded 0-3. I would like to know if results 2-3 occur more often over time. Here is an example of the layout data:

Data <- data.frame( year = sample(1998:2004, 200, replace = TRUE), score = sample(1:4, 200, replace = TRUE) ) 

If i draw

 plot(Data$year, Data$score) 

I get a checkered matrix where every spot is filled, but I canโ€™t say which numbers are more common. Is there any way to color or resize each point by the number of observations of a given row / year?

A few comments can help answer the question:

1). I do not know how to select data where certain numbers are more common than others. My sample selection is the same for all numbers. If there is a better way, I had to create my reproducible data in order to reflect more observations in subsequent years, I would like to know how to do it.

2). it would seem better to visualize in terms of scattering, but I could be wrong. I am open to other visualizations.

Thanks!

+6
source share
5 answers

This is how I approach this (hopefully this is what you need)

Create data (Note: when using sample in questions always use set.seed to be reproducible)

 set.seed(123) Data <- data.frame( year = sample(1998:2004, 200, replace = TRUE), score = sample(1:4, 200, replace = TRUE) ) 

Find frequncies score for year with table

 Data2 <- as.data.frame.matrix(table(Data)) Data2$year <- row.names(Data2) 

Use melt to convert it to long format

 library(reshape2) Data2 <- melt(Data2, "year") 

Separate the data by showing different colors per group and the preliminary frequency of the relative size.

 library(ggplot2) ggplot(Data2, aes(year, variable, size = value, color = variable)) + geom_point() 

enter image description here

Alternatively, you can use both fill and size to describe the frequency, something like

 ggplot(Data2, aes(year, variable, size = value, fill = value)) + geom_point(shape = 21) 

enter image description here

+4
source

Here's a different approach:

 ggplot(Data, aes(year)) + geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score) 

pic

Each face represents a single "grade" value, as indicated in the name of each face. You can easily get an idea of โ€‹โ€‹the accounts by looking at the height of the bars + color (light blue, which indicates more points).


Of course, you could only do this for score %in% 2:3 if you do not want score 1 and 4 to be included. In this case, you can do:

 ggplot(Data[Data$score %in% 2:3,], aes(year)) + geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score) 
+4
source

There are so many answers ... It seems you want to know whether the frequency of outcomes 2-3 increases over time, so why not build it directly:

 set.seed(1) Data <- data.frame( year = sample(1998:2004, 200, replace = TRUE), score = sample(0:3, 200, replace = TRUE)) library(ggplot2) ggplot(Data, aes(x=factor(year),y=score, group=(score>1)))+ stat_summary(aes(color=(score>1)),fun.y=length, geom="line")+ scale_color_discrete("score",labels=c("0 - 1","2 - 3"))+ labs(x="",y="Frequency") 

+4
source
 > with(Data, round( prop.table(table(year,score), 1), 3) ) score year 1 2 3 4 1998 0.308 0.231 0.231 0.231 1999 0.136 0.273 0.227 0.364 2000 0.281 0.250 0.219 0.250 2001 0.129 0.290 0.226 0.355 2002 0.217 0.174 0.261 0.348 2003 0.286 0.286 0.200 0.229 2004 0.387 0.129 0.194 0.290 png(); plot(jitter(Data$year), jitter(Data$score) );dev.off() 

enter image description here

There are other methods that you can use if the number of points is so large that jitter does not allow you to count the number of points. You can use a transparent color that allows you to determine the density of points. The last 2 hexadecimal digits in the hexadecimal hexadecimal number were preceded by the letter octothorpe - alpha transparency of the color. See ?col2rgb and ?col2rgb . Compare these two graphs with the new data, which allows you to have differences in the proportions:

 Data <- data.frame( year = rep(1998:2004, length=49000), score = sample(1:7, 49000, prob=(1:7)/5, replace = TRUE) ) png(); plot(jitter(Data$year), jitter(Data$score) );dev.off() 

The alpha-transparency example

  png(); plot(jitter(Data$year), jitter(Data$score) , col="#bbbbbb11" );dev.off() 

enter image description here

+3
source

Another alternative:

 df<-aggregate(Data$score,by= list(Data$year),table) matplot(df$Group.1,(df[,2])) 

hope this helps

+1
source

Source: https://habr.com/ru/post/980090/


All Articles