How to count the number of observations at given intervals in R?

I have data that includes variables for the hour, minute, and second for each observation. I want to count the number of observations until 3 in the morning, all observations until 6 in the morning, all observations until 9 in the morning, etc. Any help on this would be greatly appreciated.

Sample data:

day hour minute second 01 17 10 03 01 17 14 20 01 17 25 27 01 17 32 39 01 17 33 40 01 17 34 10 01 17 34 14 01 17 34 16 01 17 34 21 01 17 34 23 01 17 34 25 01 17 34 31 01 17 34 36 

I have about 300,000 observations like this.

hour: int 17 17 17 17 17 17 17 17 17 17 17

minute: int 10 14 25 32 33 34 34 34 34 34

second: int 3 20 27 39 40 10 14 16 21 23

+4
source share
3 answers

One approach is to create a new variable based on your binning criteria, and then tab on that variable:

 set.seed(1) dat <- data.frame(hour = sample(0:23, 100, TRUE, prob = runif(24)), minute = sample(0:59,100, TRUE, prob = runif(60)), second = sample(0:59,100, TRUE, prob = runif(60))) #Adjust bins accordingly dat <- transform(dat, bin = ifelse(hour < 3,"Before 3", ifelse(hour < 6,"Before 6", ifelse(hour <9,"Before 9","Later in day")))) as.data.frame(table(dat$bin)) Var1 Freq 1 Before 3 7 2 Before 6 17 3 Before 9 19 4 Later in day 57 

Depending on the number of mailboxes required, you may run into problems with nested ifelse () statements, but this should give you a start. Please update your question in more detail if you are stuck.

+6
source

How about length(which(data$hour <=2 )) ? I used 2 hours here to avoid having to deal with minutes and seconds in the first place. Then a cycle or apply for all the hours you want to count.

If you need to restart the counter every day, use the same values ​​for the $ day data.

+3
source

This approach gives you more flexibility if you decide that you need different times. You can find n below any point in time (and not just hours). Because I'm lazy, I did this job, treating everything as characters.

 #1. Create a fake data set as chase did set.seed(1) dat <- data.frame(hour = sample(0:23, 100, TRUE, prob = runif(24)), minute = sample(0:59,100, TRUE, prob = runif(60)), second = sample(0:59,100, TRUE, prob = runif(60))) #2. Create a function to turn your single digits double and everything into character dig <- function(x){ ifelse(nchar(as.character(x))<2, paste("0", as.character(x), sep=""), as.character(x)) } #3. Use the dig function to make a character dataframe dat <- data.frame(sapply(dat, dig)) #4. Paste hour minute and second together into new character vector dat <- transform(dat, time=as.numeric(paste(hour, minute, second,sep=""))) #5. function to take that character vector and compare it to the cut off time n.obs <- function(var, hour='0', min='00', sec='00', pm=FALSE){ hour <- if(pm) as.character(as.numeric(hour) + 12) else hour bench <- as.numeric(paste(hour, min, sec, sep="")) length(var[var<=bench]) } #try it out n.obs(dat$time, '2') n.obs(dat$time, '2', pm=T) n.obs(dat$time, '14', pm=F) #notice same as above because pm=F n.obs(dat$time, hour='14', min='30', pm=F) 
+1
source

Source: https://habr.com/ru/post/1398101/


All Articles