Merging Records Interval

Let me start by saying that this question relates to R (static programming language), but I am opening direct sentences for other environments.

The goal is to combine the results from dataframe (df) A into subelements in df B. This is a one-to-many relationship, but here is a twist , as soon as the records are matched with keys, they must also correspond to a specific time frame, given the start time and duration.

For example, several entries in df A:

OBS ID StartTime Duration Outcome 1 01 10:12:06 00:00:10 Normal 2 02 10:12:30 00:00:30 Weird 3 01 10:15:12 00:01:15 Normal 4 02 10:45:00 00:00:02 Normal 

And from df B:

  OBS ID Time 1 01 10:12:10 2 01 10:12:17 3 02 10:12:45 4 01 10:13:00 

The desired merger result will be:

  OBS ID Time Outcome 1 01 10:12:10 Normal 3 02 10:12:45 Weird 

Desired result: dataframe B with the results combined with A. Remarks 2 and 4 were discarded because, although they coincided with the identifiers on the entries in A, they did not fall into any of the specified time intervals.

Question

Is it possible to perform this kind of operation in R and how do you get started? If not, can you suggest an alternative tool?

+4
source share
3 answers

Data setting

Set up input frames first. We create two versions of data frames: A and B just use the character columns for time, and At and Bt use the chron package "times" class for time (which takes precedence over the "character" class, which you can add and subtract):

 LinesA <- "OBS ID StartTime Duration Outcome 1 01 10:12:06 00:00:10 Normal 2 02 10:12:30 00:00:30 Weird 3 01 10:15:12 00:01:15 Normal 4 02 10:45:00 00:00:02 Normal" LinesB <- "OBS ID Time 1 01 10:12:10 2 01 10:12:17 3 02 10:12:45 4 01 10:13:00" A <- At <- read.table(textConnection(LinesA), header = TRUE, colClasses = c("numeric", rep("character", 4))) B <- Bt <- read.table(textConnection(LinesB), header = TRUE, colClasses = c("numeric", rep("character", 2))) # in At and Bt convert times columns to "times" class library(chron) At$StartTime <- times(At$StartTime) At$Duration <- times(At$Duration) Bt$Time <- times(Bt$Time) 

sqldf with class times.

Now we can perform the calculation using the sqldf package. We use method="raw" (which does not assign classes to output), so we must assign the class "times" to output "Time" :

 library(sqldf) out <- sqldf("select Bt.OBS, ID, Time, Outcome from At join Bt using(ID) where Time between StartTime and StartTime + Duration", method = "raw") out$Time <- times(as.numeric(out$Time)) 

Result:

 > out OBS ID Time Outcome 1 1 01 10:12:10 Normal 2 3 02 10:12:45 Weird 

With the sqldf development version, this can be done without using method="raw" , and the "Time" column will be automatically set to the "times" class using the sqldf class assignment heuristic:

 library(sqldf) source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R") # grab devel ver sqldf("select Bt.OBS, ID, Time, Outcome from At join Bt using(ID) where Time between StartTime and StartTime + Duration") 

sqldf with character class

In fact, it is not possible to use the "times" class, performing all time calculations in sqlite from string characters using the sqlite strftime function. Unfortunately, the SQL statement is a little involved:

 sqldf("select B.OBS, ID, Time, Outcome from A join B using(ID) where strftime('%s', Time) - strftime('%s', StartTime) between 0 and strftime('%s', Duration) - strftime('%s', '00:00:00')") 

EDIT:

A series of fixes that fixed the grammar added additional approaches and fixed / improved read.table instructions.

EDIT:

Simplified / improved final sqldf statement.

+4
source

here is an example:

 # first, merge by ID z <- merge(A[, -1], B, by = "ID") # convert string to POSIX time z <- transform(z, s_t = as.numeric(strptime(as.character(z$StartTime), "%H:%M:%S")), dur = as.numeric(strptime(as.character(z$Duration), "%H:%M:%S")) - as.numeric(strptime("00:00:00", "%H:%M:%S")), tim = as.numeric(strptime(as.character(z$Time), "%H:%M:%S"))) # subset by time range subset(z, s_t < tim & tim < s_t + dur) 

output:

  ID StartTime Duration Outcome OBS Time s_t dur tim 1 1 10:12:06 00:00:10 Normal 1 10:12:10 1321665126 10 1321665130 2 1 10:12:06 00:00:10 Normal 2 10:12:15 1321665126 10 1321665135 7 2 10:12:30 00:00:30 Weird 3 10:12:45 1321665150 30 1321665165 

OBS # 2 looks in range. it makes sense?

+2
source

Combine the two data files with merge() . Then subset() resulting data.frame with the condition time >= startTime & time <= startTime + Duration or any rules makes sense to you.

+1
source

Source: https://habr.com/ru/post/1381927/


All Articles