Finding the closest approval time for each patient

I have two data sets:

First set:

 patient<-c("A","A","B","B","C","C","C","C")
 arrival<-c("11:00","11:00","13:00","13:00","14:00","14:00","14:00","14:00")
 lastRow<-c("","Yes","","Yes","","","","Yes")

 data1<-data.frame(patient,arrival,lastRow)

Another dataset:

 patient<-c("A","A","A","A","B","B","B","C","C","C")
 availableSlot<-c("11:15","11:35","11:45","11:55","12:55","13:55","14:00","14:00","14:10","17:00")

 data2<-data.frame(patient, availableSlot)

I want to create a column to be added to the first data set so that for each last row of each patient it displays the available slot, which is closest to the time of arrival:

Results:

  patient arrival lastRow availableSlot
       A   11:00        
       A   11:00     Yes     11:15
       B   13:00        
       B   13:00     Yes     12:55
       C   14:00        
       C   14:00        
       C   14:00        
       C   14:00     Yes     14:00

Understand if someone tells me how I can implement this in R.

+4
source share
2 answers

I would use data.table, first clearing, converting to ITime and ignoring redundant rows:

library(data.table)
setDT(data1)[, arrival := as.ITime(as.character(arrival))]
setDT(data2)[, availableSlot := as.ITime(as.character(availableSlot))]
DT1 = unique(data1, by="patient", fromLast=TRUE)

Then you can make a “sliding connection”:

res = data2[DT1, on=.(patient, availableSlot = arrival), roll="nearest", 
  .(patient, availableSlot = x.availableSlot)]

#    patient availableSlot
# 1:       A      11:15:00
# 2:       B      12:55:00
# 3:       C      14:00:00

How it works

Syntax x[i, on=, roll=, j].

  • on= - merge columns.
  • : i x.
  • roll="nearest" on= "" .
  • on= x.* i.*.
  • j , .() - list() .

http://r-datatable.com/Getting-started ?data.table , .


res, ...

# a very nonstandard step:
data1[lastRow == "Yes", availableSlot := res$availableSlot ]

#    patient  arrival lastRow availableSlot
# 1:       A 11:00:00                  <NA>
# 2:       A 11:00:00     Yes      11:15:00
# 3:       B 13:00:00                  <NA>
# 4:       B 13:00:00     Yes      12:55:00
# 5:       C 14:00:00                  <NA>
# 6:       C 14:00:00                  <NA>
# 7:       C 14:00:00                  <NA>
# 8:       C 14:00:00     Yes      14:00:00

data1 availableSlot , , data1$col <- val.

+8

( joel.wilson answer ), R

#Convert dates to POSIXct format
data1$arrival = as.POSIXct(data1$arrival, format = "%H:%M")
data2$availableSlot = as.POSIXct(data2$availableSlot, format = "%H:%M")

#Lookup times from data2$availableSlot closest to data1$arrival
data1$availableSlot = sapply(data1$arrival, function(x)
                    data2$availableSlot[which.min(abs(x - data2$availableSlot))])

#Keep just hour and minutes
data1$availableSlot = strftime(as.POSIXct(data1$availableSlot, 
                                origin = "1970-01-01"), format = "%H:%M")
data1$arrival = strftime(as.POSIXct(data1$arrival), format = "%H:%M")

#Remove times when lastrow is empty
data1$availableSlot[which(data1$lastRow != "Yes")] = ""
+1

Source: https://habr.com/ru/post/1668531/


All Articles