How to order a data frame R based on a request identifier and a previous request identifier?

I have an R data frame that looks like this:

  User | request_id | previous_request_id
 -------------------------------------
 A | 9 | 5
 A | 3 | 1
 A | 5 | NA
 A | 1 | 9
 B | 2 | 8
 B | 8 | 7
 B | 7 | NA
 B | 4 | 2

Each row corresponds to a request made by a specific user. Each row has a user identifier, request identifier, and identifier of their previous request. If there is no previous request, the value NA is specified in the previous_request_id field.

For each user, I want to order each request using the previous request identifier using:

  • The order is 1 if previous_request_id is NA
  • The order is 2 if previous_request_id is request_id with order 1
  • Order is 3 if previous_request_id is request_id with order 2
  • and etc.

The result of the above rules applied to the first table should look like this:

  User | request_id | previous_request_id | Order
 ---------------------------------------------
 A | 9 | 5 | 2
 A | 3 | 1 | 4
 A | 5 | NA | 1
 A | 1 | 9 | 3
 B | 2 | 8 | 3
 B | 8 | 7 | 2
 B | 7 | NA | 1
 B | 4 | 2 | 4

Is there any way to do this inside R? I believe that a graphic database package may be a way to do this, but so far I have not been able to find anything in my research (in the center in Cypher Neo4j).

Any help here would be greatly appreciated!

+6
source share
4 answers

There are many ways to do this, but here is what I came up with ...

df <- read.delim(text="User|request_id|previous_request_id A|9|5 A|3|1 A|5|NA A|1|9 B|2|8 B|8|7 B|7|NA B|4|2", sep="|") df$order <- rep(NA, nrow(df)) df$order[is.na(df$previous_request_id)] <- 1 df$order[df$order[match(df$previous_request_id, df$request_id)] == 1] <- 2 df$order[df$order[match(df$previous_request_id, df$request_id)] == 2] <- 3 df$order[df$order[match(df$previous_request_id, df$request_id)] == 3] <- 4 

But note that we repeat the same code (almost) over and over. We can create a loop to shorten the code a bit ...

 max_user_len <- max(table(df$User)) df$order <- rep(NA, nrow(df)) df$order[is.na(df$previous_request_id)] <- 1 sapply(1:max_user_len, function(x)df$order[df$order[match(df$previous_request_id, df$request_id)] == x] <<- x+1) > df$order [1] 2 4 1 3 3 2 1 4 
+2
source

There may be more efficient ways to do this, but here, how would I do it using only loops and recursion.

 str <- "User |request_id |previous_request_id A |9 |5 A |3 |1 A |5 |NA A |1 |9 B |2 |8 B |8 |7 B |7 |NA B |4 |2" tab <- read.table(textConnection(str), sep="|", header=TRUE) tab$order <- NA getOrder <- function(id){ i <- which(tab$request_id == id) if(is.na(tab$previous_request_id[i])){ tab$order[i] <<- 1 } else { tab$order[i] <<- getOrder(tab$previous_request_id[i]) + 1 } } for(i in 1:nrow(tab)){ if(is.na(tab$order[i])){ if(is.na(tab$previous_request_id[i])){ tab$order[i] <- 1 } else { tab$order[i] <- getOrder(tab$previous_request_id[i]) + 1 } } } 

Output:

  User request_id previous_request_id order 1 A 9 5 2 2 A 3 1 4 3 A 5 NA 1 4 A 1 9 3 5 B 2 8 3 6 B 8 7 2 7 B 7 NA 1 8 B 4 2 4 
0
source

With igraph this can be done by calculating the shortest path from the first request. This may work:

  require(igraph) df[]<-lapply(df,as.character) unlist( lapply(split(df,df$User), function(x) { graphtmp<-graph.edgelist(na.omit(as.matrix(x[,3:2]))) path<-as.vector(shortest.paths(graphtmp,x$request_id[is.na(x$previous_request_id)],x$request_id)) path+1 }),use.names=F) #[1] 2 4 1 3 3 2 1 4 
0
source

I'm not sure how this compares with other solutions, because it uses a for loop, but data operations and plyr should help speed up some recursive components:

 ## DATA UPLOAD df <- read.delim(text="User|request_id|previous_request_id A|9|5 A|3|1 A|5|NA A|1|9 B|2|8 B|8|7 B|7|NA B|4|2", sep="|") ## PACKAGE LOAD require(data.table) require(plyr) ## GET DATA INTO RIGHT FORMAT df <- data.table(df) df[, User := as.character(User)] df[, request_id := as.character(request_id)] df[, previous_request_id := as.character(previous_request_id)] ## THE ACTUAL PROCESS # Create vector of user ids user.list <- unique(df$User) # Setkey to speed up filtering setkey(df,User) get_order <- function(user,df) { # Consider only one user at a time s.df <- df[user] # Create an empty ordering column s.df$ord <- as.numeric(NA) # Redefine NA as 0 s.df[is.na(previous_request_id) == TRUE,]$previous_request_id <- "0" # Set seed to 0 seed <- "0" # Setkey to speed up filtering setkey(s.df,previous_request_id) for (i in 1:NROW(s.df)) { # Filter by seed and define ord as i s.df[seed]$ord <- i # Define new seed based on filtered request_id seed <- s.df[seed]$request_id} return(s.df)} # Loop through user vector and rbindlist to rebind the output rebuilt <- rbindlist(llply(.data = user.list, .fun = function(x) {get_order(x,df)})) 
0
source

Source: https://habr.com/ru/post/988441/


All Articles