How to make this cycle more efficient?

I have a data frame that looks like this:

user1,product1,0 user1,product2,2 user1,product3,1 user1,product4,2 user2,product3,0 user2,product2,2 user3,product4,0 user3,product5,3 

A data frame has millions of rows. I need to go through each row, and if the value in the last column is 0, save this product number, otherwise add the product number to the previous product number, which has value = 0, and then write in a new data frame.

For example, the resulting matrix should be

 user1,product1 user1,product1product2 user1,product1product3 user1,product1product4 user2,product3 user2,product3product2 user3,product4 user3,product4product5 

I wrote a for loop to go through each line, and it works, but very slow. How can I speed it up? I tried to vectorize it, but I'm not sure, because I need to check the value of the previous line.

+4
source share
1 answer

Please note that in fact you do not have a matrix. A matrix can contain only one atomic type (numeric, integer, character, etc.). You really have data.frame.

What you want to do is easy to do with na.locf from the zoo and ifelse packages.

 x <- structure(list(V1 = c("user1", "user1", "user1", "user1", "user2", "user2", "user3", "user3"), V2 = c("product1", "product2", "product3", "product4", "product3", "product2", "product4", "product5"), V3 = c("0", "2", "1", "2", "0", "2", "0", "3")), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 8L)) library(zoo) # First, create a column that contains the value from the 2nd column # when the 3rd column is zero. x$V4 <- ifelse(x$V3==0,x$V2,NA) # Next, replace all the NA with the previous non-NA value x$V4 <- na.locf(x$V4) # Finally, create a column that contains the concatenated strings x$V5 <- ifelse(x$V2==x$V4,x$V2,paste(x$V4,x$V2,sep="")) # Desired output x[,c(1,5)] 

Since you are using data.frame, you need to make sure that the β€œproduct” columns are a symbol and not a coefficient (the code above will give odd results if the β€œproduct” columns are a factor).

+9
source

Source: https://habr.com/ru/post/1382886/


All Articles