Conditional recoding based on search vector

I need to conditionally transcode my data frame daccording to the search vector.

dput(lookup)
structure(c("Apple", "Apple", "Banana", "Carrot"), .Names = c("101", "102", "102", "103"))
dput(d)
structure(list(pat = c(101, 101, 101, 102, 102, 103), gene = structure(1:6, .Label = c("a", 
"b", "c", "d", "e", "f"), class = "factor"), Apple = c(0.1, 0.2, 
0.3, 0.4, NA, NA), Banana = c(NA, NA, NA, NA, 0.55, NA), Carrot = c(NA, 
NA, NA, NA, NA, 0.6)), .Names = c("pat", "gene", "Apple", "Banana", 
"Carrot"), row.names = c(NA, -6L), class = "data.frame")

d- This is a wide frame of data that I went through reshape. I need to recode any NAsin each column Apple, Bananaand Carrotup 0, if it patmatches this column according to the lookup table. In this case d$Apple[5], d$Banana[4]they will be transcoded to 0.

I worked with recodefrom dplyr, but I have no idea how to find and transcode it, not to mention that it needs to be done on several columns ... There was another related post on transcoding variables in R using a lookup table , but it seems This does not apply to my problem. Can anyone help me out? Thank!

Edit

I tried the following :.

e <- melt(d, id.vars=c("pat", "gene"))
e %>% mutate(test=ifelse(lookup[as.character(pat)] == variable, replace(value, is.na(value), 0), value))

My code works partially. It was possible to transcode NAto d$Apple[5], but not to d$Banana[4], because the search can only give the first value:

lookup["102"]
    102 
"Apple" 

whereas I need my search in order to be able to display both "Apple" and "Banana" and to be able to convert NAs, fulfilling each condition accordingly. Any ideas?

+4
source share
3

, dplyr , .

for(i in unique(lookup)){
    need_to_replace = is.na(d[[i]]) & (d$pat %in% names(lookup[lookup %in% i]))
    d[[i]][need_to_replace] = 0
}

d

   pat gene Apple Banana Carrot
1 101    a   0.1     NA     NA
2 101    b   0.2     NA     NA
3 101    c   0.3     NA     NA
4 102    d   0.4   0.00     NA
5 102    e   0.0   0.55     NA
6 103    f    NA     NA    0.6
+2

, ,

for(i in 1:nrow(d)){
  mtch <- lookup[which(d$pat[i] == names(lookup))] # Get lookup matches for row i
  colnum <- which(colnames(d) %in% mtch) # Get column nr that matches lookup value
  newval<-ifelse(is.na(d[i,colnum]),0,d[i,colnum]) # if it contains NA replace with 0
  d[i,colnum]<-unlist(newval) # replace the values

}

  pat gene Apple Banana Carrot
1 101    a   0.1     NA     NA
2 101    b   0.2     NA     NA
3 101    c   0.3     NA     NA
4 102    d   0.4   0.00     NA
5 102    e   0.0   0.55     NA
6 103    f    NA     NA    0.6

,

0

I would work with a long format and use the connections from dplyr.

First, I will return to the long format, for example:

library(tidyverse)
long_format <- d %>% 
  gather(fruit, value, -pat, -gene) 

Then I would create a search like data_frameso we can use joins.

lookup <- tribble(~pat, ~fruit,
                  101, "Apple",
                  102, "Apple",
                  102, "Banana",
                  103, "Carrot")

Using the tool right_join, we save all the combinations from the search. Then we replace the missing values ​​with 0and distribute back to the wide format if you need it.

long_format %>% 
  right_join(lookup) %>% 
  replace_na(replace = list(value = 0)) %>%
  spread(fruit, value)
#> Joining, by = c("pat", "fruit")
#> pat gene Apple Banana Carrot
#> 1 101    a   0.1     NA     NA
#> 2 101    b   0.2     NA     NA
#> 3 101    c   0.3     NA     NA
#> 4 102    d   0.4   0.00     NA
#> 5 102    e   0.0   0.55     NA
#> 6 103    f    NA     NA    0.6
0
source

Source: https://habr.com/ru/post/1688979/


All Articles