Have an as-is item return value using the match function in R

I have a much larger existing dataframe. For this small example, I would like to replace some of the variables (replace state (df1)) with newstate (df2) according to the first column. My problem is that the values ​​are returned as NA, since only some of them are mapped in the new data frame (df2).

Existing data frame:

state = c("CA","WA","OR","AZ") first = c("Jim","Mick","Paul","Ron") df1 <- data.frame(first, state) first state 1 Jim CA 2 Mick WA 3 Paul OR 4 Ron AZ 

New data frame matching existing data frame

 state = c("CA","WA") newstate = c("TX", "LA") first =c("Jim","Mick") df2 <- data.frame(first, state, newstate) first state newstate 1 Jim CA TX 2 Mick WA LA 

Tried to use a match, but returns NA for the β€œstate”, where the corresponding β€œfirst” variable from df2 is not found in the original data frame.

 df1$state <- df2$newstate[match(df1$first, df2$first)] first state 1 Jim TX 2 Mick LA 3 Paul <NA> 4 Ron <NA> 

Is there a way to ignore the nomogram or is the item returning the existing as-is variable? This will be an example of the desired result: the Jim / Mick states are updated, but the Paul and Ron states do not change.

  first state 1 Jim TX 2 Mick LA 3 Paul OR 4 Ron AZ 
+5
source share
3 answers

Is this what you want; BTW, if you really don't want to work with factors, use stringAsFactors = FALSE in your data.frame call. Note the use of nomatch = 0 in the match.

 > state = c("CA","WA","OR","AZ") > first = c("Jim","Mick","Paul","Ron") > df1 <- data.frame(first, state, stringsAsFactors = FALSE) > state = c("CA","WA") > newstate = c("TX", "LA") > first =c("Jim","Mick") > df2 <- data.frame(first, state, newstate, stringsAsFactors = FALSE) > df1 first state 1 Jim CA 2 Mick WA 3 Paul OR 4 Ron AZ > df2 first state newstate 1 Jim CA TX 2 Mick WA LA > > # create an index for the matches > indx <- match(df1$first, df2$first, nomatch = 0) > df1$state[indx != 0] <- df2$newstate[indx] > df1 first state 1 Jim TX 2 Mick LA 3 Paul OR 4 Ron AZ 
+6
source

I think you will get better behavior with character vectors than with factors.

 > df1 <- data.frame(first, state,stringsAsFactors=FALSE) > state = c("CA","WA") > newstate = c("TX", "LA") > first =c("Jim","Mick") > df2 <- data.frame(first, state, newstate, stringsAsFactors=FALSE) > df1[ match(df2$first, df1$first ), "state"] <- df2$newstate > df1 first state 1 Jim TX 2 Mick LA 3 Paul OR 4 Ron AZ 
+3
source
 library(data.table) DT1 <- as.data.table(df1) DT2 <- as.data.table(df2) setkey(DT1, first, state) setkey(DT2, first, state) DT1[DT2] # first state newstate # 1: Jim CA TX # 2: Mick WA LA 

Note that [.data.table also has a nomatch argument, that is:

 DT2[DT1, nomatch=0] # first state newstate # 1: Jim CA TX # 2: Mick WA LA DT2[DT1, nomatch=NA] # first state newstate # 1: Jim CA TX # 2: Mick WA LA # 3: Paul OR NA # 4: Ron AZ NA 

+2
source

Source: https://habr.com/ru/post/1203984/


All Articles