Creating a new variable based on a categorical variable already in the dataset

Suppose I have a dataset with a categorical variable X that takes values A , B or C

I want to create a new variable Y that

  • 1 if X = A ;
  • 2 if X = B ;
  • 3 if X = C

This is what I still know, and I know that this is not true.

 if(X==A) { (Y = 1) } else if(X==B) { (Y = 2) } else { (Y = 3) } 

I keep getting the error:

Object 'Y' not found

How to create a variable Y so that it can take these new values โ€‹โ€‹based on the values โ€‹โ€‹of X ?

+5
source share
3 answers

The voices to the question will greatly puzzle me ... so do you need an answer to this question?

Using the loop-based method, as the OP intended, is:

 Y <- numeric(length(X)) ## initialize a numeric vector `Y`, of the same length of `X` ## loop through all elements of `X`, use `if-else` to allocate value for `Y` for (i in seq_along(X)) { if (X[i] == "A") Y[i] <- 1 else if (X[i] == "B") Y[i] <- 2 else if (X[i] == "C") Y[i] <- 3 } 

Fully vectorial method,

 Y <- match(X, LETTERS[1:3]) 

Here, LETTERS are internal R-constants for capital letters. There are several constants in R, and you can get them all by reading the ?Constants documentation.

+4
source

Option 1. Take the numerical values โ€‹โ€‹of the coefficient.

 X # [1] "B" "C" "A" "C" "A" "C" "B" "B" "A" "A" c(factor(X)) # [1] 2 3 1 3 1 3 2 2 1 1 

c() cancels attributes and is used for general attachment. as.numeric() may be more readable.

Option 2: search vector.

 c(A = 1, B = 2, C = 3)[X] # BCACACBBAA # 2 3 1 3 1 3 2 2 1 1 

Data:

 set.seed(25) X <- sample(LETTERS[1:3], 10, TRUE) 
+5
source

In this case, you can consider dplyr::recode in tidyverse . This is essentially a vector switch that you seem to need. Alternatively, you can use the second data type and use dplyr::left_join or base::merge .

 library(tidyverse) data = tribble( ~x, ~y, 1, "A", 2, "A", 4, "B", 5, "C", 7, "Z" ) data %>% mutate( new_var = recode(y, "A" = "first", "B" = "second", "C" = "third", "Z" = "last") ) #> # A tibble: 5 X 3 #> xy new_var #> <dbl> <chr> <chr> #> 1 1 A first #> 2 2 A first #> 3 4 B second #> 4 5 C third #> 5 7 Z last 
+2
source

Source: https://habr.com/ru/post/1262898/


All Articles