Example statement equivalent in R

Question

Example statement equivalent in R

I have a variable in a data frame where one of the fields usually has 7-8 values. I want them to divide them into 3 or 4 new categories within the new variable inside the data frame. What is the best approach?

I would use the CASE statement if I were in an SQL-like tool, but not sure how to attack this in R.

Any help you can provide would be greatly appreciated!

+48

r

Btibert3 Jan 07 2018-11-11T00:

source share

13 answers

Henrico · Answer 1 · 2011-01-07 18:15

Take a look at the cases function from the memisc package. It implements case functionality with two different ways to use it. From the examples in the package:

 z1=cases( "Condition 1"=x<0, "Condition 2"=y<0,# only applies if x >= 0 "Condition 3"=TRUE )

where x and y are two vectors.

Marek · Answer 2 · 2011-09-12 15:57

If you get factor , you can change the levels using the standard method:

 df <- data.frame(name = c('cow','pig','eagle','pigeon'), stringsAsFactors = FALSE) df$type <- factor(df$name) # First step: copy vector and make it factor # Change levels: levels(df$type) <- list( animal = c("cow", "pig"), bird = c("eagle", "pigeon") ) df # name type # 1 cow animal # 2 pig animal # 3 eagle bird # 4 pigeon bird

You can write a simple function as a wrapper:

 changelevels <- function(f, ...) { f <- as.factor(f) levels(f) <- list(...) f } df <- data.frame(name = c('cow','pig','eagle','pigeon'), stringsAsFactors = TRUE) df$type <- changelevels(df$name, animal=c("cow", "pig"), bird=c("eagle", "pigeon"))

Prasad Chalasani · Answer 3 · 2011-01-07 13:49

The switch method is used here:

 df <- data.frame(name = c('cow','pig','eagle','pigeon'), stringsAsFactors = FALSE) df$type <- sapply(df$name, switch, cow = 'animal', pig = 'animal', eagle = 'bird', pigeon = 'bird') > df name type 1 cow animal 2 pig animal 3 eagle bird 4 pigeon bird

The only drawback to this is that you must continue to write the category name ( animal , etc.) for each item. It is syntactically more convenient to define our categories as shown below (see a very similar question How to add a column to a data frame in R )

 myMap <- list(animal = c('cow', 'pig'), bird = c('eagle', 'pigeon'))

and we want to somehow "invert" this mapping. I am writing my own invMap function:

 invMap <- function(map) { items <- as.character( unlist(map) ) nams <- unlist(Map(rep, names(map), sapply(map, length))) names(nams) <- items nams }

and then invert the above mapping as follows:

 > invMap(myMap) cow pig eagle pigeon "animal" "animal" "bird" "bird"

And then it's easy to use this to add a type column to the data frame:

 df <- transform(df, type = invMap(myMap)[name]) > df name type 1 cow animal 2 pig animal 3 eagle bird 4 pigeon bird

Gregory Demin · Answer 4 · 2011-01-07 09:34

Imho, the simplest and most universal code:

 dft=data.frame(x = sample(letters[1:8], 20, replace=TRUE)) dft=within(dft,{ y=NA y[x %in% c('a','b','c')]='abc' y[x %in% c('d','e','f')]='def' y[x %in% 'g']='g' y[x %in% 'h']='h' })

adamsss6 · Answer 5 · 2016-07-11 12:57

I do not see offers for the "switch". Sample code (run it):

 x <- "three"; y <- 0; switch(x, one = {y <- 5}, two = {y <- 12}, three = {y <- 432}) y

Ian Fellows · Answer 6 · 2011-01-07 03:16

You can use recode from the car package:

 library(ggplot2) #get data library(car) daimons$new_var <- recode(diamonds$clarity , "'I1' = 'low';'SI2' = 'low';else = 'high';")[1:10]

42 - · Answer 7 · 2011-01-07 03:56

There is a switch , but I can never make it work as it seems to me. Since you did not provide an example, I will do one using a factor variable:

  dft <-data.frame(x = sample(letters[1:8], 20, replace=TRUE)) levels(dft$x) [1] "a" "b" "c" "d" "e" "f" "g" "h"

If you specify the categories that you want in the order corresponding to the reassignment, you can use the factor or numeric variables as an index:

 c("abc", "abc", "abc", "def", "def", "def", "g", "h")[dft$x] [1] "def" "h" "g" "def" "def" "abc" "h" "h" "def" "abc" "abc" "abc" "h" "h" "abc" [16] "def" "abc" "abc" "def" "def" dft$y <- c("abc", "abc", "abc", "def", "def", "def", "g", "h")[dft$x] str(dft) 'data.frame': 20 obs. of 2 variables: $ x: Factor w/ 8 levels "a","b","c","d",..: 4 8 7 4 6 1 8 8 5 2 ... $ y: chr "def" "h" "g" "def" ...

Later I learned that there are actually two different switching functions. This is not a general function, but you should think of it as switch.numeric or switch.character . If your first argument is an R 'factor, you get switch.numeric behavior that can cause problems, as most people see the factors displayed as a character and make the wrong assumption that all functions will treat them as such.

jamesM · Answer 8 · 2011-09-09 20:28

I do not like any of them, they are not clear to the reader or potential user. I just use an anonymous function, the syntax is not as smooth as the case argument, but the evaluation is similar to the case argument, and not that it hurts. it also assumes that you evaluate it where your variables are defined.

 result <- ( function() { if (x==10 | y< 5) return('foo') if (x==11 & y== 5) return('bar') })()

all of them () must be concluded and evaluated anonymous function.

Evan Cortens · Answer 9 · 2017-01-26 03:51

case_when() , which was added to dplyr in May 2016, solves this problem similarly to memisc::cases() .

For example:

 library(dplyr) mtcars %>% mutate(category = case_when( .$cyl == 4 & .$disp < median(.$disp) ~ "4 cylinders, small displacement", .$cyl == 8 & .$disp > median(.$disp) ~ "8 cylinders, large displacement", TRUE ~ "other" ) )

Aaron · Answer 10 · 2011-09-10 20:03

An actual example may turn out to be wrong. If this is a factor that is likely to simply set the factor levels accordingly.

Say you have a factor with letters A through E like this.

 > a <- factor(rep(LETTERS[1:5],2)) > a [1] ABCDEABCDE Levels: ABCDE

To join levels B and C and call it BC, simply change the names of these levels to BC.

 > levels(a) <- c("A","BC","BC","D","E") > a [1] A BC BC DEA BC BC DE Levels: A BC DE

The result is optional.

kuba · Answer 11 · 2013-11-17 11:58

If you want to have sql-like syntax, you can just use the sqldf package. The function to be used is also called sqldf , and the syntax is as follows

 sqldf(<your query in quotation marks>)

patrickmdnet · Answer 12 · 2017-04-15 21:28

You can use the base merge function for case-style remapping tasks:

 df <- data.frame(name = c('cow','pig','eagle','pigeon','cow','eagle'), stringsAsFactors = FALSE) mapping <- data.frame( name=c('cow','pig','eagle','pigeon'), category=c('animal','animal','bird','bird') ) merge(df,mapping) # name category # 1 cow animal # 2 cow animal # 3 eagle bird # 4 eagle bird # 5 pig animal # 6 pigeon bird

בנימן הגלילי · Answer 13 · 2017-08-03 07:59

Mixing plyr::mutate and dplyr::case_when works for me and is readable.

 iris %>% plyr::mutate(coolness = dplyr::case_when(Species == "setosa" ~ "not cool", Species == "versicolor" ~ "not cool", Species == "virginica" ~ "super awesome", TRUE ~ "undetermined" )) -> testIris head(testIris) levels(testIris$coolness) ## NULL testIris$coolness <- as.factor(testIris$coolness) levels(testIris$coolness) ## ok now testIris[97:103,4:6]

Bonus points if the column can exit the mutate as a factor instead of char! The last line of the case_when statement, which captures all inconsistent lines, is very important.

  Petal.Width Species coolness 97 1.3 versicolor not cool 98 1.3 versicolor not cool 99 1.1 versicolor not cool 100 1.3 versicolor not cool 101 2.5 virginica super awesome 102 1.9 virginica super awesome 103 2.1 virginica super awesome

Example statement equivalent in R

More articles: