How to ignore the case of using a subset in R

Question

How to ignore the case of using a subset in R

How to ignore case when using subset function in R?

eos91corr.data <- subset(test.data,select=c(c(X,Y,Z,W,T)))

I would like to select columns with names x, y, z, w, t. what should I do?

thanks

+4

r subset

Autumn Nov 13 '12 at 21:00

source share

3 answers

You can use regular expressions with the grep function to ignore the case when defining the column names to select. Once you have determined the names of the columns you need, you can pass them to a subset .

If your data

 dat <- data.frame(xy = 1:5, x = 1:5, mm = 1:5, y = 1:5, z = 1:5, w = 1:5, t = 1:5, r = 1:5) # xy x mm yzwtr # 1 1 1 1 1 1 1 1 1 # 2 2 2 2 2 2 2 2 2 # 3 3 3 3 3 3 3 3 3 # 4 4 4 4 4 4 4 4 4 # 5 5 5 5 5 5 5 5 5

Then

 (selNames <- grep("^[XYZWT]$", names(dat), ignore.case = TRUE, value = TRUE)) # [1] "x" "y" "z" "w" "t" subset(dat, select = selNames) # xyzwt # 1 1 1 1 1 1 # 2 2 2 2 2 2 # 3 3 3 3 3 3 # 4 4 4 4 4 4 # 5 5 5 5 5 5

EDIT If column names are longer than one letter, the above approach will not work too well. Suppose you can get the desired column names in a vector, you can use the following:

 upperNames <- c("XY", "Y", "Z", "W", "T") (grepPattern <- paste0("^", upperNames, "$", collapse = "|")) # [1] "^XY$|^Y$|^Z$|^W$|^T$" (selNames2 <- grep(grepPattern, names(dat), ignore.case = TRUE, value = TRUE)) # [1] "xy" "y" "z" "w" "t" subset(dat, select = selNames2) # xy yzwt # 1 1 1 1 1 1 # 2 2 2 2 2 2 # 3 3 3 3 3 3 # 4 4 4 4 4 4 # 5 5 5 5 5 5

+2

Benbarnes Nov 13 '12 at 21:18

source share

The 'stringr' library is a very neat wrapper for all of these functions. It has the parameter "ignore.case" as follows:

  also, you may want to consider using match not subset.

0

Ricardo saporta Nov 13 '12 at 23:32

source share

Stephan kolassa · Accepted Answer · 2012-11-13T22:09:36+0000

If you can live without the subset() function, the tolower() function may work:

 dat <- data.frame(XY = 1:5, x = 1:5, mm = 1:5, y = 1:5, z = 1:5, w = 1:5, t = 1:5, r = 1:5) dat[,tolower(names(dat)) %in% c("xy","x")]

However, this will return data.frame with the columns in the order in which they are in the original dat dataset: both

 dat[,tolower(names(dat)) %in% c("xy","x")]

and

 dat[,tolower(names(dat)) %in% c("x","xy")]

will give the same result, although the order of target names has been reversed.

If you want the columns in the result to be in the order of the destination vector, you need to be a little more bizarre. The following two commands return a data.frame with the columns in the order of the target vector (i.e., the results will be different when the column is on):

 dat[,sapply(c("x","xy"),FUN=function(foo)which(foo==tolower(names(dat))))] dat[,sapply(c("xy","x"),FUN=function(foo)which(foo==tolower(names(dat))))]

How to ignore the case of using a subset in R

More articles: