Highlighting factor variable levels in R

Question

Highlighting factor variable levels in R

Say my dataset contains three columns: identifier (identifier), register (character), and value (numeric). This is my dataset:

tdata <- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), case=c("a","b","c","c","a","b","c","c","a","b","c","c","a","b","c","c"), value=c(1,34,56,23,546,34,67,23,65,23,65,23,87,34,321,56)) tdata id case value 1 1 a 1 2 1 b 34 3 1 c 56 4 1 c 23 5 2 a 546 6 2 b 34 7 2 c 67 8 2 c 23 9 3 a 65 10 3 b 23 11 3 c 65 12 3 c 23 13 4 a 87 14 4 b 34 15 4 c 321 16 4 c 56

If you notice that for each ID we have two c. How to rename them c1 and c2? (I need to distinguish them for further analysis).

+5

r dataset

user9292 Nov 26 '14 at 16:27

source share

3 answers

I would suggest that instead of replacing the values in the "case" column, you simply add a secondary "ID" column. This is easy to do using getanID from my splitstackshape package.

 library(splitstackshape) getanID(tdata, c("id", "case"))[] # id case value .id # 1: 1 a 1 1 # 2: 1 b 34 1 # 3: 1 c 56 1 # 4: 1 c 23 2 # 5: 2 a 546 1 # 6: 2 b 34 1 # 7: 2 c 67 1 # 8: 2 c 23 2 # 9: 3 a 65 1 # 10: 3 b 23 1 # 11: 3 c 65 1 # 12: 3 c 23 2 # 13: 4 a 87 1 # 14: 4 b 34 1 # 15: 4 c 321 1 # 16: 4 c 56 2

[] may or may not be required depending on the version of "data.table" you have installed.

If you really want to collapse these columns, you can also do:

 getanID(tdata, c("id", "case"))[, case := paste0(case, .id)][, .id := NULL][] # id case value # 1: 1 a1 1 # 2: 1 b1 34 # 3: 1 c1 56 # 4: 1 c2 23 # 5: 2 a1 546 # 6: 2 b1 34 # 7: 2 c1 67 # 8: 2 c2 23 # 9: 3 a1 65 # 10: 3 b1 23 # 11: 3 c1 65 # 12: 3 c2 23 # 13: 4 a1 87 # 14: 4 b1 34 # 15: 4 c1 321 # 16: 4 c2 56

+2

A5C1D2H2I1M1N2O1R2T1 Nov 26 '14 at 16:38

source share

How about this slightly modified approach:

 library(dplyr) tdata %>% group_by(id, case) %>% mutate(caseNo = paste0(case, row_number())) %>% ungroup() %>% select(-case) #Source: local data frame [16 x 3] # # id value caseNo #1 1 1 a1 #2 1 34 b1 #3 1 56 c1 #4 1 23 c2 #5 2 546 a1 #6 2 34 b1 #7 2 67 c1 #8 2 23 c2 #9 3 65 a1 #10 3 23 b1 #11 3 65 c1 #12 3 23 c2 #13 4 87 a1 #14 4 34 b1 #15 4 321 c1 #16 4 56 c2

+2

docendo discimus Nov 26 '14 at 16:40

source share

Matthew plourde · Accepted Answer · 2014-11-26T16:36:24+0000

What about:

 within(tdata, case <- ave(as.character(case), id, FUN=make.unique))

Highlighting factor variable levels in R

More articles: