Split CamelCase Column Names

Question

Split CamelCase Column Names

I tried to figure this out for a while, and thought I'd ask here.

Let's say I have a data frame as shown below:

df <- data.frame(participant = 1:6, group = c("adult", "adult", "child", "child", "NSS", "NSS"), RegProto = c(2, 3, 4, 2, 4, 3), RegInt = c(2, 3, 4, 6, 6, 5), RegDistant = c(3, 3, 4, 5, 4, 5), IrregProto = c(4, 5, 3, 4, 3, 1), IrregInt = c(4, 4, 4, 4, 4, 4), IrregDistant = c(4, 5, 6, 8, 9, 1))

The problem with this data frame is that each of them contains two variables: one variable whose values are either Reg or Irreg , the other - Proto , Int or Distant values. What I would like to do is split these columns and make the table long, preferably using tidyr . I thought I could do it like this.

 library("tidyr") df_long <- df %>% gather(index, n, -group, -participant) %>% select(participant, group, index, n) %>% separate(index, into = c("verb", "similarity"), sep = "\\.?=\\p{Upper}")

This does what I want, until separate() . I get an error stating that the values were not shared, but there are no other suggestions as to why this might be. I'm new to regex, so I suspect the problem should be there, but I can't figure out what the correct syntax might be.

+6

regex r dplyr tidyr

Joef Jan 19 '15 at 15:04

source share

1 answer

Sven hohenstein · Accepted Answer · 2015-01-19T15:13:19+0000

You can use this regex:

 (?<=.)(?=[AZ])

This indicates a position (zero-length), followed by an uppercase letter and preceded by any character.

Command:

 library(dplyr) df %>% gather(index, n, -group, -participant) %>% select(participant, group, index, n) %>% separate(index, into = c("verb", "similarity"), sep = "(?<=.)(?=[AZ])")

Split CamelCase Column Names

More articles: