Split CamelCase Column Names

I tried to figure this out for a while, and thought I'd ask here.

Let's say I have a data frame as shown below:

df <- data.frame(participant = 1:6, group = c("adult", "adult", "child", "child", "NSS", "NSS"), RegProto = c(2, 3, 4, 2, 4, 3), RegInt = c(2, 3, 4, 6, 6, 5), RegDistant = c(3, 3, 4, 5, 4, 5), IrregProto = c(4, 5, 3, 4, 3, 1), IrregInt = c(4, 4, 4, 4, 4, 4), IrregDistant = c(4, 5, 6, 8, 9, 1)) 

The problem with this data frame is that each of them contains two variables: one variable whose values ​​are either Reg or Irreg , the other - Proto , Int or Distant values. What I would like to do is split these columns and make the table long, preferably using tidyr . I thought I could do it like this.

 library("tidyr") df_long <- df %>% gather(index, n, -group, -participant) %>% select(participant, group, index, n) %>% separate(index, into = c("verb", "similarity"), sep = "\\.?=\\p{Upper}") 

This does what I want, until separate() . I get an error stating that the values ​​were not shared, but there are no other suggestions as to why this might be. I'm new to regex, so I suspect the problem should be there, but I can't figure out what the correct syntax might be.

+6
source share
1 answer

You can use this regex:

 (?<=.)(?=[AZ]) 

This indicates a position (zero-length), followed by an uppercase letter and preceded by any character.

Command:

 library(dplyr) df %>% gather(index, n, -group, -participant) %>% select(participant, group, index, n) %>% separate(index, into = c("verb", "similarity"), sep = "(?<=.)(?=[AZ])") 
+9
source

Source: https://habr.com/ru/post/981223/


All Articles