Two column splitting conditions

I have a dataframe like this:

I would like to split the second column into many columns based on "?". However, this is not easy, because there is another interrogative marker in the bite. Thus, the only additional distribution is that each substring contains "http".

How can you smash them? The number of columns in the output example is just an example that I don’t know exactly how much can be generated.

Sample input:

df_in <- data.frame(x = c('x1','x2','x3','x4'), y = c('http://example1.com?https://example2.com', 'NA', 'http://example3.com?id=1234?https://example4/com?http://example6.com', 'http://example5.com')) 

console-printed dataframe:

  df_in xy x1 http://example1.com?https://example2.com x2 NA x3 http://example3.com?id=1234?https://example4/com?http://example6.com x4 http://example5.com 

An example of the expected result:

 df_out <- data.frame(x = c('x1','x2','x3','x4'), col1 = c('http://example1.com', 'NA', 'http://example3.com?id=1234', 'http://example5.com'), col2 = c('https://example2.com', 'NA', 'https://example4/com', 'NA'), col3 = c('NA', 'NA', 'https://example6/com', 'NA')) 

Output printed on console:

  x col1 col2 col3 x1 http://example1.com https://example2.com NA x2 NA NA NA x3 http://example3.com?id=1234 https://example4/com https://example6/com x4 http://example5.com NA NA 
+5
source share
2 answers

We can use separate from tidyr to split column β€œy” into multiple columns by splitting into ? which is before http

 library(tidyr) df_in %>% separate(y, into = paste0("col", 1:3), sep="[?](?=http)") # x col1 col2 col3 #1 x1 http://example1.com https://example2.com <NA> #2 x2 NA <NA> <NA> #3 x3 http://example3.com?id=1234 https://example4/com http://example6.com #4 x4 http://example5.com <NA> <NA> 
+5
source

If you have an arbitrary number of domains to split, therefore, without knowing the number of columns you need to create, you can use the cSplit function from the splitstackshape package. However, before that, we need to add a divider right before ?http , i.e.

 library(splitstackshape) df_in$y <- gsub('(\\w)(\\?h)', '\\1_\\2', df_in$y) cSplit(df_in 'y', '_?') #Or all in one line, cSplit(transform(df_in, y = gsub('(\\w)(\\?h)', '\\1_\\2', y)), 'y', '_?') 

what gives,

  x y_1 y_2 y_3 1: x1 http://example1.com https://example2.com NA 2: x2 NA NA NA 3: x3 http://example3.com?id=1234 https://example4/com http://example6.com 4: x4 http://example5.com NA NA 
+4
source

Source: https://habr.com/ru/post/1275304/


All Articles