R split a data row into multiple columns, sorted by separate variables

I have a simple question about cleaning up dirty data. I have a dataset that was emailed to me that contains several columns, each of which contains a comma row of numbers. Traditionally, each of these numbers should be its own variable, but this is not how this data is given to me. Here is an example of some data:

indication  treatment
     1,2     3
       2    2,1
      1,3   2,3

Imagine that these data sets contain about 100 of these columns and thousands of rows and a different number of variables in each of these columns. My goal is to import such a dataset and then split each column so that each variable in the row is in its own column, but each column is split so that each unique variable is sorted into its own column. Like this:

indication_1    indication_2    indication_3    treatment_1 treatment_2 treatment_3
1   1   0   0   0   1
0   1   0   1   1   0
1   0   1   0   1   1

Note that the column heading has changed, and the numeric value is indicated as a binary code of 0 or 1, where 1 indicates the presence of a variable.

, split, , , , . , , .

Id , , , , , , , . Id , ( , ).

.

+4
1

strsplit, mtabulate

library(qdapTools)
do.call(cbind, lapply(df, function(x) mtabulate(strsplit(x, ","))))
#    indication.1 indication.2 indication.3 treatment.1 treatment.2 treatment.3
#1            1            1            0           0           0           1
#2            0            1            0           1           1           0
#3            1            0            1           0           1           1
+3

Source: https://habr.com/ru/post/1689910/


All Articles