R creating an adjacency matrix from columns in a data frame

I am interested in testing some methods of network visualization, but before trying these functions, I want to build an adjacency matrix (from, to) using a data frame that looks like this.

Id Gender Col_Cold_1 Col_Cold_2 Col_Cold_3 Col_Hot_1 Col_Hot_2 Col_Hot_3 10 F pain sleep NA infection medication walking 14 F Bump NA muscle NA twitching flutter 17 M pain hemoloma Callus infection 18 F muscle pain twitching medication 

My goal is to create an adjacency matrix as follows

 1) All values in columns with keyword Cold will contribute to the rows 2) All values in columns with keyword Hot will contribute to the columns 

For example, pain, sleep, Bump, muscle, hemaloma - the values ​​of the cells under the columns with the Cold keyword, and they will form rows, and the values ​​of cells, such as infection, medication, Callus, walking, twitching, flutter , are under the columns with key Hot , and this will create the columns of the association matrix.

The final desired result should look like this:

  infection medication walking twitching flutter Callus pain 2 2 1 1 1 sleep 1 1 1 Bump 1 1 muscle 1 1 hemaloma 1 1 
  • [pain, infection] = 2, because the connection between pain and infection occurs twice in the original data frame: once on line 1 and again on line 3.

  • [pain, medication] = 2, because the connection between pain and medication occurs twice once on line 1 and again on line 4.

Any suggestions or recommendations for creating such a matrix of associations are highly appreciated.

Playable Dataset

 df = structure(list(id = c(10, 14, 17, 18), Gender = structure(c(1L, 1L, 2L, 1L), .Label = c("F", "M"), class = "factor"), Col_Cold_1 = structure(c(4L, 2L, 1L, 3L), .Label = c("", "Bump", "muscle", "pain"), class = "factor"), Col_Cold_2 = structure(c(4L, 2L, 3L, 1L), .Label = c("", "NA", "pain", "sleep"), class = "factor"), Col_Cold_3 = structure(c(1L, 3L, 2L, 4L), .Label = c("NA", "hemaloma", "muscle", "pain" ), class = "factor"), Col_Hot_1 = structure(c(4L, 3L, 2L, 1L), .Label = c("", "Callus", "NA", "infection"), class = "factor"), Col_Hot_2 = structure(c(2L, 3L, 1L, 3L), .Label = c("infection", "medication", "twitching"), class = "factor"), Col_Hot_3 = structure(c(4L, 2L, 1L, 3L), .Label = c("", "flutter", "medication", "walking" ), class = "factor")), .Names = c("id", "Gender", "Col_Cold_1", "Col_Cold_2", "Col_Cold_3", "Col_Hot_1", "Col_Hot_2", "Col_Hot_3" ), row.names = c(NA, -4L), class = "data.frame") 
+6
source share
1 answer

One way is to make the data set β€œneat” and then use xtabs . First, some cleanups:

 df[] <- lapply(df, as.character) # Convert factors to characters df[df == "NA" | df == "" | is.na(df)] <- NA # Make all blanks NAs 

Now, carefully configure the data set:

 library(tidyr) library(dplyr) out <- do.call(rbind, sapply(grep("^Col_Cold", names(df), value = T), function(x){ vars <- c(x, grep("^Col_Hot", names(df), value = T)) setNames(gather_(select(df, one_of(vars)), key_col = x, value_col = "value", gather_cols = vars[-1])[, c(1, 3)], c("cold", "hot")) }, simplify = FALSE)) 

The idea is to β€œconnect” each of the β€œcold” columns to each of the β€œhot” columns to create a long dataset. out as follows:

 out # cold hot # 1 pain infection # 2 Bump <NA> # 3 <NA> Callus # 4 muscle <NA> # 5 pain medication # ... 

Finally, use xtabs to draw the desired output:

 xtabs(~ cold + hot, na.omit(out)) # hot # cold Callus flutter infection medication twitching walking # Bump 0 1 0 0 1 0 # hemaloma 1 0 1 0 0 0 # muscle 0 1 0 1 2 0 # pain 1 0 2 2 1 1 # sleep 0 0 1 1 0 1 
+1
source

Source: https://habr.com/ru/post/1013271/


All Articles