Co-Presence Matrix - Heat Cards

I have a large dataset with over 20 columns and over 2000 rows. I would like to know how different variables match. In addition, it would be nice to make a heat map from this (a diagram of the accompanying heat map or a correlation heat map). However, I'm not sure that you can do this with dummy / binary variables. Any tips?

I will need to convert this example data set ( x)

    A   B   C   D   E   F
1   0   1   1   1   1   0
2   0   1   1   0   0   1
3   1   0   0   0   1   0
4   0   0   1   1   1   1
5   0   0   1   1   0   0

In something like this:

    A   B   C   D   E   F
A   0   0   0   0   1   0
B   0   0   2   1   1   1
C   0   2   0   3   2   2
D   0   1   3   0   2   1
E   1   1   2   2   0   1
F   0   1   2   2   1   0
+4
source share
2 answers

Given the matrix X, we have

(A <- t(X) %*% X)
#   A B C D E F
# A 1 0 0 0 1 0
# B 0 2 2 1 1 1
# C 0 2 4 3 2 2
# D 0 1 3 3 2 1
# E 1 1 2 2 3 1
# F 0 1 2 1 1 2

If you want the diagonal to contain zeros, add diag(A) <- 0. Then a heat map can be obtained, for example,

heatmap(A, Rowv = NA, Colv = NA)
+2
source
temp = sapply(colnames(A), function(x)
    sapply(colnames(A), function(y)
        sum(rowSums(A[,c(x, y)]) == 2)))
diag(temp) = 0
temp
#  A B C D E F
#A 0 0 0 0 1 0
#B 0 0 2 1 1 1
#C 0 2 0 3 2 2
#D 0 1 3 0 2 1
#E 1 1 2 2 0 1
#F 0 1 2 1 1 0

library(reshape2)
library(ggplot2)

df1 = melt(temp)

graphics.off()
ggplot(df1, aes(x = Var1, y = Var2, fill = value)) +
    geom_tile() +
    theme_classic()

enter image description here

DATA

A = structure(list(A = c(0L, 0L, 1L, 0L, 0L), B = c(1L, 1L, 0L, 0L, 
0L), C = c(1L, 1L, 0L, 1L, 1L), D = c(1L, 0L, 0L, 1L, 1L), E = c(1L, 
0L, 1L, 1L, 0L), F = c(0L, 1L, 0L, 1L, 0L)), .Names = c("A", 
"B", "C", "D", "E", "F"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))
+2
source

Source: https://habr.com/ru/post/1693206/


All Articles