I have two inputs in the following formats:
domains = list(
O60925 = "PF01920",
P01130 = c("PF07645", "PF00057", "PF00058"),
Q14764 = c("PF11978", "PF01505"),
Q9BX68 = "PF01230",
P46777 = "PF14204")
interactions = structure(c(1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0), .Dim = c(8L, 8L), .Dimnames = list(c("PF01920",
"PF07645", "PF00057", "PF00058", "PF11978", "PF01505", "PF01230",
"PF14204"), c("PF01920", "PF07645", "PF00057", "PF00058", "PF11978",
"PF01505", "PF01230", "PF14204")))
PF01920 PF07645 PF00057 PF00058 PF11978 PF01505 PF01230 PF14204
PF01920 1 0 0 0 0 0 1 0
PF07645 0 1 0 1 0 0 0 0
PF00057 0 0 1 1 0 0 0 0
PF00058 0 1 1 1 0 0 0 0
PF11978 0 0 0 0 1 0 0 0
PF01505 0 0 0 0 0 1 0 0
PF01230 1 0 0 0 0 0 1 0
PF14204 0 0 0 0 0 0 0 0
I would like to calculate the following output, where the integer in each cell represents the total sum of all cells in the matrix interactions
for each pair of names in the list domains
.
O60925 P01130 Q14764 Q9BX68 P46777
O60925 1 0 0 1 0
P01130 0 7 0 0 0
Q14764 0 0 2 0 0
Q9BX68 1 0 0 1 0
P46777 0 0 0 0 0
The context is that I have a list of proteins (list names domains
) and their Pfam domains (entries in the list domains
) and a matrix of known interactions of the Pfam-Pfam domain domain (matrix interactions
). I would like to summarize the total number of known domain interactions for each pair of proteins.
domains
interactions
, , . , , , apply
:
proteins = names(domains)
result = matrix(0, nrow = length(proteins), ncol = length(proteins),
dimnames = list(proteins, proteins))
combinations = tidyr::crossing(proteins, proteins)
n_interactions = apply(combinations, 1, function(row) {
domains1 = domains[[row[1]]]
domains2 = domains[[row[2]]]
sum(interactions[as.matrix(crossing(domains1, domains2))])
})
result[as.matrix(combinations)] = n_interactions
, , ?