I am trying to build a square matrix adjacency from data.table . Here is a reproducible example of what I already have:
require(data.table) require(plyr) require(reshape2) # Build a mock data.table dt <- data.table(Source=as.character(rep(letters[1:3],2)),Target=as.character(rep(letters[4:2],2))) dt # Source Target #1: ad #2: bc #3: cb #4: ad #5: bc #6: cb sry <- ddply(dt, .(Source,Target), summarize, Frequency=length(Source)) sry # Source Target Frequency #1 ad 2 #2 bc 2 #3 cb 2 mtx <- as.matrix(dcast(sry, Source ~ Target, value.var="Frequency", fill=0)) rownames(mtx) <- mtx[,1] mtx <- mtx[,2:ncol(mtx)] mtx # bcd #a "0" "0" "2" #b "0" "2" "0" #c "2" "0" "0"
Now this is very close to what I want to get, except that I would like to have all the nodes represented in both dimensions, for example:
abcd a 0 0 0 2 b 0 0 2 0 c 0 2 0 0 d 0 0 0 0
Please note that I am working on fairly large data, so I would like to find an effective solution for this.
Thank you for your help.
SOLUTIONS (EDIT):
Given the quality of the proposed solutions and the size of my data set, I compared all the solutions.
#The bench was made with a 1-million-row sample from my original dataset library(data.table) aa <- fread("small2.csv",sep="^") dt <- aa[,c(8,9),with=F] colnames(dt) <- c("Source","Target") dim(dt)
Given this data, the desired result is a 2222 * 2222 matrix (2222 * 2223 solutions in which the first column contains row names are also obviously acceptable).
# Ananda Mahto first solution am1 <- function() { table(dt[, lapply(.SD, factor, levs)]) } dim(am1())
And the test result ...
library(rbenchmark) benchmark(am1(), am2(), akr(), cc(), replications=75)