This is very similar to how you multiply a regular R-matrix. For example, to create a term matrix of a document from an example of a Reuters dataset with only rows where the term βwillβ appears more than once:
reut21578 <- system.file("texts", "crude", package = "tm") reuters <- VCorpus(DirSource(reut21578), readerControl = list(reader = readReut21578XMLasPlain)) dtm <- DocumentTermMatrix(reuters) v <- as.vector(dtm[,"would"]>1) dtm2 <- dtm[v, ] > inspect(dtm2[, "would"]) A document-term matrix (3 documents, 1 terms) Non-/sparse entries: 3/0 Sparsity : 0% Maximal term length: 5 Weighting : term frequency (tf) Terms Docs would 246 2 489 2 502 2
A tm document document matrix is ββa simple triple matrix from the slam package, so the slam documentation helps in determining how to manipulate dtms.
source share