How do you safely save the verification object in the R tm package?

When I save the inspect () object in the R tm package, it prints on the screen. It saves the data that I want in data.frame, but I have thousands of documents for analysis, and printing to the screen overflows my memory.

library(tm) data("crude") matrix <- TermDocumentMatrix(corpus,control=list(removePunctuation = TRUE, stopwords=TRUE)) out= data.frame(inspect(matrix)) 

I tried every trick I can think of. capture.output () modifies the object (not the desired effect), like sink (). dev.off () does not work. invisible () does nothing. suppressWarnings (), suppressMessages () and try () unsurprisingly do nothing. There are no silent or silent parameters in the validation command.

The closest I can get is

 out= capture.output(inspect(matrix)) out= data.frame(out) 

which, in particular, does not give the same data frame, but it can be quite easy if I need to go down this route. Any other (less hacker) suggestions would be helpful. Thanks.

Windows 7 64-bit R-3.0.1 tm-package is the latest version (0.5-9.1).

+4
source share
2 answers

Assign inside capture, then:

 capture.output(out <- data.frame(inspect(matrix))) -> .null # discarding this 

But really, inspect is for visual inspection, so maybe try

 as.data.frame(as.matrix(matrix)) 

instead (btw matrix is a very unfortunate name for a variable like this base function).

+7
source

Using this input (the variable name of the variable from your question using a variable called "matrix" may be confusing:

 library(tm) data("crude") tdm <- TermDocumentMatrix(crude,control=list(removePunctuation = TRUE, stopwords=TRUE)) 

Then this will avoid screen printing.

 m <- as.matrix(tdm) 

and then I personally will do something like

 require(data.table) data.table(m, keep.rownames=TRUE) # rn 127 144 191 194 211 236 237 242 246 248 273 349 352 353 368 489 502 543 704 708 # 1: 100000 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 # 2: 108 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 # 3: 111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 # 4: 115 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 # 5: 12217 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 # --- # 996: yesterday 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 0 0 0 0 # 997: yesterdays 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 # 998: york 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 # 999: zero 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 # 1000: zone 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 
0
source

Source: https://habr.com/ru/post/1501723/


All Articles