RTextTools create_matrix returns a non-character argument error

I am new to word processing using R. I try the simple code below

library(RTextTools) texts <- c("This is the first document.", "This is the second file.", "This is the third text.") matrix <- create_matrix(texts,ngramLength=3)

which is one of the answers in the question Search for 2 and 3 words Phrases using the R TM package

However, the error Error in FUN(X[[2L]], ...) : non-character argument appears instead.

I can generate a document term matrix when I drop the ngramLength parameter, but I need to search for phrases of a specific word length. Any suggestions for an alternative or fix?

+6
source share
3 answers

ngramLength doesn't seem to work. Here is a workaround:

 library(RTextTools) library(tm) library(RWeka) # this library is needed for NGramTokenizer library texts <- c("This is the first document.", "Is this a text?", "This is the second file.", "This is the third text.", "File is not this.") TrigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 3, max = 3)) dtm <- DocumentTermMatrix(Corpus(VectorSource(texts)), control=list( weighting = weightTf, tokenize = TrigramTokenizer)) as.matrix(dtm) 

The RWeka uses RWeka NGramTokenizer instead of the create_matrix called by create_matrix . Now you can use dtm in other RTextTools functions, for example, to train the classification model below:

 isText <- c(T,F,T,T,F) container <- create_container(dtm, isText, virgin=F, trainSize=1:3, testSize=4:5) models=train_models(container, algorithm=c("SVM","BOOSTING")) classify_models(container, models) 
+3
source

I encountered the same error. I found a fix in this stretch request https://github.com/timjurka/RTextTools/pull/5/files . I made a change by "trace (create_matrix, edit = T)". Now it works :)

+2
source

I do not think this is a problem with the character (input type). Same error when I use the NYTimes dataset that comes with the package and runs the same code as in the reference guide.

0
source

Source: https://habr.com/ru/post/973148/


All Articles