RTextTools create_matrix returns a non-character argument error

Question

RTextTools create_matrix returns a non-character argument error

I am new to word processing using R. I try the simple code below

library(RTextTools) texts <- c("This is the first document.", "This is the second file.", "This is the third text.") matrix <- create_matrix(texts,ngramLength=3)

which is one of the answers in the question Search for 2 and 3 words Phrases using the R TM package

However, the error Error in FUN(X[[2L]], ...) : non-character argument appears instead.

I can generate a document term matrix when I drop the ngramLength parameter, but I need to search for phrases of a specific word length. Any suggestions for an alternative or fix?

+6

r text-mining

Ricky Jul 31 '14 at 8:51

source share

3 answers

user3631991 · Answer 1 · 2014-12-18T12:30:13+0000

ngramLength doesn't seem to work. Here is a workaround:

 library(RTextTools) library(tm) library(RWeka) # this library is needed for NGramTokenizer library texts <- c("This is the first document.", "Is this a text?", "This is the second file.", "This is the third text.", "File is not this.") TrigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 3, max = 3)) dtm <- DocumentTermMatrix(Corpus(VectorSource(texts)), control=list( weighting = weightTf, tokenize = TrigramTokenizer)) as.matrix(dtm)

The RWeka uses RWeka NGramTokenizer instead of the create_matrix called by create_matrix . Now you can use dtm in other RTextTools functions, for example, to train the classification model below:

 isText <- c(T,F,T,T,F) container <- create_container(dtm, isText, virgin=F, trainSize=1:3, testSize=4:5) models=train_models(container, algorithm=c("SVM","BOOSTING")) classify_models(container, models)

user131476 · Answer 2 · 2016-03-31T13:12:45+0000

I encountered the same error. I found a fix in this stretch request https://github.com/timjurka/RTextTools/pull/5/files . I made a change by "trace (create_matrix, edit = T)". Now it works :)

Ashish m · Answer 3 · 2014-08-22T19:16:58+0000

I do not think this is a problem with the character (input type). Same error when I use the NYTimes dataset that comes with the package and runs the same code as in the reference guide.

RTextTools create_matrix returns a non-character argument error

More articles: