I’ve been racking my brains over the past few days. I searched all the SO archives and tried the suggested solutions, but just couldn't get it to work. I have sets of txt documents in folders, such as 2000 06, 1995 -99, etc. And you want to start some basic operations with text search, such as creating a matrix of documents and a matrix of terms of documents and performing some operations based on on word matches. My script runs on a smaller package, however, when I try to use it with a larger package, it fails. I pasted the code into one operation with such a folder.
library(tm) # Framework for text mining. library(SnowballC) # Provides wordStem() for stemming. library(RColorBrewer) # Generate palette of colours for plots. library(ggplot2) # Plot word frequencies. library(magrittr) library(Rgraphviz) library(directlabels) setwd("/ConvertedText") txt <- file.path("2000 -06") docs<-VCorpus(DirSource(txt, encoding = "UTF-8"),readerControl = list(language = "UTF-8")) docs <- tm_map(docs, content_transformer(tolower), mc.cores=1) docs <- tm_map(docs, removeNumbers, mc.cores=1) docs <- tm_map(docs, removePunctuation, mc.cores=1) docs <- tm_map(docs, stripWhitespace, mc.cores=1) docs <- tm_map(docs, removeWords, stopwords("SMART"), mc.cores=1) docs <- tm_map(docs, removeWords, stopwords("en"), mc.cores=1) #corpus creation complete setwd("/ConvertedText/output") dtm<-DocumentTermMatrix(docs) tdm<-TermDocumentMatrix(docs) m<-as.matrix(dtm) write.csv(m, file="dtm.csv") dtms<-removeSparseTerms(dtm, 0.2) m1<-as.matrix(dtms) write.csv(m1, file="dtms.csv") # matrix creation/storage complete freq <- sort(colSums(as.matrix(dtm)), decreasing=TRUE) wf <- data.frame(word=names(freq), freq=freq) freq[1:50] #adjust freq score in next line p <- ggplot(subset(wf, freq>100), aes(word, freq))+ geom_bar(stat="identity")+ theme(axis.text.x=element_text(angle=45, hjust=1)) ggsave("frequency2000-06.png", height=12,width=17, dpi=72) # frequency graph generated x<-as.matrix(findFreqTerms(dtm, lowfreq=1000)) write.csv(x, file="freqterms00-06.csv") png("correlation2000-06.png", width=12, height=12, units="in", res=900) graph.par(list(edges=list(col="lightblue", lty="solid", lwd=0.3))) graph.par(list(nodes=list(col="darkgreen", lty="dotted", lwd=2, fontsize=50))) plot(dtm, terms=findFreqTerms(dtm, lowfreq=1000)[1:50],corThreshold=0.7) dev.off()
When I use the mc.cores = 1 argument in tm_map, the operation continues indefinitely. However, if I use the argument lazy = TRUE in tm_map, it seems to be going well, but subsequent operations give this error.
Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "try-error" In addition: Warning messages: 1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) : all scheduled cores encountered errors in user code 2: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code
I searched for all the solutions, but it worked sequentially. Any help would be greatly appreciated!
Best! to