Error converting text to lowercase using tm_map (..., tolower)

I tried using tm_map . This gave the following error. How can I get around this?

  require(tm) byword<-tm_map(byword, tolower) Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "character" 
+46
r lowercase tm term-document-matrix
Nov 30 '12 at 6:35
source share
4 answers

Use the basic function of R tolower() :

 tolower(c("THE quick BROWN fox")) # [1] "the quick brown fox" 
+101
Nov 30
source share

VCorpus comment comment , you should wrap the tolower inside the content_transformer so as not to spoil the VCorpus object - something like:

 > library(tm) > data('crude') > crude[[1]]$content [1] "Diamond Shamrock Corp said that\neffective today it had cut its contract prices for crude oil by\n1.50 dlrs a barrel.\n The reduction brings its posted price for West Texas\nIntermediate to 16.00 dlrs a barrel, the copany said.\n \"The price reduction today was made in the light of falling\noil product prices and a weak crude oil market,\" a company\nspokeswoman said.\n Diamond is the latest in a line of US oil companies that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil markets.\n Reuter" > tm_map(crude, content_transformer(tolower))[[1]]$content [1] "diamond shamrock corp said that\neffective today it had cut its contract prices for crude oil by\n1.50 dlrs a barrel.\n the reduction brings its posted price for west texas\nintermediate to 16.00 dlrs a barrel, the copany said.\n \"the price reduction today was made in the light of falling\noil product prices and a weak crude oil market,\" a company\nspokeswoman said.\n diamond is the latest in a line of us oil companies that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil markets.\n reuter" 
+6
Jul 20 '16 at 8:07
source share
 myCorpus <- Corpus(VectorSource(byword)) myCorpus <- tm_map(myCorpus , tolower) print(myCorpus[[1]]) 
+3
Jul 25 '13 at 17:10
source share

using tolower in this way has an undesirable side effect: if you try to create a matrix of document terms from the corpus later, this will fail. This is due to a recent tm change that the tolower return type cannot handle. Use instead:

 myCorpus <- tm_map(myCorpus, PlainTextDocument) 
+1
Jun 25 '15 at 19:48
source share



All Articles