Gsub vectorization issue

Purpose: I am new to R , but I am trying to get familiar with programming in R In the current task, I wanted to replace a few words found in corpus , keeping the corpus structure in tact.

Gsub did not allow vectors to be used for templates and corresponding replacements, so I decided to write a modified Gsub function. (I know the Gsubfn function, but I would also like to develop some programming skills.)

Data generation

 a<- c("this is a testOne","this is testTwo","this is testThree","this is testFour") corpus<- Corpus(VectorSource(a)) pattern1<- c("testOne","testTwo","testThree") replacement1<- c("gameOne","gameTwo","gameThree") 

Modified Gsub

 gsub2<- function(myPattern, myReplacement, myCorpus, fixed=FALSE,ignore.case=FALSE){ for (i in 1:length(myCorpus)){ for (j in 1:length(myPattern)){ myCorpus[[i]]<- gsub(myPattern[j],myReplacement[j], myCorpus[[i]], fixed=TRUE) } } } 

Code execution

 gsub2(pattern1,replacement1,corpus,fixed=TRUE) 

However, there are no changes in the actual case. I think this is because all changes are made inside the function and therefore are limited inside the function. And then I tried to return the case, but R could not recognize the object of the case.

Can someone point me in the right direction please? Thanks.

+4
source share
2 answers

What if, as you said, you return the corpus object?

 gsub2<- function(myPattern, myReplacement, myCorpus, fixed=FALSE,ignore.case=FALSE){ for (i in 1:length(myCorpus)){ for (j in 1:length(myPattern)){ myCorpus[[i]]<- gsub(myPattern[j],myReplacement[j], myCorpus[[i]], fixed=TRUE) } } return(myCorpus) } 

and then

 a <- gsub2(pattern1,replacement1,corpus,fixed=TRUE) > class(a) [1] "VCorpus" "Corpus" "list" > for (i in 1:length(a)){print(a[[i]])} this is a gameOne this is gameTwo this is gameThree this is testFour 

Isn't that what you want?

+2
source

Try using mapply :

 # original data corpus <- c("this is a testOne","this is testTwo","this is testThree","this is testFour") # make a copy to gsub into corpus2 <- corpus # set pattern/replacement pattern1<- c("testOne","testTwo","testThree") replacement1<- c("gameOne","gameTwo","gameThree") corpus2 # before gsub # run gsub on all of the patterns/replacements x <- mapply(FUN= function(...) { corpus2 <<- gsub(...,x=corpus2)}, pattern=pattern1, replacement=replacement1) rm(x) # discard x; it empty corpus2 # after gsub 
+3
source

Source: https://habr.com/ru/post/1485590/


All Articles