Sentimental comment analysis using qdap is slow

We use the qdap package to determine the mood of each comment to view a specific application. I read the review comments from the CSV file and passed it to the qdap polarity function. Everything works fine, and I get the polarity for all comments, but the problem is that it takes 7-8 seconds to calculate the polarity of all sentences (the total number of sentences present in the CSV file is 779). I embed my code below.

temp_csv <- filePath() attach(temp_csv) text_data <- temp_csv[,c('Content')] print(Sys.time()) polterms <- list(neg=c('wtf')) POLKEY <- sentiment_frame(positives=c(positive.words),negatives=c(polterms[[1]],negative.words)) polarity <- polarity(sentences, polarity.frame = POLKEY) print(Sys.time()) 

The accepted time is as follows:

[1] "2016-04-12 16:43:01 IST"

[1] "2016-04-12 16:43:09 IST"

Can someone tell me if I am doing something wrong? How to increase productivity?

+5
source share
1 answer

I am the author of qdap . The polarity function was developed for much smaller datasets. When my role shifted, I started working with large data sets. I needed fast and accurate (these two things are in opposition to each other), and have since developed a debugged sentimentr package. The algorithm is optimized to be faster and more accurate than qdap polarity.

Currently, you have 5 word-based (or trained alorites) that are close to detecting feelings. Each of them has flaws (-) and pluses (+) and is useful in certain circumstances.

I am showing temporary tests of sample data for the first 4 options from the above code below.

Install packages and perform synchronization functions

I use pacman because it allows the reader to simply run the code; although you can replace the calls to install.packages and library .

 if (!require("pacman")) install.packages("pacman") pacman::p_load(qdap, syuzhet, dplyr) pacman::p_load_current_gh(c("trinker/stansent", "trinker/sentimentr")) pres_debates2012 #nrow = 2912 tic <- function (pos = 1, envir = as.environment(pos)){ assign(".tic", Sys.time(), pos = pos, envir = envir) Sys.time() } toc <- function (pos = 1, envir = as.environment(pos)) { difftime(Sys.time(), get(".tic", , pos = pos, envir = envir)) } id <- 1:2912 

Delays

 ## qdap tic() qdap_sent <- pres_debates2012 %>% with(qdap::polarity(dialogue, id)) toc() # Time difference of 18.14443 secs ## sentimentr tic() sentimentr_sent <- pres_debates2012 %>% with(sentiment(dialogue, id)) toc() # Time difference of 1.705685 secs ## syuzhet tic() syuzhet_sent <- pres_debates2012 %>% with(get_sentiment(dialogue, method="bing")) toc() # Time difference of 1.183647 secs ## stanford tic() stanford_sent <- pres_debates2012 %>% with(sentiment_stanford(dialogue)) toc() # Time difference of 6.724482 mins 

For more details on timings and accuracy, see my sentimentr README.md and please show the repo if it is useful . Below is one of the README tests:

enter image description here

+9
source

Source: https://habr.com/ru/post/1246943/


All Articles