Sentimental comment analysis using qdap is slow

Question

Sentimental comment analysis using qdap is slow

We use the qdap package to determine the mood of each comment to view a specific application. I read the review comments from the CSV file and passed it to the qdap polarity function. Everything works fine, and I get the polarity for all comments, but the problem is that it takes 7-8 seconds to calculate the polarity of all sentences (the total number of sentences present in the CSV file is 779). I embed my code below.

temp_csv <- filePath() attach(temp_csv) text_data <- temp_csv[,c('Content')] print(Sys.time()) polterms <- list(neg=c('wtf')) POLKEY <- sentiment_frame(positives=c(positive.words),negatives=c(polterms[[1]],negative.words)) polarity <- polarity(sentences, polarity.frame = POLKEY) print(Sys.time())

The accepted time is as follows:

[1] "2016-04-12 16:43:01 IST"

[1] "2016-04-12 16:43:09 IST"

Can someone tell me if I am doing something wrong? How to increase productivity?

+5

r shiny sentiment-analysis qdap

VenuSathya20 Apr 12 '16 at 12:01

source share

1 answer

Tyler rinker · Accepted Answer · 2016-04-12T14:38:55+0000

I am the author of qdap . The polarity function was developed for much smaller datasets. When my role shifted, I started working with large data sets. I needed fast and accurate (these two things are in opposition to each other), and have since developed a debugged sentimentr package. The algorithm is optimized to be faster and more accurate than qdap polarity.

Currently, you have 5 word-based (or trained alorites) that are close to detecting feelings. Each of them has flaws (-) and pluses (+) and is useful in certain circumstances.

qdap + on CRAN; -slow
syuzhet + on CRAN; + Fast; + great schedule; to no avail on non-literary use.
sentimentr + fast; + higher accuracy; -GitHub only
stansent (port of Stanford) + the most accurate; -slower
tm.plugin.sentiment - placed on CRAN; -I could not make it work easily.

I am showing temporary tests of sample data for the first 4 options from the above code below.

Install packages and perform synchronization functions

I use pacman because it allows the reader to simply run the code; although you can replace the calls to install.packages and library .

 if (!require("pacman")) install.packages("pacman") pacman::p_load(qdap, syuzhet, dplyr) pacman::p_load_current_gh(c("trinker/stansent", "trinker/sentimentr")) pres_debates2012 #nrow = 2912 tic <- function (pos = 1, envir = as.environment(pos)){ assign(".tic", Sys.time(), pos = pos, envir = envir) Sys.time() } toc <- function (pos = 1, envir = as.environment(pos)) { difftime(Sys.time(), get(".tic", , pos = pos, envir = envir)) } id <- 1:2912

Delays

 ## qdap tic() qdap_sent <- pres_debates2012 %>% with(qdap::polarity(dialogue, id)) toc() # Time difference of 18.14443 secs ## sentimentr tic() sentimentr_sent <- pres_debates2012 %>% with(sentiment(dialogue, id)) toc() # Time difference of 1.705685 secs ## syuzhet tic() syuzhet_sent <- pres_debates2012 %>% with(get_sentiment(dialogue, method="bing")) toc() # Time difference of 1.183647 secs ## stanford tic() stanford_sent <- pres_debates2012 %>% with(sentiment_stanford(dialogue)) toc() # Time difference of 6.724482 mins

For more details on timings and accuracy, see my sentimentr README.md and please show the repo if it is useful . Below is one of the README tests:

Sentimental comment analysis using qdap is slow

Install packages and perform synchronization functions

Delays

More articles: