I cannot give you a definite answer without data that reproduces your problem, but I would suggest that the bottleneck comes from the following line of source code stemCompletion :
possibleCompletions <- lapply(x, function(w) grep(sprintf("^%s", w), dictionary, value = TRUE))
After that, given that you saved the default completion heuristic "prevalent", this happens:
possibleCompletions <- lapply(possibleCompletions, function(x) sort(table(x), decreasing = TRUE)) structure(names(sapply(possibleCompletions, "[", 1)), names = x)
This first line crosses every word in your corpus and checks it on your dictionary for possible completions. I assume that you have many words that appear many times in your corpus. This means that the function is called many times just to give the same answer. Perhaps a faster version (depending on how many words are repeated and how often they are repeated) would look something like this:
y <- unique(x) possibleCompletions <- lapply(y, function(w) grep(sprintf("^%s", w), dictionary, value = TRUE)) possibleCompletions <- lapply(possibleCompletions, function(x) sort(table(x), decreasing = TRUE)) z <- structure(names(sapply(possibleCompletions, "[", 1)), names = y) z[match(x, names(z))]
Thus, it skips only unique x values, and not every x value. To create this revised version of the code, you will need to download the source from CRAN and change the function (I found it in the .R completion file in the R folder).
Or you can just use Python for this.