How can I get perplexed and likely to register with Spark LDA?

Question

How can I get perplexed and likely to register with Spark LDA?

I am trying to get the bewilderment and log probability of the Spark LDA model (with Spark 2.1). The code below does not work (methods logLikelihoodand logPerplexitynot found), even though I can save the model.

from pyspark.mllib.clustering import LDA
from pyspark.mllib.linalg import Vectors

# construct corpus
# run LDA
ldaModel = LDA.train(corpus, k=10, maxIterations=10)
logll = ldaModel.logLikelihood(corpus)
perplexity = ldaModel.logPerplexity(corpus)

Please note that such methods do not come with dir(LDA).

What will be a working example?

0

machine-learning cluster-analysis apache-spark pyspark lda

zzzmk Jan 22 '18 at 14:09

source share

1 answer

desertnaut · Accepted Answer · 2018-01-22T14:40:58+0000

I can train, but not suitable. Object "LDA" does not have the attribute "fit"

This is because you are working with an old, RDD (MLlib) based API , i.e.

from pyspark.mllib.clustering import LDA # WRONG import

the class LDAreally doesn't include fit, logLikelihoodor logPerplexity.

, , API (ML):

from pyspark.ml.clustering import LDA  # NOTE: different import

# Loads data.
dataset = (spark.read.format("libsvm")
    .load("data/mllib/sample_lda_libsvm_data.txt"))

# Trains a LDA model.
lda = LDA(k=10, maxIter=10)
model = lda.fit(dataset)

ll = model.logLikelihood(dataset)
lp = model.logPerplexity(dataset)

How can I get perplexed and likely to register with Spark LDA?

More articles: