Incremental training of thematic models in MALLET

In accordance with the MALLET documentation , it is possible to gradually train models on topics:

"- output-model [FILENAME] This parameter specifies the file to write the serialized object of the topic trainers. This type of output is suitable for pausing and resuming training."

I would like to train topics in one dataset and then grow the model with another dataset. After both stages of training, I would like to display states for both datasets (with --output-state). Here is how I am trying to do this:

# training on the first dataset
../mallet-2.0.7/bin/mallet import-dir --input input/ --keep-sequence --output input.mallet
../mallet-2.0.7/bin/mallet train-topics --input  input.mallet --num-topics 3 --output-state topic-state.gz --output-model model

# training on the second dataset
../mallet-2.0.7/bin/mallet import-dir --input input2/ --keep-sequence --output input2.mallet  --use-pipe-from input.mallet
../mallet-2.0.7/bin/mallet train-topics --input  input2.mallet --num-topics 3  --num-iterations 100 --output-state topic-state2.gz --input-model model

In the last command, if I add an β€œinput model model”, the data from the second data set is not in the output state file. If I do not add it, the data from the first data set is not in the output state file.

If I try to add additional instances to the model in code:

model.addInstances(instances);
model.setNumThreads(2);
model.setNumIterations(50);
model.estimate();

[...]

model.addInstances(instances2);
model.setNumThreads(2);
model.setNumIterations(50);
model.estimate();

I get an error message:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 30
    at cc.mallet.topics.ParallelTopicModel.buildInitialTypeTopicCounts(ParallelTopicModel.java:364)
    at cc.mallet.topics.ParallelTopicModel.addInstances(ParallelTopicModel.java:276)
    at cc.mallet.examples.TopicModel2.main(TopicModel2.java:66)

Previously, there were similar questions on the MALLET list: http://permalink.gmane.org/gmane.comp.ai.mallet.devel/924 , http://permalink.gmane.org/gmane.comp.ai.mallet.devel/ 2139

So, is it possible to gradually train thematic models?

+4
source share
1 answer

I think you were part of this conversation that may be useful to you now.

http://comments.gmane.org/gmane.comp.ai.mallet.devel/2153
0
source

Source: https://habr.com/ru/post/1535104/


All Articles