There is no easy way to combine the end results of individual training sessions.
Even for the same data, a little randomization from the initial seeding or jitter flow planning will lead to different final states, which makes the vectors completely comparable within the same session.
This is because each session finds a useful configuration of vectors ... but there are many equally useful configurations, and not one best.
For example, any final state that you reach has many turns / reflections, which can be just as good in the task of predicting workouts or perform exactly the same in some other task (for example, to solve analogies). But most of these possible alternatives will not have coordinates that can be mixed and matched for useful comparisons with each other.
Preloading your model with data from previous training runs can improve results after more intensive training with new data, but I do not know about thorough testing of this feature. The effect probably depends on your specific goals, the choice of parameters and how much the new and old data are similar, or represents the possible data against which the vectors will be used.
For example, if the Google News body is different from your own learning data, or the text that you will use word vectors to understand, using it as a starting point can simply slow down or evade your learning. On the other hand, if you train your new data long enough, in the end, any influence of preloaded values can be diluted to nonexistence. (If you really need a “mixed” result, you may need to train new data at the same time with an alternating goal to push the vectors back to the values of the previous data set.)
Ways to combine the results of independent sessions can be a good research project. Perhaps the method used in translation projects of the word2vec language translation - studying the projection between the spaces of the vocabulary, can also "translate" between different coordinates of different runs. Perhaps blocking some vectors in place or training in dual goals to “predict new text” and “stay close to old vectors” will give significantly improved combined results.
source share