How to use the pre-processed Word2Vec model in Tensorflow

I have a model Word2Vecthat is studying at Gensim. How to use it in Tensorflowfor Word Embeddings. I do not want to train insertion from scratch in Tensorflow. Can someone tell me how to do this with some sample code?

+4
source share
1 answer

Suppose you have a dictionary and an inverse_dict list with an index in the list that matches most common words:

vocab = {'hello': 0, 'world': 2, 'neural':1, 'networks':3}
inv_dict = ['hello', 'neural', 'world', 'networks']

Notice how the inverse_dict index matches the dictionary values. Now declare the embed matrix and get the values:

vocab_size = len(inv_dict)
emb_size = 300 # or whatever the size of your embeddings
embeddings = np.zeroes((vocab_size, emb_size))

from gensim.models.keyedvectors import KeyedVectors                         
model = KeyedVectors.load_word2vec_format('embeddings_file', binary=True)

for k, v in vocab.items():
  embeddings[v] = model[k]

. . , : x = ['hello', 'world']. . :

x_train = []
for word in x:  
  x_train.append(vocab[word]) # integerize
x_train = np.array(x_train) # make into numpy array

x_model = tf.placeholder(tf.int32, shape=[None, input_size])
with tf.device("/cpu:0"):
  embedded_x = tf.nn.embedding_lookup(embeddings, x_model)

embedded_x - . , , . ,

+7

Source: https://habr.com/ru/post/1673412/


All Articles