Loading the pre-prepared word 2vec to initialize embedding_lookup in the Estimator model_fn

I solve the problem of text classification. I defined my classifier using a class Estimatorwith my own model_fn. I would like to use Google preliminary training word2vecas initial values, and then optimize it for this task.

I saw this post: Using pre-trained word embedding (word2vec or Glove) in TensorFlow
which explains how to do this in the raw TensorFlow code. However, I would really like to use the class Estimator.

As an extension, I would then like to train this Cloud Cloud Engine code, is there a good way to transfer in a fairly large file with initial values?

Say we have something like:

def build_model_fn():
    def _model_fn(features, labels, mode, params):
        input_layer = features['feat'] #shape=[-1, params["sequence_length"]]
        #... what goes here to initialize W

        embedded = tf.nn.embedding_lookup(W, input_layer)
        ...
        return predictions

estimator = tf.contrib.learn.Estimator(
    model_fn=build_model_fn(),
    model_dir=MODEL_DIR,
    params=params)
estimator.fit(input_fn=read_data, max_steps=2500)
+4
source share
1 answer

The investments are usually large enough that the only viable approach is to use them to initialize tf.Variableyour schedule. This will allow you to use servers with parameters in distributed, etc.

For this (and anything else), I would recommend using the new "core" rating tf.estimator.Estimator, as this will make things a lot easier.

, , , :

(2) dict  (3)


(3), :

model_fn Tensor, tf.contrib.framework.load_variable. :

  • TF
  • .

:

def model_fn(mode, features, labels, hparams):
  embeddings = tf.Variable(tf.contrib.framework.load_variable(
      'gs://my-bucket/word2vec_checkpoints/',
      'a/fully/qualified/scope/embeddings'
  ))
  ....
  return tf.estimator.EstimatorSpec(...)

, TF, (2).


(2) tf.train.Scaffold, , tf.Session ( ).

Scaffold tf.train.EstimatorSpec, model_fn.

model_fn , init_feed_dict Scaffold. .

def model_fn(mode, features, labels, hparams):
  embed_ph = tf.placeholder(
      shape=[hparams.vocab_size, hparams.embedding_size], 
      dtype=tf.float32)
  embeddings = tf.Variable(embed_ph)
  # Define your model
  return tf.estimator.EstimatorSpec(
      ..., # normal EstimatorSpec args
      scaffold=tf.train.Scaffold(init_feed_dict={embed_ph: my_embedding_numpy_array})
  )

, init_feed_dict embed_ph , embeddings.initialization_op ( ).


+8

Source: https://habr.com/ru/post/1679824/


All Articles