Search for columns of inline columns

I work with datasets and feature_columns in tensorflow ( https://developers.googleblog.com/2017/11/introducing-tensorflow-feature-columns.html ). I see that they have categorical functions and a way to create inline functions from categorical functions. But when we work on nlp tasks, how do we create a single embed search?

For example: consider the task of classifying text. Each data point would have many text columns, but they would not be separate categories. How do we create and use one embed search for all of these columns?

The following is an example of how I am currently using implementation features. I create a categorical function for each column and use it to create an attachment. The problem would be that the attachments for the same word may differ for different columns.

def create_embedding_features(key, vocab_list=None, embedding_size=20):
    cat_feature = \
        tf.feature_column.categorical_column_with_vocabulary_list(
            key=key,
            vocabulary_list = vocab_list
            )
    embedding_feature = tf.feature_column.embedding_column(
            categorical_column = cat_feature,
            dimension = embedding_size
        )
    return embedding_feature

le_features_embd = [create_embedding_features(f, vocab_list=vocab_list)
                     for f in feature_keys]
+4
source share
1 answer

, . , (), . , . . , var-length ( ) (, 256- ).

_CategoricalColumn.

cat_column_with_vocab = tf.feature_column.categorical_column_with_vocabulary_list(
    key='my-text',
    vocabulary_list=vocab_list)

, , categorical_column_with_vocabulary_file.

, ( ) .

embedding_initializer = None
if has_pretrained_embedding:     
  embedding_initializer=tf.contrib.framework.load_embedding_initializer(
        ckpt_path=xxxx)
else:
  embedding_initializer=tf.random_uniform_initializer(-1.0, 1.0)
embed_column = embedding_column(
    categorical_column=cat_column_with_vocab,
    dimension=256,   ## this is your pre-trained embedding dimension
    initializer=embedding_initializer,
    trainable=False)

, price:

price_column = tf.feature_column.numeric_column('price')

columns = [embed_column, price_column]

:

features = tf.parse_example(..., 
    features=make_parse_example_spec(columns))
dense_tensor = tf.feature_column.input_layer(features, columns)
for units in [128, 64, 32]:
  dense_tensor = tf.layers.dense(dense_tensor, units, tf.nn.relu)
prediction = tf.layers.dense(dense_tensor, 1)

, tf.parse_example , tf.Example ( protobuf):

features {
  feature {
    key: "price"
    value { float_list {
      value: 29.0
    }}
  }
  feature {
    key: "my-text"
    value { bytes_list {
      value: "this"
      value: "product"
      value: "is"
      value: "for sale"
      value: "within"
      value: "us"
    }}
  }
}

, : - , - .

["this", "product", "is", "for sale", "within", "us"].
+5

Source: https://habr.com/ru/post/1689932/


All Articles