Unicode in TensorFlow standard format

Following the documentation here , I am trying to create functions from unicode strings. Here’s what the object creation method looks like,

def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

This will throw an exception,

  File "/home/rklopfer/.virtualenvs/tf/local/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 512, in init
    copy.extend(field_value)
  File "/home/rklopfer/.virtualenvs/tf/local/lib/python2.7/site-packages/google/protobuf/internal/containers.py", line 275, in extend
    new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
  File "/home/rklopfer/.virtualenvs/tf/local/lib/python2.7/site-packages/google/protobuf/internal/type_checkers.py", line 108, in CheckValue
    raise TypeError(message)
TypeError: u'Gross' has type <type 'unicode'>, but expected one of: (<type 'str'>,)

Naturally, if I wrap valuein str, it will fail with the first actual Unicode character it encounters.

+4
source share
2 answers

The definition of BytesList is in feature.proto and has a type repeated bytes, which means you need to pass it something that can be converted to a list of byte sequences.

unicode , , . . IE, UTF-8

value.encode("utf-8")
+5

Source: https://habr.com/ru/post/1651321/