I have text, process it and find the offset for some words in the text. These offsets will be used by another application, and this application will work with text as a sequence of bytes, so str indexes will be erroneous for it.
Example:
>>> text = ""Hello there!" He said"
>>> text[7:12]
'there'
>>> text.encode('utf-8')[7:12]
>>> b'o the'
So how can I convert indices into a string into indices in bytearray encoded?
source
share