The number of bytes required to represent unicode depends on the encoding you use.
>>> s = u'μ μ ' >>> len(s) 2 >>> len(s.encode('UTF-8')) 6 >>> len(s.encode('UTF-16')) 6 >>> len(s.encode('UTF-32')) 12
If you intend to reuse the encoding result, I recommend that you encode it once, and then pull out its len and reuse the already encoded result later.
source share