To embed Unicode literals such as ½ in your Python 2 script, you need to use a special comment at the top of your script, which lets the interpreter know how Unicode is encoded. If you want to use UTF-8, you will also need to tell the editor to save the file as UTF-8. And if you want to print Unicode text, make sure your terminal is also configured to use UTF-8.
Here is a short example tested on Python 2.6.6
# -*- coding: utf-8 -*- value = "a string with fractions like 2½ in it" value = value.replace("½",".5") print(value)
Output
a string with fractions like 2.5 in it
Note that I use ".5" as a replacement string; using "0.5" converts "2½" to "20.5" , which is not true.
Actually, these lines should be marked as Unicode lines, for example:
# -*- coding: utf-8 -*- value = u"a string with fractions like 2½ in it" value = value.replace(u"½", u".5") print(value)
For more information about using Unicode in Python, see Pragmatic Unicode , which was written by veteran SO Ned Batchelder.
I should also mention that you will need to change your regular expression pattern so that it has a decimal point in numbers. For instance:
# -*- coding: utf-8 -*- from __future__ import print_function import re pat = re.compile(r'[-+]?(?:\d*?[.])?\d+', re.U) data = u"+2½ -105 -2½ -115 +2½ -105 -2½ -115 +2½ -102 -2½ -114" print(data) print(pat.findall(data.replace(u"½", u".5")))
Output
+2½ -105 -2½ -115 +2½ -105 -2½ -115 +2½ -102 -2½ -114 [u'+2.5', u'-105', u'-2.5', u'-115', u'+2.5', u'-105', u'-2.5', u'-115', u'+2.5', u'-102', u'-2.5', u'-114']