To solve the problem, you can use the following snippet:
#!/usr/bin/python # -*- coding: utf-8 -*- import re str = u'[DMSM-8433] 加護亜依 Kago Ai – 加護亜依 vs. FRIDAY' regex = u'[\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf]+ (?=[A-Za-z ]+–)' p = re.compile(regex, re.U) match = p.sub("", str) print match.encode("UTF-8")
See the IDEONE demo
Besides the declaration # -*- coding: utf-8 -*- , I added the @nhahtdh character class to detect Japanese characters .
Note that match needs to be encoded as a UTF-8 string “manually”, since Python 2 needs to be “reminded” that we work with Unicode all the time.
source share