This stream may be old, but just thought I was adding my 2 cents. Here is a regular expression that can be used to match all English alphanumeric characters, Japanese katakana, hiragana, multi-byte alphanumeric characters [hankaku and zenkaku], dash
/[一-龠]+|[ぁ-ゔ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[a-zA-Z0-9]+[々〆〤]+/u
You can edit it according to your needs, but pay attention to the “u” flag at the end.
Hope this helps!
source share