How do you use Unicode characters in a regular expression in Ruby?

Question

How do you use Unicode characters in a regular expression in Ruby?

I am trying to write a line of code that will take a line of Japanese text and delete a specific character set. However, I am having problems using Unicode characters inside a regex.

I am currently using text.gsub(/《.*?》/u, '') , but getting an error

 'gsub': invalid byte sequence in Windows-31J (Argument error)

Can someone tell me what I am doing wrong?

Sample text: その仕草 "しぐさ" があまりに無造作むむぞささだだったので

Expected Result: その仕草があまりに無造作だったのの

thanks

edit: # encoding: utf-8 present at the top of the script.

+4

ruby regex unicode

Somberclock Mar 05 '12 at 1:55

source share

1 answer

Limbo peng · Accepted Answer · 2012-03-05T02:19:58+0000

Try the following:

 text.encode('utf-8', 'utf-8').gsub(/《.*?》/u, '')

How do you use Unicode characters in a regular expression in Ruby?

More articles: