Do Rails detect foreign characters?

Question

Do Rails detect foreign characters?

I am wondering if there is a way to detect foreign characters in Rails.

I read that Rails uses Unicode by default, and foreign characters like Chinese and Japanese have ranges in Unicode. Is there an easy way to detect these characters in Rails? or just specify the range of characters that I expect?

Is there a plugin for this? Thanks in advance!

+4

ruby-on-rails unicode character-encoding

gerky Aug 26 '11 at 5:23

source share

2 answers

This is pretty easy with 1.9.2, since regular expressions are based on characters in 1.9.2, and 1.9.2 knows the difference between bytes and characters from top to bottom. You are in Rails, so you should get everything in UTF-8. Fortunately, UTF-8 and ASCII overlap for the entire ASCII range, so you can simply delete anything that is not between ' ' and '~' when you have encoded text in UTF-8 format:

 >> "Wheré is µ~pancakes ho元use?".gsub(/[^ -~]/, '') => "Wher is ~pancakes house?"

In fact, there is no reason to go to all these troubles. Ruby 1.9 works great with Unicode, just like Rails and pretty much everything else. Working with non-ASCII text was a nightmare 15 years ago, now it is widespread and fairly straightforward.

If you manage to get text data that is not UTF-8, then you have some options. If the encoding is ASCII-8BIT or BINARY , then you can probably get away with s.force_encoding('utf-8') . If you end up with something other than UTF-8 and ASCII-8BIT , you can use Iconv to re-encode it.

Literature:

+1

mu is too short Aug 26 '11 at 8:21

source share

edgerunner · Accepted Answer · 2011-08-26T08:32:05+0000

All ideographic language encodings use several bytes to represent a character, and Ruby 1.9+ knows the difference between bytes and characters (Ruby 1.8 is not)

You can compare character length with string byte length as a quick and dirty detector. This is probably not reliable.

class String def multibyte? chars.count < bytes.count end end "可口可樂".multibyte? #=> true "qwerty".multibyte? #=> false

Do Rails detect foreign characters?

More articles: