Smart character replacement using ruby ​​gsub and regexp

I am trying to create a permalink as a behavior for some article titles, and I do not want to add a new db field for permalink. Therefore, I decided to write an assistant that converts the title of my article from:

"O" focoasă "a pornit cruciada, împotriva bărbaţilor zgârciţi" to "O-focoasa-a-pornit-cruciada-impotriva-barbatilor-zgarciti".

So far I have figured out how to replace spaces with a hyphen and remove other special characters (except -) using:

title.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase 

I am wondering if there is another way to replace a character with another different character from just one call to the .gsub method, so I won’t need to bind the title.gsub ("ă", "a") methods for all the special UTF-8 characters of my localization.

I was thinking of creating a hash with all the special characters and their counterparts, but I still haven't figured out how to use variables with regular expressions.

I was looking for something like:

 title.gsub(/\s/, "-").gsub(*replace character goes here*).gsub(/[^\w-]/, '').downcase 

Thanks!

+4
source share
2 answers

I solved this in my application using the Unidecoder gem:

 require 'unidecode' def uninternationalize(str) Unidecoder.decode(str).gsub("[?]", "").gsub(/`/, "'").strip end 
+5
source

If you want to transliterate only one character to another, you can use the String#tr method, which does the same as the Unix tr command: replace each character in the first list with the character in the same position in the second list:

 'Ünicöde'.tr('ÄäÖöÜüß', 'AaOoUus') # => "Unicode" 

However, I agree with @Daniel Vandersluis: it would be nice to use a more specialized library. Things like this can become really tedious, very fast. In addition, many of these characters actually have standardized transliterations (ä → ae, ö → oe, ..., ß → ss), and users can rely on the correct transliteration (of course I don’t like getting the name Jorg - if you really should, you can call me Jörg, but I really prefer Jörg), and if you have a library that provides you with these transliterations, why not use them? Please note that there are many transliterations that are not single characters and therefore cannot be used with String#tr in any case.

+4
source

Source: https://habr.com/ru/post/1307486/


All Articles