Clean and style characters from text

I get text from a feed that has a lot of characters like this:

Insignia™ 2.0 Stereo Computer Speaker System (2-Piece) - Black 4th-Generation Apple® iPod® touch 

Is there an easy way to get rid of them, or should I anticipate which characters I want to delete, and use the delete method to delete them? Also, when I try to remove

 & 

from

 str.delete("&") 

He leaves an amp; Is there a better way to remove this type of character? Do I need to transcode the text?

+6
source share
3 answers

String # delete , of course, is not what you want, since it works with characters, not with the string as a whole.

Try

 str.gsub /&/, "" 

You can also try replacing & to letter ampersand, for example:

 str.gsub /&/, "&" 

If this is closer to what you really want, you can get the best results without attaching an HTML string. If yes, try the following:

 CGI::unescapeHTML(str) 

Details of the unescapeHTML method are here .

+21
source

If you get data from a feed as well as RSS-XML, then you should use an XML parser like Nokogiri to process the XML. This will automatically result in the non-return of HTML objects and will allow you to directly get a string representation.

+1
source

To remove, try using the gsub method, for example:

 text = "foo&bar" text.gsub /\b&\b/, "" #=> foobar 
0
source

Source: https://habr.com/ru/post/899530/


All Articles