\n\n\nMy Page\n\n\n

Remove space from html document with ruby

So I have a line in ruby, something like

str = "<html>\n<head>\n\n  <title>My Page</title>\n\n\n</head>\n\n<body>" +
      "  <h1>My Page</h1>\n\n<div id=\"pageContent\">\n  <p>Here is a para" +
      "graph. It can contain  spaces that should not be removed.\n\nBut\n" +
      "line breaks that should be removed.</p></body></html>"

How to remove all spaces (spaces, tabs and lines) that are outside the tag / not inside a tag that has content such as <p>using only native Ruby?

(I would like to avoid using XSLT or something for the task, it's simple.)

+3
source share
4 answers
str.gsub!(/\n\t/, " ").gsub!(/>\s*</, "><")

The first gsub!replaces all line breaks and tabs with spaces, the second removes spaces between tags.

, \n \t, - " . Butline breaks", . .squeeze(" ") .

+9

, regexen, . :

str.gsub(/>\s*/, ">").gsub(/\s*</, "<")

, /\s/ , . , regexen "\r", Windows .

<p> foo bar </p> <p>foo bar</p>, .

+5

You can condense all groups of spaces into one space (i.e. hello worldin hello world) using String # squeeze:

"hello     world".squeeze(" ")  # => "hello world"

If the compression parameter is a character that should be compressed.

EDIT: I misunderstood your question, sorry.

This will

  • remove spaces inside tags
  • leave separate spaces out of tags

Now I will work on a solution.

+1
source
xml.squish.gsub /(> <)/, '><'

Even shorter than above.

PS I like funny faces.

0
source

Source: https://habr.com/ru/post/1750831/


All Articles