Unable to remove node in nokigiri

I have a slightly strange problem with Nokogiri in Rails. I am trying to remove the tag "p" with the class "why". I have the following code that does not work:

def test_grab
  f = File.open("public/test.html")
  @doc = Nokogiri::HTML.parse(f)
  f.close
  @doc = @doc.css("p")
  @doc.each do |p|
    if p["class"] == "why"
      logger.info p.values
      p.remove
    end
  end
end

test.html:

<html>
<head>
    <title>Test</title>
</head>
<body>
    <p>Test data</p>
    <p>More <a href="http://stackoverflow.com">Test Data</a></p>
    <p class="why">Why is this still here?</p>
</body>
</html>

Html output file:

<p>Test data</p>
<p>More <a href="http://stackoverflow.com">Test Data</a></p>
<p class="why">Why is this still here?</p>

I know that rails code is included in the if loop because logger.info is displayed on the server terminal.

Any ideas?

+3
source share
1 answer

Is there a reason you are reusing an instance variable @doc?

When it comes to resolving such issues, I find it best to try to evaluate the same code without the overhead of Rails. For instance:

require 'nokogiri'

doc = Nokogiri::HTML(DATA)
doc.css("p").each do |p|
  p.remove if p["class"] == "why" 
end

__END__
<html>
<head>
    <title>Test</title>
</head>
<body>
    <p>Test data</p>
    <p>More <a href="http://stackoverflow.com">Test Data</a></p>
    <p class="why">Why is this still here?</p>
</body>
</html>

What returns:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><title>Test</title></head>
<body>
    <p>Test data</p>
    <p>More <a href="http://stackoverflow.com">Test Data</a></p>

</body>
</html>

paragraphs = @doc.css("p"), paragraphs.each .. , .

+5

Source: https://habr.com/ru/post/1778272/


All Articles