Ruby - reading and editing an XML file

I am writing a Ruby (1.9.3) script that reads XML files from a folder and then edits them if necessary.

My problem is that I was provided with XML files converted by Tidy , but its output is a bit strange, for example:

<?xml version="1.0" encoding="utf-8"?> <XML> <item> <ID>000001</ID> <YEAR>2013</YEAR> <SUPPLIER>Supplier name test, Coproration</SUPPLIER> ... 

As you can see, there is an additional CRLF. I don't know why this has this behavior, but I am accessing it with a ruby ​​script. But I am having problems, since I need to see if the last character of the string is " > ", or if the first is " < " so that I can see something is wrong with the markup.

I tried:

 Dir.glob("C:/testing/corrected/*.xml").each do |file| puts file File.open(file, 'r+').each_with_index do |line, index| first_char = line[0,1] if first_char != "<" //copy this line to the previous line and delete this one? end end end 

It also seems to me that I should copy the original content of the file when I read it to another temporary file and then overwrite it. Is this the best "way"? Any advice is appreciated as I do not have much experience modifying the contents of files.

Hello

+4
source share
1 answer

<SUPPLIER> this extra \n always appear in the <SUPPLIER> node? Like others, Nokogiri is a great choice for parsing XML (or HTML). You can iterate through each <SUPPLIER> node and remove the \n character, and then save the XML as a new file.

 require 'nokogiri' # read and parse the old file file = File.read("old.xml") xml = Nokogiri::XML(file) # replace \n and any additional whitespace with a space xml.xpath("//SUPPLIER").each do |node| node.content = node.content.gsub(/\n\s+/, " ") end # save the output into a new file File.open("new.xml", "w") do |f| f.write xml.to_xml end 
+10
source

Source: https://habr.com/ru/post/1483957/


All Articles