I am writing a Ruby (1.9.3) script that reads XML files from a folder and then edits them if necessary.
My problem is that I was provided with XML files converted by Tidy , but its output is a bit strange, for example:
<?xml version="1.0" encoding="utf-8"?> <XML> <item> <ID>000001</ID> <YEAR>2013</YEAR> <SUPPLIER>Supplier name test, Coproration</SUPPLIER> ...
As you can see, there is an additional CRLF. I don't know why this has this behavior, but I am accessing it with a ruby ββscript. But I am having problems, since I need to see if the last character of the string is " > ", or if the first is " < " so that I can see something is wrong with the markup.
I tried:
Dir.glob("C:/testing/corrected/*.xml").each do |file| puts file File.open(file, 'r+').each_with_index do |line, index| first_char = line[0,1] if first_char != "<" //copy this line to the previous line and delete this one? end end end
It also seems to me that I should copy the original content of the file when I read it to another temporary file and then overwrite it. Is this the best "way"? Any advice is appreciated as I do not have much experience modifying the contents of files.
Hello
source share