Python - How to insert file read loops?

Question

Python - How to insert file read loops?

2 days ago I first became acquainted with Python (and programming in general). Today I am stuck. I spent several hours trying to find the answer to what, I suspect, is a problem so trivial that no one is stuck here :)

The boss wants me to manually clean up the HUGE .xml files into something more human readable. I am trying to create a script to do this for me. Below is an example .xml file, as well as my desired result.

Input (File.xml):

<IssueTracking> <Issue> <SequenceNum>123</SequenceNum> <Subject>Subject of Ticket 123</Subject> <Description>Line 1 in Description field of Ticket 123. Line 2 in Description field of Ticket 123. Line 3 in Description field of Ticket 123.</Description> </Issue> <Issue> <SequenceNum>124</SequenceNum> <Subject>Subject of Ticket 124</Subject> <Description>Line 1 in Description field of Ticket 124. Line 2 in Description field of Ticket 124. Line 3 in Description field of Ticket 124.</Description> </Issue> </IssueTracking>

Output Required:

 123 Subject of Ticket 123 Line 1 in Description field of Ticket 123. Line 2 in Description field of Ticket 123. Line 3 in Description field of Ticket 123. 124 Subject of Ticket 124 Line 1 in Description field of Ticket 124. Line 2 in Description field of Ticket 124. Line 3 in Description field of Ticket 124.

Here is what I got so far.

 with open(File.xml, 'r') as SourceFile: # Opens the file while 1: # Keep going through the file to the end SourceFileLine = SourceFile.readline() # Saves lines of the source file if not SourceFileLine: # Skip empty lines break SourceFileLine = SourceFileLine.strip() # Strips the whitespace if "<SequenceNum>" in SourceFileLine: SequenceNum = SourceFileLine[13:-14] # Trims the tags, saves the field. continue if "<Subject>" in SourceFileLine: Subject = SourceFileLine[9:-10] continue #if "<Description>" in SourceFileLine: # last_pos = SourceFile.tell() # while "</Description>" not in SourceFileLine: # SourceFile.seek(last_pos) # ????? # # Description = Description[22:] # continue if "</Issue>" in SourceFileLine: print(SequenceNum, end = "\t") print(Subject) # print(Description) print("\n")

I am stuck in defining and saving these three lines between the <Description> tags into one line that I can print before continuing with the original file. Now, having looked at dozens of other examples of line reading cycles, I suspect that I need to indicate that the point I reached in the destination field and put a new reading cycle in it. But I did not find another example of this, so I assume that I have something missing or there is a better way. Thank you in advance!

+6

python loops readline

phlogiston Jul 20 '12 at 19:20

source share

2 answers

Please do not read XML files like this, for python there are various libraries that will help in reading XML files.

Take a look at the python lxml library, it provides a very easy way to read and parse XML files, which will greatly improve your code.

I would explain how to use the library itself, but their documentation is much better than I can squeeze into this text area: http://lxml.de/tutorial.html

+6

sean Jul 20 '12 at 19:24

source share

Jon clements · Accepted Answer · 2012-07-20T19:52:25+0000

An example of using lxml, which I highly recommend processing your data. (nb: written for Py2.x, but easy to adapt for Py3.x)

 from lxml import etree xml = """<IssueTracking> <Issue> <SequenceNum>123</SequenceNum> <Subject>Subject of Ticket 123</Subject> <Description>Line 1 in Description field of Ticket 123. Line 2 in Description field of Ticket 123. Line 3 in Description field of Ticket 123.</Description> </Issue> <Issue> <SequenceNum>124</SequenceNum> <Subject>Subject of Ticket 124</Subject> <Description>Line 1 in Description field of Ticket 124. Line 2 in Description field of Ticket 124. Line 3 in Description field of Ticket 124.</Description> </Issue> </IssueTracking> """ root = etree.fromstring(xml) for issue in root.findall('Issue'): as_list = [issue.find(n).text for n in ('SequenceNum', 'Subject', 'Description')] as_list[2] = as_list[2].split('\n') print as_list

Print

 ['123', 'Subject of Ticket 123', ['Line 1 in Description field of Ticket 123.', 'Line 2 in Description field of Ticket 123.', 'Line 3 in Description field of Ticket 123.']] ['124', 'Subject of Ticket 124', ['Line 1 in Description field of Ticket 124.', 'Line 2 in Description field of Ticket 124.', 'Line 3 in Description field of Ticket 124.']]

Python - How to insert file read loops?

Input (File.xml):

Output Required:

More articles: