Os.walk () python: xml directory structure representation, recursion

So I'm trying to use os.walk () to create an XML representation of the directory structure. I seem to get a lot of duplicates. It correctly puts the directories in each other and the files in the right place for the first part of the xml file; however, after he does this correctly, he continues to move incorrectly. I'm not quite sure why ....

Here is my code:

def dirToXML(self,directory): curdir = os.getcwd() os.chdir(directory) xmlOutput="" tree = os.walk(directory) for root, dirs, files in tree: pathName = string.split(directory, os.sep) xmlOutput+="<dir><name><![CDATA["+pathName.pop()+"]]></name>" if len(files)>0: xmlOutput+=self.fileToXML(files) for subdir in dirs: xmlOutput+=self.dirToXML(os.path.join(root,subdir)) xmlOutput+="</dir>" os.chdir(curdir) return xmlOutput 

The ToXML file simply parses the list, so there is no need to worry about it.

The directory structure is simple:

 images/ images/testing.xml images/structure.xml images/Hellos images/Goodbyes images/Goodbyes/foo images/Goodbyes/bar images/Goodbyes/square 

and the resulting xml file became:

 <structure> <dir> <name>images</name> <files> <file> <name>structure.xml</name> </file> <file> <name>testing.xml</name> </file> </files> <dir> <name>Hellos</name> </dir> <dir> <name>Goodbyes</name> <dir> <name>foo</name> </dir> <dir> <name>bar</name> </dir> <dir> <name>square</name> </dir> </dir> <dir> <name>foo</name> </dir> <dir> <name>bar</name> </dir> <dir> <name>square</name> </dir> </dir> <dir> <name>Hellos</name> </dir> <dir> <name>Goodbyes</name> <dir> <name>foo</name> </dir> <dir> <name>bar</name> </dir> <dir> <name>square</name> </dir> </dir> <dir> <name>foo</name> </dir> <dir> <name>bar</name> </dir> <dir> <name>square</name> </dir> </structure> 

Any help would be greatly appreciated!

+4
source share
3 answers

I would recommend not using os.walk() , since you need to do so much to massage its output. Instead, just use a recursive function that uses os.listdir() , os.path.join() , os.path.isdir() , etc.

 import os from xml.sax.saxutils import escape as xml_escape def DirAsXML(path): result = '<dir>\n<name>%s</name>\n' % xml_escape(os.path.basename(path)) dirs = [] files = [] for item in os.listdir(path): itempath = os.path.join(path, item) if os.path.isdir(itempath): dirs.append(item) elif os.path.isfile(itempath): files.append(item) if files: result += ' <files>\n' \ + '\n'.join(' <file>\n <name>%s</name>\n </file>' % xml_escape(f) for f in files) + '\n </files>\n' if dirs: for d in dirs: x = DirAsXML(os.path.join(path, d)) result += '\n'.join(' ' + line for line in x.split('\n')) result += '</dir>' return result if __name__ == '__main__': print '<structure>\n' + DirAsXML(os.getcwd()) + '\n</structure>' 

Personally, I would recommend a much less complex XML schema by putting names in attributes and getting rid of the <files> group:

 import os from xml.sax.saxutils import quoteattr as xml_quoteattr def DirAsLessXML(path): result = '<dir name=%s>\n' % xml_quoteattr(os.path.basename(path)) for item in os.listdir(path): itempath = os.path.join(path, item) if os.path.isdir(itempath): result += '\n'.join(' ' + line for line in DirAsLessXML(os.path.join(path, item)).split('\n')) elif os.path.isfile(itempath): result += ' <file name=%s />\n' % xml_quoteattr(item) result += '</dir>' return result if __name__ == '__main__': print '<structure>\n' + DirAsLessXML(os.getcwd()) + '\n</structure>' 

This gives an output, for example:

 <structure> <dir name="local"> <dir name=".hg"> <file name="00changelog.i" /> <file name="branch" /> <file name="branch.cache" /> <file name="dirstate" /> <file name="hgrc" /> <file name="requires" /> <dir name="store"> <file name="00changelog.i" /> 

and etc.

If os.walk() more like expat , it would be easier for you.

+7
source

Delete two lines:

  for subdir in dirs: xmlOutput+=self.dirToXML(os.path.join(root,subdir)) 

You return to subdirectories; but this is redundant because os.walk does the recursion.

+6
source

I tried to use os.walk, but I saw that it did not work with the recursive tree structure that I wanted to create in xml. I changed my code as follows and it gives the result that I need:

 def dirToXML(self,directory): curdir = os.getcwd() os.chdir(directory) xmlOutput="" pathName = string.split(directory, os.sep) xmlOutput+="<dir><name><![CDATA["+pathName.pop()+"]]></name>" for item in os.listdir(directory): if os.path.isfile(os.path.join(directory, item)): xmlOutput+="<file><name><![CDATA["+item+"]]></name></file>" else : xmlOutput+=self.dirToXML(os.path.join(directory,item)) xmlOutput+="</dir>" os.chdir(curdir) return xmlOutput 
0
source

Source: https://habr.com/ru/post/1298931/


All Articles