I need to parse ~ 2000 xml documents, extract certain nodes from each document, add them to one document and save. I use internal C nodes to use XPath. The problem is that when I iterate over the document, I cannot delete the internal C objects from memory, eventually getting> 4 GB of used memory. I know that the problem is not with the loaded tree (I started the loop, just loading and deleting the hash tree for each document), but with the filtered nodes or the root node.
Here is the code I'm using. What am I missing, so I can clear the memory at the end of each iteration?
xmlDoc <- xmlHashTree()
rootNode <- newXMLNode("root")
for (i in seq_along(all.docs)){
# Read in the doc, filter out nodes, remove temp doc
temp.xml <- xmlParse(all.docs[i])
filteredNodes <- newXMLNode(all.docs[i],
xpathApply(temp.xml,"//my.node[@my.attr='my.value'"))
free(temp.xml)
rm(temp.xml)
# Add filtered nodes to root node and get rid of them.
addChildren(rootNode, filteredNodes)
removeNodes(filteredNodes, free = TRUE)
rm(filteredNodes)
}
# Add root node to doc and save that new log.
xmlDoc <- addChildren(root)
saveXML(xmlDoc, "MergedDocs.xml")
thanks for the help