R XML - unable to remove internal C nodes from memory

I need to parse ~ 2000 xml documents, extract certain nodes from each document, add them to one document and save. I use internal C nodes to use XPath. The problem is that when I iterate over the document, I cannot delete the internal C objects from memory, eventually getting> 4 GB of used memory. I know that the problem is not with the loaded tree (I started the loop, just loading and deleting the hash tree for each document), but with the filtered nodes or the root node.

Here is the code I'm using. What am I missing, so I can clear the memory at the end of each iteration?

xmlDoc <- xmlHashTree()
rootNode <- newXMLNode("root")

for (i in seq_along(all.docs)){

  # Read in the doc, filter out nodes, remove temp doc
  temp.xml <- xmlParse(all.docs[i])
  filteredNodes <- newXMLNode(all.docs[i],
                   xpathApply(temp.xml,"//my.node[@my.attr='my.value'"))
  free(temp.xml)
  rm(temp.xml)

  # Add filtered nodes to root node and get rid of them.
  addChildren(rootNode, filteredNodes)
  removeNodes(filteredNodes, free = TRUE)
  rm(filteredNodes)

}
# Add root node to doc and save that new log.
xmlDoc <- addChildren(root)
saveXML(xmlDoc, "MergedDocs.xml") 

thanks for the help

+4
1

"XML" . , "xml2" . , , "xml2". - "XML", .

xmlDoc <- xml_new_document() %>% xml_add_child("root")

for (i in seq_along(all.docs)){
 # Read in the log.
 rawXML <- read_xml(all.docs[i])

 # Filter relevant nodes and cast them to a list of children.
 tempNodes   <- xml_find_all(rawXML, "//my.node[@my.attr='my.value'")
 theChildren <- xml_children(tempNodes)

 # Get rid of the temp doc.
 rm(rawXML)

 # Add the filtered nodes to the log under a node named after the file name
 xmlDoc %>%
  xml_add_child(all.docs[i]  %>%
  xml_add_child(theChildren[[1]]) %>%
  invisible()

 # Remove the temp objects
 rm(tempNodes); rm(theChildren)
}
+1

Source: https://habr.com/ru/post/1658387/


All Articles