Extract tags tagged CDATA from .kml to R

I would like to extract values ​​for the description from the .kml file using R.

Here is the file:

<?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:atom="http://www.w3.org/2005/Atom"> <Document> <open>1</open> <visibility>1</visibility> <name><![CDATA[2013-07-06 4:18pm]]></name> ... <Placemark> <name><![CDATA[2013-07-06 4:18pm (Start)]]></name> <description><![CDATA[]]></description> <TimeStamp><when>2013-07-06T20:18:56.000Z</when></TimeStamp> <styleUrl>#start</styleUrl> <Point> <coordinates>-78.353348,45.020615,340.29998779296875</coordinates> </Point> </Placemark> <Placemark id="tour"> <name><![CDATA[2013-07-06 4:18pm]]></name> <description><![CDATA[]]></description> ... <gx:Track> <when>2013-07-06T20:18:56.000Z</when> <gx:coord>-78.353348 45.020615 340.29998779296875</gx:coord> <when>2013-07-06T20:19:12.000Z</when> <gx:coord>-78.353315 45.020644 340.29998779296875</gx:coord> <when>2013-07-06T22:12:23.000Z</when> <gx:coord>-78.353108 45.020736 342.29998779296875</gx:coord> <ExtendedData> ... <Placemark> <name><![CDATA[2013-07-06 4:18pm (End)]]></name> <description><![CDATA[Created by Google My Tracks on Android. Name: 2013-07-06 4:18pm Activity type: cycling Description: - Total distance: 49.62 km (30.8 mi) Total time: 1:53:28 Moving time: 1:50:17 Average speed: 26.24 km/h (16.3 mi/h) Average moving speed: 27.00 km/h (16.8 mi/h) Max speed: 61.20 km/h (38.0 mi/h) Average pace: 2.29 min/km (3.7 min/mi) Average moving pace: 2.22 min/km (3.6 min/mi) Fastest pace: 0.98 min/km (1.6 min/mi) Max elevation: 406 m (1333 ft) Min elevation: 265 m (868 ft) Elevation gain: 690 m (2263 ft) Max grade: 12 % Min grade: -11 % Recorded: 2013-07-06 4:18pm ]]></description> ... </Placemark> </Document> </kml> 

And here is what I want to extract, the text contained in

  <description><![CDATA[Created by Google My Tracks on Android.: ]]></description> 

i.e:.

  Name: 2013-07-06 4:18pm Activity type: cycling Description: - Total distance: 49.62 km (30.8 mi) Total time: 1:53:28 Moving time: 1:50:17 Average speed: 26.24 km/h (16.3 mi/h) Average moving speed: 27.00 km/h (16.8 mi/h) Max speed: 61.20 km/h (38.0 mi/h) Average pace: 2.29 min/km (3.7 min/mi) Average moving pace: 2.22 min/km (3.6 min/mi) Fastest pace: 0.98 min/km (1.6 min/mi) Max elevation: 406 m (1333 ft) Min elevation: 265 m (868 ft) Elevation gain: 690 m (2263 ft) Max grade: 12 % Min grade: -11 % Recorded: 2013-07-06 4:18p 

xmlToList gives me, I think, NULL, because the CDATA tag means that the following stuff is not parsed:

 xml <- xmlTreeParse("test1.kml", useInternalNodes=TRUE) xmllist <- xmlToList(xml) xmllist$Document$Placemark$description [[1]] NULL 

I think this this means "The term CDATA is used for text data that should not be parsed by the XML parser ... The CDATA batch begins with" ""

The following will not work for me, possibly for the same reason as CDATA:

 z1 <- xpathApply(xml, "//description", xmlValue) z1 list() 

Can someone help me extract the text to a file?

Here is the link to the file: https://docs.google.com/file/d/0B__iOdFGJbXYOHJGbWJVNW0tS3M/edit?usp=sharing

+4
source share
2 answers

Jake Burkhead answered this in the comments. His decision does this. And I am very grateful. This is how the text is extracted from the .kml file:

 > xml1 <- xmlTreeParse("2013-07-06 4-18pm.kml", useInternalNodes=TRUE) > root <-xmlRoot(xml1) > names(root[["Document"]]) open visibility name author Style Style Style Style "open" "visibility" "name" "author" "Style" "Style" "Style" "Style" Style Schema Placemark Placemark Placemark "Style" "Schema" "Placemark" "Placemark" "Placemark" > # note that I want the text in the third "Placemark" which is in position [13] so: > xmlValue(root[["Document"]][[13]][["description"]]) [1] "Created by Google My Tracks on Android.\n\nName: 2013-07-06 4:18pm\nActivity type: cycling\nDescription: -\nTotal distance: 49.62 km (30.8 mi)\nTotal time: 1:53:28\nMoving time: 1:50:17\nAverage speed: 26.24 km/h (16.3 mi/h)\nAverage moving speed: 27.00 km/h (16.8 mi/h)\nMax speed: 61.20 km/h (38.0 mi/h)\nAverage pace: 2.29 min/km (3.7 min/mi)\nAverage moving pace: 2.22 min/km (3.6 min/mi)\nFastest pace: 0.98 min/km (1.6 min/mi)\nMax elevation: 406 m (1333 ft)\nMin elevation: 265 m (868 ft)\nElevation gain: 690 m (2263 ft)\nMax grade: 12 %\nMin grade: -11 %\nRecorded: 2013-07-06 4:18pm\n" 

I accepted the answer, but thought I put the complete solution here if it helps others.

Thanks so much for your persistence Jake. Thanks also to Ricardo and agstudy.

+1
source
 doc <- xmlTreeParse("test1.kml", useInternalNodes = TRUE) root <-xmlRoot(doc) xmlValue(root[["Document"]][["name"]]) R> xmlValue(root[["Document"]][["name"]]) [1] "2013-07-06 4:18pm" 

Also xmlToDataFrame(root) and xmlToDataFrame(doc) return this value in the name column. Using xmlToList in the root or doc returns NULL for the value of any CData. I am looking at the name of the node because copying and pasting your example is not xmlParse . From my own little tests, it seems like this should work on any CData.

+3
source

Source: https://habr.com/ru/post/1490419/


All Articles