I am currently trying to parse a very large kml (xml) file with ruby (Nokogiri) and I have few problems.
The parsing code is good, in fact I will share it just for him, although this code does not have much to do with my problem:
geofactory = RGeo::Geographic.projected_factory(:projection_proj4 => "+proj=lcc +lat_1=34.83333333333334 +lat_2=32.5 +lat_0=31.83333333333333 +lon_0=-81 +x_0=609600 +y_0=0 +ellps=GRS80 +to_meter=0.3048 +no_defs", :projection_srid => 3361) f = File.open("horry_parcels.kml") kmldoc = Nokogiri::XML(f) kmldoc.css("//Placemark").each_with_index do |placemark, i| puts i tds = Nokogiri::HTML(placemark.search("//description").children[0].to_html).search("tr > td") h = HorryParcel.new h.owner_name = tds.shift.text tds.shift tds.each_slice(2) do |k, v| col = k.text.downcase eval("h.#{col} = v.text") end coords = kmldoc.search("//MultiGeometry")[i].text.gsub("\n", "").gsub("\t", "").split(",0 ").map {|x| x.split(",")} points = coords.map { |lon, lat| geofactory.parse_wkt("POINT (#{lon} #{lat})") } geo_shape = geofactory.polygon(geofactory.linear_ring(points)) proj_shape = geo_shape.projection h.geo_shape = geo_shape h.proj_shape = proj_shape h.save end
Anyway, I checked this code with a much smaller kml sample and it works.
However, when I load the real thing, the ruby just waits as if it is processing something. However, this “processing” spans several hours while I do other things. As you can see, I have a counter ( each_with_index
) in the tags array, and during this many hours period, not a single i
value was put
on the command line. Oddly enough, it's not yet timed, but even if it works, there should be a better way to do it.
I know that I can open a KML file in Google Earth (Google Earth Pro here) and save the data in smaller, more manageable kml files, but it seems to be customized, it would be a very manual, unprofessional process.
Here is an example kml (w / only one label) if that helps.
<?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom"> <Document> <name>justone.kml</name> <Style id="PolyStyle00"> <LabelStyle> <color>00000000</color> <scale>0</scale> </LabelStyle> <LineStyle> <color>ff0000ff</color> </LineStyle> <PolyStyle> <color>00f0f0f0</color> </PolyStyle> </Style> <Folder> <name>justone</name> <open>1</open> <Placemark id="ID_010161"> <name>STUART CHARLES A JR</name> <Snippet maxLines="0"></Snippet> <description>""</description> <styleUrl>#PolyStyle00</styleUrl> <MultiGeometry> <Polygon> <outerBoundaryIs> <LinearRing> <coordinates> -78.941896,33.867893,0 -78.942514,33.868632,0 -78.94342899999999,33.869705,0 -78.943708,33.870083,0 -78.94466799999999,33.871142,0 -78.94511900000001,33.871639,0 -78.94541099999999,33.871776,0 -78.94635,33.872216,0 -78.94637899999999,33.872229,0 -78.94691400000001,33.87248,0 -78.94708300000001,33.87256,0 -78.94783700000001,33.872918,0 -78.947889,33.872942,0 -78.948655,33.873309,0 -78.949589,33.873756,0 -78.950164,33.87403,0 -78.9507,33.873432,0 -78.95077000000001,33.873384,0 -78.950867,33.873354,0 -78.95093199999999,33.873334,0 -78.952518,33.871631,0 -78.95400600000001,33.869583,0 -78.955254,33.867865,0 -78.954606,33.867499,0 -78.953833,33.867172,0 -78.952994,33.866809,0 -78.95272799999999,33.867129,0 -78.952139,33.866803,0 -78.95152299999999,33.86645,0 -78.95134299999999,33.866649,0 -78.95116400000001,33.866847,0 -78.949281,33.867363,0 -78.948936,33.866599,0 -78.94721699999999,33.866927,0 -78.941896,33.867893,0 </coordinates> </LinearRing> </outerBoundaryIs> </Polygon> </MultiGeometry> </Placemark> </Folder> </Document> </kml>
-78.94637899999999 <?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom"> <Document> <name>justone.kml</name> <Style id="PolyStyle00"> <LabelStyle> <color>00000000</color> <scale>0</scale> </LabelStyle> <LineStyle> <color>ff0000ff</color> </LineStyle> <PolyStyle> <color>00f0f0f0</color> </PolyStyle> </Style> <Folder> <name>justone</name> <open>1</open> <Placemark id="ID_010161"> <name>STUART CHARLES A JR</name> <Snippet maxLines="0"></Snippet> <description>""</description> <styleUrl>#PolyStyle00</styleUrl> <MultiGeometry> <Polygon> <outerBoundaryIs> <LinearRing> <coordinates> -78.941896,33.867893,0 -78.942514,33.868632,0 -78.94342899999999,33.869705,0 -78.943708,33.870083,0 -78.94466799999999,33.871142,0 -78.94511900000001,33.871639,0 -78.94541099999999,33.871776,0 -78.94635,33.872216,0 -78.94637899999999,33.872229,0 -78.94691400000001,33.87248,0 -78.94708300000001,33.87256,0 -78.94783700000001,33.872918,0 -78.947889,33.872942,0 -78.948655,33.873309,0 -78.949589,33.873756,0 -78.950164,33.87403,0 -78.9507,33.873432,0 -78.95077000000001,33.873384,0 -78.950867,33.873354,0 -78.95093199999999,33.873334,0 -78.952518,33.871631,0 -78.95400600000001,33.869583,0 -78.955254,33.867865,0 -78.954606,33.867499,0 -78.953833,33.867172,0 -78.952994,33.866809,0 -78.95272799999999,33.867129,0 -78.952139,33.866803,0 -78.95152299999999,33.86645,0 -78.95134299999999,33.866649,0 -78.95116400000001,33.866847,0 -78.949281,33.867363,0 -78.948936,33.866599,0 -78.94721699999999,33.866927,0 -78.941896,33.867893,0 </coordinates> </LinearRing> </outerBoundaryIs> </Polygon> </MultiGeometry> </Placemark> </Folder> </Document> </kml>
EDIT: 99.9% of the data I work with is in *.shp
format, so I ignored this problem last week. But I'm going to run this process on my desktop computer (on my laptop) and run it until it expires or ends.
class ClassName attr_reader :before, :after def go @before = Time.now run_actual_code @after = Time.now puts "process took #{(@after - @before) seconds} to complete" end def run_actual_code ... end end
The code above should tell me how much time has passed. From this (if it really ends), we should be able to calculate a crude rule of thumb about how long you should expect your (aka PERFECT) code to work without SAX analysis or “spraying” the text components of the document.