Select multiple nodes using Nokogiri and the top ancestor of the node inside the variable

In the past days, I was looking for any solution to get multiple nodes using Nokogiri according to the reference variable in the node ancestor.

What I need: In fact, I collect all the "Id" s "Segment" node. Then I want to collect all the subsequent "Resources" with the "Segment" node. To collect "resources" I want to set "Id" as a variable.

<CPL> <SegmL> <Segment> <Id>UUID</Id> #UUID as a variable <Name>name_01</Name> <SeqL> <ImageSequence> <Id>UUID</Id> <Track>UUID</Track> <ResourceList> <Resource> #depending on SegmentId <A>aaa</A> <B>bbb</B> <C>ccc</C> <D>ddd</D> </Resource> </ResourceList> </ImageSequence> <AudioSequence> <Id>UUID</Id> <Track>UUID</Track> <ResourceList> <Resource> <A>aaa</A> <B>bbb</B> <C>ccc</C> <D>ddd</D> </Resource> </ResourceList> </AudioSequence> </SequL> </Segment> <Segment> <Id>UUIDa</Id> <Name>name_02</Name> <SequL> <ImageSequence> <Id>UUID</Id> <Track>UUID</Track> <ResourceList> <Resource> <A>aaa</A> <B>bbb</B> <C>ccc</C> <D>ddd</D> </Resource> </ResourceList> </ImageSequence> <AudioSequence> <Id>UUID</Id> <Track>UUID</Track> <ResourceList> <Resource> <A>aaa</A> <B>bbb</B> <C>ccc</C> <D>ddd</D> </Resource> </ResourceList> </AudioSequence> </SequL> </Segment> </SegmL> </CPL> 

All resource data collected using A = Resource.css("A").text.gsub(/\n/,"")

 #first each do cpls.each_with_index do |(cpl_uuid, mycpl), index| cpl_filename = mycpl cpl_file = File.open("#{resource_uri}/#{cpl_filename}") cpl = Nokogiri::XML( cpl_file ).remove_namespaces! #get UUID for UUID checks cpl_uuid = cpl.css("Id").first.text.gsub(/\n/,"") cpl_root_edit_rate = cpl.css("EditRate").first.text.gsub(/\s+/, "\/") #second each do cpl.css("Segment").each do |s| # loop segment cpl_segment_list_uuid = s.css("Id").first.text.gsub(/\n/,"") #uuid of segment list #third each do cpl.css("Resource").each do |f| #loop resources cpl_A = f.css("A").text.gsub(/\n/,"") # uuid of A cpl_B = f.css("B").text.gsub(/\n/,"") # uuid of B end #third end #second end #first 

My expression gives me this information stored in an array:

 A = 48000.0 B = 240000.0 C = 0.0 D = 240000.0 Some functions to calculate an average on the resources. puts all_arry A = 5.0 B = 5.0 C = 5.0 D = 5.0 A = 5.0 B = 5.0 C = 5.0 D = 5.0 =8 values -> only 4 values existing for the exact loop (2 average values per Segment) 

Currently, all "SegmentId" collect all "Resources"

How can I pinpoint the following resources for each segment identifier as a variable?

I used this code, but the cycle is empty, thinking due to several nodes between the "Identifier" of the "Segment" and each "Resource" of "A", "B" ...:

 if cpl.at("Segment/Id:contains(\"#{cpl_segment_list_uuid}\")") cpl.css("Resource").each do |f| #collecting resources here for each segmet end end 

All nodes do not have attributes, identifiers, classes, etc.

Can you help me with my problem. First of all, I will thank you for your support!

UPDATE 10/07/16

I also ran code with the following expressions for "each do" on resources:

 expression = "/SegmetList/Segment[Id>cpl_segment_list_uuid]" cpl.xpath(expression).each do |f| 

It starts "each of them", but I did not receive internal nodes

 cpl.css("Segment:contains(\"#{cpl_segment_list_uuid}\") > Resource").each do |f| 

Same as previous

And with the if condition, the same problem is also:

 if cpl.at("Segment/Id:contains(\"#{cpl_segment_list_uuid}\")").each do|f| #some code end 

UPDATE 2016/18/10

In fact, I get the right amount of resources (4), but still do not share for each segment. Thus, in each segment there are the same four resources.

Why am I not getting a double number of all resources, I create an array in the "segment" -loop.

This is the real code:

 #first each do cpls.each_with_index do |(cpl_uuid, mycpl), index| cpl_filename = mycpl cpl_file = File.open("#{resource_uri}/#{cpl_filename}") cpl = Nokogiri::XML( cpl_file ).remove_namespaces! #get UUID for UUID checks cpl_uuid = cpl.css("Id").first.text.gsub(/\n/,"") cpl_root_edit_rate = cpl.css("EditRate").first.text.gsub(/\s+/, "\/") #second each do cpl.css("Segment").each do |s| # loop segment cpl_segment_list_uuid = s.css("Id").first.text.gsub(/\n/,"") #uuid of segment list array_for_resource_data = Array.new #third each do s.css("Resource").each do |f| #loop resources #all resources s.search('//A | //B').each do |f| #selecting only resources "A" and "B" cpl_A = f.css("A").text.gsub(/\n/,"") # uuid of A cpl_B = f.css("B").text.gsub(/\n/,"") # uuid of B end #third end #second end #first 

I hope my update provides you with more details. Thanks so much for your help and answer!

UPDATE 2016/31/10

The problem with dual output segments is fixed. Now I have another loop for each sequence under the segments:

 cpl.css("Segment").each do |u| segment_list_uuid = u.css("Id").first.text.gsub(/\n/,"") sequence_list_uuid_arr = Array.new u.xpath("//SequenceList[//*[starts-with(name(),'Sequence')]]").each do |s| sequence_list_uuid = s.css("TrackId").first.text#.gsub(/\n/,"") sequence_list_uuid_arr.push(cpl_sequence_list_uuid) #following some resource nodes s.css("Resource").each do |f| asset_uuid = f.css("TrackFileId").text.gsub(/\n/,"") resource_uuid = f.css("Id").text.gsub(/\n/,"") edit_rate = f.css("EditRate").text.gsub(/\s+/, "\/") #some more code end #resource end #sequence list end #segment 

Now I want to get all the different "resources" under each unique sequence. I have to list all the various resources and summarize some of the values โ€‹โ€‹collected.

Is there a way to collect each resource with different values โ€‹โ€‹(sub-sub) under the same "sequence identifier"? At the moment, I donโ€™t understand for any solution .... so there is no code that I could show you that would work in parts.

each_with_index for the Resource loop is not working.

Do you have any ideas or any approach to help me with my new problem?

+5
source share
1 answer

Try

 resource.search('.//A | .//B') 

.// will bind the xpath request to the current element, and not search the entire document.

Example

 elem = doc.search('ImageSequence').first elem.search('//A') # returns all A in the whole document elem.search('.//A') # returns all A inside element 
0
source

Source: https://habr.com/ru/post/1257597/


All Articles