Select multiple nodes using Nokogiri and the top ancestor of the node inside the variable

Question

Select multiple nodes using Nokogiri and the top ancestor of the node inside the variable

In the past days, I was looking for any solution to get multiple nodes using Nokogiri according to the reference variable in the node ancestor.

What I need: In fact, I collect all the "Id" s "Segment" node. Then I want to collect all the subsequent "Resources" with the "Segment" node. To collect "resources" I want to set "Id" as a variable.

<CPL> <SegmL> <Segment> <Id>UUID</Id> #UUID as a variable <Name>name_01</Name> <SeqL> <ImageSequence> <Id>UUID</Id> <Track>UUID</Track> <ResourceList> <Resource> #depending on SegmentId <A>aaa</A> <B>bbb</B> <C>ccc</C> <D>ddd</D> </Resource> </ResourceList> </ImageSequence> <AudioSequence> <Id>UUID</Id> <Track>UUID</Track> <ResourceList> <Resource> <A>aaa</A> <B>bbb</B> <C>ccc</C> <D>ddd</D> </Resource> </ResourceList> </AudioSequence> </SequL> </Segment> <Segment> <Id>UUIDa</Id> <Name>name_02</Name> <SequL> <ImageSequence> <Id>UUID</Id> <Track>UUID</Track> <ResourceList> <Resource> <A>aaa</A> <B>bbb</B> <C>ccc</C> <D>ddd</D> </Resource> </ResourceList> </ImageSequence> <AudioSequence> <Id>UUID</Id> <Track>UUID</Track> <ResourceList> <Resource> <A>aaa</A> <B>bbb</B> <C>ccc</C> <D>ddd</D> </Resource> </ResourceList> </AudioSequence> </SequL> </Segment> </SegmL> </CPL>

All resource data collected using A = Resource.css("A").text.gsub(/\n/,"")

 #first each do cpls.each_with_index do |(cpl_uuid, mycpl), index| cpl_filename = mycpl cpl_file = File.open("#{resource_uri}/#{cpl_filename}") cpl = Nokogiri::XML( cpl_file ).remove_namespaces! #get UUID for UUID checks cpl_uuid = cpl.css("Id").first.text.gsub(/\n/,"") cpl_root_edit_rate = cpl.css("EditRate").first.text.gsub(/\s+/, "\/") #second each do cpl.css("Segment").each do |s| # loop segment cpl_segment_list_uuid = s.css("Id").first.text.gsub(/\n/,"") #uuid of segment list #third each do cpl.css("Resource").each do |f| #loop resources cpl_A = f.css("A").text.gsub(/\n/,"") # uuid of A cpl_B = f.css("B").text.gsub(/\n/,"") # uuid of B end #third end #second end #first

My expression gives me this information stored in an array:

 A = 48000.0 B = 240000.0 C = 0.0 D = 240000.0 Some functions to calculate an average on the resources. puts all_arry A = 5.0 B = 5.0 C = 5.0 D = 5.0 A = 5.0 B = 5.0 C = 5.0 D = 5.0 =8 values -> only 4 values existing for the exact loop (2 average values per Segment)

Currently, all "SegmentId" collect all "Resources"

How can I pinpoint the following resources for each segment identifier as a variable?

I used this code, but the cycle is empty, thinking due to several nodes between the "Identifier" of the "Segment" and each "Resource" of "A", "B" ...:

 if cpl.at("Segment/Id:contains(\"#{cpl_segment_list_uuid}\")") cpl.css("Resource").each do |f| #collecting resources here for each segmet end end

All nodes do not have attributes, identifiers, classes, etc.

Can you help me with my problem. First of all, I will thank you for your support!

UPDATE 10/07/16

I also ran code with the following expressions for "each do" on resources:

 expression = "/SegmetList/Segment[Id>cpl_segment_list_uuid]" cpl.xpath(expression).each do |f|

It starts "each of them", but I did not receive internal nodes

 cpl.css("Segment:contains(\"#{cpl_segment_list_uuid}\") > Resource").each do |f|

Same as previous

And with the if condition, the same problem is also:

 if cpl.at("Segment/Id:contains(\"#{cpl_segment_list_uuid}\")").each do|f| #some code end

UPDATE 2016/18/10

In fact, I get the right amount of resources (4), but still do not share for each segment. Thus, in each segment there are the same four resources.

Why am I not getting a double number of all resources, I create an array in the "segment" -loop.

This is the real code:

 #first each do cpls.each_with_index do |(cpl_uuid, mycpl), index| cpl_filename = mycpl cpl_file = File.open("#{resource_uri}/#{cpl_filename}") cpl = Nokogiri::XML( cpl_file ).remove_namespaces! #get UUID for UUID checks cpl_uuid = cpl.css("Id").first.text.gsub(/\n/,"") cpl_root_edit_rate = cpl.css("EditRate").first.text.gsub(/\s+/, "\/") #second each do cpl.css("Segment").each do |s| # loop segment cpl_segment_list_uuid = s.css("Id").first.text.gsub(/\n/,"") #uuid of segment list array_for_resource_data = Array.new #third each do s.css("Resource").each do |f| #loop resources #all resources s.search('//A | //B').each do |f| #selecting only resources "A" and "B" cpl_A = f.css("A").text.gsub(/\n/,"") # uuid of A cpl_B = f.css("B").text.gsub(/\n/,"") # uuid of B end #third end #second end #first

I hope my update provides you with more details. Thanks so much for your help and answer!

UPDATE 2016/31/10

The problem with dual output segments is fixed. Now I have another loop for each sequence under the segments:

 cpl.css("Segment").each do |u| segment_list_uuid = u.css("Id").first.text.gsub(/\n/,"") sequence_list_uuid_arr = Array.new u.xpath("//SequenceList[//*[starts-with(name(),'Sequence')]]").each do |s| sequence_list_uuid = s.css("TrackId").first.text#.gsub(/\n/,"") sequence_list_uuid_arr.push(cpl_sequence_list_uuid) #following some resource nodes s.css("Resource").each do |f| asset_uuid = f.css("TrackFileId").text.gsub(/\n/,"") resource_uuid = f.css("Id").text.gsub(/\n/,"") edit_rate = f.css("EditRate").text.gsub(/\s+/, "\/") #some more code end #resource end #sequence list end #segment

Now I want to get all the different "resources" under each unique sequence. I have to list all the various resources and summarize some of the values collected.

Is there a way to collect each resource with different values (sub-sub) under the same "sequence identifier"? At the moment, I don’t understand for any solution .... so there is no code that I could show you that would work in parts.

each_with_index for the Resource loop is not working.

Do you have any ideas or any approach to help me with my new problem?

+5

variables css ruby nodes nokogiri

Daniel S. 01 Oct '16 at 23:26

source share

1 answer

akuhn · Answer 1 · 2016-12-22T00:22:45+0000

Try

 resource.search('.//A | .//B')

.// will bind the xpath request to the current element, and not search the entire document.

Example

 elem = doc.search('ImageSequence').first elem.search('//A') # returns all A in the whole document elem.search('.//A') # returns all A inside element

Select multiple nodes using Nokogiri and the top ancestor of the node inside the variable

More articles: