Unable to retrieve values ​​from map in apache pig

I have a simple relation, v , in Apache Pig:

 dump v; (151364,[ 'ref'#'R813','highway'#'secondary', 'name:ga'#'Lána Chairdif', 'name'#'Cardiff Lane'],(31015271, 31053762)) (151368,[ 'ref'#'N1', 'oneway'#'yes','designation'#'Buses Only', 'highway'#'trunk', 'motor_vehicle'#'designated', 'name:ga'#'Cearnóg Pharnell Thoir', 'maxspeed'#'30', 'name'#'Parnell Square East'],(389365, 540403072)) (151596,[ 'name:en'#'Liffey', 'boundary'#'administrative', 'name:ga'#'An Life','admin_level'#'8', 'name'#'Liffey', 'waterway'#'river'],(1347749, 1426049020, 1347745, 1426049019, 1347742, 900075612)) (367947,[ 'maxspeed'#'80', 'ref'#'L2223','highway'#'tertiary'],(13259933, 2384217, 335978958)) (367952,['created_by'#'YahooApplet 1.0', 'name'#'Charnwood Avenue', 'highway'#'residential'],(2384386, 25963471, 14949594, 2384385, 6146344, 2384254)) (508603,[ 'ref'#'L3018','highway'#'tertiary', 'maxspeed'#'50', 'name'#'Shelerin Road'],(2854184, 2854168, 335978984, 2853307, 2384254, 335978978, 335978975, 2655735, 2655703, 392675957, 11676198, 920037194, 244531387, 2655952, 11675077)) (727153,[ 'ref'#'N8','highway'#'trunk', 'name'#'Merchants' Quay'],(354153, 453344873)) (727157,['highway'#'unclassified', 'oneway'#'yes', 'maxspeed'#'30', 'name'#'Kyle Street'],(354168, 354167)) (727159,['highway'#'unclassified', 'oneway'#'yes', 'maxspeed'#'30', 'name'#'North Main Street'],(354178, 465226768, 354167, 413995429, 72219131, 685537307, 1232381779, 354164)) (727161,[ 'maxspeed'#'30','highway'#'pedestrian', 'name'#'Maylor Street'],(1486492976, 1515360721, 1515360722, 1515345383, 1515344226, 1515344227, 1515344228, 1515344231)) 

In the @orangeoctopus tip, I tried to recover my data with any ' in the key names, and I have this data:

 (151364,[ ref#'R813', name:ga#'Lána Chairdif', name#'Cardiff Lane',highway#'secondary'],(31015271, 31053762)) (151368,[ motor_vehicle#'designated', name#'Parnell Square East', highway#'trunk', oneway#'yes',designation#'Buses Only', maxspeed#'30', name:ga#'Cearnóg Pharnell Thoir', ref#'N1'],(389365, 540403072)) (151596,[ name:en#'Liffey', boundary#'administrative', waterway#'river', name:ga#'An Life',admin_level#'8', name#'Liffey'],(1347749, 1426049020, 1347745, 1426049019, 1347742, 900075612)) (367947,[highway#'tertiary', maxspeed#'80', ref#'L2223'],(13259933, 2384217, 335978958)) (367952,[ name#'Charnwood Avenue',created_by#'YahooApplet 1.0', highway#'residential'],(2384386, 25963471, 14949594, 2384385, 6146344, 2384254)) (508603,[ maxspeed#'50', ref#'L3018', name#'Shelerin Road',highway#'tertiary'],(2854184, 2854168, 335978984, 2853307, 2384254, 335978978, 335978975, 2655735, 2655703, 392675957, 11676198, 920037194, 244531387, 2655952, 11675077)) (727153,[highway#'trunk', name#'Merchants' Quay', ref#'N8'],(354153, 453344873)) (727157,[ oneway#'yes', maxspeed#'30', name#'Kyle Street',highway#'unclassified'],(354168, 354167)) (727159,[ oneway#'yes', maxspeed#'30', name#'North Main Street',highway#'unclassified' (354178, 465226768, 354167, 413995429, 72219131, 685537307, 1232381779, 354164)) (727161,[highway#'pedestrian', name#'Maylor Street', maxspeed#'30'],(1486492976, 1515360721, 1515360722, 1515345383, 1515344226, 1515344227, 1515344228, 1515344231)) 

In both cases, v has the same circuit / structure:

 grunt> describe v; 2012-01-09 22:55:34,271 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s). v: {id: int,tags: map[ ],nodes: (null)} 

Then I try to extract only one value from the tags map:

 grunt> w = foreach v generate tags#'ref'; dump w; 

But that only gives me empty data, although some elements have data here.

 () () () () () () () () () () 

With the old "quoted" keys, I tried (according to @orangeoctopus solution)

 w = foreach v generate tags#'\'ref\''; 

And it gave me the same “empty” data and it didn’t work. (I also tried other combinations of ' and " , for example "'ref'" / '"ref"' / etc., but all but '\'ref\'' were invalid latin pig syntax)

What's happening? If I try to filter based on the tag value (for example, filter v by tags#'highway' != '' ), I get nothing, which is consistent with this problem, related to the fact that you cannot extract data from the map, not Am I doing something wrong?

+4
source share
2 answers

Very difficult!

Your problem is that your literal data includes single quotes. Your line is not ref (3 characters long), it is 'ref' (5 characters long). I figured this out because a dump of a card containing strings usually does not contain quotes there.

Therefore, you need to enter keywords, including these quotes (you need to avoid them with \ ):

 grunt> w = foreach v generate tags#'\'ref\''; 

Another option is to change the way data is loaded so that it does not include single quotation marks in the lines themselves and does not cut them out. PigStorage does not do this for free, but you can use something like REPLACE or your own UDF for this.

+4
source

Are you loading data correctly? It is strange that there is a place after [and before] when you discard a card.

It is also easier to discard all quotation marks in the key and value in the input . For instance:

Input file

 151364 [ref#R813,highway#secondary] 

Pigs

 a = LOAD 'data.txt' AS (id:INT, m:MAP[]); DUMP a; b = FOREACH a GENERATE m#'ref'; DUMP b; 

Output

 (151364,[highway#secondary,ref#R813]) (R813) 
+3
source

Source: https://habr.com/ru/post/1390136/


All Articles