How can I pass JSON from a file?

I will probably have a very large JSON file, and I want to transfer it, and not load it all into memory. Based on the following statement (I added emphasis) from JSON::XS , I believe that this does not suit my needs. Is there a Perl 5 JSON module that will transfer results from disk?

In some cases, there is a need for incremental analysis of JSON texts. While this module always needs to store both the JSON text and the resulting Perl data structure in memory at a time , this allows you to analyze the JSON stream sequentially. This is done by accumulating text until it has a full JSON object, which can then be decoded. This process is similar to using decode_prefix to find out if a full JSON object is available, but much more efficient (and can be implemented with minimal method calls).

To clarify, JSON will contain an array of objects. I want to read one object at a time from a file.

+5
source share
5 answers

You have viewed JSON :: Streaming :: Reader , which appears as the first when searching for "JSON Stream" on search.cpan. org?

As an alternative, JSON :: SL is found by searching for 'JSON SAX' - not quite obvious search terms, but what you are describing is similar to SAX for XML.

+3
source

In terms of ease of use and speed, JSON::SL seems to be a winner:

 #!/usr/bin/perl use strict; use warnings; use JSON::SL; my $p = JSON::SL->new; #look for everthing past the first level (ie everything in the array) $p->set_jsonpointer(["/^"]); local $/ = \5; #read only 5 bytes at a time while (my $buf = <DATA>) { $p->feed($buf); #parse what you can #fetch anything that completed the parse and matches the JSON Pointer while (my $obj = $p->fetch) { print "$obj->{Value}{n}: $obj->{Value}{s}\n"; } } __DATA__ [ { "n": 0, "s": "zero" }, { "n": 1, "s": "one" }, { "n": 2, "s": "two" } ] 

JSON::Streaming::Reader was fine, but it is slower and suffers from too verbose interface (all of these coderefs are required, although many do nothing):

 #!/usr/bin/perl use strict; use warnings; use JSON::Streaming::Reader; my $p = JSON::Streaming::Reader->for_stream(\*DATA); my $obj; my $attr; $p->process_tokens( start_array => sub {}, #who cares? end_array => sub {}, #who cares? end_property => sub {}, #who cares? start_object => sub { $obj = {}; }, #clear the current object start_property => sub { $attr = shift; }, #get the name of the attribute #add the value of the attribute to the object add_string => sub { $obj->{$attr} = shift; }, add_number => sub { $obj->{$attr} = shift; }, #object has finished parsing, it can be used now end_object => sub { print "$obj->{n}: $obj->{s}\n"; }, ); __DATA__ [ { "n": 0, "s": "zero" }, { "n": 1, "s": "one" }, { "n": 2, "s": "two" } ] 

It took JSON::SL 2 seconds and JSON::Streaming::Reader 3.6 seconds to analyze 1000 records (note, JSON::SL was filed 4k at a time, I did not control the size of JSON :: Streaming :: Reader).

+11
source

This is done by accumulating text until it has a full JSON object, which can then be decoded.

This is what spins you. A JSON document is a single object.

You need to more clearly define what you want from incremental parsing. Are you looking for one element of large mapping? What are you trying to do with the information you read / write?


I do not know of any library that will gradually analyze JSON data, immediately reading one element from the array. However, it is quite simple to implement using a state machine (the main file is in the format \s*\[\s*([^,]+,)*([^,]+)?\s*\]\s* , except that you need to parse the commas in the lines correctly.)

+2
source

You tried to skip the first right bracket [ , and then the commas,:

 $json->incr_text =~ s/^ \s* \[ //x; ... $json->incr_text =~ s/^ \s* , //x; ... $json->incr_text =~ s/^ \s* \] //x; 

as in the third example: http://search.cpan.org/dist/JSON-XS/XS.pm#EXAMPLES

+2
source

If you have control over how you create JSON, I suggest that you format the formatting and print one object per line. This makes simple parsing:

 use Data::Dumper; use JSON::Parse 'json_to_perl'; use JSON; use JSON::SL; my $json_sl = JSON::SL->new(); use JSON::XS; my $json_xs = JSON::XS->new(); $json_xs = $json_xs->pretty(0); #$json_xs = $json_xs->utf8(1); #$json_xs = $json_xs->ascii(0); #$json_xs = $json_xs->allow_unknown(1); my ($file) = @ARGV; unless( defined $file && -f $file ) { print STDERR "usage: $0 FILE\n"; exit 1; } my @cmd = ( qw( CMD ARGS ), $file ); open my $JSON, '-|', @cmd or die "Failed to exec @cmd: $!"; # local $/ = \4096; #read 4k at a time while( my $line = <$JSON> ) { if( my $obj = json($line) ) { print Dumper($obj); } else { die "error: failed to parse line - $line"; } exit if( $. == 5 ); } exit 0; sub json { my ($data) = @_; return decode_json($data); } sub json_parse { my ($data) = @_; return json_to_perl($data); } sub json_xs { my ($data) = @_; return $json_xs->decode($data); } sub json_xs_incremental { my ($data) = @_; my $result = []; $json_xs->incr_parse($data); # void context, so no parsing push( @$result, $_ ) for( $json_xs->incr_parse ); return $result; } sub json_sl_incremental { my ($data) = @_; my $result = []; $json_sl->feed($data); push( @$result, $_ ) for( $json_sl->fetch ); # ? error: JSON::SL - Got error CANT_INSERT at position 552 at json_to_perl.pl line 82, <$JSON> line 2. return $result; } 
0
source

Source: https://habr.com/ru/post/1434609/


All Articles