a common problem
Although I may be diagnosing the root cause of the event, determining how many users have affected it, or extracting the synchronization logs to assess the impact of the recent code change on performance and throughput, my tools remain grep : grep , awk , sed , tr , uniq , sort , zcat , tail , head , join and split . To glue them all together, Unix gives us channels, and for more xargs filtering we have xargs . If they me perl -e , there is always perl -e .
These tools are ideal for processing CSV files, tab delimited files, predictable line format log files, or comma-separated key-value files. In other words, files in which each line has no context.
XML Analogs
Recently, I needed to scroll through gigabytes of XML in order to build a histogram of user usage. It was simple enough with the tools that I had, but for more complex queries, normal approaches do not work. Let's say I have files with these elements:
<foo user="me"> <baz key="zoidberg" value="squid" /> <baz key="leela" value="cyclops" /> <baz key="fry" value="rube" /> </foo>
And let's say I want to make a comparison between the user and the average <baz> on <foo> . Processing line by line is no longer an option: I need to know which user <foo> I am currently checking to know whose average value to update. Any kind of Unix one liner that performs this task is likely to be incomprehensible.
Fortunately, in the XML field, we have great technologies like XPath, XQuery, and XSLT that can help us.
I used to use the wonderful Perl XML::XPath module to execute queries like the one above, but after finding the TextMate plugin that can run an XPath expression for my current window , I stopped writing one-time Perl scripts to request XML. And I just found out about XMLStarlet, which is installed when typing this text and which I look forward to in the future.
JSON solutions?
So this leads me to my question: are there any such tools for JSON? It’s just a matter of time before some research task will require me to perform similar requests to JSON files, and without tools like XPath and XSLT, such a task would be much more complicated. If I had a JSON bundle that would look like this:
{ "firstName": "Bender", "lastName": "Robot", "age": 200, "address": { "streetAddress": "123", "city": "New York", "state": "NY", "postalCode": "1729" }, "phoneNumber": [ { "type": "home", "number": "666 555-1234" }, { "type": "fax", "number": "666 555-4567" } ] }
And I wanted to know the average number of phone numbers each person had, I could do something like this with XPath:
fn:avg(/fn:count(phoneNumber))
Questions
- Are there any command line tools that can “request” JSON files this way?
- If you need to process a bunch of JSON files on a Unix command line, what tools do you use?
- Damn, is there any work on creating such a query language for JSON?
- If you use such tools in your daily work, what do you like / dislike about them? Are there any bugs?
I note that more and more data serialization is performed using JSON, so such processing tools will be critical in analyzing large data dumps in the future. The language libraries for JSON are very strong, and it’s easy enough to write scripts for such processing, but in order for people to really be able to play with data shell tools, they are necessary.
Related issues
- Equivalent to Grep and Sed for XML Command Line Processing
- Is there a query language for JSON?
- JSONPath or another XPath-like utility for JSON / Javascript; or jquery json