Monday, July 01, 2013

Querying JSON using JSONPath

Keywords:
JSON query filter evaluate JSONPath expression

Problem:
Dealing with code that's doing a fair bit of JSON processing (traversing object structures and collating the values) it leads to the question - is there a standard way to lookup or filter data from the JSON structure using a query or expression?

The same question has been already been asked on StackOverflow: Is there a query language for JSON? and the answer is essentially that there are many approaches. There's another list of approaches with examples in 8 ways to query json structures.

So if there's no 'standard' approach (for example, backed by an RFC as is the case with JSON), which approach should I pick?


Solution:
JSONPath proposed by Stefan Goessner allows for XPath-like expressions to be evaluated against JSON. What's possible with the expression syntax seems to cover most of the traversal and collating scenarios I was interested in and the JavaScript implementation is remarkably lightweight (implemented as just one function jsonPath(<object>, <expression>, <optional arguments object>)).

Here some extra expression examples (beyond those shown in the JSONPath articles and jsonpath on Google Code):

Expression Result
$ The root element (the same JSON back out - i.e. an identity transform).
$.* All child elements of the root.
$.store.book All book elements.
$.store.book[1] The second book element.
$..price The price of every element - result will be Numbers.
$.store.book[?(@.isbn=='0-553-21311-3')] The book element where isbn is '0-553-21311-3'.
$.store.book[?(@.category=='fiction' && @.price > 10)] The book element where category is 'fiction' and price is more than 10.

It seems like there's still some discussion around what's to be supported and there isn't exactly high activity on the project. So if using it, treat is as beta and evaluate if it does what you need. It's likely to be one of those things that once it gets past v0 there will be less (if any) changes to expect.

Some things to look out for:
  • Where's the latest version?
    Although the downloads page says the latest version is 0.8.0, this few years old (2007). You can access later versions (of the JavaScript source) either via the SVN path to the trunk or via the Google Code Browser. The javascript implementation lists the version as 0.8.5 - I'm not sure what the cycle is for getting this to 'release'.
  • No results returned as false
    I'd expect this to be simply an empty array [].
  • Results as Arrays of Arrays
    If querying a nested JSON structure of objects and the matching results match different parts of the tree, what you get back is a nested array structure indicating the relative location of the matches. I'd prefer to get this back as a flattened array. I guess the logic is that have the context of where the items were found and if you want, you can post-process (and flatten) the Arrays of Arrays - some useful approaches discussed on StackOverflow:Merge/flatten an Array of Arrays in JavaScript?.
  • the Object and it's properties in the results?
    In using the latest code mentioned above, it seems that - in some scenarios - where an object matches the expression the result includes the object as well as the properties as additional items in the result. For example:
    Instead of: You get:
    [
        {
            "category": "fiction",
            "author": "Herman Melville",
            "title": "Moby Dick",
            "isbn": "0-553-21311-3",
            "price": 8.99
        }
    ]
    
    [
        {
            "category": "fiction",
            "author": "Herman Melville",
            "title": "Moby Dick",
            "isbn": "0-553-21311-3",
            "price": 8.99
        },
        "fiction",
        "Herman Melville",
        "Moby Dick",
        "0-553-21311-3",
        8.99
    ]
    
    This wasn't the case with 0.8.0 and may change in future releases?