Processing JSON with jq

March 28, 2018

The command of this week is jq, a flexible tool to process and manipulate JSON input.

Before we start, let me say that a blog post can't do justice to jq without turning into a full manual. If you are eager to learn more, head to the resources section.

The basics

jq [options...] filter [files...]

While jq can take a wide variety of options they mostly concern input and output formatting, the real magic happens in the filter argument.

The key point to understand how jq works is to think about it as a processor: receives an input, processes the input based on the defined filter and generates an output.


Most of the examples below will use the /feed endpoint of the NASA NeoWs API, which retrieves a list of asteroids near Earth in a date range.

For the sake of simplicity, let's assume that in every example, the following curl command is used to retrieve and pipe JSON into jq:

$ curl "\
start_date=$(date +%Y-%m-%d)\

The endpoint retrieves a response that roughly looks like:

"links" : { /* pagination info here */ },
"element_count" : 70,
"near_earth_objects" : {
"2018-03-30" : [/* Asteroids around this date */],
"2018-04-01" : [/* Asteroids around this date */]

If you want to follow along and don't have a terminal at hand, there's an online playground that you can use.

Basic filter

$ jq '.'

The basic filter produces the input unchanged. You can think of the dot as the "current context" which in this case is the whole JSON input.

tip: since jq formats its output by default, you can use this to pretty print JSON in the console.

Accessing fields

$ jq '.element_count'

Fields can be accessed in the JSON data with a dot (.) followed by the field name. You can think of this as accessing a property from the current context.

In this case, we are retrieving the element_count field to know how many asteroids around the Earth are. At the time I ran it, there were 70!


$ jq '.[]'

The family of .[] filters is very versatile:


$ jq '.near_earth_objects | .[] | .[].name'

Filters can be chained with pipes as you would normally do with commands in the terminal. In this example, we are looping through the fields of the .near_earth_objects, which are arrays, and retrieving the names of all asteroids.

Result (click to expand) ```bash "(2015 FN120)" "(1999 FP19)" "(2017 FW90)" "(2018 FH5)" "(2018 FM4)" "(2018 FF1)" "348306 (2005 AY28)" "(2008 GB110)" "(2012 QG42)" "(2016 CC194)" "(2017 GO4)" "(2018 CZ13)" "(2018 FV3)" "(2005 GR33)" "(2007 GS3)" "(2007 RX8)" "(2011 HN5)" "(2014 OE338)" "(2016 GA3)" "489486 (2007 GS3)" "(2018 FV1)" "(2018 EM4)" "(2007 DB61)" "17511 (1992 QN)" "(2001 OT)" "(2004 FG29)" "(2018 ER1)" "(2004 FG1)" "(2008 GH110)" "(2015 XE261)" "474574 (2004 FG1)" "(2017 RO17)" "498548 (2008 GH110)" "(2018 FW4)" "(2011 EC7)" "85953 (1999 FK21)" "204232 (2004 DG2)" "225586 (2000 WS67)" "(2009 FF)" "(2015 TC25)" "(2017 GX6)" "(2017 UD1)" "(2018 FW2)" "(2007 WB)" "(2008 GE)" "(2011 HJ)" "(2013 HM11)" "(2016 AG9)" "(2015 XT168)" "(2008 VR4)" "(2013 OW2)" "509520 (2007 WB)" "(2013 GZ7)" "(2011 UH20)" "441304 (2008 AU26)" "(2004 FJ29)" "(2015 EL7)" "(2016 FF1)" "(2016 WB10)" "(2018 EB)" ```

Object and Array construction

$ jq '{count: .element_count}'
$ jq '.near_earth_objects | [.[] | .[]]'

jq also provides granular control over the formatting of the output by allowing you to explicitly define how it will look like.

In the first example, we are returning an object with element_cont as count.

The second example is a bit more involved, we are arranging all near_earth_objects of all dates in a single array. This can be done in a nicer way with the help of operators and functions (see below).

Result 1 (click to expand) ```js {count: 70} ```
Result 2 (click to expand) ```js [ {/* Asteroid data */}, {/* Asteroid data */}, {/* Asteroid data */}, {/* Asteroid data */}, ] ```

Operators and functions

$ jq '.near_earth_objects | map(.[])'

jq also comes with a group of built-in operators and functions: +, -, length, map, add, range, and the list goes on. In this case we are revisiting our previous example to arrange all near_earth_objects of all dates in a single array using map.

Result (click to expand) ```js [ {/* Asteroid data */}, {/* Asteroid data */}, {/* Asteroid data */}, {/* Asteroid data */}, ] ```

All together now ♫

$ jq '.near_earth_objects | map(.[] | {name, is_potentially_hazardous_asteroid})'

To wrap up, let's build a filter to freak out our family and friends. Every asteroid has a name and a param called is_potentially_hazardous_asteroid which, as the name implies, is currently defined based on parameters that measure the asteroid’s potential to make threatening close approaches to the Earth (source).

Let's retrieve a list of all asteroids names indicating if they are hazardous or not:

Result (click to expand) ```js [ { "name": "(2018 FF1)", "is_potentially_hazardous_asteroid": false }, { "name": "348306 (2005 AY28)", "is_potentially_hazardous_asteroid": true }, { "name": "(2008 GB110)", "is_potentially_hazardous_asteroid": false }, { "name": "(2012 QG42)", "is_potentially_hazardous_asteroid": true } ] ```