Awesome jq and GeoJSON

cli
code
geojson
json
spatial data
broadband
Author

Olivier Leroy

Published

March 10, 2024

If you are manipulating a lot of GeoJSON features/objects and want a quick CLI tool to filter and slice them, you should give jq a try! Since there are not many tutorials that exist on using jq to manage objects in the GeoJSON family, we hope that these few tricks will help you on your learning journey.

Quick disclaimer:

We are using a UNIX shell to run commands (zsh). If you are using powershell, you will need to adapt the following examples accordingly.

In these examples, we are using a GeoJSON file of Vermont census blocks with attributes related to our work on broadband data. While it is not a deeply nested JSON, it is perfect to illustrate some common use cases.

A quick check lets us know that it is 94 MB. Not “that” big but still decent.

First, let’s see how many features it has. Here’s how we can approximate that:

wc -l data/vt-bb.geojson
# 24618 data/vt-bb.geojson

This is a decent estimate, but we are counting some rows at the top and bottom of the file that are not features (try head -n 5 and tail on it if you are curious).

We can also use jq:

jq '.features | length' data/vt-bb.geojson
# 24611

This is the correct number of blocks! How did that magic work? Let’s decompose our one-liner:

jq and small examples

It is always a good idea to start experimenting with smaller data so let’s start there:

jq '.features[0:5]'  data/vt-bb.geojson > data/not_perfect_sample.geojson

Here we asked for the [0 to 5[ (yes: [inclusive:exclusive]) features (i.e. the first 5) and jq produces a valid JSON but if you inspect it you will see that we moved from GeoJSON to a JSON array.

jq '.' data/not_perfect_sample.geojson | head -n 4
# [
#   {
#     "type": "Feature",
#     "properties": {
# to compare with :
jq '.' data/vt-bb.geojson | head -n 12/
# { 
#   "type": "FeatureCollection",
#   "name": "sql_statement",
#   "crs": {
#     "type": "name",
#     "properties": {
#       "name": "urn:ogc:def:crs:EPSG::4269"
#     }
#   },
#   "features": [
#     {
#       "type": "Feature",

We used .features so jq returned the following value (here an array with all the features) but we lost type, name, and crs.

You probably have noticed that . is used to return all the input as output but by default jq will prettify the JSON.

If we want to keep them we will need to be slighly more verbose:

jq '{type: .type , crs: .crs ,features: .features[0:10]}' data/vt-bb.geojson > data/better_sample.geojson 

Here we introduced {} allowing you to build a JSON object. We then “stick them” together and send them to a new JSON with a proper type and crs (grabbed from our original file).

Extracting geometries!

If we just want the geometries of our census blocks:

jq '{type: .type , crs: .crs ,features: [.features[] | del(.properties)]}' better_sample.geojson > sample_only_geom.geojson  

Here we are streaming a filter on .features[] into a function that will delete all properties (del(.properties)) and this will be used as an array for features.

We will need to adjust that code a bit for data/vt-bb.geojson:

jq --compact-output  '{type: .type , crs: .crs ,features: [.features[] | del(.properties)]}'  data/vt-bb.geojson > data/geom.geojson

--compact-output will convert to a single line JSON (and saved space!). Now data/geom.geojson is 72MB.

jq , please give me a data frame:

But wait what if we just want the properties?

First let’s get their keys:

At the top level if we do ..

jq `keys` data/better_sample.geojson
#[
#  "crs",
#  "features",
#  "type"
]

.. we get the keys for the first array. We need to go in the features object to get properties and pass it to the keys function. We are a bit lazy and just ask for the first feature.

jq '.features[0].properties | keys' data/better_sample.geojson

Second make them into a csv

Here we will need to buckle up a bit as our code is becoming quite a big line:

jq -r '(.features[0].properties | keys_unsorted), (.features[].properties | to_entries | map(.value))| @csv' data/better_sample.geojson > data/sample.csv
  • (.features[0].properties | keys_unsorted) here nothing new we added parentheses to enforce precedence. We are getting the header of our csv

  • (.features[].properties | to_entries | map(.value)) :

    • we are starting from all our properties (not the first one)
    • passing it to to_entries convert our object to multiple objects with “key” / “value” (see margin)
    • finally, map(.value) gets all “value” for every selected features
{
"key": "state_abbr",
"value": "VT"
},
{
"key": "geoid_st",
"value": "50"
},
{
"key": "geoid_co",
"value": "50005"
}
  • Finally @csv convert to a csv and we redirect the output later in data/sample.csv

We have just explored the surface! jq can help to filter some specific features:

  • every geometries “served” in our file?
  • the first node in every geometries)?
  • etc!

jq is a generic tool for filtering json and lot of people are following the JSON spec in GeoJSON, so we can build on top of all their monumental work!