I have some JSON entries and I would like to filter out those with the greatest score. If there are multiple entries with the same score I want to print them all.

jq

To parse JSON data from within bash chances are you’ll want to be using jq.

scores.json

{ 
  "rounds": [
    [
      {"name": "alice", "score": 3},
      {"name": "bob",   "score": 2},
      {"name": "ted",   "score": 3}
    ],
    [
      {"name": "alice", "score": 2},
      {"name": "bob",   "score": 3},
      {"name": "ted",   "score": 1}
    ]
  ]
}

sort_by()

sort_by(.field) can be used to sort by a particular field.

$ jq -c '.rounds[] | sort_by(.score)' scores.json
[{"name":"bob","score":2},{"name":"alice","score":3},{"name":"ted","score":3}]
[{"name":"ted","score":1},{"name":"alice","score":2},{"name":"bob","score":3}]

The -c option forces jq to print each object on a single line.

When using sort_by() though, there is no way of knowing where the first “max” score is so let’s try max_by():

$ jq -c '.rounds[] | max_by(.score)' scores.json
{"name":"ted","score":3}
{"name":"bob","score":3}

max_by() only gives a single result even if there is a “tie”.

group_by()

Another of the by() functions is group_by(). It can be given a field to group on and will put same value matches into individual arrays (“groups”). The groups will be sorted by the value of the given field in ascending order.

$ jq '.rounds[0] | group_by(.score)' scores.json
[
  [
    {
      "name": "bob",
      "score": 2
    }
  ],
  [
    {
      "name": "alice",
      "score": 3
    },
    {
      "name": "ted",
      "score": 3
    }
  ]
]

There are 2 arrays inside the result: one for the group of scores with 2 and another for the scores with value 3. They are sorted in ascending order meaning that the last item will be the “max” group. It can be accessed using negative indexing i.e. [-1].

$ jq -c '.rounds[] | group_by(.score)[-1]' scores.json
[{"name":"alice","score":3},{"name":"ted","score":3}]
[{"name":"bob","score":3}]

strings sort separately

It turns out that in the actual data the score values were strings and not numbers like in the sample data. Strings sort differently to numbers and because of this the current approach is “broken”.

We’ll need to change the score values in order to generate a failing test case.

scores-strings.json

{ 
  "rounds": [
    [
      {"name": "alice", "score": "10"},
      {"name": "bob",   "score": "10"},
      {"name": "ted",   "score": "2"}
    ],
    [
      {"name": "alice", "score": "1"},
      {"name": "bob",   "score": "3"},
      {"name": "ted",   "score": "10"}
    ]
  ]
}

Do we have a failing test case?

$ jq -c '.rounds[] | group_by(.score)[-1]' scores-strings.json 
[{"name":"ted","score":"2"}]
[{"name":"bob","score":"3"}]

We do. The problem is that the string "10" sorts before the string "2" (and "3").

$ echo '["1", "2", "10"]' | jq -c sort
["1","10","2"]
$ echo '[1, 2, 10]' | jq -c sort
[1,2,10]

tonumber

We’ll need to turn our strings into numbers. This can be done in jq with tonumber:

$ echo '["1", "2", "10"]' | jq -c '[ .[] | tonumber ] | sort'
[1,2,10]

Return of the sort_by()

Instead of just passing a field to sort_by() we can pass an “expression” that including our newly found tonumber to gives us numerical sorting as opposed to alphabetical sorting:

$ jq -c '.rounds[] | group_by(.score) | sort_by(.[].score | tonumber)[-1]' scores-strings.json 
[{"name":"alice","score":"10"},{"name":"bob","score":"10"}]
[{"name":"ted","score":"10"}]

You may have noticed that we can replace sort_by()[-1] with max_by():

$ jq -c '.rounds[] | group_by(.score) | max_by(.[].score | tonumber)' scores-strings.json 
[{"name":"alice","score":"10"},{"name":"bob","score":"10"}]
[{"name":"ted","score":"10"}]

That’s it!

For more information on jq check out the manual and Cookbook.