Aggregating geodata using geohashes

What we want to do

The situation is the following. We have a set of points with gps coordinates. We want to visualize them in the browser, but with large datasets (e.g. millions of points), displaying all of them is not what we want to do. Instead we want to aggregate them, in order to display multiple points being near each other as one single point.

The following example will be using MongoDB and a geocode system called geohash

What is Geohash

Geohash is a representation of latitude/longitude coordinates as a unique hash.

An example: The geohash of 57.64911,10.40744 would be u4pruydqqvj.

The geohash has a useful characteristic, which is that the closest points are often the ones with the closest geohash, means that they have the most characters in common.

How we use it

The fact that the closest points are often the ones with the closest geohashes allows us to create a database query which aggregates multiple points into one.
In order to aggregate our data points we will be “adding” a field which is the geohash shortened to a certain number of characters. This number of characters depends on how big of an area we want to aggregate into one point.
For example, a precision of 5 characters would represent an area of about 4.9km x 4.9km, whereas 9 characters would be 4.8m x 4.8m (according to elastic.co)

  • Let’s say our stored data has the following structure and is stored in a collection called ‘point’:
      {
          '_id' : 'somemongodbobjectID',
          'gps' : [57.64911,10.40744],
          'geohash' : 'u4pruydqqvj'
      }
    
  • In order to aggregate all points we are using mongodb’s aggregate pipeline.
    1. The first step is it to add the shortened geohash as a field. We will just call it shortGeohash. This is done using mongodb’s $project pipeline stage and the $substr operator, which allows us to create a substring of an existing field. $substr takes 3 arguments, where the first is the field to create the shortened string of, and the other twos define where to slice the string.
       db.point.aggregate([
           { $project : {shortGeohash: {$substr: ["$geohash", 0, 9]}}},
       ])
      
    2. The second step is two aggregate the points which have the same shortened geohash, means they are located in the same area of 4.8m x 4.8m (in this example). In order to know which documents are grouped together we also $push the documents before the $group stage with the $$ROOT operator.
       db.point.aggregate([
           { $project : {shortGeohash: {$substr: ["$geohash", 0, 9]}}},
           { $group: {_id: "$shortGeohash", count: {$sum:1}, originalDoc:{$push: "$$ROOT" }}
       ])
      
    3. This will result in an array of documents grouped by their geohash. We now have the points aggregated by their location and we have an array of IDs, which will allow us to do further work on our aggregated data.
       [{ _id: "u4pruydqq",
           count: 2,
           originalDoc: [{
               "_id": "5579b75416b8101ca37d9ab0",
               "shortGeohash": "u4pruydqq"
           }, {
               "_id": "5579b75416b8101ca37d9ab1",
               "shortGeohash": "u4pruydqq"
           }] 
       }, { _id: "u4pruydqr",
           count: 5,
           originalDoc:
           [{
               "_id": "5579b75416b8101ca37d9ab2",
               "shortGeohash": "u4pruydqr"
           }, {
               "_id": "5579b75416b8101ca37d9ab3",
               "shortGeohash": "u4pruydqr"
           }, {
               "_id": "5579b75416b8101ca37d9ab4",
               "shortGeohash": "u4pruydqr"
           }, {
               "_id": "5579b75416b8101ca37d9ab5",
               "shortGeohash": "u4pruydqr"
           }, {
               "_id": "5579b75416b8101ca37d9ab",
               "shortGeohash": "u4pruydqr" 
           }]
       }]
      
Advertisements