Uploading measured data using Socket.IO

Task

The current task is it to upload a .csv file to the web server using Socket.IO. Socket.IO is a realtime framework for web applications, but can be used for communication between two NodeJS instances as well. It provides improved communication mechanisms over websockets and some fallback options as well.

What we want to implement will consist of a client, which will read data from a .csv file and transmit this data row by to the web server, which will receive the data, parse and transform it into JSON and then save it in the database.

The Client

The client consists only of a few lines of code. A connection to the server will be established. After that, the file will be read and the content will be split up into rows (by splitting between each new line: ‘\n’ ). Then each row will be sent via the websocket to the server. After sending each row, a ‘done’ sequence will be sent as a simple workaround for the server to see whether the whole file was sent.

var io = require('socket.io-client'),
 fs = require('fs');
 _ = require('lodash');

var socket = io.connect('http://localhost:3002');

fs.readFile('test_data/newFormat.csv', function(err, data) {
   data = data.toString('utf-8');
   data = data.split('\n');
   console.log('Scanned file with '+data.length+' rows');
   _.each(data, function(row) {
     socket.emit('upload', row);
   });
   socket.emit('upload', "####done####");
});

The Server

The server consists of a simple http server, on which the Socket.IO framework will listen for incoming connections. For each connection, a new socket will be opened and each it will receive the data row by row. The data is then parsed into JSON and stored in an array in order to buffer. Only if 1000 elements are in our buffer, the content will be saved in the database. This is because MongoDBs maximum bulk insert size is 1000 objects, and these settings resulted in the fastest execution of both receiving and storing the objects.

'use strict';
var app = require('express')();
var server = require('http').Server(app);
var io = require('socket.io')(server);
var _ = require('lodash');
var Promise = require('bluebird');
var mongoose = Promise.promisifyAll(require('mongoose')),
Schema = mongoose.Schema,
Measurement = Promise.promisifyAll(require('./model/Measurement'));

mongoose.connectAsync('mongodb://localhost/roadstar_csv')
.then(server.listen(3002));

io.on('connection', function (socket) {
    socket.on('upload', receiveData);
});

var buffer = [];
var ops = [];
var firstTime = true;

function receiveData(chunk) {
     var op;

    if(firstTime) {
       console.time('receiving rows');
       console.time('writing to db');
       firstTime = false;
    } else if(chunk === '####done####') {
       op = Measurement.collection.insertAsync(buffer);
       ops.push(op);
       buffer = [];
 
       console.timeEnd('receiving rows');
       Promise.all(ops)
       .then(function() {
         console.timeEnd('writing to db');
         firstTime = true;
     });
   } else {
         try {
             chunk = dataToJSON(chunk);
             if(buffer.length < 1000) {
                 buffer.push(chunk);
             } else {
                 op = Measurement.collection.insertAsync(buffer);
                 ops.push(op);
                 buffer = [];
                 buffer.push(chunk);
            }
         } catch (err) {
             console.log(err);
         }
     }
}

Working with MongoDB and GeoJSON

What is MongoDB

“MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling.

A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents.”

http://docs.mongodb.org/manual/core/introduction/

What is GeoJSON

GeoJSON is a format for encoding a variety of geographic data structures.

– http://geojson.org

GeoJSON allows storing geographic data as Points, LineString, Polygon and many more formats. Each geometric information can be enriched with properties and is then called a Feature

{ 
    "type": "Feature",
    "geometry": {"type": "Point", "coordinates": [102.0, 0.5]},
    "properties": {"someproperty": "somevalue"}
}

These features again can be grouped into collections, called FeatureCollection

Combine best of both worlds

Working with GeoJSON and MongoDB in NodeJS works very simple, because MongoDBs JSON-like documents allow us to store GeoJSON as is.

A short example using the mongoose for MongoDB would look like the following.
(Attention: this example is shortened and missing some boilerplate code)

var GeojsonfeatureSchema = new Schema({
    type: {type: String},
    'geometry' : {
        type: {type: String},
        'coordinates' : {
            'type' : [Number],
            'index' : '2dsphere',
            'required' : true
        }
    },
    'properties' : {
        'speed' : Number,
        'measurement' : Number,
        'quality' : String
    }
});

mongoose.model('GeojsonFeature', GeojsonfeatureSchema);
var GeojsonFeature = mongoose.model('GeojsonFeature');

new GeojsonFeature({
    'type' : 'Feature',
    'geometry' : {
        'type' : 'Point',
        'coordinates' : [50.2, 9.7],
    },
    'properties' : {
        'speed' : 10
        'measurement' : 9.06
        'quality' : "very bad"
    }
}).save(function(err, doc) {
 //...
));

Create an index and query geo data

After storing data in our database, it is now about time to think about how to get data out of the database again. Because we are working with geo data, it would be nice to retrieve data in a “show me all entries near a certain coordinate” way. So let’s find a way to do this.
An index over the coordinates in our document collection was automatically created because we added the property 'index' : '2dsphere'. See details about these indexes here.

Because MongoDB is awesome, it now lets us query our data in a very intuitiv way, creating queries like this:

Find all data near a coordinate

Note: GeoJSON defines the first of the coordinates to be longitude!

var query = {
    'coordinates' : {
        $near: {
        $geometry: {
             type: "Point" ,
             coordinates: [ lng , lat ]
        },
        $maxDistance: distance,
        $minDistance: 0
        }
    }
};
Geojson.find(query, '-__v -_id', function(err, doc) {
    //hooray we've got our documents 
});

Find all data in a given bounding box

var query = {
    'geometry.coordinates': {
        $geoWithin: {
            $box: [
                [ swlng, swlat ],
                [ nelng , nelat ]
            ]
        }
    }
};
Geojson.find(query, '-__v -_id', function(err, doc) {
    //hooray we've got our documents 
});

That’s all about that. No complex calculations, just some simple queries 🙂