Saturday, September 23, 2017

Exposing Data in a Captive Portal

So far, BME-280 sensor data has been exposed as a JSON API and as a web page. There are two problems with the second approach:

  • There must be a wifi router to which the ESP8266 can connect
  • The IP address the router assigns must be somehow displayed before we visit the page

In an earlier blog post, the IP address was displayed in the Arduino IDE's serial monitor. Of course, we could connect a display and output the IP address there. Wouldn't it be nice if our browser were automatically redirected to that page, sort of like what happens when you connect to a public network in your favorite coffee shop?

That's exactly what we'll do.

We establish our own access point such that whenever the user connects to that network, his browser will be automatically directed to a specific landing page. This landing page is called a "captive portal", and is typically used for user authentication, gather information, plant tracking cookies, etc.

Captive portals are not just for coffee shops any more - ours will display temperature, humidity, and air pressure from the BME-280 sensor!

Here's the code:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
/**
 * BME280-Captive-Portal
 * 
 * By: Mike Klepper
 * Date: 23 September 2017
 * 
 * This program exposes BME-280 data as a captive portal.
 * 
 * See patriot-geek.blogspot.com
 * for explanation.
 */

#include "ESP8266WiFi.h"
#include "DNSServer.h"
#include "ESP8266WebServer.h"
#include "Adafruit_Sensor.h"
#include "Adafruit_BME280.h"

const int STARTUP_DELAY = 500;

const char* MESSAGE_503 = "503 Service Unavailable";

const char* AP_NAME = "BME-280 Data";
const byte DNS_PORT = 53;
const IPAddress SUBNET_MASK(255, 255, 255, 0);
const IPAddress AP_IP(192, 168, 1, 1);
const byte WEB_SERVER_PORT = 80;

boolean sensorAvailable;

DNSServer dnsServer;
ESP8266WebServer webServer(WEB_SERVER_PORT);

Adafruit_BME280 bme;

void setup(void)
{
  // Start the BME280 sensor
  if(!bme.begin())
  {
    sensorAvailable = false;
  }
  else
  {
    sensorAvailable = true;
    delay(STARTUP_DELAY);
  }

  WiFi.mode(WIFI_AP);
  WiFi.softAPConfig(AP_IP, AP_IP, SUBNET_MASK);
  WiFi.softAP(AP_NAME);

  dnsServer.start(DNS_PORT, "*", AP_IP);

  webServer.onNotFound([]() 
  {
    returnHtml();
  });
  
  webServer.begin();
}

void loop(void)
{
  dnsServer.processNextRequest();
  webServer.handleClient();
  yield();
}

void returnHtml()
{
  if(sensorAvailable)
  {
    // Get values
    float tempC = bme.readTemperature();
    float humidity = bme.readHumidity();
    float pressurePascals = bme.readPressure();
  
    // Convert to British units
    float tempF = 9.0/5.0 * tempC + 32.0;
    float pressureInchesOfMercury = 0.000295299830714 * pressurePascals;
  
    // Build HTML response
    String responseHtml = "";
    responseHtml += "<!DOCTYPE html>";
    responseHtml += "<html>";
    responseHtml += "    <head>";
    responseHtml += "        <meta charset=\"UTF-8\">";
    responseHtml += "        <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">";
    responseHtml += "        <title>BME-280 Sensor Data</title>";
    responseHtml += "        <style>";
    responseHtml += "            body {font-family: sans-serif}";
    responseHtml += "            h1 {font-size: 1.0cm}";
    responseHtml += "            p {font-size: 0.50cm}";
    responseHtml += "            button {font-size: 1.0cm}";
    responseHtml += "        </style>";
    responseHtml += "        <script type=\"text/javascript\">";
    responseHtml += "        function refreshPage()";
    responseHtml += "        {";
    responseHtml += "            location.reload(true);";
    responseHtml += "        }";
    responseHtml += "        </script>";
    responseHtml += "    </head>";
    responseHtml += "    <body>";
    responseHtml += "        <h1>BME-280 Sensor Data</h1>";
    responseHtml += "        <p>Temperature: " + String(tempF) + "&deg; F</p>";
    responseHtml += "        <p>Humidity: " + String(humidity) + "%</p>";
    responseHtml += "        <p>Pressure: " + String(pressureInchesOfMercury) + " inHg</p>";
    responseHtml += "        <button type=\"button\" onclick=\"refreshPage()\">Refresh</button>";
    responseHtml += "    </body>";
    responseHtml += "</html>";
  
    // Return HTML with correct MIME type
    webServer.send(200, "text/html; charset=utf-8", responseHtml);
  }
  else
  {
    webServer.send(500, "text/plain", MESSAGE_503);
  }
}

Once it is running, you will see a new wireless network, called "BME-280 Data".

Connect to it, and you're immediately presented with the sensor data!

The only new part of the code are on lines 49 through 58:

49 Set the ESP8266 into access point mode
50 Set the AP's local IP address, gateway IP, and subnet mask
51 Specify the AP's name
53 Start a domain name server
55-58 Return the same HTML regardless of the address the user tries to visit

This technique - creating a captive portal - has many applications beyond displaying sensor data. In particular, it can be used to view and edit configuration values. This will be explored in a future post.

Why bother having the user connect to that AP - why not just make the AP name to be the sensor data? One problem with this is that the AP name will usually be truncated by the network manager of the computer or smart phone. Further, most computers or smart phones cache the names of the networks it finds.

Tuesday, September 5, 2017

MongoDB: Connecting with Node.js

This tutorial demonstrates how to

 

Introduction

There are two popular libraries for connecting to MongoDB from Node.js:
  • Native Mongo Driver
  • Mongoose

Native Mongo Driver, also called node-mongodb-native, has official support from the creators of MongoDB, and the commands it uses are similar to those used in the MongoDB shell.

Mongoose provides an ODM (Object -> Document Mapping) service. This allows the imposition of schemas on our schemaless MongoDB! It also provides field-level validations, which prevents the insertion of unexpected values in a document's fields. Mongoose is built atop Native Mongo Driver.

We will be using the first one in this tutorial. We will create several JS files that reproduce our "Queer Eye for the Straight Guy" database from a previous post.

First, start the MongoDB daemon in a Terminal window, then start the MongoDB Shell in another Terminal window. If the qeftsg database is still present, remove it using the following commands:

use qeftsg
db.cast.drop();

Now we are ready to recreate the database using Node.js!

 

Creating a Node.js Project

Assuming that Node.js and NPM are installed, open a new Terminal window and create a simple project:

mkdir testproject
cd testproject
npm init

After answering several questions, the following package.js file will be created:

{
  "name": "testproject",
  "version": "1.0.0",
  "description": "Demonstrates how to connect to MongoDB with Node.js",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [
    "MongoDB",
    "Node.js"
  ],
  "author": "Mike Klepper",
  "license": "ISC"
}

Now we install the Native Mongo Driver:

npm install mongodb --save

Once this command has finished running, there are two changes to the TestProject folder:

  1. There is a new folder called node_modules
  2. The project.json file has been modified: a "dependencies" section has been added:
{
  "name": "testproject",
  "version": "1.0.0",
  "description": "Demonstrates how to connect to MongoDB with Node.js",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [
    "MongoDB",
    "Node.js"
  ],
  "author": "Mike Klepper",
  "license": "ISC",
  "dependencies": {
    "mongodb": "^2.2.31"
  }
}

 

Connecting to MongoDB

First, create a file called connect.js in the same folder as package.json, and edit it to read:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/**
 * connect.js
 *
 * By: Mike Klepper
 * Date: 3 September 2017
 *
 * This program demonstrates how to connect to MongoDB
 *
 * See blog post on patriot-geek.blogspot.com
 * for instructions.
 */

var MongoClient = require("mongodb").MongoClient;

const DATABASE_NAME = "qeftsg";
const URL = "mongodb://localhost:27017/" + DATABASE_NAME;

MongoClient.connect(URL, onConnected);

function onConnected(err, db)
{
  if(!err)
  {
    console.log("Connected to MongoDB!");
    closeConnection(err, db);
  }
  else
  {
    console.log("Problems connecting to the DB: " + err);
  }
}

function closeConnection(err, db)
{
  console.log("Closing connection");
  db.close();
}

Run the program as follows:

node connect.js
The output indicates that we successfully connected:
Connected to MongoDB!
Closing connection

Great, we've connected to the database!

 

Create

Now let's create the "cast" collection and populate it with the Fab Five. Create a new program called create.js:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
/**
 * create.js
 * 
 * By: Mike Klepper
 * Date: 3 September 2017
 * 
 * This program creates five records in MongoDB
 * 
 * See blog post on patriot-geek.blogspot.com
 * for instructions.
 */

var MongoClient = require("mongodb").MongoClient;

const DATABASE_NAME = "qeftsg";
const URL = "mongodb://localhost:27017/" + DATABASE_NAME;

var ted = {
 name: "Ted Allen",
 description: "Food and Wine Connoisseur",
 specialties: ["Alcohol", "Beverages", "Food Preparation", "Food Presentation"],
 age: 52,
 contactInfo:
 {
  website: "tedallen.net",
  email: "info@tedallen.net"
 }
};

var carson = {
 name: "Carson Kressley",
 description: "Fashion Savant",
 specialties: ["Clothing", "Fashion", "Personal Styling"],
 age: 47,
 contactInfo:
 {
  website: "carsonkressley.com",
  facebook: "carsonkressley"
 }
};

var kyan = {
 name: "Kyan Douglas",
 description: "Grooming Guru",
 specialties: ["Hair", "Grooming", "Personal Hygiene", "Makeup"],
 age: 47,
 contactInfo:
 {
  instagram: "kyandouglas",
  facebook: "kyandouglasactual"
 }
};

var thom = {
 name: "Thom Filicia",
 description: "Design Doctor",
 specialties: ["Interior Design", "Home Organization"],
 age: 48,
 contactInfo:
 {
  facebook: "thomfiliciainc",
  email: "info@thomfilicia.com",
  website: "thomfilicia.com"
 }
};

var jai = {
 name: "Jai Rodriguez",
 description: "Culture Vulture",
 specialties: ["Popular Culture", "Relationships", "Social Interaction"],
 age: 38,
 contactInfo:
 {
  website: "myspace.com/jairodriguezmusic",
  facebook: "JaiRodriguezfanpage"
 }
};

MongoClient.connect(URL, onConnected);

function onConnected(err, db)
{
  if(!err)
  {
    console.log("Connected to MongoDB!");
    insertCast(db, closeConnection);
  }
  else
  {
    console.log("Problems connecting to the DB: " + err);
  }
}

function insertCast(db, callback)
{
  var cast = db.collection("cast");
  cast.insertMany([ted, carson, kyan, thom, jai], onInsertManyComplete);
  callback(null, db);
}

function onInsertManyComplete(err, result)
{
  if(!err)
  {
    console.log("Inserted " + result.result.n + " documents into the document collection");
    console.dir(result);
  }
  else
  {
    console.log("Problems inserting records: " + err);
  }
}

function closeConnection(err, db)
{
  console.log("Closing connection");
  db.close();
}

We run the program by entering node create.js and it outputs the following:

Connected to MongoDB!
Closing connection
Inserted 5 documents into the document collection
{ result: { ok: 1, n: 5 },
  ops: 
   [ { name: 'Ted Allen',
       description: 'Food and Wine Connoisseur',
       specialties: [Object],
       age: 52,
       contactInfo: [Object],
       _id: [Object] },
     { name: 'Carson Kressley',
       description: 'Fashion Savant',
       specialties: [Object],
       age: 47,
       contactInfo: [Object],
       _id: [Object] },
     { name: 'Kyan Douglas',
       description: 'Grooming Guru',
       specialties: [Object],
       age: 47,
       contactInfo: [Object],
       _id: [Object] },
     { name: 'Thom Filicia',
       description: 'Design Doctor',
       specialties: [Object],
       age: 48,
       contactInfo: [Object],
       _id: [Object] },
     { name: 'Jai Rodriguez',
       description: 'Culture Vulture',
       specialties: [Object],
       age: 38,
       contactInfo: [Object],
       _id: [Object] } ],
  insertedCount: 5,
  insertedIds: 
   [ ObjectID { _bsontype: 'ObjectID', id: [Object] },
     ObjectID { _bsontype: 'ObjectID', id: [Object] },
     ObjectID { _bsontype: 'ObjectID', id: [Object] },
     ObjectID { _bsontype: 'ObjectID', id: [Object] },
     ObjectID { _bsontype: 'ObjectID', id: [Object] } ] }

 

Retrieve

Now let's retrieve the records. Create a new file called retrieve.js and add the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/**
 * retrieve.js
 *
 * By: Mike Klepper
 * Date: 3 September 2017
 *
 * This program retrieves all records in the cast collection
 *
 * See blog post on patriot-geek.blogspot.com
 * for instructions.
 */

var MongoClient = require("mongodb").MongoClient;

const DATABASE_NAME = "qeftsg";
const URL = "mongodb://localhost:27017/" + DATABASE_NAME;

var database = null;
MongoClient.connect(URL, onConnected);

function onConnected(err, db)
{
  if(!err)
  {
    console.log("Connected to MongoDB!");
    database = db;
    findEverybody(db, closeConnection);
  }
  else
  {
    console.log("Problems connecting to the DB: " + err);
  }
}

function findEverybody(db, closeConnection)
{
  var cast = db.collection("cast");
  cast.find({}, {name: 1, age: 1, _id: 0}).toArray(onResultsFound);
}

function onResultsFound(err, docs)
{
  if(!err)
  {
    console.log("Found the following docs:");
    console.dir(docs);
  }
  else
  {
    console.log("Problems retrieving documents: " + err);
  }

  closeConnection(null, database);
}

function closeConnection(err, db)
{
  console.log("Closing connection");
  database.close();
}

Run the program with node retrieve.js and the output is as expected:

Connected to MongoDB!
Found the following records
[ { name: 'Ted Allen', age: 52 },
  { name: 'Carson Kressley', age: 47 },
  { name: 'Kyan Douglas', age: 47 },
  { name: 'Thom Filicia', age: 48 },
  { name: 'Jai Rodriguez', age: 38 } ]
Closing connection

 

Update

To demonstrate how to update a record, we will update Ted's age. Create a file called update.js and add the following to it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/**
 * update.js
 *
 * By: Mike Klepper
 * Date: 3 September 2017
 *
 * This program update a record in the cast collection
 *
 * See blog post on patriot-geek.blogspot.com
 * for instructions.
 */

var MongoClient = require("mongodb").MongoClient;

const DATABASE_NAME = "qeftsg";
const URL = "mongodb://localhost:27017/" + DATABASE_NAME;

var database = null;
MongoClient.connect(URL, onConnected);

function onConnected(err, db)
{
  if(!err)
  {
    console.log("Connected to MongoDB!");
    database = db;
    updateTed(db, closeConnection);
  }
  else
  {
    console.log("Problems connecting to the DB: " + err);
  }
}

function updateTed(db, callback)
{
  var cast = db.collection("cast");
  cast.updateOne({name: "Ted Allen"}, {$set: {age: 53}}, onUpdate);
}

function onUpdate(err, result)
{
  if(!err)
  {
    console.log("Updated Ted");
    console.log(JSON.stringify(result, null, 5));
  }
  else
  {
    console.log("Problems updating the DB: " + err);
  }

  closeConnection(null, database);
}

function closeConnection(err, db)
{
  console.log("Closing connection");
  database.close();
}

Run the program with node update.js and the output is:

Connected to MongoDB!
Updated Ted
{
     "n": 1,
     "nModified": 1,
     "ok": 1
}
Closing connection

Then rerun retrieve.js and we see that Ted's age has indeed been set to 53:

Connected to MongoDB!
Found the following records
[ { name: 'Ted Allen', age: 53 },
  { name: 'Carson Kressley', age: 47 },
  { name: 'Kyan Douglas', age: 47 },
  { name: 'Thom Filicia', age: 48 },
  { name: 'Jai Rodriguez', age: 38 } ]
Closing connection

 

Delete

Finally, let's delete Jai from the database. Create delete.js and add the following lines:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/**
 * delete.js
 *
 * By: Mike Klepper
 * Date: 3 September 2017
 *
 * This program deletes a record from the cast collection
 *
 * See blog post on patriot-geek.blogspot.com
 * for instructions.
 */

var MongoClient = require("mongodb").MongoClient;

const DATABASE_NAME = "qeftsg";
const URL = "mongodb://localhost:27017/" + DATABASE_NAME;

var database = null;
MongoClient.connect(URL, onConnected);

function onConnected(err, db)
{
  if(!err)
  {
    console.log("Connected to MongoDB!");
    database = db;
    deleteJai(db, closeConnection);
  }
  else
  {
    console.log("Problems connecting to the DB: " + err);
  }
}

function deleteJai(db, callback)
{
  var cast = db.collection("cast");
  cast.deleteOne({name: "Jai Rodriguez"}, onDeleted);
}

function onDeleted(err, result)
{
  if(!err)
  {
    console.log("Deleted Jai");
    console.log(JSON.stringify(result, null, 5));
  }
  else
  {
    console.log("Problems deleting document: " + err);
  }

  closeConnection(err, database);
}

function closeConnection(err, db)
{
  console.log("Closing connection");
  database.close();
}

Run it, and the output should be:

Connected to MongoDB!
Deleted Jai
{
     "n": 1,
     "ok": 1
}
Closing connection

 

Conclusion

That's the basics of using Native Mongo Driver. There is much more to the Native Mongo Client, including:

  • Batch operations
  • Cursor management
  • Aggregation pipeline
  • Application performance management

A great source for information and tutorials is the official documentation.

MongoDB: Aggregation Pipeline

In this post we will:

Then we will discuss how to use:

Note: if you're getting no results when running these pipelines, be sure that you switched into the right database!

 

Introduction

Sometimes it is necessary to perform calculations on documents stored in MongoDB, such as for data analysis. There are several ways of doing this, one of which is to use "aggregation pipelines". These work similar to how piping works with Unix commands. Documents enter a pipeline that transforms them into aggregated results. Each step in an aggregation pipeline is called a "stage", and the results of each stage are passed to the next stage. These stages are described using an array of expressions (JSON objects) which start with a stage operator. A stage may filter-out or transform incoming documents or generate new documents based on the incoming documents, depending on the stage operator.

Almost all of these expressions are stateless, meaning they operate only on the current document. The exception to this are the accumulator operators.

Accumulator operators maintain running calculations as docs pass through the pipeline. For example, there is a $sum operator which provides a running total. This functionality changes when we use it in a different type of pipeline. For example when $sum is applied to an array field, it adds together all elements in that array. Examples of these two different behaviors will be provided below.

The stage operators we will cover are:

$project
either adds a new field or removes an existing field. For each incoming document, one modified document is outputted.
$sort
reorders the documents according to a sort key.
$match
filters the documents to allow only those that match a specified criteria. Each incoming document either passes-through the filter unmodified, or is not passed.
$count
counts the documents at this stage in the pipeline.
$group
groups input documents and then applies an accumulator expression (if any) to each group.
$unwind
creates one document for each element of an array field.

We have already seen some of these operators at work, as part of .find() and .sort().

The syntax of aggregation, if there is only one stage, is:

db.<collection>.aggregate(<A_Single_Stage>, <Options>);
or
db.<collection>.aggregate([<A_Single_Stage>], <Options>);
But if there are more than one stage, the syntax is:
db.<collection>.aggregate([<stage_1>, <stage_2>...], <Options>);

We will not be using the Options argument in this post. It is optional and will be omitted.

 

Two New Demo Databases

For this tutorial, we not only will use the "Query Eye for the Straight Guy" database, but two new databases. These are based closely on the sample databases found in the MongoDB documentation.

The bookstore inventory database

Consider how to represent the inventory of a bookstore in MongoDB. In a relational database, we would create at least two tables, one for Books and another for Authors. Authors would have a primary key called authorId, and Books will use that as a foreign key.

In a NoSQL database, we want to denormalize this data, so we will store the author's name in the book document, like this:

{title:"Atlas Shrugged",  author:"Rand",   quantity:3, format:"soft",  unitPrice:9.99 }
{title:"The Odyssey",   author:"Homer",  quantity:8, format:"soft",  unitPrice:8.01 }
{title:"Ringworld",   author:"Niven",  quantity:3, format:"audio",  unitPrice:23.99 }
{title:"Iliad",   author:"Homer",  quantity:7, format:"soft",  unitPrice:7.50 }
{title:"The World in Shadow",  author:"Beale",  quantity:2, format:"soft",  unitPrice:6.99 }
{title:"SJWs Always Lie",  author:"Day",   quantity:2, format:"soft",  unitPrice:6.99 }
{title:"Dangerous",   author:"Yiannopoulos",  quantity:2, format:"hard",  unitPrice:18.00 }
{title:"The Fountainhead",  author:"Rand",   quantity:2, format:"soft",  unitPrice:8.99 }
{title:"Tales of Known Space",  author:"Niven",  quantity:6, format:"soft",  unitPrice:6.99 }

Using the MongoDB Shell, we create a new database called "bookstore", and inside that we create a collection called "inventory" that contains the above records:

use bookstore
db.inventory.insertMany([
 {title:"Atlas Shrugged", author:"Rand", quantity:3, format:"soft", unitPrice:9.99 },
 {title:"The Odyssey", author:"Homer", quantity:8, format:"soft", unitPrice:8.01 },
 {title:"Ringworld", author:"Niven", quantity:3, format:"audio", unitPrice:23.99 },
 {title:"Iliad", author:"Homer", quantity:7, format:"soft", unitPrice:7.50 },
 {title:"The World in Shadow", author:"Beale", quantity:2, format:"soft", unitPrice:6.99 },
 {title:"SJWs Always Lie", author:"Day", quantity:2, format:"soft", unitPrice:6.99 },
 {title:"Dangerous", author:"Yiannopoulos", quantity:2, format:"hard", unitPrice:18.00 },
 {title:"The Fountainhead", author:"Rand", quantity:2, format:"soft", unitPrice:8.99 },
 {title:"Tales of Known Space", author:"Niven", quantity:6, format:"soft", unitPrice:6.99 }
]);

The class scores database

We will also need another database that represents student scores:

{name: "Rob", quizzes: [10, 10, 8], labs: [10, 7, 9], midterm:95, final:93 }
{name: "Sam", quizzes: [9, 10, 10], labs: [8, 8, 10], midterm:87, final:90 }
{name: "Zoe", quizzes: [5, 4, 5],   labs: [6, 5, 6],  midterm:75, final:79 }
{name: "Rae", quizzes: [7, 10, 9],  labs: [8, 8, 9],  midterm:89, final:85 }

So, switch to a new database called "class" and populate a collection called "scores" with that data:

use class
db.scores.insertMany([
 {name: "Rob", quizzes: [10, 10, 8], labs: [10, 7, 9], midterm:95, final:93 },
 {name: "Sam", quizzes: [9, 10, 10], labs: [8, 8, 10], midterm:87, final:90 },
 {name: "Zoe", quizzes: [5, 4, 5], labs: [6, 5, 6], midterm:75, final:79 },
 {name: "Rae", quizzes: [7, 10, 9], labs: [8, 8, 9], midterm:89, final:85 }
]);

 

Using the $project Operator to Create Calculated Fields

The $project stage operator allows us to either include or exclude fields, to rename fields, or to create calculated fields. It "reshapes" the document. For each incoming document there is one outgoing document.

To demonstrate this, let's get the total value (quantity * unitPrice) for each book in the bookstore database.

First, switch into the bookstore database:

use bookstore
then run the following:
db.inventory.aggregate(
 {
  $project: 
  {
   title: 1, 
   author: 1,
   _id: 0,
   totalValue: {$multiply: ["$unitPrice", "$quantity"]}
  }
 }
);

This pipeline has exactly one stage, a $project stage, which includes the title and author fields, excludes the _id field, and includes a calculated field. It is not necessary to indent the query like that, it was done for sake of readability.

The results are as expected:

{ "author" : "Rand", "title" : "Atlas Shrugged", "totalValue" : 29.97 }
{ "author" : "Homer", "title" : "The Odyssey", "totalValue" : 64.08 }
{ "author" : "Niven", "title" : "Ringworld", "totalValue" : 71.97 }
{ "author" : "Homer", "title" : "Iliad", "totalValue" : 52.5 }
{ "author" : "Beale", "title" : "The World in Shadow", "totalValue" : 13.98 }
{ "author" : "Day", "title" : "SJWs Always Lie", "totalValue" : 13.98 }
{ "author" : "Yiannopoulos", "title" : "Dangerous", "totalValue" : 36 }
{ "author" : "Rand", "title" : "The Fountainhead", "totalValue" : 17.98 }
{ "author" : "Niven", "title" : "Tales of Known Space", "totalValue" : 41.94 }

Now create another field, a boolean field called "inStock" - this isn't too difficult, all we have to do is check that quantity > 0:

db.inventory.aggregate(
 {
  $project: 
  {
   title: 1, 
   author: 1,
   _id: 0,
   totalValue: {$multiply: ["$unitPrice", "$quantity"]},
   inStock: {$gt: ["$quantity", 0]}
  }
 }
);

The results are:

{ "title" : "Atlas Shrugged", "author" : "Rand", "totalValue" : 29.97, "inStock" : true }
{ "title" : "The Odyssey", "author" : "Homer", "totalValue" : 64.08, "inStock" : true }
{ "title" : "Ringworld", "author" : "Niven", "totalValue" : 71.97, "inStock" : true }
{ "title" : "Iliad", "author" : "Homer", "totalValue" : 52.5, "inStock" : true }
{ "title" : "The World in Shadow", "author" : "Beale", "totalValue" : 13.98, "inStock" : true }
{ "title" : "SJWs Always Lie", "author" : "Day", "totalValue" : 13.98, "inStock" : true }
{ "title" : "Dangerous", "author" : "Yiannopoulos", "totalValue" : 36, "inStock" : true }
{ "title" : "The Fountainhead", "author" : "Rand", "totalValue" : 17.98, "inStock" : true }
{ "title" : "Tales of Known Space", "author" : "Niven", "totalValue" : 41.94, "inStock" : true }

Now suppose that the "SJWs Always Lie" sells out...

db.inventory.update({title : "SJWs Always Lie"}, {$set: {quantity: 0}});

We rerun the query and we see that "SJWs Always Lie" is indeed out of stock:

{ "title" : "Atlas Shrugged", "author" : "Rand", "totalValue" : 29.97, "inStock" : true }
{ "title" : "The Odyssey", "author" : "Homer", "totalValue" : 64.08, "inStock" : true }
{ "title" : "Ringworld", "author" : "Niven", "totalValue" : 71.97, "inStock" : true }
{ "title" : "Iliad", "author" : "Homer", "totalValue" : 52.5, "inStock" : true }
{ "title" : "The World in Shadow", "author" : "Beale", "totalValue" : 13.98, "inStock" : true }
{ "title" : "SJWs Always Lie", "author" : "Day", "totalValue" : 0, "inStock" : false }
{ "title" : "Dangerous", "author" : "Yiannopoulos", "totalValue" : 36, "inStock" : true }
{ "title" : "The Fountainhead", "author" : "Rand", "totalValue" : 17.98, "inStock" : true }
{ "title" : "Tales of Known Space", "author" : "Niven", "totalValue" : 41.94, "inStock" : true }

 

The $project Operator and Accumulators

Now let us see how the accumulators work when they are part of a $projection stage. For this, we will switch to the class database:

use class

We can use the accumulation operators to get some statistics:

db.scores.aggregate([
 {
  $project: 
  {
   name:1, 
   _id: 0,
   quizTotal: {$sum: "$quizzes"}, 
   quizHigh: {$max: "$quizzes"},
   quizLow: {$min: "$quizzes"},
   quizAverage: {$avg: "$quizzes"},
   examTotal: {$sum: ["$midterm", "final"]}
  }
 }
]);

This shows how $sum, $max, $min, and $avg work on arrays, as well as how to use $sum to add two fields. The results are:

{ "name" : "Rob", "quizTotal" : 28, "quizHigh" : 10, "quizLow" : 8, "quizAverage" : 9.333333333333334, "examTotal" : 95 }
{ "name" : "Sam", "quizTotal" : 29, "quizHigh" : 10, "quizLow" : 9, "quizAverage" : 9.666666666666666, "examTotal" : 87 }
{ "name" : "Zoe", "quizTotal" : 14, "quizHigh" : 5, "quizLow" : 4, "quizAverage" : 4.666666666666667, "examTotal" : 75 }
{ "name" : "Rae", "quizTotal" : 26, "quizHigh" : 10, "quizLow" : 7, "quizAverage" : 8.666666666666666, "examTotal" : 89 }

 

The $sort Operator

The $sort operator works as expected. Let's try it with the previous query to see our first example of a 2-stage pipeline:

db.scores.aggregate([
       {
              $project: 
              {
                     name:1, 
                     _id: 0,
                     quizTotal: {$sum: "$quizzes"}, 
                     quizHigh: {$max: "$quizzes"},
                     quizLow: {$min: "$quizzes"},
                     quizAverage: {$avg: "$quizzes"},
                     examTotal: {$sum:["$midterm", "final"]}
              }
       }, 
       {
              $sort: 
              {
                     name: 1
              }
       }
]);

This pipeline has two stages, the $project stage and the $sort stage. The results are as follows:

{ "name" : "Rae", "quizTotal" : 26, "quizHigh" : 10, "quizLow" : 7, "quizAverage" : 8.666666666666666, "examTotal" : 89 }
{ "name" : "Rob", "quizTotal" : 28, "quizHigh" : 10, "quizLow" : 8, "quizAverage" : 9.333333333333334, "examTotal" : 95 }
{ "name" : "Sam", "quizTotal" : 29, "quizHigh" : 10, "quizLow" : 9, "quizAverage" : 9.666666666666666, "examTotal" : 87 }
{ "name" : "Zoe", "quizTotal" : 14, "quizHigh" : 5, "quizLow" : 4, "quizAverage" : 4.666666666666667, "examTotal" : 75 }

We will see how to use $project to rename fields in a later example.

 

The $match Operator

The $match operator does what it says: if a document matches the given condition(s), it is returned; otherwise it is not. So, the $match operator outputs either zero or one document for each incoming document.

For the first couple of examples, we switch to the "Queer Eye for the Straight Guy" database:

use qeftsg

Say we want to get the names and ages of all cast members who are over 40. We do it like this:

db.cast.aggregate([
 {$match: {age: {$gt: 40}}}, 
 {$project: {name:1, age:1, _id:0}}
]);

This two stage pipeline consists of a $match stage and a $project stage. The results are:

{ "name" : "Ted Allen", "age" : 52 }
{ "name" : "Carson Kressley", "age" : 47 }
{ "name" : "Kyan Douglas", "age" : 47 }
{ "name" : "Thom Filicia", "age" : 48 }

Let's make that into a 3-stage pipeline by adding a $sort stage:

db.cast.aggregate([
 {$match: {age: {$gt: 40}}}, 
 {$project: {name:1, age:1, _id:0}}, 
 {$sort: {name: 1}}
]);

This returns:

{ "name" : "Carson Kressley", "age" : 47 }
{ "name" : "Kyan Douglas", "age" : 47 }
{ "name" : "Ted Allen", "age" : 52 }
{ "name" : "Thom Filicia", "age" : 48 }

 

The $count Operator

The $count operator does exactly what it says: it returns the number of incoming docs in the current stage.

For example, to get the number of "Queer Eye" cast members that are over 40, add a forth stage to the previous pipeline:

db.cast.aggregate([
 {$match: {age: {$gt: 40}}}, 
 {$project: {name:1, age:1, _id:0}}, 
 {$sort: {name: 1}}, 
 {$count: "docCount"}
]);
This returns:
{ "docCount" : 4 }

Of course, this pipeline is too complicated: the $project and the $sort stages do not alter the number of records coming from the $match stage! So, the pipeline can be simplified into a two stage pipeline:

db.cast.aggregate([
 {$match: {age: {$gt: 40}}}, 
 {$count: "docCount"}
]);

And it returns the same thing:

{ "docCount" : 4 }

 

The $unwind Operator

The $unwind operator creates one output document for each element in an array field in the incoming document. For example, let's make one document for each of Kyan Douglas' specialties, and only show his name and that specialty:

db.cast.aggregate([
 {$match: {name: "Kyan Douglas"}}, 
 {$unwind: "$specialties"}, 
 {$project: {name:1, specialties:1, _id:0}}
]);

The result is:

{ "name" : "Kyan Douglas", "specialties" : "Hair" }
{ "name" : "Kyan Douglas", "specialties" : "Grooming" }
{ "name" : "Kyan Douglas", "specialties" : "Personal Hygiene" }
{ "name" : "Kyan Douglas", "specialties" : "Makeup" }

 

The $group Operator

The group operator takes the incoming documents and groups them by creating a new _id field. The value we provide to the _id field gives us the different groups. It also applies aggregation operators to each group. The output is one document per group.

The next examples will use the bookstore database, so we switch to it:

use bookstore

Here is how to get a list of distinct authors, sorted alphabetically:

db.inventory.aggregate([
 {$group: {_id: "$author"}}, 
 {$sort: {_id: 1}}
]);

This returns:

{ "_id" : "Beale" }
{ "_id" : "Day" }
{ "_id" : "Homer" }
{ "_id" : "Niven" }
{ "_id" : "Rand" }
{ "_id" : "Yiannopoulos" }

The grouping was done in the first stage, and the group names are stored in a new _id field.

Now, let's use $project to produce a new field called "author":

db.inventory.aggregate([
 {$group: {_id: "$author"}}, 
 {$project: {author: "$_id"}}
]);
This returns
{ "_id" : "Yiannopoulos", "author" : "Yiannopoulos" }
{ "_id" : "Rand", "author" : "Rand" }
{ "_id" : "Day", "author" : "Day" }
{ "_id" : "Beale", "author" : "Beale" }
{ "_id" : "Niven", "author" : "Niven" }
{ "_id" : "Homer", "author" : "Homer" }

Then, we drop the _id field:

db.inventory.aggregate([
 {$group: {_id: "$author"}}, 
 {$project: {author: "$_id", _id: 0}}
]);

and we get:

{ "author" : "Yiannopoulos" }
{ "author" : "Rand" }
{ "author" : "Day" }
{ "author" : "Beale" }
{ "author" : "Niven" }
{ "author" : "Homer" }

Thus we have essentially renamed the _id field.

Finally, we sort by author:

db.inventory.aggregate([
 {$group: {_id: "$author"}}, 
 {$project: {author: "$_id", _id: 0}}, 
 {$sort:{author: 1}}
]);
Results:
{ "author" : "Beale" }
{ "author" : "Day" }
{ "author" : "Homer" }
{ "author" : "Niven" }
{ "author" : "Rand" }
{ "author" : "Yiannopoulos" }

 

The $group Operator with Accumulators

For each group stage, we can apply accumulators. Accumulators work differently when in a group stage than in a project stage. One accumulator is $push, which creates an array field in the output document and pushes a specified value from each document onto that array. Let us get the titles of the books written by each author:

db.inventory.aggregate([
 {$group: {_id: "$author", books: {$push:"$title"}}}, 
 {$sort: {_id: 1}}
]);
This returns:
{ "_id" : "Beale", "books" : [ "The World in Shadow" ] }
{ "_id" : "Day", "books" : [ "SJWs Always Lie" ] }
{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Niven", "books" : [ "Ringworld", "Tales of Known Space" ] }
{ "_id" : "Rand", "books" : [ "Atlas Shrugged", "The Fountainhead" ] }
{ "_id" : "Yiannopoulos", "books" : [ "Dangerous" ] }

Like mentioned earlier, the $sum operator behaves differently in groups then it does in projects. The $sum operator maintains a running total when used in a group stage. We will use $sum to count the titles by each author by adding one to a field named numTitles for each record:

db.inventory.aggregate([
 {$group: {_id:"$author", numTitles: {$sum: 1}}}, 
 {$project: {author: "$_id", _id: 0, numTitles: 1}}, 
 {$sort: {author: 1}}
]);
Result:
{ "numTitles" : 1, "author" : "Beale" }
{ "numTitles" : 1, "author" : "Day" }
{ "numTitles" : 2, "author" : "Homer" }
{ "numTitles" : 2, "author" : "Niven" }
{ "numTitles" : 2, "author" : "Rand" }
{ "numTitles" : 1, "author" : "Yiannopoulos" }

Now let's put it all together...

We want to find the total price of all books in stock, plus the titles, grouped by author. We approach this as follows:

  1. Find the totalValue for each title (price * quantity)
  2. Group the docs by author, while keeping a running total of totalValue, and create the books array by pushing each title
  3. Sort by author ascending
  4. Return only author, titles, and authorValue
db.inventory.aggregate([
 {$project: {title: 1, author: 1, _id: 0, totalValue: {$multiply: ["$unitPrice", "$quantity"]}}},
 {$group: {_id:"$author", runningTotal: {$sum: "$totalValue"}, books: {$push:"$title"}}},
 {$project: {author: "$_id", _id: 0, runningTotal: 1, books: 1}},
 {$sort: {author: 1}}
]);
Results:
{ "runningTotal" : 13.98, "books" : [ "The World in Shadow" ], "author" : "Beale" }
{ "runningTotal" : 0, "books" : [ "SJWs Always Lie" ], "author" : "Day" }
{ "runningTotal" : 116.58, "books" : [ "The Odyssey", "Iliad" ], "author" : "Homer" }
{ "runningTotal" : 113.91, "books" : [ "Ringworld", "Tales of Known Space" ], "author" : "Niven" }
{ "runningTotal" : 47.95, "books" : [ "Atlas Shrugged", "The Fountainhead" ], "author" : "Rand" }
{ "runningTotal" : 36, "books" : [ "Dangerous" ], "author" : "Yiannopoulos" }

Note that we don't need that "totalValue" field in the final result. This allows us to drop the first projection and include it in the $group stage:

db.inventory.aggregate([
 {$group: {_id: "$author", runningTotal: {$sum: {$multiply: ["$unitPrice", "$quantity"]}}, books: {$push: "$title"}}},
 {$project: {author: "$_id", _id:0, runningTotal: 1, books: 1}},
 {$sort: {author: 1}}
]);
The result is the same.

 

Conclusion

So that is MongoDB's data aggregation pipeline. As you can imagine, pipelines be very memory-intensive when applied to large datasets. There are various methods to mitigate this, such as performing $match operations earlier rather than later. Ultimately, the allowDiskUse option will have MongoDB write intermediate results of the pipeline to temporary files, thus avoiding memory limitations.