By Nick Gambino
The code for this project is located here. It’s a public repository, so feel free to open a PR to contribute to the codebase if you feel like anything can be improved!
Over the past several months, our team has been working on a series of articles that describe our overall approach to tackling IoT related software projects, and the technologies we typically leverage. These articles are conceptual, and are meant to be technical enough to be useful for engineering teams, and detailed enough to serve as a reference point for our internal team. This series inspired us to create a step-by-step developer guide that demonstrates exactly how other engineers can get started in the IoT space.
You can refer to some of our past articles for more information about some of these tools, but the tech stack we will be using to build out this simple IoT system is:
One issue we found while learning these technologies, is that while one can find great documentation on each of these technologies individually, we felt like there aren’t as many resources on how to string them together end-to-end. So, in true Sudokrew fashion, we went ahead and built our own resource!
In this tutorial we will be building an end-to-end IoT system running on a local network. This IoT system will:
Docker installed on you local machine. A simple way to get this running is by downloading Docker Desktop here.
An IoT system is essentially data that gets collected from a series of processes, typically emitted from “things”, or devices. In order to demonstrate how this works, we will be building several services:
Initially, we were planning on programming a bunch of Raspberry Pi devices to serve as our IoT fleet of devices. However, as we got started we felt like this was a bit out of scope of the article, and was distracting from the tools we were featuring in our previous articles. So, instead of programming on the devices themselves, we will be “virtualizing” them in a Node.js runtime environment. The code used within this article can be downloaded and run on separate Raspberry Pi’s and will work the same way.
We have working with Influx time-series database services for several years now. As an open-source time-series database service, they are ideally suited to powering the majority of our backend IoT stack. Better yet, they offer a free managed service that we will be using during this demonstration (no credit card required!).
Influx provides both a fully managed service, as well as a more tailored database solution that a team can easily maintain on their own. As we mentioned in this article, we like to use the managed service in order to reduce overhead related to database maintenance and servicing.
IoT systems are ideally suited to take advantage of messaging queues. We will be using the open-source RabbitMQ for this demonstration, although most major cloud providers will have their own version (AWS SQS, Azure Service Bus), that will serve the same purpose. We go into more details about the benefits of using messaging queues here. Essentially, they are great for consuming massive amounts of streaming data and prioritizing this data by reducing race conditions from the emitted device data.
Using containers for deployment are great for when you need to scale services in the cloud. Docker is something that we use heavily in all of our development workflows, and highly recommend it. While we won’t be doing any stress testing in this article, we will be using some basic Docker orchestration to illustrate how many microservies can be orchestrated within a single "docker-compose.yml" file.
To get a better understanding of what we will be building, here is a system diagram of our IoT project:
We will first be building each microservice independently of one other, before orchestrating the stack with Docker.
As we referenced in previous articles, we like to use InfluxDB to handle the majority of our backend IoT services. There are other options out there, but time-series databases in general are perfectly suited to being used in IoT projects. You can read more about our deep dive into time-series databases here.
For this demo, we will be using a fully managed instance of InfluxDB. This is in part to keep things simple, and to highlight the system architecture, rather than focusing on database configuration files. InfluxDB provides a free tier that is perfect for this, so this where we will begin setup:
Navigate to https://www.influxdata.com/get-influxdb/ and click “Use it for Free”. Fill out the sign up form to create a free InfluxDB account. No credit card required!
InfluxDB refers to databases as “buckets”, so we will need to create a new bucket for us to collect our streaming IoT data:
In order to execute reads and write to the newly created bucket, we will need to generate a token to use within our applications:
1. In the left hand sidebar, press Data > Tokens > Generate New Token > All-access-token
2. In the dialog, create a new token, and name it "iot_capstone_token". After clicking "save", the "Tokens" screen should resemble:
Once these steps are done, we will have a running, fully managed instance of InfluxDB running in the cloud!
In a production system, we would have multiple devices out in the field that emit data to our hosted endpoint, typically pushing data via "POST" request to our consumer server. These devices could be anything from a smart home system, to a raspberry pi with a series of sensors attached. For the purposes of this demo, we will be focusing instead on the software portion of this part of the system, rather than illustrating how to program the hardware.
To do this, we will be "virtualizing" a fleet of devices that emit data to an endpoint that we will specify later on.
Let’s first create a directory for our fleet, and add a couple of libraries that will help us with mocking and sending out data for our system:
Then we will write some logic to create a "Device" class, that will represent an instance of an IoT device. This could theoretically be installed on a device, but for our purposes, we will just instantiate each device, and run the "sendStatus" function in an interval.
Each device will emit an integer that represents it’s status (0, 1, 2). These are arbitrary values, but they could map to various states like "healthy", "offline", or "overheated".
We’ll also need to create a ".env" file within the "fleet" directory. We’ll need to create a variable in this file so that our server can read the correct value privately:
This will ensure that our virtualized fleet will be sending http POST data to the correct endpoint that we will be using while we build our "producer" service.
Once we have our devices emitting data, we will need to intercept that data, and drop in onto a message queue for processing. In order to do this, we will be setting up a simple express server in Node.js that listens at a specified port, and drops the data onto our RabbitMQ messaging queue. Remember, as we detailed in this article, these services are meant to be simple by design.
First, let’s create a separate directory, and install a few dependencies we’ll need:
Then, in producer/index.js:
The server is listening on the "/status" endpoint, that our IoT devices are emitting data to via http "POST" requests. Our web server then picks up this data, and drops it onto the RabbitMQ messaging queue that we will be spinning up later.
Having this microservice handle a single, simple task helps to eliminate failures related to code bloat, and also helps to keep things simple to maintain. First, the Producer intercepts data packets being sent to our endpoint. Then we will be connecting to a RabbitMQ messaging queue then creating a channel to push to. On a successful connection, we will destructure the JSON payload and stringify it before turning it into the buffer data type format that RabbitMQ requires.
Let’s also log a message to the console to provide some visibility into the process as we monitor the data getting passed through.
Lastly, in a similar way to the fleet, the "producer" directory also needs to be configured with an ".env" file containing:
Now that our data is in our messaging queue, we will need to extract it from the queue and process it accordingly. In production systems, we can do several things with this data. As we mentioned previously in our data pipeline article, raw data payloads can get backed up to a robust storage service like Amazon S3, write the data into a standalone alerting service, or written to a database for analytics. In this demonstration, we will be writing this data to the InfluxDB instance that we set up earlier in this article, and plot the transactions on a line graph.
In the same way as the other services, let’s create a new directory to house our consumer service:
Then install the required dependencies, that will help us ingest data from RabbitMQ, and transform it into a format that InfluxDB can understand:
Here is the code for "consumer/index.js":
You’ll notice a similar structure to the "Producer" in that it connects to the same message queue. In addition, you’ll also see how we instantiate an instance of "InfluxDB" in order to provide a connection to the database.
We’ll need to "JSON.parse" the "msg" that gets ingested from the message queue, provide a timestamp, before finally writing the data to influx with the "writePoint" method.
You can get the url, token, org, and bucket from the url address bar of the InfluxDB instance. For instance:
Once you get these variables from Influx, we’ll need to set them as configuration variables in an ".env" file within the "consumer" directory:
RabbitMQ provides a really simple way to get the service running via Docker: https://www.rabbitmq.com/download.html. Make sure you have Docker installed on your machine, open a new terminal tab and run the following command:
This will get the message queue server running within a Docker container, without having to fully install the service onto your local machine. This is great for development purposes, as the container can simply be stopped when it’s not being used, freeing up lots of space on your machine.
You should see a similar output in your terminal after running that command:
We will eventually orchestrate these services together with Docker, but for now, let’s just start the node process for each in separate terminals:
If everything was done correctly, you should see a similar output in your various terminal tabs to the following:
Now that we have our separate IoT microservices running independently, we will also want to orchestrate the entire network with Docker. Stay tuned, as this article will be released soon!