Getting Started with InfluxDB
Let’s learn about how to start using InfluxDB, an open source Time Series Database.
At the end of this post you will have a graph generated from your data, showing air quality change in Berlin:
What’s a time series database and what is it good for? #
Time series databases are ✨specially designed✨ databases for storing data points that change over time. For example:
- weather temperature
- air quality measurements
- stock prices
- and sensor metrics coming from IoT devices
What makes them better suited for storing this type of data than relational databases? That’s a very good question. This requires some knowledge of how databases actually store data on disk.
I’ll try to keep it short and simple.
Relational databases store each group of objects (tables) in separate places on disk. This makes finding related data expensive. You literally need to go all over the disk to find what you need. Indexes are used to speed up this process.
But, if you have a type of data that changes regularly, it kinda makes sense to keep those values close together on disk. And later, when you want to query them, because the data points are adjacent to each other, it would be very fast and cheap to read them! Voila, you have a time-series database. It’s an oversimplification, but it’ll do for this post 😅
There are more features that make up TSDBs, such as data compression, down-sampling, and support for time series aware queries. If you want to learn more about TSDBs, Influxdata has a nice explanation.
InfluxDB 2.0 #
Starting from version 2.0, InfluxDB is becoming more than just a database. It now includes Kapacitor for processing, and Chronograf for user interface. Add Telegraf to that, and you’ll get the TICK stack.
InfluxData, the company behind the TICK stack, offers InfluxDB Cloud 2.0, as a fully managed service. We’ll be using their free plan to start using InfluxDB fast.
I found the free plan only useful for trying out the product. If you want to get serious, you may want to spin up your own database (using the OSS version) – or pay for the upgrades.
Register on InfluxDB Cloud 2.0 and create a bucket #
Go ahead to sign up for a free InfluxDB Cloud account, and create your first bucket.
After signing up, go to Load Data > Buckets page (visible on the sidebar). And create a bucket.
We’ll use this bucket later to write data into.
Ways to write data to InfluxDB #
There are lots of ways to write data to InfluxDB:
- Telegraf - gathering data from various sources, such as Docker!
- InfluxDB 2.0 API
- Influx CLI
- InfluxDB UI
- Client libraries
- And other third-party tools
So that’s a lot of options to interact with InfluxDB.
Many of these tools use line protocol to describe a data point.
Line Protocol 📏 #
InfluxDB uses a line protocol to represent data point entries. Let’s see an example first, then break down the containing elements.
Here’s a single air quality measurement containing pollutant value and some metadata:
aq_measurement,location=DEBE010,city=Berlin,country=DE,parameter=pm10,unit=µg/m³,latitude=52.543041,longitude=13.349326 value=5.91 1591398000000000000
Line protocol is a text based format that describes:
- The measurement name
- Tag set (optional key=value pairs)
- Field set (key=value pairs, value can be one of float, integer, UInteger, string, boolean)
- And timestamp
- Optional - if omitted data will be entered as current time
- Nanoseconds precision by default
measurementName,tagKey=tagValue fieldKey="fieldValue" 1465839830100400200 --------------- --------------- --------------------- ------------------- | | | | Measurement Tag set Field set Timestamp
Write some data 💾 #
I’ve prepared 3 months of air quality measurements as line protocol for you. Let’s write them to our empty bucket.
From your InfluxDB Cloud dashboard, go to Load Data > Buckets, and find your bucket.
Click the Add Data button, and choose Line Protocol.
You have the option to upload a file, and enter data manually. Let’s choose Enter Manually.
Click Write Data. You should see “Data Written Successfully” message.
That’s it! Now your data is ready to be queried, further processed, and visualized!
Let’s see how we can explore and visualize this data.
Using the Data Explorer 📈 #
In a few steps you’ll have a chart showing the change in Berlin air quality over time.
- Go to the Data Explorer section. Make sure your bucket is selected in the
- In the first
Filterpane, select your measurement name
- Adjust the time window to be between March 1 2020 and June 1 2020.
- Now click Submit. This will update the graph view.
- You should have a graph that looks like the one below!
There are several lines, each representing data from one air quality station.
This query and visualization will not be preserved. If you want to keep it, click on the Save As button on the top right corner.
The graph will be added as a cell in the dashboard. Go to the Boards page and check it out.
What’s next #
That was a taste of what you can do with InfluxDB.
Now you might be saying, but I only input some prepared data, I want to keep feeding the database with fresh data! I hear you.
As I mentioned earlier in the post, there are numerous ways to feed the database. Just to mention a few:
- You can configure a Telegraf agent to collect data from external sources such as Docker, PostgreSQL, and Minecraft. There are over 200 Telegraf plugins available!
We didn’t cover the Flux query language, how to process data with InfluxDB tasks, and how to use client libraries to enter data. Why not continue using your InfluxDB cloud and try these out?
This post was born from my research notes while I was making an app that feeds InfluxDB from AWS Lambda. I’ll talk more about it in a later post.