IoT use cases commonly share some patterns in how they ingest, process, and store data. First, these systems need to ingest bursts of data from device sensors of various locales. Next, these systems process and analyze streaming data to derive real-time insights. The data is stored in some persistent storage for further analysis.
In the following picture you can see IoT pipeline where IoT devices send data to Azure Iot Hub, and finally IoT data lands into Azure SQL Database:
Here you can find more information about one approach to store IoT data from IoT Hub to Azure SQL Database. In IoT pipelines we can notice two major paths from the source of data to the storage:
- Hot path where the most recent data that is coming from sensors need to be placed in storage layer as soon as possible. In this example, events are sent via Azure Event Hub and Azure Functions into Azure SQL Database.
- Cold path where IoT data is stored on some low-price storage layer, such as Azure Blob Storage. Cold data can be loaded later in some system for analysis to query the data.
Microsoft Azure offers various services that can be applied in IoT processing pipeline including Azure Event Hubs, Azure Stream Analytics, Azure Notification Hub, Azure Machine Learning, Azure HDInsight, Azure SQL Db, Azure Cosmos DB, and PowerBI. There are a variety of options for storing IoT data in Azure cloud such as:
- Azure Blob storage if you need to store a large amount of cold data with a low-prices. Stored IoT data can be on-demand loaded into some SQL Database or SQL DW to run analytic using standard queries or analyzed using some Azure Machine Learning service.
- Azure SQL Database or Azure SQL DW if you can parse incoming data and store it in the relational format.
- Azure SQL Database if you need to store semi-structured data formatted as JSON and you need to correlate IoT information with some existing relational data.
- Azure SQL Database or Azure Cosmos DB if you need to store semi-structured data formatted as JSON.
If you store IoT data in Azure SQL Database, you can use built-in native functions that enable you to parse and analyze IoT data collected from the devices. In this post we will see how you can store and analyze IoT data in Azure SQL Database.
In this post we will see how you can store and analyze IoT data stored in Azure SQL Database.
Storing semi-structured IoT data formatted as JSON in a SQL table
If your sensors send semi-structured data formatted as JSON, you don’t need complex relational model to store IoT data in SQL tables. The only thing that you need create is simple table that represents collection of JSON documents.
CREATE TABLE Telemetry ( _id bigint, sensorData nvarchar(max) )
This table is optimized for fast ingestion of large amount of data because it doesn’t have any index that will slow down ingestion.
CLUSTERED COLUMNSTORE INDEX is recommended option if you need to store a large amount of data and leverage fast batch mode analytic to analyze IoT data, because it provides high compression of data and fast analytic.
CLUSTERED COLUMNSTORE INDEX can compress sensor data up to 25x, meaning that you can store 25TB of sensor data in 1TB table. This might be perfect solution for storing a large amount of data in a single storage unit.
A table with CLUSTERED COLUMNSTORE index that stores IoT data is shown in the following sample:
CREATE TABLE Telemetry ( _id bigint IDENTITY, sensorData nvarchar(max), INDEX cci CLUSTERED COLUMNSTORE )
Once you store your IoT data in sensorData column, you can analyze data using standard SQL language.
Analyzing IoT Data
Azure SQL Database provides native SQL functions that you can use to parse sensor data stored as JSON in Azure SQL Db table and use the values in standard SQL query as any relational column. The following query shows how you can analyze sensor data from Telemetry table:
SELECT TOP 10 JSON_VALUE(data, '$.ip'), MAX(CAST(JSON_VALUE(data, '$.temperature') as float)) FROM Telemetry WHERE CAST(JSON_VALUE(data, '$.date') AS date) BETWEEN @start AND @end GROUP BY JSON_VALUE(data, '$.ip'), JSON_VALUE(data, '$.severity') HAVING AVG(CAST(JSON_VALUE(data, '$.temperature') as float) > 20 ORDER BY MAX(CAST(JSON_VALUE(data, '$.temperature') as float))
Azure SQL Database provides rich query language that you can use to analyze IoT data. This might be even considered as a simple query from the SQL point of view; however, if you store data in plain storage or NoSQL database, it might be difficult to do the same analysis without workarounds, and complex map/reduce transformations.
Although traditionally people would put IoT data in some NoSQL database due to the semi-structured nature of sensor data, they might find a lack of rich query language for data analysis as serious issue.
Azure SQL Database enables you to combine best features both from NoSQL and relational database worlds, store IoT data like in NoSQL database and still use all rich SQL language elements such as grouping, sorting, aggregation, window functions, that are either not supported or partially supported in NoSQL database.
Source: Microsoft Blog – SQL Server Storage Engine – Working with Azure IoT data in Azure SQL Database