by Modio Computing P.C.
Summary
StreamingQualityAnalyser introduces data quality assessment of IoT data streams through probabilistic approaches that according to the literature exhibit strong accuracy.
Our strategic goal is that our new service will complement our Qiqbus commercial streaming analytics platform for IoT, enabling end-users to improve the quality of their data-driven IoT applications and services. This is realized via using our Qiqbus’ data quality assessment service in order to indicate sensors producing low-quality data, which might degrade the performance of end-user’s applications or services. To further contribute to our goal, we complement our new service with intuitive mapbased User Interfaces (UIs), which help end-users to identify possible flaws within their sensor substrate (based on Modio’s service’s low quality data indications) and accordingly undertake corrective actions (e.g. replace specific sensors with new ones or update the sensor topology in case that data are lost or delayed due to networking issues) in order to overall improve the performance of their data-driven applications and services.
In terms of our experiments industrial impact, Modio’s founding team, having a strong background in cloud, machine learning and data analytics technologies, has decided to grasp the business opportunity in the emerging IoT market including manufacturing, healthcare, smart cities, homes and cars. Our strategic goal is to innovate in the IoT domain with quality assessment techniques for streaming data as well as with novel privacy technologies ensuring that sensitive streaming data are always kept confidential. Both of these two features are not currently supported by existing IoT analytics commercial packages.
During the experiment, we validate the performance of the following innovative approaches to outliers’ detection specifically targeting time series:
- The ‘Generalized Autoregressive Conditional Heteroscedasticity’ (GARCH) algorithm that is known to operate with data streams that exhibit temporal locality, i.e. data whose range of uncertainty varies over time.
- A validated outlier detection approach implemented in R’s forecast package, tsouliers[1] , that identifies residuals by fitting a loess curve for non-seasonal data and via a periodic Seasonal and Trend decomposition using Loess (STL) for seasonal data.
- An outlier detection approach based on Long Short-Term Memory (LSTM) implementation of Recurrent Neural Networks (RNNs).
To validate the performance of the aforementioned algorithms, we leverage the FIESTA-IoT semantics for the following two purposes:
- For training our machine learning models, we use historical data which we gather via testbed-agnostic queries of datasets and data-streams
- For acquiring real-time data, we invoke semantic-enabled discovery of resources and onobservations.
The StreamingQualityAnalyser continuously retrieves data from the FIESTA-IoT platform and specifically from sensors located in the ADREAM, KETI and NITOS testbeds. The data is stored and then analyzed on demand to identify outliers using one of the aforementioned approaches above.
The results of the data quality analysis are rendered through a single page web application. The web application helps end-users to identify possible flaws within their sensor substrate (based on Modio’s service’s low quality data indications) and accordingly undertake corrective actions (e.g. replace specific sensors with new ones or update the sensor topology in case that data are lost or delayed due to networking issues) in order to overall improve the performance of their data-driven applications and services. The application is accessible on the public Internet and available for testing.
Finally, the implementation of our methods for sensor data quality assessment is committed to our Git repository and it is available to the FIESTA-IoT consortium only.
[1] https://github.com/robjhyndman/forecast