by Institute of Communications and Computer Systems: ICCS
Summary
Fully utilizing the big sensory data produced by smart-city/building sensor networks requires discovering hidden correlations in the corresponding datasets. To achieve this, CREDIT experiment suggested using enhanced community detection algorithms for data clustering of datasets obtained from very large smart-city/building infrastructures. Our scientific approach capitalizes on a recently developed framework for big network data analytics, namely Hyperbolic Data Analytics, which embeds network graphs in the hyperbolic space, computing distances between node pairs as hyperbolic coordinate distances and allowing more efficient computation of network metrics, such as the Edge-Betweenness Centrality (EBC). CREDIT took the framework one step ahead and modified a well-known community detection algorithm (Girvan-Newman, GN), by computing EBC in the hyperbolic space, speeding up the computations without significantly sacrificing accuracy. By first obtaining a data dependency graph on the collected sensory data, in CREDIT we mapped the problem of data clustering to a community detection one over a graph embedded in the hyperbolic space. We demonstrated its efficacy by analysis over benchmark datasets, as well as analysis of real multi-dimensional data collected by the FIESTA-IoT platform. CREDIT verified that the Hyperbolic GN (HGN) is capable of coping with large volumes of diverse sensory data, obtained from real, operational smart-city/building topologies, and at realistic scales, depicting its feasibility and quantifying its performance potentials.
Additionally, CREDIT exploited the developed analytics methodology in an application for reducing the energy cost associated with the sensing nodes, using data from real scenarios obtained from FIESTA-IoT. Through the analysis of the obtained datasets, it became possible to do so in a twofold way. First determine in an efficient manner which sampling instances can be omitted in a specific set of measurements defined by a sampling rate, thus conserving the associated energy for all employed sensors, and secondly, identify the sensors that exhibit practically identical behavior in the data clusters and use them either for monitoring load balancing or measurement prediction. In both cases, energy savings are gained by determining additional idle periods for sensors.
Access to FIESTA-IoT allowed us to validate the operation of HGN and quantify its performance potentials with real data in a short time period, contributing to the fast evolution of our research work. The role of the datasets obtained by FIESTA-IoT was key and aided in promoting our position in the state-of-the-art. At the same time, we were able to provide multi-facet feedback regarding the operation of FIESTA-IoT and potential improvements/extensions, hopefully contributing towards making FIESTA-IoT an attractive and promising venue for experimenting with multi-dimensional big data networking applications.