Abstract
Sensor networks are deployed in many application areas nowadays ranging from environment monitoring, industrial monitoring, and agriculture monitoring to military battlefield sensing. The accuracy of sensor readings is without a doubt one of the most important measures to evaluate the quality of a sensor and its network. Therefore, this work is motivated to propose approaches that can detect and repair erroneous (i.e., dirty) data caused by inevitable system problems involving various hardware and software components of sensor networks. As information about a single event of interest in a sensor network is usually reflected in multiple measurement points, the inconsistency among multiple sensor measurements serves as an indicator for data quality problem. The focus of this paper is thus to study methods that can effectively detect and identify erroneous data among inconsistent observations based on the inherent structure of various sensor measurement series from a group of sensors. Particularly, we present three models to characterize the inherent data structures among sensor measurement traces and then apply these models individually to guide the error detection of a sensor network. First, we propose a multivariate Gaussian model which explores the correlated data changes of a group of sensors. Second, we present a Principal Component Analysis (PCA) model which captures the sparse geometric relationship among sensors in a network. The PCA model is motivated by the fact that not all sensor networks have clustered sensor deployment and clear data correlation structure. Further, if the sensor data show non-linear characteristic, a traditional PCA model can not capture the data attributes properly. Therefore, we propose a third model which utilizes kernel functions to map the original data into a high dimensional feature space and then apply PCA model on the mapped linearized data. All these three models serve the purpose of capturing the underlying phenomenon of a sensor network from its global view, and then guide the error detection to discover any anomaly observations. We conducted simulations for each of the proposed models, and evaluated the performance by deriving the Receiver Operating Characteristic (ROC) curves.