College of Computer Science and Technology, Qingdao University, Qingdao266071, China.
School of Computer Science and Technology, Shandong University, Qingdao266237, China.
Department of Computer Science, the University of Hong Kong, Hong Kong999077, China.
School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan250353, China
Shandong Computer Science Center (National Supercomputer Center in Jinan), Jinan250014, China.
Show Author Information
Hide Author Information
Abstract
In the era of big data, sensor networks have been pervasively deployed, producing a large amount of data for various applications. However, because sensor networks are usually placed in hostile environments, managing the huge volume of data is a very challenging issue. In this study, we mainly focus on the data storage reliability problem in heterogeneous wireless sensor networks where robust storage nodes are deployed in sensor networks and data redundancy is utilized through coding techniques. To minimize data delivery and data storage costs, we design an algorithm to jointly optimize data routing and storage node deployment. The problem can be formulated as a binary nonlinear combinatorial optimization problem, and due to its NP-hardness, designing approximation algorithms is highly nontrivial. By leveraging the Markov approximation framework, we elaborately design an efficient algorithm driven by a continuous-time Markov chain to schedule the deployment of the storage node and corresponding routing strategy. We also perform extensive simulations to verify the efficacy of our algorithm.
C. Y.Ai, F. H.Li, and K. J.Zhang, Detecting isolate safe areas in wireless sensor monitoring systems, Tsinghua Science and Technology, vol. 22, no. 4, pp. 427-436, 2017.
R.Arcucci, C.Pain, and Y. K.Guo, Effective variational data assimilation in air-pollution prediction, Big Data Mining and Analytics, vol. 1, no. 4, pp. 297-307, 2018.
F.Li, J.Luo, S. Q.Xin, and Y.He, Autonomous deployment of wireless sensor networks for optimal coverage with directional sensing model, Computer Networks, vol. 108, pp. 120-132, 2016.
F.Li, J.Luo, W. P.Wang, and Y.He, Autonomous deployment for load balancing -surface coverage in sensor networks, IEEE Transactions on Wireless Communications, vol. 14, no. 1, pp. 279-293, 2015.
L.Xiang, J.Luo, C. W.Deng, A. V.Vasilakos, and W. S.Lin, DECA: Recovering fields of physical quantities from incomplete sensory data, in Proc. 9th Annu. IEEE Communications Society Conf. Sensor, Mesh and Ad Hoc Communications and Networks, Seoul, South Korea, 2012, pp. 182-190.
S. Y.Cheng, Z. P.Cai, J. Z.Li, and H.Gao, Extracting kernel dataset from big sensory data in wireless sensor networks, IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 4, pp. 813-827, 2017.
J.Li, S. Y.Cheng, Z. P.Cai, J. G.Yu, C. K.Wang, and D. M.Li, Approximate holistic aggregation in wireless sensor networks, ACM Transactions on Sensor Networks, vol. 13, no. 2, pp. 1-24, 2017.
Z. B.He, Z. P.Cai, S. Y.Cheng, and X. M.Wang, Approximate aggregation for tracking quantiles and range countings in wireless sensor networks, Theoretical Computer Science, vol. 607, pp. 381-390, 2015.
S. Y.Cheng, Z. P.Cai, and J. Z.Li, Curve query processing in wireless sensor networks, IEEE Transactions on Vehicular Technology, vol. 64, no. 11, pp. 5198-5209, 2015.
Q. Y.Meng, K.Wang, X. M.He, and M. Y.Guo, QoE-driven big data management in pervasive edge computing environment, Big Data Mining and Analytics, vol. 1, no. 3, pp. 222-233, 2018.
M.Albano and S.Chessa, Distributed erasure coding in data centric storage for wireless sensor networks, in Proc. 2009 IEEE Symp. Computers and Communications, Sousse, Tunisia, 2009, pp. 22-27.
A. G.Dimakis, V.Prabhakaran, and K.Ramchandran, Decentralized erasure codes for distributed networked storage, IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2809-2816, 2006.
F.Li, Y. B.Yang, Z. C.Chi, L. Y.Zhao, Y. W.Yang, and J.Luo, Trinity: Enabling self-sustaining WSNs indoors with energy-free sensing and networking, ACM Transactions on Embedded Computing Systems, vol. 17, no. 2, pp. 1-27, 2018.
S.Holzer and R.Wattenhofer, Optimal distributed all pairs shortest paths and applications, in Proc. 2012 ACM Symp. Principles of Distributed Computing, Madeira, Portugal, 2012, pp. 355-364.
M.Ghaffari and J.Li, Improved distributed algorithms for exact shortest paths, in Proc. 50th Annu. ACM SIGACT Symp. Theory of Computing, Los Angeles, CA, USA, 2018, pp. 431-444.
G.Maia, D. L.Guidoni, A. C.Viana, A. L. L.Aquino, R. A. F.Mini, and A. A. F.Loureiro, A distributed data storage protocol for heterogeneous wireless sensor networks with mobile sinks, Ad Hoc Networks, vol. 11, no. 5, pp. 1588-1602, 2013.
Q.Liu, K.Zhang, X. D.Liu, and N.Linge, Grid routing: An energy-efficient routing protocol for WSNs with single mobile sink, International Journal of Sensor Networks, vol. 25, no. 2, pp. 93-103, 2017.
M. H.Chen, S. C.Liew, Z. Y.Shao, and C. H.Kai, Markov approximation for combinatorial network optimization, IEEE Transactions on Information Theory, vol. 59, no. 10, pp. 6301-6327, 2013.
X.Zheng, Z. P.Cai, J. Z.Li, and H.Gao, A study on application-aware scheduling in wireless networks, IEEE Transactions on Mobile Computing, vol. 16, no. 7, pp. 1787-1801, 2017.
Z. P.Cai, G. H.Lin, and G. L.Xue, Improved approximation algorithms for the capacitated multicast routing problem, in Proc. of the 11th COCOON, Berlin, Germany, 2005, pp. 136-145.
Z. P.Cai, Z. Z.Chen, and G. H.Lin, A 3.4713-approximation algorithm for the capacitated multicast tree routing problem, Theoretical Computer Science, vol. 410, no. 52, pp. 5415-5424, 2009.
R. L.Deng, S. B.He, and J. M.Chen, An online algorithm for data collection by multiple sinks in wireless-sensor networks, IEEE Transactions on Control of Network Systems, vol. 5, no. 1, pp. 93-104, 2018.
J.Wang, J. Y.Cao, R. S.Sherratt, and J. H.Park, An improved ant colony optimization-based approach with mobile sink for wireless sensor networks, The Journal of Supercomputing, vol. 74, no. 12, pp. 6633-6645, 2018.
G. R.Li, S. C.Peng, C.Wang, J. W.Niu, and Y.Yuan, An energy-efficient data collection scheme using denoising autoencoder in wireless sensor networks, Tsinghua Science and Technology, vol. 24, no. 1, pp. 86-96, 2019.
F.Li, J.Luo, G. T.Shi, and Y.He, ART: Adaptive frequency-temporal co-existing of ZigBee and WiFi, IEEE Transactions on Mobile Computing, vol. 16, no. 3, pp. 662-674, 2017.
B.Sheng, Q.Li, and W. Z.Mao, Data storage placement in sensor networks, in Proc. 7th ACM Int. Symp. Mobile Ad Hoc Networking and Computing, Florence, Italy, 2006, pp. 344-355.
X. J.Yang, X. F.Tao, E.Dutkiewicz, X. J.Huang, Y. J.Guo, and Q. M.Cui, Energy-efficient distributed data storage for wireless sensor networks based on compressed sensing and network coding, IEEE Transactions on Wireless Communications, vol. 12, no. 10, pp. 5087-5099, 2013.
J.Luo, F.Li, and Y.He, 3DQS: Distributed data access in 3D wireless sensor networks, in Proc. 2011 IEEE Int. Conf. Communications, Kyoto, Japan, 2011, pp. 1-5.
C.Zhang, J.Luo, L.Xiang, F.Li, J. C.Lin, and Y.He, Harmonic quorum systems: Data management in 2D/3D wireless sensor networks with holes, in Proc. 2012 9th Annu. IEEE Communications Society Conf. Sensor, Mesh and Ad Hoc Communications and Networks, Seoul, South Korea, 2012, pp. 1-9.
R. F.Zeng, Y. X.Jiang, C.Lin, Y. F.Fan, and X. M.Shen, A distributed fault/intrusion-tolerant sensor data storage scheme based on network coding and homomorphic fingerprinting, IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 10, pp. 1819-1830, 2012.
J.Tian, T.Yan, and G. L.Wang, A network coding based energy efficient data backup in survivability-heterogeneous sensor networks, IEEE Transactions on Mobile Computing, vol. 14, no. 10, pp. 1992-2006, 2015.
G.D’Angelo, D.Diodati, A.Navarra, and C. M.Pinotti, The minimum -storage problem: Complexity, approximation, and experimental analysis, IEEE Transactions on Mobile Computing, vol. 15, no. 7, pp. 1797-1811, 2016.
Yang H, Li F, Yu D, et al. Reliable Data Storage in Heterogeneous Wireless Sensor Networks by Jointly Optimizing Routing and Storage Node Deployment. Tsinghua Science and Technology, 2021, 26(2): 230-238. https://doi.org/10.26599/TST.2019.9010061
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
3.3 Algorithm
To resolve the JUST-Approx problem (and thus to address the JUST problem with approximation optimality), we design a CTMC to drive the adaptive deployment of the storage nodes and the corresponding routing scheme, where denotes time and denotes the storage node deployment adopted at time . In the CTMC, state transitions (i.e., storage node redeployments) continuously happen at any point in time and are characterized by transition rates. We assume that, any two of the states have a transition link if and only if the two states have only one storage node deployed at different sites. Note that this assumption does not compromise the reachability between the states and thus the irreducibility of the Markov chain.
The CTMC has a unique stationary distribution, as it has finite states and is of irreducibility. In the Markov approximation framework, we design a time-reversible CTMC , such that the following detailed balance equation holds for the stationary distribution, i.e.,
where and denote the transition rates from to and the one from to , respectively. In this paper, we define the transition rates as
where is a constant. Recalling the definition of (for ) in Eq. (
17
), the above definition of apparently satisfies the balance in Eq. (
18
).
The CTMC can be implemented based on the two propositions in Ref. [
24
], as shown in the following:
Proposition 1 For each state , the sojourn time is a random variable obeying an exponential distribution parameterized by
where denotes the set of the states adjacent to .
Proposition 2 The transition probability from to is defined as
if considering the discrete-time counterpart of the CTMC.
According to the above propositions, the optimality of the CTMC-driven method can be explained from the following two folds: On one hand, we adopt the storage node deployments that have lower costs for a longer time; on the other hand, it is more likely to move to the lower-cost storage node deployment in each state transition.
Motivated by the two propositions, we can implement a CTMC to drive the adaptive deployment of the storage nodes by repeating the following steps: (1) Supposing denotes the current storage node deployment, we set up a countdown timer initialized by an exponential random value ; (2) once the timer expires, we transit to a new storage node deployment according to the transition probabilities defined in Eq. (
19
). Although we can achieve the optimal distribution by repeating the above two steps in a long run, this approach entails some central infrastructures.
We propose an algorithm (see Algorithm 2) to implement the CTMC in a distributed manner. By default, we initially deploy the storage nodes in a randomized fashion (as shown in Line 2). The lifetime of the sensor network can be divided into a sequence of epochs, and a new epoch begins when some storage nodes are redeployed. In each epoch, we first compute a routing strategy according to the current storage node deployment by the -NN policy (see Line 2). In every epoch, each storage node generates an exponential random variable , where
with representing the subset of feasible storage node deployments where only is placed at different sites from the one in (see Line 2). Each storage node then sets up a countdown timer initialized by (see Line 2). We assume , and storage node is the one whose timer expires first. When the timer expires, broadcasts a notification message, and the other storage nodes that received the message (before their timers expire) stop counting down. As shown in Line 8, we redeploy according to the probability distribution:
The complexity of the above algorithm mainly stems from computing , i.e., calculating the routing strategy and corresponding cost. The shortest paths from the sensor nodes to the sites and the induced cost can be computed by state-of-the-art polynomial-time algorithms (e.g., Dijkstra’s method with a time complexity of ). As the sensor nodes and the sites where we deploy the storage nodes are fixed, the data delivery paths and their costs can be preloaded in the storage nodes, especially as the storage nodes usually have sufficient memory space. When a storage node is redeployed, it notifies the other ones of its new position by broadcasting -length notification messages, such that all the other storage nodes can locally compute . Moreover, the storage nodes can forward the notifications to the sensor nodes by piggybacking the received new storage node deployment in the acknowledgments to the data deliveries. For each sensor node, once the notification is received from the storage nodes, the sensor node can quickly compute the cost of delivering a data unit in the redeployed storage node and then update the routing strategy according to the -NN policy.
In the above implementation, each of the storage nodes only needs to set up a local timer. One question is, does such a distributed implementation realize CTMC by respecting Propositions 1 and 21? To answer this question, we present the following theorem, which implies that relying on a set of local timers in a distributed manner instead of a centralized one does not compromise the optimality of in solving our JUST-Approx problem.
Theorem 2 In Algorithm 1, the sojourn time for each state is a random variable following an exponential distribution parameterized by rate in Eq. (
20
), and the transition probability from to () is (see Eq. (
21
)).
Proof For any storage node deployment , the sojourn time can be defined as
as shown in Algorithm 1. As for , we have for . In another word, is an exponentially distributed random variable with the rate defined by . Also, since and , we have . Therefore, for ,
For , the probability that is the redeployed storage node with can be calculated as . Let denote the set of the storage node deployments with deployed at different sites from the one in . Then, for , the probability that we have the transition (given that has to be redeployed) can be defined by (see Eq. (
22
)). Therefore, holds for .
10.26599/TST.2019.9010061.F1
Total cost of our algorithm under different values of .
10.26599/TST.2019.9010061.F2
Optimality gaps of our algorithm under different values of .