AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
View PDF
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Optimized Consensus for Blockchain in Internet of Things Networks via Reinforcement Learning

Institute of Intelligent Computing, School of Computer Science and Technology, Shandong University, Qingdao 266237, China
Department of Electrical and Computer Engineering, George Washington University, Washington, DC 20052, USA
Show Author Information

Abstract

Most blockchain systems currently adopt resource-consuming protocols to achieve consensus between miners; for example, the Proof-of-Work (PoW) and Practical Byzantine Fault Tolerant (PBFT) schemes, which have a high consumption of computing/communication resources and usually require reliable communications with bounded delay. However, these protocols may be unsuitable for Internet of Things (IoT) networks because the IoT devices are usually lightweight, battery-operated, and deployed in an unreliable wireless environment. Therefore, this paper studies an efficient consensus protocol for blockchain in IoT networks via reinforcement learning. Specifically, the consensus protocol in this work is designed on the basis of the Proof-of-Communication (PoC) scheme directly in a single-hop wireless network with unreliable communications. A distributed MultiAgent Reinforcement Learning (MARL) algorithm is proposed to improve the efficiency and fairness of consensus for miners in the blockchain system. In this algorithm, each agent uses a matrix to depict the efficiency and fairness of the recent consensus and tunes its actions and rewards carefully in an actor-critic framework to seek effective performance. Empirical results from the simulation show that the fairness of consensus in the proposed algorithm is guaranteed, and the efficiency nearly reaches a centralized optimal solution.

References

[1]
W. P. Wang, Z. R. Wang, Z. F. Zhou, H. X. Deng, W. L. Zhao, C. Y. Wang, and Y. Z. Guo, Anomaly detection of industrial control systems based on transfer learning, Tsinghua Science and Technology, vol. 26, no. 6, pp. 821–832, 2021.
[2]
Z. N. Mohammad, F. Farha, A. O. M. Abuassba, S. K. Yang, and F. Zhou, Access control and authorization in smart homes: A survey, Tsinghua Science and Technology, vol. 26, no. 6, pp. 906–917, 2021.
[3]
X. L. Xu, H. Y. Li, W. J. Xu, Z. J. Liu, L. Yao, and F. Dai, Artificial intelligence for edge service optimization in Internet of Vehicles: A survey, Tsinghua Science and Technology, vol. 27, no. 2, pp. 270–287, 2022.
[4]
M. S. Ali, M. Vecchio, M. Pincheira, K. Dolui, F. Antonelli, and M. H. Rehmani, Applications of blockchains in the internet of things: A comprehensive survey, IEEE Commun. Surv. Tutorials, vol. 21, no. 2, pp. 1676–1717, 2019.
[5]
K. Biswas and V. Muthukkumarasamy, Securing smart cities using blockchain technology, in Proc. 18th Int. Conf. on High Performance Computing and Communications; IEEE 14th Int. Conf. on Smart City; IEEE 2nd Int. Conf. on Data Science and Systems, Sydney, Australia, 2016, pp. 1392–1393.
[6]
P. T. S. Liu, Medical record system using blockchain, big data and tokenization, in Proc. 18th Int. Conf. on Information and Communications Security, Singapore, 2016, pp. 254–261.
[7]
X. Yue, H. J. Wang, D. W. Jin, M. Q. Li, and W. Jiang, Healthcare data gateways: Found healthcare intelligence on blockchain with novel privacy risk control, J. Med. Syst., vol. 40, no. 10, p. 218, 2016.
[8]
N. Satoshi, Bitcoin: A peer-to-peer electronic cash system, https://nakamotoinstitute.org/bitcoin/, 2008.
[9]
B. Fisch, J. Bonnerau, N. Greco, and J. Benet, Scaling proof-of-replication for filecoin mining, Technical report, Stanford University, Palo Alto, CA, USA, https://research.protocol.ai/publications/scaling-proof-of-replication-for-filecoin-mining/, 2018.
[10]
I. Bentov, C. Lee, A. Mizrahi, and M. Rosenfeld, Proof of activity: Extending bitcoin’s proof of work via proof of stake, SIGMETRICS Perform. Eval. Rev., vol. 42, no. 3, pp. 34–37, 2014.
[11]
M. Castro and B. Liskov, Practical Byzantine fault tolerance, in Proc. 3rd Symp. on Operating Systems Design and Implementation, New Orleans, LA, USA, 1999, pp. 173–186.
[12]
M. F. Yin, D. Malkhi, M. K. Reiter, G. G. Gueta, and I. Abraham, HotStuff: BFT consensus with linearity and responsiveness, in Proc. 2019 ACM Symp. on Principles of Distributed Computing, Toronto, Canada, 2019, pp. 347–356.
[13]
Y. F. Zou, M. H. Xu, J. G. Yu, F. Zhao, and X. Z. Cheng, A fast consensus for permissioned wireless blockchains, IEEE Internet Things J., .
[14]
M. H. Xu, C. C. Liu, Y. F. Zou, F. Zhao, J. G. Yu, and X. Z. Cheng, wChain: A fast fault-tolerant blockchain protocol for multihop wireless networks, IEEE Trans. Wirel. Commun., vol. 20, no. 10, pp. 6915–6926, 2021.
[15]
L. Yang, Y. F. Zou, M. H. Xu, Y. C. Xu, D. X. Yu, and X. Z. Cheng, Distributed consensus for blockchains in internet-of-things networks, Tsinghua Science and Technology, vol. 27, no. 5, pp. 817–831, 2022.
[16]
M. H. Xu, F. Zhao, Y. F. Zou, C. C. Liu, X. Z. Cheng, and F. Dressler, BLOWN: A blockchain protocol for single-hop wireless networks under adversarial SINR, IEEE Trans. Mob. Comput., .
[17]
R. S. Sutton, Temporal credit assignment in reinforcement learning, PhD dissertation, Univ. Mass. Amherst, Amherst, MA, USA, 1984.
[18]
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, in Proc. 12th Int. Conf. on Neural Information Processing Systems, Denver, CO, USA, 2000, pp. 1057–1063.
[19]
A. G. Barto, R. S. Sutton, and C. W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Trans. Syst. Man Cybern., vol. SMC-13, no. 5, pp. 834–846, 1983.
[20]
S. Dziembowski, S. Faust, V. Kolmogorov, and K. Pietrzak, Proofs of space, in Proc. 35th Annu. Cryptology Conf., Santa Barbara, CA, USA, 2015, pp. 585–605.
[21]
A. Miller, A. Juels, E. Shi, B. Parno, and J. Katz, Permacoin: Repurposing bitcoin work for data preservation, in Proc. 2014 IEEE Symp. on Security and Privacy, Berkeley, CA, USA, 2014, pp. 475–490.
[22]
M. Tan, Multi-agent reinforcement learning: Independent vs. cooperative agents, in Machine Learning Proceedings 1993. Amsterdam, the Netherlands: Elsevier, 1993, pp. 330–337.
DOI
[23]
P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, et al., Value-decomposition networks for cooperative multi-agent learning based on team reward, in Proc. 17th Int. Conf. on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 2017, pp. 2085–2087.
[24]
T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, and S. Whiteson, QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning, in Proc. 35th Int. Conf. on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 2018, pp. 4292–4301.
[25]
K. Q. Zhang, Z. R. Yang, and T. Başar, Multi-agent reinforcement learning: A selective overview of theories and algorithms, in Handbook of Reinforcement Learning and Control, K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, eds. Cham, Germany: Springer, 2021, pp. 321–384.
DOI
[26]
J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, Counterfactual multi-agent policy gradients, in Proc. 32nd AAAI Conf. on Artificial Intelligence, Palo Alto, CA, USA, 2018, pp. 2974–2982.
[27]
R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, in Proc. 31st Int. Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6382–6393.
[28]
J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. S. Torr, P. Kohli, and S. Whiteson, Stabilising experience replay for deep multi-agent reinforcement learning, in Proc. 34th Int. Conf. on Machine Learning, Sydney, Australia, 2017, pp. 1146–1155.
[29]
A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, Multiagent cooperation and competition with deep reinforcement learning, PLoS One, vol. 12, no. 4, p. e0172395, 2017.
[30]
A. Lazaridou, A. Peysakhovich, and M. Baroni, Multi-agent cooperation and the emergence of (natural) language, arXiv preprint arXiv: 1612.07182, 2017.
[31]
I. Mordatch and P. Abbeel, Emergence of grounded compositional language in multi-agent populations, in Proc. 32nd AAAI Conf. on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conf. and 8th AAAI Symp. on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2018, p. 183.
[32]
T. Bansal, J. Pachocki, S. Sidor, I. Sutskever, and I. Mordatch, Emergent complexity via multi-agent competition, present at Proc. 6th Int. Conf. on Learning Representations, Vancouver, Canada, 2018.
[33]
M. Raghu, A. Irpan, J. Andreas, R. Kleinberg, Q. V. Le, and J. M. Kleinberg, Can deep reinforcement learning solve Erdos-Selfridge-spencer games? in Proc. 35th Int. Conf. on Machine Learning, Stockholmsmässan, Sweden, 2018, pp. 4235–4243.
[34]
J. Z. Leibo, V. Zambaldi, M. Lanctot, and J. Marecki, Multi-agent reinforcement learning in sequential social dilemmas, in Proc. 16th Conf. on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil, 2017, pp. 464–473.
[35]
A. Lerer and A. Peysakhovich, Maintaining cooperation in complex social dilemmas using deep reinforcement learning, arXiv preprint arXiv: 1707.01068, 2018.
[36]
J. Z. Leibo, J. Perolat, E. Hughes, S. Wheelwright, A. H. Marblestone, E. Duéñez-Guzmán, P. Sunehag, I. Dunning, and T. Graepel, Malthusian reinforcement learning, in Proc. 18th Int. Conf. on Autonomous Agents and MultiAgent Systems, Montreal, Canada, 2019, pp. 1099–1107.
[37]
Y. C. Ho, Team decision theory and information structures, Proc. IEEE, vol. 68, no. 6, pp. 644–654, 1980.
[38]
X. F. Wang and T. Sandholm, Reinforcement learning to play an optimal Nash equilibrium in team Markov games, in Proc. 15th Int. Conf. on Neural Information Processing Systems, Cambridge, MA, USA, 2002, pp. 1603–1610.
[39]
T. Yoshikawa, Decomposition of dynamic team decision problems, IEEE Trans. Autom. Control, vol. 23, no. 4, pp. 627–632, 1978.
[40]
M. L. Littman, Markov games as a framework for multi-agent reinforcement learning, in Machine Learning Proceedings 1994, W. W. Cohen and H. Hirshpp, eds. Amsterdam, the Netherlands: Elsevier, 1994, pp. 157–163.
DOI
[41]
J. L. Hu and M. P. Wellman, Nash q-learning for general-sum stochastic games, J. Mach. Learn. Res., vol. 4, pp. 1039–1069, 2003.
[42]
M. G. Lagoudakis and R. Parr, Learning in zero-sum team Markov games using factored value functions, in Proc. 15th Int. Conf. on Neural Information Processing Systems, Cambridge, MA, USA, 2002, pp. 1659–1666.
[43]
M. L. Littman, Friend-or-foe Q-learning in general-sum games, in Proc. 18th Int. Conf. on Machine Learning, San Francisco, CA, USA, 2001, pp. 322–328.
[44]
C. Dwork, N. Lynch, and L. Stockmeyer, Consensus in the presence of partial synchrony (Preliminary version), in Proc. 3rd Annu. ACM Symp. on Principles of Distributed Computing, Vancouver British Columbia, Canada, 1984, pp. 103–118.
[45]
P. Marbach and J. N. Tsitsiklis, Simulation-based optimization of Markov reward processes: Implementation issues, in Proc. 38th IEEE Conf. on Decision and Control, Phoenix, AZ, USA, 1999, pp. 1769–1774.
[46]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018.
[47]
R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn., vol. 8, nos. 3&4, pp. 229–256, 1992.
[48]
R. S. Sutton, Generalization in reinforcement learning: Successful examples using sparse coarse coding, in Proc. 8th Int. Conf. on Neural Information Processing Systems, Denver, CO, USA, 1995, pp. 1038–1044.
Tsinghua Science and Technology
Pages 1009-1022
Cite this article:
Zou Y, Jin Z, Zheng Y, et al. Optimized Consensus for Blockchain in Internet of Things Networks via Reinforcement Learning. Tsinghua Science and Technology, 2023, 28(6): 1009-1022. https://doi.org/10.26599/TST.2022.9010045

487

Views

58

Downloads

2

Crossref

1

Web of Science

2

Scopus

0

CSCD

Received: 22 July 2022
Revised: 29 August 2022
Accepted: 04 October 2022
Published: 28 July 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return