Optimized Consensus for Blockchain in Internet of Things Networks via Reinforcement Learning

Yifei Zou; Zongjing Jin; Yanwei Zheng; Dongxiao Yu; Tian Lan

doi:10.26599/TST.2022.9010045

| Sign up

PDF (4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (5)

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Open Access

Optimized Consensus for Blockchain in Internet of Things Networks via Reinforcement Learning

Yifei Zou^¹, Zongjing Jin^¹, Yanwei Zheng^¹(), Dongxiao Yu^¹, Tian Lan^²

1Institute of Intelligent Computing, School of Computer Science and Technology, Shandong University, Qingdao 266237, China

2Department of Electrical and Computer Engineering, George Washington University, Washington, DC 20052, USA

Show Author Information

Abstract

Most blockchain systems currently adopt resource-consuming protocols to achieve consensus between miners; for example, the Proof-of-Work (PoW) and Practical Byzantine Fault Tolerant (PBFT) schemes, which have a high consumption of computing/communication resources and usually require reliable communications with bounded delay. However, these protocols may be unsuitable for Internet of Things (IoT) networks because the IoT devices are usually lightweight, battery-operated, and deployed in an unreliable wireless environment. Therefore, this paper studies an efficient consensus protocol for blockchain in IoT networks via reinforcement learning. Specifically, the consensus protocol in this work is designed on the basis of the Proof-of-Communication (PoC) scheme directly in a single-hop wireless network with unreliable communications. A distributed MultiAgent Reinforcement Learning (MARL) algorithm is proposed to improve the efficiency and fairness of consensus for miners in the blockchain system. In this algorithm, each agent uses a matrix to depict the efficiency and fairness of the recent consensus and tunes its actions and rewards carefully in an actor-critic framework to seek effective performance. Empirical results from the simulation show that the fairness of consensus in the proposed algorithm is guaranteed, and the efficiency nearly reaches a centralized optimal solution.

Keywords

consensus in blockchain Proof-of-Communication (PoC)MultiAgent Reinforcement Learning (MARL)Internet of Things (IoT) networks

References

[1]

W. P.

Wang

, Z. R.

Wang

, Z. F.

Zhou

, H. X.

Deng

, W. L.

Zhao

, C. Y.

Wang

, and Y. Z.

Guo

, Anomaly detection of industrial control systems based on transfer learning, Tsinghua Science and Technology, vol. 26, no. 6, pp. 821–832, 2021.

Crossref Google Scholar

[2]

Z. N.

Mohammad

, F.

Farha

, A. O. M.

Abuassba

, S. K.

Yang

, and F.

Zhou

, Access control and authorization in smart homes: A survey, Tsinghua Science and Technology, vol. 26, no. 6, pp. 906–917, 2021.

Crossref Google Scholar

[3]

X. L.

, H. Y.

, W. J.

, Z. J.

Liu

, L.

Yao

, and F.

Dai

, Artificial intelligence for edge service optimization in Internet of Vehicles: A survey, Tsinghua Science and Technology, vol. 27, no. 2, pp. 270–287, 2022.

Crossref Google Scholar

[4]

M. S.

Ali

, M.

Vecchio

, M.

Pincheira

, K.

Dolui

, F.

Antonelli

, and M. H.

Rehmani

, Applications of blockchains in the internet of things: A comprehensive survey, IEEE Commun. Surv. Tutorials, vol. 21, no. 2, pp. 1676–1717, 2019.

Crossref Google Scholar

[5]

Biswas

and V.

Muthukkumarasamy

, Securing smart cities using blockchain technology, in Proc. 18^th Int. Conf. on High Performance Computing and Communications; IEEE 14^th Int. Conf. on Smart City; IEEE 2^nd Int. Conf. on Data Science and Systems, Sydney, Australia, 2016, pp. 1392–1393.

Crossref Google Scholar

[6]

P. T. S.

Liu

, Medical record system using blockchain, big data and tokenization, in Proc. 18^th Int. Conf. on Information and Communications Security, Singapore, 2016, pp. 254–261.

Crossref Google Scholar

[7]

Yue

, H. J.

Wang

, D. W.

Jin

, M. Q.

, and W.

Jiang

, Healthcare data gateways: Found healthcare intelligence on blockchain with novel privacy risk control, J. Med. Syst., vol. 40, no. 10, p. 218, 2016.

Crossref Google Scholar

[8]

Satoshi

, Bitcoin: A peer-to-peer electronic cash system, https://nakamotoinstitute.org/bitcoin/, 2008.

[9]

Fisch

, J.

Bonnerau

, N.

Greco

, and J.

Benet

, Scaling proof-of-replication for filecoin mining, Technical report, Stanford University, Palo Alto, CA, USA, https://research.protocol.ai/publications/scaling-proof-of-replication-for-filecoin-mining/, 2018.

[10]

Bentov

, C.

Lee

, A.

Mizrahi

, and M.

Rosenfeld

, Proof of activity: Extending bitcoin’s proof of work via proof of stake, SIGMETRICS Perform. Eval. Rev., vol. 42, no. 3, pp. 34–37, 2014.

Crossref Google Scholar

[11]

Castro

and B.

Liskov

, Practical Byzantine fault tolerance, in Proc. 3^rd Symp. on Operating Systems Design and Implementation, New Orleans, LA, USA, 1999, pp. 173–186.

Google Scholar

[12]

M. F.

Yin

, D.

Malkhi

, M. K.

Reiter

, G. G.

Gueta

, and I.

Abraham

, HotStuff: BFT consensus with linearity and responsiveness, in Proc. 2019 ACM Symp. on Principles of Distributed Computing, Toronto, Canada, 2019, pp. 347–356.

Crossref Google Scholar

[13]

Y. F.

Zou

, M. H.

, J. G.

, F.

Zhao

, and X. Z.

Cheng

, A fast consensus for permissioned wireless blockchains, IEEE Internet Things J., .

Crossref Google Scholar

[14]

M. H.

, C. C.

Liu

, Y. F.

Zou

, F.

Zhao

, J. G.

, and X. Z.

Cheng

, wChain: A fast fault-tolerant blockchain protocol for multihop wireless networks, IEEE Trans. Wirel. Commun., vol. 20, no. 10, pp. 6915–6926, 2021.

Crossref Google Scholar

[15]

Yang

, Y. F.

Zou

, M. H.

, Y. C.

, D. X.

, and X. Z.

Cheng

, Distributed consensus for blockchains in internet-of-things networks, Tsinghua Science and Technology, vol. 27, no. 5, pp. 817–831, 2022.

Crossref Google Scholar

[16]

M. H.

, F.

Zhao

, Y. F.

Zou

, C. C.

Liu

, X. Z.

Cheng

, and F.

Dressler

, BLOWN: A blockchain protocol for single-hop wireless networks under adversarial SINR, IEEE Trans. Mob. Comput., .

Crossref Google Scholar

[17]

R. S.

Sutton

, Temporal credit assignment in reinforcement learning, PhD dissertation, Univ. Mass. Amherst, Amherst, MA, USA, 1984.

[18]

R. S.

Sutton

, D.

McAllester

, S.

Singh

, and Y.

Mansour

, Policy gradient methods for reinforcement learning with function approximation, in Proc. 12^th Int. Conf. on Neural Information Processing Systems, Denver, CO, USA, 2000, pp. 1057–1063.

Google Scholar

[19]

A. G.

Barto

, R. S.

Sutton

, and C. W.

Anderson

, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Trans. Syst. Man Cybern., vol. SMC-13, no. 5, pp. 834–846, 1983.

Crossref Google Scholar

[20]

Dziembowski

, S.

Faust

, V.

Kolmogorov

, and K.

Pietrzak

, Proofs of space, in Proc. 35^th Annu. Cryptology Conf., Santa Barbara, CA, USA, 2015, pp. 585–605.

Crossref Google Scholar

[21]

Miller

, A.

Juels

, E.

Shi

, B.

Parno

, and J.

Katz

, Permacoin: Repurposing bitcoin work for data preservation, in Proc. 2014 IEEE Symp. on Security and Privacy, Berkeley, CA, USA, 2014, pp. 475–490.

Crossref Google Scholar

[22]

Tan

, Multi-agent reinforcement learning: Independent vs. cooperative agents, in Machine Learning Proceedings 1993. Amsterdam, the Netherlands: Elsevier, 1993, pp. 330–337.

Crossref

[23]

Sunehag

, G.

Lever

, A.

Gruslys

, W. M.

Czarnecki

, V.

Zambaldi

, M.

Jaderberg

, M.

Lanctot

, N.

Sonnerat

, J. Z.

Leibo

, K.

Tuyls

, et al., Value-decomposition networks for cooperative multi-agent learning based on team reward, in Proc. 17^th Int. Conf. on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 2017, pp. 2085–2087.

Google Scholar

[24]

Rashid

, M.

Samvelyan

, C. S.

de Witt

, G.

Farquhar

, J. N.

Foerster

, and S.

Whiteson

, QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning, in Proc. 35^th Int. Conf. on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 2018, pp. 4292–4301.

Google Scholar

[25]

K. Q.

Zhang

, Z. R.

Yang

, and T.

Başar

, Multi-agent reinforcement learning: A selective overview of theories and algorithms, in Handbook of Reinforcement Learning and Control, K. G.

Vamvoudakis

, Y.

Wan

, F. L.

Lewis

, and D.

Cansever

, eds. Cham, Germany: Springer, 2021, pp. 321–384.

Crossref

[26]

Foerster

, G.

Farquhar

, T.

Afouras

, N.

Nardelli

, and S.

Whiteson

, Counterfactual multi-agent policy gradients, in Proc. 32^nd AAAI Conf. on Artificial Intelligence, Palo Alto, CA, USA, 2018, pp. 2974–2982.

Crossref Google Scholar

[27]

Lowe

, Y.

, A.

Tamar

, J.

Harb

, P.

Abbeel

, and I.

Mordatch

, Multi-agent actor-critic for mixed cooperative-competitive environments, in Proc. 31^st Int. Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6382–6393.

Google Scholar

[28]

Foerster

, N.

Nardelli

, G.

Farquhar

, T.

Afouras

, P. H. S.

Torr

, P.

Kohli

, and S.

Whiteson

, Stabilising experience replay for deep multi-agent reinforcement learning, in Proc. 34^th Int. Conf. on Machine Learning, Sydney, Australia, 2017, pp. 1146–1155.

Google Scholar

[29]

Tampuu

, T.

Matiisen

, D.

Kodelja

, I.

Kuzovkin

, K.

Korjus

, J.

Aru

, J.

Aru

, and R.

Vicente

, Multiagent cooperation and competition with deep reinforcement learning, PLoS One, vol. 12, no. 4, p. e0172395, 2017.

Crossref Google Scholar

[30]

Lazaridou

, A.

Peysakhovich

, and M.

Baroni

, Multi-agent cooperation and the emergence of (natural) language, arXiv preprint arXiv: 1612.07182, 2017.

Google Scholar

[31]

Mordatch

and P.

Abbeel

, Emergence of grounded compositional language in multi-agent populations, in Proc. 32^nd AAAI Conf. on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conf. and 8^th AAAI Symp. on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2018, p. 183.

Crossref Google Scholar

[32]

Bansal

, J.

Pachocki

, S.

Sidor

, I.

Sutskever

, and I.

Mordatch

, Emergent complexity via multi-agent competition, present at Proc. 6^th Int. Conf. on Learning Representations, Vancouver, Canada, 2018.

Google Scholar

[33]

Raghu

, A.

Irpan

, J.

Andreas

, R.

Kleinberg

, Q. V.

, and J. M.

Kleinberg

, Can deep reinforcement learning solve Erdos-Selfridge-spencer games? in Proc. 35^th Int. Conf. on Machine Learning, Stockholmsmässan, Sweden, 2018, pp. 4235–4243.

Google Scholar

[34]

J. Z.

Leibo

, V.

Zambaldi

, M.

Lanctot

, and J.

Marecki

, Multi-agent reinforcement learning in sequential social dilemmas, in Proc. 16^th Conf. on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil, 2017, pp. 464–473.

Google Scholar

[35]

Lerer

and A.

Peysakhovich

, Maintaining cooperation in complex social dilemmas using deep reinforcement learning, arXiv preprint arXiv: 1707.01068, 2018.

Google Scholar

[36]

J. Z.

Leibo

, J.

Perolat

, E.

Hughes

, S.

Wheelwright

, A. H.

Marblestone

, E.

Duéñez-Guzmán

, P.

Sunehag

, I.

Dunning

, and T.

Graepel

, Malthusian reinforcement learning, in Proc. 18^th Int. Conf. on Autonomous Agents and MultiAgent Systems, Montreal, Canada, 2019, pp. 1099–1107.

Google Scholar

[37]

Y. C.

, Team decision theory and information structures, Proc. IEEE, vol. 68, no. 6, pp. 644–654, 1980.

Crossref Google Scholar

[38]

X. F.

Wang

and T.

Sandholm

, Reinforcement learning to play an optimal Nash equilibrium in team Markov games, in Proc. 15^th Int. Conf. on Neural Information Processing Systems, Cambridge, MA, USA, 2002, pp. 1603–1610.

Google Scholar

[39]

Yoshikawa

, Decomposition of dynamic team decision problems, IEEE Trans. Autom. Control, vol. 23, no. 4, pp. 627–632, 1978.

Crossref Google Scholar

[40]

M. L.

Littman

, Markov games as a framework for multi-agent reinforcement learning, in Machine Learning Proceedings 1994, W. W.

Cohen

and H.

Hirshpp

, eds. Amsterdam, the Netherlands: Elsevier, 1994, pp. 157–163.

Crossref

[41]

J. L.

and M. P.

Wellman

, Nash q-learning for general-sum stochastic games, J. Mach. Learn. Res., vol. 4, pp. 1039–1069, 2003.

Google Scholar

[42]

M. G.

Lagoudakis

and R.

Parr

, Learning in zero-sum team Markov games using factored value functions, in Proc. 15^th Int. Conf. on Neural Information Processing Systems, Cambridge, MA, USA, 2002, pp. 1659–1666.

Google Scholar

[43]

M. L.

Littman

, Friend-or-foe Q-learning in general-sum games, in Proc. 18^th Int. Conf. on Machine Learning, San Francisco, CA, USA, 2001, pp. 322–328.

Google Scholar

[44]

Dwork

, N.

Lynch

, and L.

Stockmeyer

, Consensus in the presence of partial synchrony (Preliminary version), in Proc. 3rd Annu. ACM Symp. on Principles of Distributed Computing, Vancouver British Columbia, Canada, 1984, pp. 103–118.

Google Scholar

[45]

Marbach

and J. N.

Tsitsiklis

, Simulation-based optimization of Markov reward processes: Implementation issues, in Proc. 38^th IEEE Conf. on Decision and Control, Phoenix, AZ, USA, 1999, pp. 1769–1774.

Google Scholar

[46]

R. S.

Sutton

and A. G.

Barto

, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018.

[47]

R. J.

Williams

, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn., vol. 8, nos. 3&4, pp. 229–256, 1992.

Crossref Google Scholar

[48]

R. S.

Sutton

, Generalization in reinforcement learning: Successful examples using sparse coarse coding, in Proc. 8^th Int. Conf. on Neural Information Processing Systems, Denver, CO, USA, 1995, pp. 1038–1044.

Google Scholar

Tsinghua Science and Technology

Volume 28 Issue 6,
December 2023

Pages 1009-1022

DOI: 10.26599/TST.2022.9010045

Cite this article:

Zou Y, Jin Z, Zheng Y, et al. Optimized Consensus for Blockchain in Internet of Things Networks via Reinforcement Learning. Tsinghua Science and Technology, 2023, 28(6): 1009-1022. https://doi.org/10.26599/TST.2022.9010045