| Sign up

PDF (10.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Reinforcement Learning-Based Dynamic Order Recommendation for On-Demand Food Delivery

Xing Wang^¹, Ling Wang^¹(), Chenxin Dong^², Hao Ren^³, Ke Xing^³

1Department of Automation, Tsinghua University, Beijing 100080, China

2School of Mechanical and Automotive Engineering, Qingdao Hengxing University of Science and Technology, Qingdao 266100, China

3Meituan, Beijing 100015, China

Show Author Information

Abstract

On-demand food delivery (OFD) is gaining more and more popularity in modern society. As a kernel order assignment manner in OFD scenario, order recommendation directly influences the delivery efficiency of the platform and the delivery experience of riders. This paper addresses the dynamism of the order recommendation problem and proposes a reinforcement learning solution method. An actor-critic network based on long short term memory (LSTM) unit is designed to deal with the order-grabbing conflict between different riders. Besides, three rider sequencing rules are accordingly proposed to match different time steps of the LSTM unit with different riders. To test the performance of the proposed method, extensive experiments are conducted based on real data from Meituan delivery platform. The results demonstrate that the proposed reinforcement learning based order recommendation method can significantly increase the number of grabbed orders and reduce the number of order-grabbing conflicts, resulting in better delivery efficiency and experience for the platform and riders.

Keywords

on-demand food delivery order recommendation reinforcement learning actor-critic network long short term memory

References

[1]

L.

Li

, Y.

Chai

, and Y.

Liu

, Evolution of e-commerce patterns: Model and economic analysis, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 52, no. 11, pp. 1524–1529, 2012.

[2]

X.

Liu

and Y.

Li

, VRP model and a heuristic algorithm for across-region distribution in the environment of E-commerce, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 46, pp. 1014–1018, 2006.

[3]

A.

Seghezzi

, M.

Winkenbach

, and R.

Mangiaracina

, On-demand food delivery: A systematic literature review, Int. J. Logist. Manag., .

Crossref Google Scholar

[4]

Meituan, Homepage of Meituan delivery, https://peisong.meituan.com/about, 2023.

[5]

C.

Li

, and L.

Miao

, Planning methods of regional logistics systems and logistics parks, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 44, no. 3, pp. 398–401, 2004.

[6]

X.

Wang

, L.

Wang

, C.

Dong

, H.

Ren

, and K.

Xing

, An online deep reinforcement learning-based order recommendation framework for rider-centered food delivery system, IEEE Trans. Intell. Transp. Syst., vol. 24, no. 5, pp. 5640–5654, 2023.

Crossref Google Scholar

[7]

E.

Jiang

, L.

Wang

, and J.

Wang

, Decomposition-based multi-objective optimization for energy-aware distributed hybrid flow shop scheduling with multiprocessor tasks, Tsinghua Science and Technology, vol. 26, no. 5, pp. 646–663, 2021.

Crossref Google Scholar

[8]

D.

Reyes

, A.

Erera

, M.

Savelsbergh

, S.

Sahasrabudhe

, and R. J.

O’Neil

, The meal delivery routing problem, https://optimization-online.org/?p=15139, 2018.

[9]

B.

Yildiz

and M.

Savelsbergh

, Provably high-quality solutions for the meal delivery routing problem, Transp. Sci., vol. 53, no. 5, pp. 1372–1388, 2019.

Crossref Google Scholar

[10]

M. W.

Ulmer

, B. W.

Thomas

, A. M.

Campbell

, and N.

Woyak

, The restaurant meal delivery problem: Dynamic pickup and delivery with deadlines and random ready times, Transp. Sci., vol. 55, no. 1, pp. 75–100, 2021.

Crossref Google Scholar

[11]

S.

Liu

, L.

He

, and Z. J. M.

Shen

, On-time last-mile delivery: Order assignment with travel-time predictors, Manag. Sci., vol. 67, no. 7, pp. 4095–4119, 2021.

Crossref Google Scholar

[12]

J. F.

Chen

, L.

Wang

, H.

Ren

, J.

Pan

, S.

Wang

, J.

Zheng

, and X.

Wang

, An imitation learning-enhanced iterated matching algorithm for on-demand food delivery, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10, pp. 18603–18619, 2022.

Crossref Google Scholar

[13]

Z.

Steever

, M.

Karwan

, and C.

Murray

, Dynamic courier routing for a food delivery service, Comput. Oper. Res., vol. 107, pp. 173–188, 2019.

Crossref Google Scholar

[14]

S.

Paul

, S.

Rathee

, J.

Matthew

, and K. M.

Adusumilli

, An optimization framework for on-demand meal delivery system, in Proc. 2020 IEEE Int. Conf. Industrial Engineering and Engineering Management (IEEM), Singapore, 2020, pp. 822–826.

Crossref Google Scholar

[15]

M.

Joshi

, A.

Singh

, S.

Ranu

, A.

Bagchi

, P.

Karia

, and P.

Kala

, Batching and matching for food delivery in dynamic road networks, in Proc. 2021 IEEE 37th Int. Conf. Data Engineering (ICDE), Chania, Greece, 2021, pp. 2099–2104.

Crossref Google Scholar

[16]

H.

Jahanshahi

, A.

Bozanta

, M.

Cevik

, E. M.

Kavuk

, A.

Tosun

, S. B.

Sonuc

, B.

Kosucu

, and A.

Başar

, A deep reinforcement learning approach for the meal delivery problem, Knowl. Based Syst., vol. 243, p. 108489, 2022.

Crossref Google Scholar

[17]

L.

Wang

, Z.

Pan

, and J.

Wang

, A review of reinforcement learning based intelligent optimization for manufacturing scheduling, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 257–270, 2021.

Crossref Google Scholar

[18]

G.

Shani

, D.

Heckerman

, and R. I.

Brafman

, An MDP-based recommender system, J. Mach. Lear. Res., vol. 6, no. 43, pp. 1265–1295, 2005.

[19]

N.

Taghipour

and A.

Kardan

, A hybrid web recommender system based on Q-learning, in Proc. 2008 ACM Symp. on Applied Computing, Fortaleza, Brazil, 2008, pp. 1164–1168.

Crossref Google Scholar

[20]

X.

Bai

, J.

Guan

, and H.

Wang

, A model-based reinforcement learning with adversarial training for online recommendation, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 10735–10746.

[21]

X.

Xin

, A.

Karatzoglou

, I.

Arapakis

, and J. M.

Jose

, Self-supervised reinforcement learning for recommender systems, in Proc. 43rd Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Virtual Event, China, 2020, pp. 931–940.

Crossref Google Scholar

[22]

X.

Chen

, C.

Huang

, L.

Yao

, X.

Wang

, W.

Liu

, and W.

Zhang

, Knowledge-guided deep reinforcement learning for interactive recommendation, in Proc. 2020 Int. Joint Conf. Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1–8.

Crossref Google Scholar

[23]

X.

Zhao

, L.

Zhang

, Z.

Ding

, L.

Xia

, J.

Tang

, and D.

Yin

, Recommendations with negative feedback via pairwise deep reinforcement learning, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 1040–1048.

Crossref Google Scholar

[24]

Y.

Deng

, F.

Bao

, Y.

Kong

, Z.

Ren

, and Q.

Dai

, Deep direct reinforcement learning for financial signal representation and trading, IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 653–664, 2017.

Crossref Google Scholar

[25]

L.

Zou

, L.

Xia

, Z.

Ding

, J.

Song

, W.

Liu

, and D.

Yin

, Reinforcement learning to optimize long-term user engagement in recommender systems, in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 2810–2818.

Crossref Google Scholar

[26]

X.

Wang

, L.

Wang

, S.

Wang

, J. F.

Chen

, and C.

Wu

, An XGBoost-enhanced fast constructive algorithm for food delivery route planning problem, Comput. Ind. Eng., vol. 152, p. 107029, 2021.

Crossref Google Scholar

[27]

Y.

Tang

, L.

Li

, and X.

Liu

, State-of-the-art development of complex systems and their simulation methods, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 271–290, 2021.

Crossref Google Scholar

[28]

H.

Salehinejad

, S.

Sankar

, J.

Barfett

, E.

Colak

, and S.

Valaee

, Recent advances in recurrent neural networks, arXiv preprint arXiv: 1801.01078, 2017.

[29]

S.

Hochreiter

and J.

Schmidhuber

, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

Crossref Google Scholar

[30]

D.

Silver

, G.

Lever

, N.

Heess

, T

Degris

, D.

Wierstra

, and M

Riedmiller

, Deterministic policy gradient algorithms, in Proc. 31st Int. Conf. Int. Conf. Machine Learning, Beijing, China, 2014, pp. 387–395.

[31]

C. M.

Bishop

and N. M.

Nasrabadi

, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006.

[32]

T. P.

Lillicrap

, J. J.

Hunt

, A.

Pritzel

, N.

Heess

, T.

Erez

, Y.

Tassa

, D.

Silver

, and D.

Wierstra

, Continuous control with deep reinforcement learning, arXiv preprint arXiv: 1509.02971, 2015.

Tsinghua Science and Technology

Volume 29 Issue 2,
April 2024

Pages 356-367

DOI: 10.26599/TST.2023.9010041

Cite this article:

Wang X, Wang L, Dong C, et al. Reinforcement Learning-Based Dynamic Order Recommendation for On-Demand Food Delivery. Tsinghua Science and Technology, 2024, 29(2): 356-367. https://doi.org/10.26599/TST.2023.9010041

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号