Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking

Zhi-Jing Wu; Yi-Qun Liu; Jia-Xin Mao; Min Zhang; Shao-Ping Ma

doi:10.1007/s11390-022-2031-y

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Regular Paper

Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking

Zhi-Jing Wu^{¹^,²}, Yi-Qun Liu^{¹^,²}(

), Jia-Xin Mao^³, Min Zhang^{¹^,²}, Shao-Ping Ma^{¹^,²}

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China

Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100084, China

A preliminary version of the paper was published in the Proceedings of WWW 2020.

Show Author Information

Abstract

Document ranking is one of the most studied but challenging problems in information retrieval (IR). More and more studies have begun to address this problem from fine-grained document modeling. However, most of them focus on context-independent passage-level relevance signals and ignore the context information. In this paper, we investigate how information gain accumulates with passages and propose the context-aware Passage Cumulative Gain (PCG). The fine-grained PCG avoids the need to split documents into independent passages. We investigate PCG patterns at the document level (DPCG) and the query level (QPCG). Based on the patterns, we propose a BERT-based sequential model called Passage-level Cumulative Gain Model (PCGM) and show that PCGM can effectively predict PCG sequences. Finally, we apply PCGM to the document ranking task using two approaches. The first one is leveraging DPCG sequences to estimate the gain of an individual document. Experimental results on two public ad hoc retrieval datasets show that PCGM outperforms most existing ranking models. The second one considers the cross-document effects and leverages QPCG sequences to estimate the marginal relevance. Experimental results show that predicted results are highly consistent with users' preferences. We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.

Keywords

document ranking neural network passage cumulative gain

Electronic Supplementary Material

Download File(s)

2031_ESM.pdf (314.7 KB)

References

[1]

Robertson S E, Walker S. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proc. the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Jul. 1994, pp. 232-241. DOI: 10.1007/978-1-4471-2099-5_24.

[2]

Ponte J M. A language modeling approach to information retrieval[Ph. D. Thesis]. University of Massachusetts, 1998.

[3]

Zhai C, Lafferty J. A study of smoothing methods for language models applied to ad hoc information retrieval. ACM SIGIR Forum, 2017, 51(2): 268-276. DOI: 10.1145/3130348.3130377.

Crossref Google Scholar

[4]

Burges C J. From RankNet to LambdaRank to LambdaMART: An overview. Technical Report, MSR-TR-2010-82, Microsoft, 2010. https://www.microsoft.com/enus/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf, Apr. 2022.

[5]

Liu T. Learning to Rank for Information Retrieval. Springer, 2011. DOI: 10.1007/978-3-642-14267-3.

[6]

Pang L, Lan Y, Guo J, Xu J, Cheng X. A deep investigation of deep IR models. arXiv: 1707.07700, 2017. https://arxiv.org/abs/1707.07700, May 2022.

[7]

Clarke C L, Scholer F, Soboroff I. The TREC 2005 terabyte track. In Proc. the 14th Text Retrieval Conference, Nov. 2005.

[8]

Callan J P. Passage-level evidence in document retrieval. In Proc. the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, July 1994, pp. 302-310. DOI: 10.1007/978-1-4471-2099-5_31.

[9]

Kaszkiel M, Zobel J. Effective ranking with arbitrary passages. Journal of the American Society for Information Science and Technology, 2001, 52(4): 344-364. DOI: 10.1002/1532-2890(2000)9999:9999<::AIDASI1075>3.0.CO;2-%23.

Crossref

[10]

Xi W, Xu R R, Khoo C S, Lim E P. Incorporating window-based passage-level evidence in document retrieval. Journal of Information Science, 2001, 27(2): 73-80. DOI: 10.1177/016555150102700202.

Crossref Google Scholar

[11]

Dai Z, Callan J. Deeper text understanding for IR with contextual neural language modeling. In Proc. the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2019, pp. 985-988. DOI: 10.1145/3331184.3331303.

[12]

Wu Z, Mao J, Liu Y, Zhang M, Ma S. Investigating passage-level relevance and its role in document-level relevance judgment. In Proc. the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2019, pp. 605-614. DOI: 10.1145/3331184.3331233.

[13]

Fan Y, Guo J, Lan Y, Xu J, Zhai C, Cheng X. Modeling diverse relevance patterns in ad-hoc retrieval. In Proc. the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2018, pp. 375-384. DOI: 10.1145/3209978.3209980.

[14]

Pang L, Lan Y, Guo J, Xu J, Xu J, Cheng X. DeepRank: A new deep architecture for relevance ranking in information retrieval. In Proc. the 2017 ACM Conference on Information and Knowledge Management, Nov. 2017, pp. 257-266. DOI: 10.1145/3132847.3132914.

[15]

Li X, Liu Y, Mao J, He Z, Zhang M, Ma S. Understanding reading attention distribution during relevance judgement. In Proc. the 27th ACM International Conference on Information and Knowledge Management, Oct. 2018, pp. 733-742. DOI: 10.1145/3269206.3271764.

[16]

Järvelin K, Kekäläinen J. IR evaluation methods for retrieving highly relevant documents. In Proc. the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2000, pp. 41-48. DOI: 10.1145/345508.345545.

[17]

Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 2002, 20(4): 422-446. DOI: 10.1145/582415.582418.

Crossref Google Scholar

[18]

Järvelin K, Price S L, Delcambre L M, Nielsen M L. Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proc. the 30th European Conference on Information Retrieval Research, March 30-April 3, 2008, pp. 4-15. DOI: 10.1007/978-3-540-78646-7_4.

[19]

Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 1998, pp. 335-336. DOI: 10.1145/290941.291025.

[20]

Liu M, Liu Y, Mao J, Luo C, Zhang M, Ma S. "Satisfaction with failure'' or'ùnsatisfied success'': Investigating the relationship between search success and user satisfaction. In Proc. the 2018 World Wide Web Conference, Apr. 2018, pp. 1533-1542. DOI: 10.1145/3178876.3186065.

[21]

Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2019, pp. 4171-4186. DOI: 10.18653/v1/N19-1423.

[22]

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780. DOI: 10.1162/neco.1997.9.8.1735.

Crossref Google Scholar

[23]

Wu Z, Mao J, Liu Y, Zhan J, Zheng Y, Zhang M, Ma S. Leveraging passage-level cumulative gain for document ranking. In Proc. the Web Conference 2020, Apr. 2020, pp. 2421-2431. DOI: 10.1145/3366423.3380305.

[24]

Liu X, Croft W B. Passage retrieval based on language models. In Proc. the 2002 ACM CIKM International Conference on Information and Knowledge Management, Nov. 2002, pp. 375-382. DOI: 10.1145/584792.584854.

[25]

Wu Z, Mao J, Liu Y, Zhang M, Ma S. Investigating reading behavior in fine-grained relevance judgment. In Proc. the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2020, pp. 1889-1892. DOI: 10.1145/3397271.3401305.

[26]

Hearst M A, Plaunt C. Subtopic structuring for full-length document access. In Proc. the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, June 27-July 1, 1993, pp. 59-68. DOI: 10.1145/160688.160695.

[27]

Salton G, Allan J, Buckley C. Approaches to passage retrieval in full text information systems. In Proc. the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, June 27-July 1, 1993, pp. 49-58. DOI: 10.1145/160688.160693.

[28]

Hui K, Yates A, Berberich K, De Melo G. PACCR: A position-aware neural IR model for relevance matching. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, Sept. 2017, pp. 1049-1058. DOI: 10.18653/v1/D17-1110.

[29]

Hu B, Lu Z, Li H, Chen Q. Convolutional neural network architectures for matching natural language sentences. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp. 2042-2050.

[30]

Guo J, Fan Y, Ai Q, Croft W B. A deep relevance matching model for ad-hoc retrieval. In Proc. the 25th ACM International Conference on Information and Knowledge Management, Oct. 2016, pp. 55-64. DOI: 10.1145/2983323.2983769.

[31]

Pang L, Lan Y, Guo J, Xu J, Wan S, Cheng X. Text matching as image recognition. In Proc. the 30th AAAI Conference on Artificial Intelligence, Feb. 2016, pp. 2793-2799.

[32]

Xiong C, Dai Z, Callan J, Liu Z, Power R. End-to-end neural ad-hoc ranking with kernel pooling. In Proc. the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2017, pp. 55-64. DOI: 10.1145/3077136.3080809.

[33]

Li X, Mao J, Wang C, Liu Y, Zhang M, Ma S. Teach machine how to read: Reading behavior inspired relevance estimation. In Proc. the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2019, pp. 795-804. DOI: 10.1145/3331184.3331205.

[34]

Robertson S E. The probability ranking principle in IR. In Readings in Information Retrieval, Jones K S, Willett P (eds. ), Morgan Kaufmann Publishers Inc., 1997, pp. 281-286.

[35]

Goffman W. A searching procedure for information retrieval. Information Storage and Retrieval, 1964, 2: 73-78. DOI: 10.1016/0020-0271(64)90006-3.

Crossref Google Scholar

[36]

Fuhr N. A probability ranking principle for interactive information retrieval. Information Retrieval, 2008, 11(3): 251-265. DOI: 10.1007/s10791-008-9045-0.

Crossref Google Scholar

[37]

Zuccon G, Azzopardi L A, Van Rijsbergen K. The quantum probability ranking principle for information retrieval. In Proc. the 2nd Conference on the Theory of Information Retrieval, Sept. 2009, pp. 232-240. DOI: 10.1007/978-3-642-04417-5_21.

[38]

Chen H, Karger D R. Less is more: Probabilistic models for retrieving fewer relevant documents. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2006, pp. 429-436. DOI: 10.1145/1148170.1148245.

[39]

Hayes A F, Krippendorff K. Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 2007, 1(1): 77-89. DOI: 10.1080/19312450709336664.

Crossref Google Scholar

[40]

Roitero K, Maddalena E, Demartini G, Mizzaro S. On fine-grained relevance scales. In Proc. the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2018, pp. 675-684. DOI: 10.1145/3209978.3210052.

[41]

Sarkar P, Pillai J S. User expectations of augmented reality experience in Indian school education. In Proc. the 7th International Conference on Research into Design, Jan. 2019, pp. 745-755. DOI: 10.1007/978-981-13-5977-4_63.

[42]

Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv: 1412.6980, 2014. https://arxiv.org/abs/14-12.6980, May 2022.

[43]

Sakai T, Song R. Evaluating diversified search results using per-intent graded relevance. In Proc. the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2011, pp. 1043-1052. DOI: 10.1145/2009916.2010055.

[44]

Luo C, Zheng Y, Liu Y, Wang X, Xu J, Zhang M, Ma S. SogouT-16: A new web corpus to embrace IR research. In Proc. the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2017, pp. 1233-1236. DOI: 10.1145/3077136.3080694.

[45]

Guo J, Fan Y, Ji X, Cheng X. MatchZoo: A learning, practicing, and developing system for neural text matching. In Proc. the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2019, pp. 1297-1300. DOI: 10.1145/3331184.3331403.

[46]

Zheng Y, Fan Z, Liu Y, Luo C, Zhang M, Ma S. Sogou-QCL: A new dataset with click relevance label. In Proc. the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2018, pp. 1117-1120. DOI: 10.1145/3209978.3210092.

[47]

Wang C, Liu Y, Wang M, Zhou K, Nie J, Ma S. Incorporating non-sequential behavior into click models. In Proc. the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2015, pp. 283-292. DOI: 10.1145/2766462.2767712.

[48]

Dupret G E, Piwowarski B. A user browsing model to predict search engine click data from past observations. In Proc. the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2008, pp. 331-338. DOI: 10.1145/1390334.1390392.

[49]

Zheng Y, Chu Z, Li X, Mao J, Liu Y, Zhang M, Ma S. THUIR at the NTCIR-14 WWW-2 task. In Proc. the 14th International Conference on NⅡ Testbeds and Community for Information Access Research, Jun. 2019, pp. 165-179. DOI: 10.1007/978-3-030-36805-0_13.

[50]

Zeiler M D. ADADELTA: An adaptive learning rate method. arXiv: 1212.5701, 2012. https://arxiv.org/abs/12-12.5701, May 2022.

Journal of Computer Science and Technology

Volume 37 Issue 4,
July 2022

Pages 814-838

DOI: 10.1007/s11390-022-2031-y

Cite this article:

Wu Z-J, Liu Y-Q, Mao J-X, et al. Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking. Journal of Computer Science and Technology, 2022, 37(4): 814-838. https://doi.org/10.1007/s11390-022-2031-y

483

Views

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 19 November 2021

Revised: 18 June 2022

Accepted: 29 June 2022

Published: 25 July 2022