AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Unsupervised Domain Adaptation on Sentence Matching Through Self-Supervision

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
Show Author Information

Abstract

Although neural approaches have yielded state-of-the-art results in the sentence matching task, their performance inevitably drops dramatically when applied to unseen domains. To tackle this cross-domain challenge, we address unsupervised domain adaptation on sentence matching, in which the goal is to have good performance on a target domain with only unlabeled target domain data as well as labeled source domain data. Specifically, we propose to perform self-supervised tasks to achieve it. Different from previous unsupervised domain adaptation methods, self-supervision can not only flexibly suit the characteristics of sentence matching with a special design, but also be much easier to optimize. When training, each self-supervised task is performed on both domains simultaneously in an easy-to-hard curriculum, which gradually brings the two domains closer together along the direction relevant to the task. As a result, the classifier trained on the source domain is able to generalize to the unlabeled target domain. In total, we present three types of self-supervised tasks and the results demonstrate their superiority. In addition, we further study the performance of different usages of self-supervised tasks, which would inspire how to effectively utilize self-supervision for cross-domain scenarios.

Electronic Supplementary Material

Download File(s)
JCST-2103-11479-Highlights.pdf (298.1 KB)

References

[1]
Bowman S R, Angeli G, Potts C, Manning C D. A large annotated corpus for learning natural language inference. arXiv: 1508.05326, 2015. https://arxiv.org/abs/1508.05326, Nov. 2023.
[2]
Williams A, Nangia N, Bowman S R. A broad-coverage challenge corpus for sentence understanding through inference. arXiv: 1704.05426, 2017. https://arxiv.org/abs/1704.05426, Nov. 2023.
[3]
Rus V, Banjade R, Lintean M. On paraphrase identification corpora. In Proc. the 9th International Conference on Language Resources and Evaluation, May 2014, pp.2422–2429.
[4]
Dzikovska M, Nielsen R, Brew C, Leacock C, Giampiccolo D, Bentivogli L, Clark P, Dagan I, Dang H T. SemEval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge. In Proc. the 2nd Joint Conference on Lexical and Computational Semantics, Jun. 2013, pp.263–274.
[5]
Nakov P, Hoogeveen D, Màrquez L, Moschitti A, Mubarak H, Baldwin T, Verspoor K. SemEval-2017 task 3: Community question answering. arXiv: 1912.00730, 2019. https://arxiv.org/abs/1912.00730, Nov. 2023.
[6]
Wang M Q, Smith N A, Mitamura T. What is the jeopardy model? A quasi-synchronous grammar for QA. In Proc. the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jun. 2007, pp.22–32.
[7]
Yang Y, Yih W T, Meek C. WikiQA: A challenge dataset for open-domain question answering. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, Sept. 2015, pp.2013–2018. DOI: 10.18653/v1/D15-1237.
[8]

Bao X Q, Wu Y F. A tensor neural network with layerwise pretraining: Towards effective answer retrieval. Journal of Computer Science and Technology , 2016, 31(6): 1151–1160. DOI: 10.1007/s11390-016-1689-4.

[9]
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations from natural language inference data. arXiv: 1705.02364, 2017. https://arxiv.org/abs/1705.02364, Nov. 2023.
[10]
Choi J, Yoo K M, Lee S. Learning to compose task-specific tree structures. arXiv: 1707.02786, 2017. https://arxiv.org/abs/1707.02786, Nov. 2023.
[11]
Nie Y X, Bansal M. Shortcut-stacked sentence encoders for multi-domain inference. arXiv: 1708.02312, 2017. https://arxiv.org/abs/1708.02312, Nov. 2023.
[12]
Shen T, Zhou T Y, Long G D, Jiang J, Wang S, Zhang C Q. Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling. arXiv: 1801.10296, 2018. https://arxiv.org/abs/1801.10296, Nov. 2023.
[13]
Chen Q, Zhu X D, Ling Z H, Wei S, Jiang H, Inkpen D. Enhanced LSTM for natural language inference. arXiv: 1609.06038, 2016. https://arxiv.org/abs/1609.06038, Nov. 2023.
[14]
Yang L, Ai Q Y, Guo J F, Croft W B. aNMM: Ranking short answer texts with attention-based neural matching model. In Proc. the 25th ACM International on Conference on Information and Knowledge Management, Oct. 2016, pp.287–296. DOI: 10.1145/2983323.2983818.
[15]
Wang Z G, Hamza W, Florian R. Bilateral multi-perspective matching for natural language sentences. arXiv: 1702.03814, 2017. https://arxiv.org/abs/1702.03814, Nov. 2023.
[16]
Gong Y C, Luo H, Zhang J. Natural language inference over interaction space. arXiv: 1709.04348, 2017. https://arxiv.org/abs/1709.04348, Nov. 2023.
[17]
Liang D, Zhang F B, Zhang Q, Huang X J. Asynchronous deep interaction network for natural language inference. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Nov. 2019, pp.2692–2700. DOI: 10.18653/v1/D19-1271.
[18]
Chen L, Zhao Y B, Lyu B E, Jin L S, Chen Z, Zhu S, Yu K. Neural graph matching networks for Chinese short text matching. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.6152–6158. DOI: 10.18653/v1/2020.acl-main.547.
[19]
Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv: 1810.04805, 2018. https://arxiv.org/abs/1810.04805, Nov. 2023.
[20]

Pan S J, Yang Q. A survey on transfer learning. IEEE Trans. Knowledge and Data Engineering , 2010, 22(10): 1345–1359. DOI: 10.1109/TKDE.2009.191.

[21]
Saenko K, Kulis B, Fritz M, Darrell T. Adapting visual category models to new domains. In Proc. the 11th European Conference on Computer Vision, Sept. 2010, pp.213–226. DOI: 10.1007/978-3-642-15561-1_16.
[22]

Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V. Domain-adversarial training of neural networks. The Journal of Machine Learning Research , 2016, 17(1): 2096–2030. DOI: 10.1007/978-3-319-58347-1_10.

[23]

Wang Y Y, Gu J M, Wang C, Chen S C, Xue H. Discrimination-aware domain adversarial neural network. Journal of Computer Science and Technology , 2020, 35(2): 259–267. DOI: 10.1007/s11390-020-9969-4.

[24]
Arjovsky M, Bottou L. Towards principled methods for training generative adversarial networks. arXiv: 1701.04862, 2017. https://arxiv.org/abs/1701.04862, Nov. 2023.
[25]
Raina R, Battle A, Lee H, Packer B, Ng A Y. Self-taught learning: Transfer learning from unlabeled data. In Proc. the 24th International Conference on Machine Learning, Jun. 2007, pp.759–766. DOI: 10.1145/1273496.1273592.
[26]

Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Analysis and Machine Intelligence , 2013, 35(8): 1798–1828. DOI: 10.1109/TPAMI.2013.50.

[27]
Bengio Y, Louradour J, Collobert R, Weston J. Curriculum learning. In Proc. the 26th Annual International Conference on Machine Learning, Jun. 2009, pp.41–48. DOI: 10.1145/1553374.1553380.
[28]
Peng M L, Zhang Q, Jiang Y G, Huang X J. Cross-domain sentiment classification with target domain specific information. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, Jul. 2018, pp.2505–2513. DOI: 10.18653/v1/P18-1233.
[29]
Ghosal D, Hazarika D, Roy A, Majumder N, Mihalcea R, Poria S. KinGDOM: Knowledge-guided DOMain adaptation for sentiment analysis. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.3198–3210. DOI: 10.18653/v1/2020.acl-main.292.
[30]
Cao Y, Fang M, Yu B S, Zhou J T. Unsupervised domain adaptation on reading comprehension. In Proc. the 34th AAAI Conference on Artificial Intelligence, Feb. 2020, pp.7480–7487. DOI: 10.1609/aaai.v34i05.6245.
[31]
Kamath A, Jia R B, Liang P. Selective question answering under domain shift. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.5684–5696. DOI: 10.18653/v1/2020.acl-main.503.
[32]
Ding N, Long D K, Xu G W, Zhu M H, Xie P J, Wang X B, Zheng H T. Coupling distant annotation and adversarial training for cross-domain Chinese word segmentation. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.6662–6671. DOI: 10.18653/v1/2020.acl-main.595.
[33]
Rücklé A, Pfeiffer J, Gurevych I. MultiCQA: Zero-shot transfer of self-supervised text matching models on a massive scale. In Proc. the 2020 Conference on Empirical Methods in Natural Language Processing, Nov. 2020, pp.2471–2486. DOI: 10.18653/v1/2020.emnlp-main.194.
[34]
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv: 1301.3781, 2013. https://arxiv.org/abs/1301.3781, Nov. 2023.
[35]
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.3111–3119.
[36]

Bengio Y, Ducharme R, Vincent P, Janvin C. A neural probabilistic language model. The Journal of Machine Learning Research , 2003, 3: 1137–1155. DOI: 10.1007/3-540-33486-6_6.

[37]
Peters M E, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. arXiv: 1802.05365, 2018. https://arxiv.org/abs/1802.05365, Nov. 2023.
[38]
Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training, 2018. https://www.bibsonomy.org/bibtex/15c343ed9a31ac52fd17a898f72af228f/lepsky?lang=en, Nov. 2023.
[39]
Kumar M P, Packer B, Koller D. Self-paced learning for latent variable models. In Proc. the 23rd International Conference on Neural Information Processing Systems, Dec. 2010, pp.1189–1197.
[40]
Sachan M, Xing E. Easy questions first? A case study on curriculum learning for question answering. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, Aug. 2016, pp.453–463. DOI: 10.18653/v1/P16-1043.
[41]
Sachan M, Xing E. Self-training for jointly learning to ask and answer questions. In Proc. the 16th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2018, pp.629–640. DOI: 10.18653/v1/N18-1058.
[42]
Tay Y, Wang S H, Tuan L A, Fu J, Phan M C, Yuan X D, Rao J F, Hui S C, Zhang A. Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives. arXiv: 1905.10847, 2019.https://arxiv.org/abs/1905.10847, Nov. 2023.
[43]
Xu B F, Zhang L, Mao Z, Wang Q, Xie H, Zhang Y. Curriculum learning for natural language understanding. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.6095–6104. DOI: 10.18653/v1/2020.acl-main.542.
[44]
Wu J W, Wang X, Wang W Y. Self-supervised dialogue learning. arXiv: 1907.00448, 2019. https://arxiv.org/abs/1907.00448, Nov. 2023.
[45]
Lewis M, Liu Y H, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv: 1910.13461, 2019. https://arxiv.org/abs/1910.13461, Nov. 2023.
[46]
Jurczyk T, Zhai M, Choi J D. SelQA: A new benchmark for selection-based question answering. In Proc. the 28th International Conference on Tools with Artificial Intelligence, Nov. 2016, pp.820–827. DOI: 10.1109/ICTAI.2016.0128.
[47]
Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv: 1412.6980, 2014. https://arxiv.org/abs/1412.6980, Nov. 2023.
[48]
Bousmalis K, Trigeorgis G, Silberman N, Krishnan D, Erhan D. Domain separation networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.343–351.
[49]
Ziser Y, Reichart R. Task refinement learning for improved accuracy and stability of unsupervised domain adaptation. In Proc. the 57th Annual Meeting of the Association for Computational Linguistics, Jul. 2019, pp.5895–5906. DOI: 10.18653/v1/P19-1591.
[50]
Long M S, Zhu H, Wang J M, Jordan M I. Deep transfer learning with joint adaptation networks. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.2208–2217.
[51]
Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S. Central moment discrepancy (CMD) for domain-invariant representation learning. arXiv: 1702.08811, 2017. https://arxiv.org/abs/1702.08811, Dec. 2023.
[52]
Ruder S, Plank B. Strong baselines for neural semi-supervised learning under domain shift. arXiv: 1804.09530, 2018. https://arxiv.org/abs/1804.09530, Nov. 2023.
[53]
Ge Y X, Chen D P, Li H S. Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv: 2001.01526, 2020. https://arxiv.org/abs/2001.01526, Nov. 2023.
Journal of Computer Science and Technology
Pages 1237-1249
Cite this article:
Bai G-R, Liu Q-B, He S-Z, et al. Unsupervised Domain Adaptation on Sentence Matching Through Self-Supervision. Journal of Computer Science and Technology, 2023, 38(6): 1237-1249. https://doi.org/10.1007/s11390-022-1479-0

194

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 30 March 2021
Accepted: 28 February 2022
Published: 15 November 2023
© Institute of Computing Technology, Chinese Academy of Sciences 2023
Return