3
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 4171–4186.
4
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need. in Proc. 31st Int. Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6000–6010.
6
T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, presented at the 5th Int. Conf. on Learning Representations, Toulon, France, 2017.
8
Z. G. Wang, H. T. Mi, and A. Ittycheriah, Sentence similarity learning by lexical decomposition and composition, in Proc. COLING 2016, the 26th Int. Conf. on Computational Linguistics, Osaka, Japan, 2016, pp. 1340–1349.
9
M. Heilman and N. A. Smith, Tree edit models for recognizing textual entailments, paraphrases, and answers to questions, in Proc. Human Language Technologies: The 2010 Annu. Conf. of the North American Chapter of the Association for Computational Linguistics, Los Angeles, CA, USA, 2010, pp. 1011–1019.
11
Y. Q. Le, Z. J. Wang, Z. Quan, J. W. He, and B. Yao, ACV-tree: A new method for sentence similarity modeling, in Proc. Twenty-Seventh Int. Joint Conf. on Artificial Intelligence, Stockholm, Sweden, 2018, pp. 4137–4143.
19
S. Chopra, R. Hadsell, and Y. LeCun, Learning a similarity metric discriminatively, with application to face verification, in Proc. 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005, pp. 539–546.
20
W. T. Yih, K. Toutanova, J. C. Platt, and C. Meek. Learning discriminative projections for text similarity measures, in Proc. Fifteenth Conf. on Computational Natural Language Learning, Portland, OR, USA, 2011, pp. 247–256.
21
R. Hadsell, S. Chopra, and Y. LeCun. Dimensionality reduction by learning an invariant mapping, in Proc. 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, New York, NY, USA, 2006, pp. 1735–1742, 2006.