AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Robust Unsupervised Discriminative Dependency Parsing

School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China.
Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050
University of Chinese Academy of Sciences, Beijing 100049, China.
Show Author Information

Abstract

Discriminative approaches have shown their effectiveness in unsupervised dependency parsing. However, due to their strong representational power, discriminative approaches tend to quickly converge to poor local optima during unsupervised training. In this paper, we tackle this problem by drawing inspiration from robust deep learning techniques. Specifically, we propose robust unsupervised discriminative dependency parsing, a framework that integrates the concepts of denoising autoencoders and conditional random field autoencoders. Within this framework, we propose two types of sentence corruption mechanisms as well as a posterior regularization method for robust training. We tested our methods on eight languages and the results show that our methods lead to significant improvements over previous work.

References

[1]
D. Klein and C. D. Manning, Corpus-based induction of syntactic structure: Models of dependency and constituency, in Proc. 42nd Annu. Meeting on Association for Computational Linguistics, Barcelona, Spain, 2004, p. 478.
[2]
Y. Bisk and J. Hockenmaier, Simple robust grammar induction with combinatory categorial grammars, in Proc. 26th AAAI Conf. Artificial Intelligence, Toronto, Canada, 2012.
[3]
E. Grave and N. Elhadad, A convex and feature-rich discriminative approach to dependency grammar induction, in Proc. 53rd Annu. Meeting of the Association for Computational Linguistics and the 7th Int. Joint Conf. Natural Language Processing, Beijing, China, 2015, pp. 1375–1384.
[4]
J. Cai, Y. Jiang, and K. W. Tu, CRF autoencoder for unsupervised dependency parsing, in Proc. 2017 Conf. Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 1638–1643.
[5]
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. A. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res., vol. 11, pp. 3371–3408, 2010.
[6]
I. J. Goodfellow, J. Shlens, and C. Szegedy, Explaining and harnessing adversarial examples, in Proc. Int. Conf. Learning Representations, San Diego, CA, USA, 2015.
[7]
N. A. Smith and J. Eisner, Annealing structural bias in multilingual weighted grammar induction, in Proc. 21st Int. Conf. Computational Linguistics and the 44th Annu. Meeting of the Association for Computational Linguistics, Sydney, Australia, 2006, pp. 569–576.
[8]
T. Berg-Kirkpatrick, A. Bouchard-Côté, J. DeNero, and D. Klein, Painless unsupervised learning with features, in Human Language Technologies: The 2010 Annu. Conf. of the North American Chapter of the Association for Computational Linguistics, Los Angeles, CA, USA, 2010, pp. 582–590.
[9]
Y. Jiang, W. Han, and K. Tu, Unsupervised neural dependency parsing, in Proc. 2016 Conf. Empirical Methods in Natural Language Processing, Austin, TX, USA, 2016, pp. 763–771.
[10]
L. L. Xu, J. Neufeld, B. Larson, and D. Schuurmans, Maximum margin clustering, in Proc. 17th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2004, pp. 1537–1544.
[11]
A. Søgaard, Unsupervised dependency parsing without training, Nat. Lang. Eng., vol. 18, no. 2, pp. 187–203, 2012.
[12]
H. Martínez Alonso, Ž. Agic, B. Plank, and A. Søgaard, Parsing universal dependencies without training, in Proc. 15th Conf. European Chapter of the Association for Computational Linguistics, East Stroudsburg, PA, USA, 2017, pp. 230–240.
[13]
P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol, Extracting and composing robust features with denoising autoencoders, in Proc. 25th Int. Conf. Machine Learning, Helsinki, Finland, 2008, pp. 1096–1103.
[14]
J. Sietsma and R. J. F. Dow, Creating artificial neural networks that generalize, Neural Netw., vol. 4, no. 1, pp. 67–79, 1991.
[15]
G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors, arXiv preprint arXiv: 1207.0580, 2012.
[16]
C. M. Bishop, Training with noise is equivalent to Tikhonov regularization, Neural Comput., vol. 7, no. 1, pp. 108–116, 1995.
[17]
C. J. C. Burges and B. Schölkopf, Improving the accuracy and speed of support vector machines, in Proc. 9th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 1996, pp. 375–381.
[18]
L. van der Maaten, M. M. Chen, S. Tyree, and K. Q. Weinberger. Learning with marginalized corrupted features, in Proc. 30th Int. Conf. Machine Learning, Atlanta, GA, USA, 2013, pp. 410–418.
[19]
N. Chen, J. Zhu, J. F. Chen, and B. Zhang, Dropout training for support vector machines, in Proc. 28th AAAI Conf. Artificial Intelligence, Québec City, Canada, 2014, pp. 1752–1759.
[20]
J. V. Graça, K. Ganchev, B. Taskar, and F. Pereira, Posterior vs. parameter sparsity in latent variable models, in Proc. 22nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 664–672.
[21]
J. Gillenwater, K. Ganchev, J. Graça, F. Pereira, and B. Taskar, Sparsity in dependency grammar induction, in Proc. ACL 2010 Conf. Short Papers, Uppsala, Sweden, 2010, pp. 194–199.
[22]
K. W. Tu and V. Honavar, Unambiguity regularization for unsupervised learning of probabilistic grammars, in Proc. 2012 Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, 2012, pp. 1324–1334.
[23]
T. Naseem, H. Chen, R. Barzilay, and M. Johnson, Using universal linguistic knowledge to guide grammar induction, in Proc. 2010 Conf. Empirical Methods in Natural Language Processing, Cambridge, MA, USA, 2010, pp. 1234–1244.
[24]
B. S. Yang and C. Cardie, Context-aware learning for sentence-level sentiment analysis with posterior regularization, in Proc. 52nd Annu. Meeting of the Association for Computational Linguistics, Baltimore, MA, USA, 2014, pp. 325–335.
[25]
J. C. Zhang, Y. Liu, H. B. Luan, J. F. Xu, and M. S. Sun, Prior knowledge integration for neural machine translation using posterior regularization, in Proc. 55th Annu. Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 1514–1523.
[26]
R. McDonald, F. Pereira, K. Ribarov, and J. Hajic, Non-projective dependency parsing using spanning tree algorithms, in Proc. Conf. Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, Canada, 2005, pp. 523–530.
[27]
M. A. Paskin, Cubic-time parsing and learning algorithms for grammatical bigram models, Report, Berkeley, California, University of California, 2001.
[28]
T. Koo, A. Globerson, X. Carreras, and M. Collins, Structured prediction models via the matrix-tree theorem, in Proc. 2007 Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, 2007, pp. 141–150.
[29]
R. McDonald and F. Pereira, Online learning of approximate dependency parsing algorithms, in Proc. 11th Conf. European Chapter of the Association for Computational Linguistics, Trento, Italy, 2006, pp. 81–88.
[30]
T. Koo and M. Collins, Efficient third-order dependency parsers, in Proc. 48th Annu. Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 1–11.
[31]
W. Ammar, C. Dyer, and N. A. Smith, Conditional random field autoencoders for unsupervised structured prediction, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3311–3319.
[32]
R. M. Neal and G. E. Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, in Learning in Graphical Models, M. I. Jordan, ed. Cambridge, MA, USA: MIT Press, 1998, pp. 355–368.
[33]
K. Ganchev, J. Graça, J. Gillenwater, B. Taskar, Posterior regularization for structured latent variable models, J. Mach. Learn. Res., vol. 11, pp. 2001–2049, 2010.
[34]
S. B. Cohen, K. Gimpel, and N. A. Smith, Logistic normal priors for unsupervised probabilistic grammar induction, in Proc. 21st Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2008, pp. 321–328.
[35]
D. Gelling, T. Cohn, P. Blunsom, and J. Graça, The PASCAL challenge on grammar induction, in Proc. NAACL-HLT Workshop on the Induction of Linguistic Structure, Montreal, Canada, 2012, pp. 64–80.
Tsinghua Science and Technology
Pages 192-202
Cite this article:
Jiang Y, Cai J, Tu K. Robust Unsupervised Discriminative Dependency Parsing. Tsinghua Science and Technology, 2020, 25(2): 192-202. https://doi.org/10.26599/TST.2018.9010145

467

Views

12

Downloads

1

Crossref

N/A

Web of Science

2

Scopus

1

CSCD

Altmetrics

Received: 12 August 2018
Accepted: 24 December 2018
Published: 02 September 2019
© The author(s) 2020

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return