Robust Unsupervised Discriminative Dependency Parsing

Yong Jiang; Jiong Cai; Kewei Tu

doi:10.26599/TST.2018.9010145

| Sign up

PDF (2.5 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

Robust Unsupervised Discriminative Dependency Parsing

Yong Jiang(), Jiong Cai, Kewei Tu

School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China.

Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai

200050

University of Chinese Academy of Sciences, Beijing 100049, China.

Show Author Information

Abstract

Discriminative approaches have shown their effectiveness in unsupervised dependency parsing. However, due to their strong representational power, discriminative approaches tend to quickly converge to poor local optima during unsupervised training. In this paper, we tackle this problem by drawing inspiration from robust deep learning techniques. Specifically, we propose robust unsupervised discriminative dependency parsing, a framework that integrates the concepts of denoising autoencoders and conditional random field autoencoders. Within this framework, we propose two types of sentence corruption mechanisms as well as a posterior regularization method for robust training. We tested our methods on eight languages and the results show that our methods lead to significant improvements over previous work.

Keywords

unsupervised learning dependency parsing autoencoders

References

[1]

Klein

and C. D.

Manning

, Corpus-based induction of syntactic structure: Models of dependency and constituency, in Proc. 42nd Annu. Meeting on Association for Computational Linguistics, Barcelona, Spain, 2004, p. 478.

Crossref

[2]

Bisk

and J.

Hockenmaier

, Simple robust grammar induction with combinatory categorial grammars, in Proc. 26th AAAI Conf. Artificial Intelligence, Toronto, Canada, 2012.

[3]

Grave

and N.

Elhadad

, A convex and feature-rich discriminative approach to dependency grammar induction, in Proc. 53rd Annu. Meeting of the Association for Computational Linguistics and the 7th Int. Joint Conf. Natural Language Processing, Beijing, China, 2015, pp. 1375–1384.

Crossref

[4]

Cai

, Y.

Jiang

, and K. W.

, CRF autoencoder for unsupervised dependency parsing, in Proc. 2017 Conf. Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 1638–1643.

Crossref

[5]

Vincent

, H.

Larochelle

, I.

Lajoie

, Y.

Bengio

, and P. A.

Manzagol

, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res., vol. 11, pp. 3371–3408, 2010.

Google Scholar

[6]

I. J.

Goodfellow

, J.

Shlens

, and C.

Szegedy

, Explaining and harnessing adversarial examples, in Proc. Int. Conf. Learning Representations, San Diego, CA, USA, 2015.

[7]

N. A.

Smith

and J.

Eisner

, Annealing structural bias in multilingual weighted grammar induction, in Proc. 21st Int. Conf. Computational Linguistics and the 44th Annu. Meeting of the Association for Computational Linguistics, Sydney, Australia, 2006, pp. 569–576.

Crossref

[8]

Berg-Kirkpatrick

, A.

Bouchard-Côté

, J.

DeNero

, and D.

Klein

, Painless unsupervised learning with features, in Human Language Technologies: The 2010 Annu. Conf. of the North American Chapter of the Association for Computational Linguistics, Los Angeles, CA, USA, 2010, pp. 582–590.

[9]

Jiang

, W.

Han

, and K.

, Unsupervised neural dependency parsing, in Proc. 2016 Conf. Empirical Methods in Natural Language Processing, Austin, TX, USA, 2016, pp. 763–771.

Crossref

[10]

L. L.

, J.

Neufeld

, B.

Larson

, and D.

Schuurmans

, Maximum margin clustering, in Proc. 17th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2004, pp. 1537–1544.

[11]

Søgaard

, Unsupervised dependency parsing without training, Nat. Lang. Eng., vol. 18, no. 2, pp. 187–203, 2012.

Crossref Google Scholar

[12]

H. Martínez

Alonso

, Ž.

Agic

, B.

Plank

, and A.

Søgaard

, Parsing universal dependencies without training, in Proc. 15th Conf. European Chapter of the Association for Computational Linguistics, East Stroudsburg, PA, USA, 2017, pp. 230–240.

[13]

Vincent

, H.

Larochelle

, Y.

Bengio

, and P. A.

Manzagol

, Extracting and composing robust features with denoising autoencoders, in Proc. 25th Int. Conf. Machine Learning, Helsinki, Finland, 2008, pp. 1096–1103.

Crossref

[14]

Sietsma

and R. J. F.

Dow

, Creating artificial neural networks that generalize, Neural Netw., vol. 4, no. 1, pp. 67–79, 1991.

Crossref Google Scholar

[15]

G. E.

Hinton

, N.

Srivastava

, A.

Krizhevsky

, I.

Sutskever

, and R. R.

Salakhutdinov

, Improving neural networks by preventing co-adaptation of feature detectors, arXiv preprint arXiv: 1207.0580, 2012.

Google Scholar

[16]

C. M.

Bishop

, Training with noise is equivalent to Tikhonov regularization, Neural Comput., vol. 7, no. 1, pp. 108–116, 1995.

Crossref Google Scholar

[17]

C. J. C.

Burges

and B.

Schölkopf

, Improving the accuracy and speed of support vector machines, in Proc. 9th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 1996, pp. 375–381.

[18]

van der Maaten

, M. M.

Chen

, S.

Tyree

, and K. Q.

Weinberger

. Learning with marginalized corrupted features, in Proc. 30th Int. Conf. Machine Learning, Atlanta, GA, USA, 2013, pp. 410–418.

[19]

Chen

, J.

Zhu

, J. F.

Chen

, and B.

Zhang

, Dropout training for support vector machines, in Proc. 28th AAAI Conf. Artificial Intelligence, Québec City, Canada, 2014, pp. 1752–1759.

[20]

J. V.

Graça

, K.

Ganchev

, B.

Taskar

, and F.

Pereira

, Posterior vs. parameter sparsity in latent variable models, in Proc. 22nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 664–672.

[21]

Gillenwater

, K.

Ganchev

, J.

Graça

, F.

Pereira

, and B.

Taskar

, Sparsity in dependency grammar induction, in Proc. ACL 2010 Conf. Short Papers, Uppsala, Sweden, 2010, pp. 194–199.

[22]

K. W.

and V.

Honavar

, Unambiguity regularization for unsupervised learning of probabilistic grammars, in Proc. 2012 Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, 2012, pp. 1324–1334.

[23]

Naseem

, H.

Chen

, R.

Barzilay

, and M.

Johnson

, Using universal linguistic knowledge to guide grammar induction, in Proc. 2010 Conf. Empirical Methods in Natural Language Processing, Cambridge, MA, USA, 2010, pp. 1234–1244.

[24]

B. S.

Yang

and C.

Cardie

, Context-aware learning for sentence-level sentiment analysis with posterior regularization, in Proc. 52nd Annu. Meeting of the Association for Computational Linguistics, Baltimore, MA, USA, 2014, pp. 325–335.

Crossref

[25]

J. C.

Zhang

, Y.

Liu

, H. B.

Luan

, J. F.

, and M. S.

Sun

, Prior knowledge integration for neural machine translation using posterior regularization, in Proc. 55th Annu. Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 1514–1523.

Crossref

[26]

McDonald

, F.

Pereira

, K.

Ribarov

, and J.

Hajic

, Non-projective dependency parsing using spanning tree algorithms, in Proc. Conf. Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, Canada, 2005, pp. 523–530.

Crossref

[27]

M. A.

Paskin

, Cubic-time parsing and learning algorithms for grammatical bigram models, Report, Berkeley, California, University of California, 2001.

[28]

Koo

, A.

Globerson

, X.

Carreras

, and M.

Collins

, Structured prediction models via the matrix-tree theorem, in Proc. 2007 Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, 2007, pp. 141–150.

[29]

McDonald

and F.

Pereira

, Online learning of approximate dependency parsing algorithms, in Proc. 11th Conf. European Chapter of the Association for Computational Linguistics, Trento, Italy, 2006, pp. 81–88.

[30]

Koo

and M.

Collins

, Efficient third-order dependency parsers, in Proc. 48th Annu. Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 1–11.

[31]

Ammar

, C.

Dyer

, and N. A.

Smith

, Conditional random field autoencoders for unsupervised structured prediction, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3311–3319.

[32]

R. M.

Neal

and G. E.

Hinton

, A view of the EM algorithm that justifies incremental, sparse, and other variants, in Learning in Graphical Models, M. I. Jordan, ed. Cambridge, MA, USA: MIT Press, 1998, pp. 355–368.

Crossref

[33]

Ganchev

, J.

Graça

, J.

Gillenwater

, B.

Taskar

, Posterior regularization for structured latent variable models, J. Mach. Learn. Res., vol. 11, pp. 2001–2049, 2010.

Google Scholar

[34]

S. B.

Cohen

, K.

Gimpel

, and N. A.

Smith

, Logistic normal priors for unsupervised probabilistic grammar induction, in Proc. 21st Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2008, pp. 321–328.

[35]

Gelling

, T.

Cohn

, P.

Blunsom

, and J.

Graça

, The PASCAL challenge on grammar induction, in Proc. NAACL-HLT Workshop on the Induction of Linguistic Structure, Montreal, Canada, 2012, pp. 64–80.

Tsinghua Science and Technology

Volume 25 Issue 2,
April 2020

Pages 192-202

DOI: 10.26599/TST.2018.9010145

Cite this article:

Jiang Y, Cai J, Tu K. Robust Unsupervised Discriminative Dependency Parsing. Tsinghua Science and Technology, 2020, 25(2): 192-202. https://doi.org/10.26599/TST.2018.9010145