This paper focuses on end-to-end task-oriented dialogue systems, which jointly handle dialogue state tracking (DST) and response generation. Traditional methods usually adopt a supervised paradigm to learn DST from a manually labeled corpus. However, the annotation of the corpus is costly, time-consuming, and cannot cover a wide range of domains in the real world. To solve this problem, we propose a multi-span prediction network (MSPN) that performs unsupervised DST for end-to-end task-oriented dialogue. Specifically, MSPN contains a novel split-merge copy mechanism that captures long-term dependencies in dialogues to automatically extract multiple text spans as keywords. Based on these keywords, MSPN uses a semantic distance based clustering approach to obtain the values of each slot. In addition, we propose an ontology-based reinforcement learning approach, which employs the values of each slot to train MSPN to generate relevant values. Experimental results on single-domain and multi-domain task-oriented dialogue datasets show that MSPN achieves state-of-the-art performance with significant improvements. Besides, we construct a new Chinese dialogue dataset MeDial in the low-resource medical domain, which further demonstrates the adaptability of MSPN.
- Article type
- Year
- Co-author
Although neural approaches have yielded state-of-the-art results in the sentence matching task, their performance inevitably drops dramatically when applied to unseen domains. To tackle this cross-domain challenge, we address unsupervised domain adaptation on sentence matching, in which the goal is to have good performance on a target domain with only unlabeled target domain data as well as labeled source domain data. Specifically, we propose to perform self-supervised tasks to achieve it. Different from previous unsupervised domain adaptation methods, self-supervision can not only flexibly suit the characteristics of sentence matching with a special design, but also be much easier to optimize. When training, each self-supervised task is performed on both domains simultaneously in an easy-to-hard curriculum, which gradually brings the two domains closer together along the direction relevant to the task. As a result, the classifier trained on the source domain is able to generalize to the unlabeled target domain. In total, we present three types of self-supervised tasks and the results demonstrate their superiority. In addition, we further study the performance of different usages of self-supervised tasks, which would inspire how to effectively utilize self-supervision for cross-domain scenarios.
Dialogue state tracking (DST) leverages dialogue information to predict dialogues states which are generally represented as slot-value pairs. However, previous work usually has limitations to efficiently predict values due to the lack of a powerful strategy for generating values from both the dialogue history and the predefined values. By predicting values from the predefined value set, previous discriminative DST methods are difficult to handle unknown values. Previous generative DST methods determine values based on mentions in the dialogue history, which makes it difficult for them to handle uncovered and non-pointable mentions. Besides, existing generative DST methods usually ignore the unlabeled instances and suffer from the label noise problem, which limits the generation of mentions and eventually hurts performance. In this paper, we propose a unified shared-private network (USPN) to generate values from both the dialogue history and the predefined values through a unified strategy. Specifically, USPN uses an encoder to construct a complete generative space for each slot and to discern shared information between slots through a shared-private architecture. Then, our model predicts values from the generative space through a shared-private decoder. We further utilize reinforcement learning to alleviate the label noise problem by learning indirect supervision from semantic relations between conversational words and predefined slot-value pairs. Experimental results on three public datasets show the effectiveness of USPN by outperforming state-of-the-art baselines in both supervised and unsupervised DST tasks.