Dialog
Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation.
Dialog state tracking
Dialogue state tacking consists of determining at each turn of a dialog the full representation of what the user wants at that point in the dialog, which contains a goal constraint, a set of requested slots, and the user’s dialog act.
Second dialog state tracking challenge
For goal-oriented dialogue, the dataset of the second dialog state tracking challenge (DSTC2) is a common evaluation dataset. The DSTC2 focuses on the restaurant search domain. Models are evaluated based on accuracy on both individual and joint slot tracking.
Model | Area | Food | Price | Joint | Paper / Source | Code |
---|---|---|---|---|---|---|
Liu et al. (2018) | 90 | 84 | 92 | 72 | Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems | |
Neural belief tracker by Mrkšić et al. (2017) | 90 | 84 | 94 | 72 | Neural Belief Tracker: Data-Driven Dialogue State Tracking | |
RNN by Henderson et al. (2014) | 92 | 86 | 86 | 69 | Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised gate |
Liu et al. (2018)
72
Mrkšić et al. (2017)
72
Henderson et al. (2014)
69