Dialog System Technology Challenges

Now we are starting the development phase. Please register on the Registration Form to start your challenge.

DSTC7 has the following three tracks (please click the links to get more detailed information about each track):

Sentence Selection

Sentence Generation

Audio Visual Scene-aware dialog (AVSD)

Task Description
Data distribution Website

Task Description
Data distribution Website

Task Description
Data distribution Website *

This challenge consists of two sub-tasks, in both of them the goal of the system is that 'given a partial conversation, select the correct next utterances from a set of candidates' and even indicate that none of the proposed utterances is a good candidate. The objective is to push the utterance classification towards real world problems.

This track proposes an end-to-end conversational modeling task, where the goal is to generate conversational responses that go beyond chitchat, by injecting informational responses that are grounded in external knowledge (e.g.,Foursquare, or possibly also Wikipedia, Goodreads, or TripAdvisor). There is no specific or predefined goal (e.g., booking a flight, or reserving a table at a restaurant), so this task does not constitute what is commonly called either goal-oriented, task-oriented, or task-completion dialog, but target human-human dialogs where the underlying goal is often ill-defined or not known in advance, even at work and other productive environments (e.g.,brainstorming meetings).

This task is aiming to generate system responses in a dialog about an input video. Dialog systems need to understand scenes in order to have conversations with users about the objects and events around them. Audio Visual Scene-aware dialog systems could be developed by integrating state-of-the-art technologies from multiple research areas, including: end-to-end dialog technologies, which generate system responses using models trained from dialog data; visual question answering (VQA) technologies, which answer to questions about images using learned image features; and video description technologies, in which videos are described/narrated using multimodal information.

* The data on the Google drive will be shared with the e-mail address registered at DSTC7.