We have started the test phase. Registration is now closed.
DSTC7 has the following three tracks (please click the links to get more detailed information about each track):
Sentence Selection |
Sentence Generation |
Audio Visual Scene-aware dialog (AVSD) |
This challenge consists of two sub-tasks, in both of them the goal of the system is that 'given a partial conversation, select the correct next utterances from a set of candidates' and even indicate that none of the proposed utterances is a good candidate. The objective is to push the utterance classification towards real world problems.
|
This track proposes an end-to-end conversational modeling task, where the goal is to generate conversational responses that go beyond chitchat, by injecting informational responses that are grounded in external knowledge (e.g.,Foursquare, or possibly also Wikipedia, Goodreads, or TripAdvisor). There is no specific or predefined goal (e.g., booking a flight, or reserving a table at a restaurant), so this task does not constitute what is commonly called either goal-oriented, task-oriented, or task-completion dialog, but target human-human dialogs where the underlying goal is often ill-defined or not known in advance, even at work and other productive environments (e.g.,brainstorming meetings).
|
This task is aiming to generate system responses in a dialog about an input video. Dialog systems need to understand scenes in order to have conversations with users about the objects and events around them. Audio Visual Scene-aware dialog systems could be developed by integrating state-of-the-art technologies from multiple research areas, including: end-to-end dialog technologies, which generate system responses using models trained from dialog data; visual question answering (VQA) technologies, which answer to questions about images using learned image features; and video description technologies, in which videos are described/narrated using multimodal information. |