CoQA is a large-scale dataset for building Conversational Question Answering systems. The goal of the CoQA challenge is to measure the ability of machines to understand a text passage and answer a series of interconnected questions that appear in a conversation. CoQA is pronounced as coca .CoQA paper
CoQA contains 127,000+ questions with answers collected from 8000+ conversations. Each conversation is collected by pairing two crowdworkers to chat about a passage in the form of questions and answers. The unique features of CoQA include 1) the questions are conversational; 2) the answers can be free-form text; 3) each answer also comes with an evidence subsequence highlighted in the passage; and 4) the passages are collected from seven diverse domains. CoQA has a lot of challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning.
Browse the examples in CoQA:
Download a copy of the dataset in json format:
To evaluate your models, use the official evaluation script. To run the evaluation, use
python evaluate-v1.0.py --data-file <path_to_dev-v1.0.json> --pred-file <path_to_predictions>.
Once you are satisfied with your model performance on the dev set, you submit it to get the official scores on the test sets. We have two test sets, an in-domain set which constitutes the domains present in the training and the dev sets, and an out-of-domain set which constitutes unseen domains (see the paper for more details). To preserve the integrity of the test results, we do not release the test set to the public. Follow this tutorial on how to submit your model for an official evaluation:Submission Tutorial
CoQA contains passages from seven domains. We make five of these public under the following licenses:
We thank the SQuAD team for allowing us to use their code and templates for generating this website.
Stanford University(Reddy et al. '18)
1Oct 06, 2018
|FlowQA (single model)|
Allen Institute for Artificial Intelligencehttps://arxiv.org/abs/1810.06683
2Sep 27, 2018
|BiDAF++ (single model)|
Allen Institute for Artificial Intelligencehttps://arxiv.org/abs/1809.10735
3Aug 21, 2018
|DrQA + seq2seq with copy attention (single model)|
4Aug 21, 2018
|Vanilla DrQA (single model)|