CoQA

A Conversational Question Answering Challenge

What is CoQA?

CoQA is a large-scale dataset for building Conversational Question Answering systems. The goal of the CoQA challenge is to measure the ability of machines to understand a text passage and answer a series of interconnected questions that appear in a conversation. CoQA is pronounced as coca .

CoQA paper

CoQA contains 127,000+ questions with answers collected from 8000+ conversations. Each conversation is collected by pairing two crowdworkers to chat about a passage in the form of questions and answers. The unique features of CoQA include 1) the questions are conversational; 2) the answers can be free-form text; 3) each answer also comes with an evidence subsequence highlighted in the passage; and 4) the passages are collected from seven diverse domains. CoQA has a lot of challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning.

Download

Browse the examples in CoQA:

Download a copy of the dataset in json format:


Evaluation

To evaluate your models, use the official evaluation script. To run the evaluation, use python evaluate-v1.0.py --data-file <path_to_dev-v1.0.json> --pred-file <path_to_predictions>.

Once you are satisfied with your model performance on the dev set, you submit it to get the official scores on the test sets. We have two test sets, an in-domain set which constitutes the domains present in the training and the dev sets, and an out-of-domain set which constitutes unseen domains (see the paper for more details). To preserve the integrity of the test results, we do not release the test set to the public. Follow this tutorial on how to submit your model for an official evaluation:

Submission Tutorial

License

CoQA contains passages from seven domains. We make five of these public under the following licenses:

  • Literature and Wikipedia passages are shared under CC BY-SA 4.0 license.
  • Children's stories are collected from MCTest which comes with MSR-LA license.
  • Middle/High school exam passages are collected from RACE which comes with its own license.
  • News passages are collected from the DeepMind CNN dataset which comes with Apache license.

Questions?

Ask us questions at our google group or at siva.reddy@mila.quebec or danqic@cs.princeton.edu.

Acknowledgements

We thank the SQuAD team for allowing us to use their code and templates for generating this website.

Leaderboard

RankModelIn-domainOut-of-domainOverall
Human Performance

Stanford University

(Reddy & Chen et al. TACL '19)
89.487.488.8

1

Sep 05, 2019
RoBERTa + AT + KD (ensemble)

Zhuiyi Technology

https://arxiv.org/abs/1909.10772
91.489.290.7

1

Apr 22, 2020
TR-MT (ensemble)

WeChatAI

91.588.890.7

2

Sep 05, 2019
RoBERTa + AT + KD (single model)

Zhuiyi Technology

https://arxiv.org/abs/1909.10772
90.989.290.4

3

Jan 01, 2020
TR-MT (ensemble)

WeChatAI

91.187.990.2

4

Mar 29, 2019
Google SQuAD 2.0 + MMFT (ensemble)

MSRA + SDRG

89.988.089.4

5

Dec 18, 2019
TR-MT (single model)

WeChatAI

90.486.889.3

6

Sep 13, 2019
XLNet + Augmentation (single model)

Xiaoming

https://github.com/stevezheng23/xlnet_extension_tf
89.986.989.0

7

Mar 29, 2019
Google SQuAD 2.0 + MMFT (single model)

MSRA + SDRG

88.586.087.8

7

Mar 29, 2019
ConvBERT (ensemble)

Joint Laboratory of HIT and iFLYTEK Research

88.785.487.8

8

Jan 25, 2019
BERT + MMFT + ADA (ensemble)

Microsoft Research Asia

87.585.386.8

8

Mar 28, 2019
ConvBERT (single model)

Joint Laboratory of HIT and iFLYTEK Research

87.784.686.8

9

Jan 21, 2019
BERT + MMFT + ADA (single model)

Microsoft Research Asia

86.481.985.0

10

Apr 28, 2020
XLNet + MMFT + ADA (single model)

NEUKG

85.781.784.6

11

Aug 26, 2019
BERT + AttentionFusionNet (single model)

Beijing Kingsoft AI Lab

85.477.383.0

12

Jan 03, 2019
BERT + Answer Verification (single model)

Sogou Search AI Group

https://github.com/sogou/SMRCToolkit
83.880.282.8

13

Jan 06, 2019
BERT with History Augmented Query (single model)

Fudan University NLP Lab

82.778.681.5

14

Feb 01, 2019
BERT Large Finetuned Baseline (single model)

Anonymous

82.678.481.4

15

Jan 21, 2019
BERT Large Augmented (single model)

Microsoft Dynamics 365 AI Research

82.577.681.1

16

Dec 13, 2018
D-AoA + BERT (single model)

Joint Laboratory of HIT and iFLYTEK Research

81.477.380.2

17

Aug 01, 2019
BERT Augmented + AoA (single model)

Netease Games AI Lab

81.177.480.0

18

Mar 10, 2019
CNet (single model)

Anonymous

80.977.179.8

19

Nov 29, 2018
SDNet (ensemble)

Microsoft Speech and Dialogue Research Group

https://github.com/Microsoft/SDNet
80.775.979.3

20

Feb 22, 2019
CQANet (single model)

Nanjing University

80.276.579.1

21

May 09, 2019
CANet (single model)

Northwestern Polytechnical University

80.175.778.9

22

Apr 14, 2019
BERT w/ 2-context (single model)

NTT Media Intelligence Laboratories

https://arxiv.org/pdf/1905.12848
79.875.978.7

22

Jul 14, 2019
BERT Finetuned Baseline

single model

79.776.378.7

23

May 06, 2020
Bert-MultiChannelFlow (single model)

SIAT-NLP

79.475.378.2

24

Dec 30, 2018
BERT-base finetune (single model)

Tsinghua University CoAI Lab

79.874.178.1

25

Apr 19, 2019
Bert-FlowDelta (single model)

National Taiwan University, MiuLab

https://arxiv.org/abs/1908.05117
79.274.177.7

26

Feb 28, 2019
GraphFlow (single model)

RPI and IBM Research

https://arxiv.org/pdf/1908.00059.pdf
78.474.577.3

27

Nov 26, 2018
SDNet (single model)

Microsoft Speech and Dialogue Research Group

https://github.com/Microsoft/SDNet
78.073.176.6

28

Aug 29, 2019
Flow Framework (single model)

SIAT NLP Group

77.073.175.8

29

Oct 06, 2018
FlowQA (single model)

Allen Institute for Artificial Intelligence

https://arxiv.org/abs/1810.06683
76.371.875.0

30

Jul 17, 2019
HisFurC + BERT

single model

76.070.474.4

31

Jan 14, 2019
RNet + PGNet + BERT (single model)

Nanjing University

74.770.073.3

32

Feb 01, 2019
XyzNet (single model)

Beijing Normal University

74.368.872.7

33

Dec 30, 2018
DrQA + marker features (single model)

Stanford University

71.665.169.7

34

Dec 10, 2018
BiDAF++ (single model)

Beijing University of Posts and Telecommunications

71.165.569.5

35

Sep 27, 2018
BiDAF++ (single model)

Allen Institute for Artificial Intelligence

https://arxiv.org/abs/1809.10735
69.463.867.8

36

Nov 22, 2018
Bert Base Augmented (single model)

Fudan University NLP Lab

68.461.866.5

37

Dec 18, 2018
RNet_DotAtt + seq2seq with copy attention (single model)

University of Science and Technology of China

68.162.366.4

38

Dec 30, 2018
Simplified BiDAF++ (single model)

Peking University

68.760.566.3

39

Aug 21, 2018
DrQA + seq2seq with copy attention (single model)

Stanford University

https://arxiv.org/abs/1808.07042
67.060.465.1

40

Aug 21, 2018
Vanilla DrQA (single model)

Stanford University

https://arxiv.org/abs/1808.07042
54.547.952.6