Coreference Resolution

Table of contents


The CorefAnnotator finds mentions of the same entity in a text, such as when “Theresa May” and “she” refer to the same person. The annotator implements both pronominal and nominal coreference resolution. The entire coreference graph (with head words of mentions as nodes) is saved as a CorefChainAnnotation.


There are three different coreference systems available in CoreNLP.

  • Deterministic: Fast rule-based coreference resolution for English and Chinese.

  • Statistical: Machine-learning-based coreference resolution for English. Unlike the other systems, this one only requires dependency parses, which are faster to produce than constituency parses.

  • Neural: Most accurate but slow neural-network-based coreference resolution for English and Chinese.

(We briefly also had a fourth hybrid or hcoref system, but it is no longer supported and models are no longer provided in current releases.)

The following table gives an overview of the system performances.

  • The F1 scores are on the CoNLL 2012 evaluation data. Numbers are lower than reported in the associated papers because these models are designed for general-purpose use, not getting a high CoNLL score (see Running on CoNLL 2012).

  • The speed measurements show the average time for processing a document in the CoNLL 2012 test set using a 2013 Macbook Pro with a 2.4 GHz Intel Core i7 processor. Preprocessing speed measures the time required for POS tagging, syntax parsing, mention detection, etc., while coref speed refers to the time spent by the coreference system.

SystemLanguagePreprocessing TimeCoref TimeTotal TimeF1 Score

Command Line Usage

There are example properties files for using the coreference systems in edu/stanford/nlp/coref/properties. The properties are named [system]-[language].properties. For example, to run the deterministic system on Chinese:

java -cp stanford-corenlp-4.0.0.jar:stanford-chinese-corenlp-models-4.0.0.jar:* edu.stanford.nlp.pipeline.StanfordCoreNLP -props edu/stanford/nlp/coref/properties/ -file example_file.txt

Alternatively, the properties can be set manually. For example, to run the neural system on English:

java -cp stanford-corenlp-4.0.0.jar:stanford-corenlp-4.0.0-models.jar:* edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos,lemma,ner,parse,coref -coref.algorithm neural -file example_file.txt

See below for further options.


The following example shows how to access coref and mention information from an Annotation:

import java.util.Properties;

import edu.stanford.nlp.coref.CorefCoreAnnotations;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;

public class CorefExample {
  public static void main(String[] args) throws Exception {
    Annotation document = new Annotation("Barack Obama was born in Hawaii.  He is the president. Obama was elected in 2008.");
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,pos,lemma,ner,parse,coref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    System.out.println("coref chains");
    for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
      System.out.println("\t" + cc);
    for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
      for (Mention m : sentence.get(CorefCoreAnnotations.CorefMentionsAnnotation.class)) {
        System.out.println("\t" + m);

More Details

Deterministic System

This is a multi-pass sieve rule-based coreference system. See the Stanford Deterministic Coreference Resolution System page for usage and more details.

Statistical System

This is a mention-ranking model using a large set of features. It operates by iterating through each mention in the document, possibly adding a coreference link between the current one and a preceding mention at each step. Some relevant options:

  • coref.maxMentionDistance: How many mentions back to look when considering possible antecedents of the current mention. Decreasing the value will cause the system to run faster but less accurately. The default value is 50.

  • coref.maxMentionDistanceWithStringMatch: The system will consider linking the current mention to a preceding one further than coref.maxMentionDistance away if they share a noun or proper noun. In this case, it looks coref.maxMentionDistanceWithStringMatch away instead. The default value is 500.

  • coref.statisical.pairwiseScoreThresholds: A number between 0 and 1 determining how greedy the model is about making coreference decisions. A value of 0 causes the system to add no coreference links and a value of 1 causes the system to link every pair of mentions, combining them all into a single coreference cluster. The default value is 0.35. The value can also be a comma-separated list of 4 numbers, in which case there are separate thresholds for when both mentions are pronouns, only the first mention is a pronoun, only the last mention is a pronoun, and neither mention is a pronoun.

Neural System

This is a neural-network-based mention-ranking model. Some relevant options:

  • coref.maxMentionDistance and coref.maxMentionDistanceWithStringMatch: See above.

  • coref.neural.greedyness: A number between 0 and 1 determining how greedy the model is about making coreference decisions (more greedy means more coreference links). The default value is 0.5.

Running on CoNLL 2012

Deterministic System

If you’d like to benchmark our deterministic system of the CoNLL 2011/2012 shared tasks, see the Usage section for the Stanford Deterministic Coreference Resolution System.

Usage Example

To use the English deterministic system, you need to use the dcoref annotator.

java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos,lemma,ner,parse,dcoref -file example.txt

Statistical and Neural Systems

If you would like to run our statistical or neural systems on the CoNLL 2012 eval data:

  1. Get the CoNLL scoring script from here
  2. Get the CoNLL 2012 eval data from here
  3. Run the CorefSystem main method. For example, for the English neural system:
java -cp stanford-corenlp-4.0.0.jar:stanford-corenlp-4.0.0-models.jar:* edu.stanford.nlp.coref.CorefSystem -props edu/stanford/nlp/coref/properties/ <path-to-conll-data> -coref.conllOutputPath <where-to-save-system-output> -coref.scorer <path-to-scoring-script>

The CoNLL 2012 coreference data differs from the normal coreference use case in a few ways:

  • There is provided POS, NER, Parsing, etc. instead of the annotations produced by CoreNLP.

  • There are speaker annotations indicating who is saying which quote.

  • There are document genre annotations.

Because of this, we train models with a few extra features for running on this dataset. We configure these models for accuracy over speed (e.g., by not having a maximum mention distance for the mention-ranking models). These models can be run using the -conll properties files (e.g., Note that the CoNLL-specific models for English are in the English models jar, not the default CoreNLP models jar.

Training New Models

Deterministic System

As a rule-based system, there is nothing to train, but there are various data files for demonyms and to indicate noun gender, animacy, and plurality, which can be edited. See the Stanford Deterministic Coreference Resolution System page.

Statistical System

Training a statistical model on the CoNLL data can be done with the following command:

java -cp stanford-corenlp-4.0.0.jar:stanford-corenlp-4.0.0-models.jar:* edu.stanford.nlp.coref.statistical.StatisticalCorefTrainer -props <properties-file>

See here for an example properties file. Training over the full CoNLL 2012 training set requires a large amount of memory. To reduce the memory footprint and runtime of training, the following options can be added to the properties file:

  • coref.statistical.minClassImbalance: Use this to downsample negative examples from each document. A value less than 0.05 is recommended.

  • coref.statisical.maxTrainExamplesPerDocument: Use this to downsample examples from larger documents. A value larger than 1000 is recommended.

Neural System

The code for training the neural coreference system is implemented in python. It is available on github here.

Citing Stanford Coreference

The deterministic coreference system for English

Marta Recasens, Marie-Catherine de Marneffe, and Christopher Potts. 2013. The Life and Death of Discourse Entities: Identifying Singleton Mentions. In Proceedings of the NAACL. [pdf] [bib]

Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky. 2011. Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task. In Proceedings of the CoNLL-2011 Shared Task. [pdf] [bib]

Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky and Christopher Manning. 2010. A Multi-Pass Sieve for Coreference Resolution. Empirical Methods in Natural Language Processing (EMNLP). [pdf] [bib]

The deterministic coreference system for Chinese and English

Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Chambers, Mihai Surdeanu and Dan Jurafsky. 2013. Deterministic coreference resolution based on entity-centric, precision-ranked rules. In Computational Linguistics 39(4). [pdf]

The statistical coreference system

Kevin Clark and Christopher D. Manning. 2015. Entity-Centric Coreference Resolution with Model Stacking. In Proceedings of the ACL. [pdf] [bib]

The neural coreference system

Kevin Clark and Christopher D. Manning. 2016. Deep Reinforcement Learning for Mention-Ranking Coreference Models. In Proceedings of EMNLP. [pdf] [bib]

Kevin Clark and Christopher D. Manning. 2016. Improving Coreference Resolution by Learning Entity-Level Distributed Representations. In Proceedings of the ACL. [pdf] [bib]