Full List Of Annotators

Annotator Descriptions
Annotator Dependencies
Sub-Annotators

Annotator Descriptions

Name	Annotator class name	Generated Annotation	Description
tokenize	TokenizerAnnotator	TokensAnnotation (list of tokens); CharacterOffsetBeginAnnotation, CharacterOffsetEndAnnotation, TextAnnotation (for each token)	Tokenizes the text. This splits the text into roughly “words”, using rules or methods suitable for the language being processed. Sometimes the tokens split up surface words in ways suitable for further NLP-processing, for example “isn’t” becomes “is” and “n’t”. The tokenizer saves the beginning and end character offsets of each token in the input text.
cleanxml	CleanXmlAnnotator	XmlContextAnnotation	Remove xml tokens from the document. May use them to mark sentence ends or to extract metadata.
docdate	DocDateAnnotator	DocDateAnnotation	Allows user to specify dates for documents.
ssplit	WordsToSentencesAnnotator	SentencesAnnotation	Splits a sequence of tokens into sentences. Part of tokenize by default.
pos	POSTaggerAnnotator	PartOfSpeechAnnotation	Labels tokens with their POS tag. For more details see this page.
lemma	MorphaAnnotator	LemmaAnnotation	Generates the word lemmas for all tokens in the corpus.
ner	NERCombinerAnnotator	NamedEntityTagAnnotation and NormalizedNamedEntityTagAnnotation	Recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities. Named entities are recognized using a combination of three CRF sequence taggers trained on various corpora, such as ACE and MUC. Numerical entities are recognized using a rule-based system. Numerical entities that require normalization, e.g., dates, are normalized to NormalizedNamedEntityTagAnnotation. For more details on the CRF tagger see this page. Sub-annotators: `docdate`, `regexner`, `tokensregex`, `entitymentions`, and `sutime`
entitymentions	EntityMentionsAnnotator	MentionsAnnotation	Group NER tagged tokens together into mentions. Run as part of: `ner`
regexner	TokensRegexNERAnnotator	NamedEntityTagAnnotation	Implements a simple, rule-based NER over token sequences using Java regular expressions. The goal of this Annotator is to provide a simple framework to incorporate NE labels that are not annotated in traditional NL corpora. For example, the default list of regular expressions that we distribute in the models file recognizes ideologies (IDEOLOGY), nationalities (NATIONALITY), religions (RELIGION), and titles (TITLE). Here is a simple example of how to use RegexNER. For more complex applications, you might consider TokensRegex.
tokensregex	TokensRegexAnnotator	-	Runs a TokensRegex pipeline within a full NLP pipeline.
parse	ParserAnnotator	TreeAnnotation, BasicDependenciesAnnotation, CollapsedDependenciesAnnotation, CollapsedCCProcessedDependenciesAnnotation	Provides full syntactic analysis, using both the constituent and the dependency representations. The constituent-based output is saved in TreeAnnotation. We generate three dependency-based outputs, as follows: basic, uncollapsed dependencies, saved in BasicDependenciesAnnotation; collapsed dependencies saved in CollapsedDependenciesAnnotation; and collapsed dependencies with processed coordinations, in CollapsedCCProcessedDependenciesAnnotation. Most users of our parser will prefer the latter representation. For more details on the parser, please see this page. For more details about the dependencies, please refer to this page.
depparse	DependencyParseAnnotator	BasicDependenciesAnnotation, CollapsedDependenciesAnnotation, CollapsedCCProcessedDependenciesAnnotation	Provides a fast syntactic dependency parser. We generate three dependency-based outputs, as follows: basic, uncollapsed dependencies, saved in BasicDependenciesAnnotation; collapsed dependencies saved in CollapsedDependenciesAnnotation; and collapsed dependencies with processed coordinations, in CollapsedCCProcessedDependenciesAnnotation. Most users of our parser will prefer the latter representation. For details about the dependency software, see this page. For more details about dependency parsing in general, see this page.
coref	CorefAnnotator	CorefChainAnnotation	Performs coreference resolution on a document, building links between entity mentions that refer to the same entity. Has a variety of modes, including rule-based, statistical, and neural. Sub-annotators: `coref.mention`
dcoref	DeterministicCorefAnnotator	CorefChainAnnotation	Implements both pronominal and nominal coreference resolution. The entire coreference graph (with head words of mentions as nodes) is saved in CorefChainAnnotation. For more details on the underlying coreference resolution algorithm, see this page.
relation	RelationExtractorAnnotator	MachineReadingAnnotations.RelationMentionsAnnotation	Stanford relation extractor is a Java implementation to find relations between two entities. The current relation extraction model is trained on the relation types (except the ‘kill’ relation) and data from the paper Roth and Yih, Global inference for entity and relation identification via a linear programming formulation, 2007, except instead of using the gold NER tags, we used the NER tags predicted by Stanford NER classifier to improve generalization. The default model predicts relations `Live_In`, `Located_In`, `OrgBased_In`, `Work_For`, and `None`. For more details of how to use and train your own model, see this page
natlog	NaturalLogicAnnotator	OperatorAnnotation, PolarityAnnotation	Marks quantifier scope and token polarity, according to natural logic semantics. Places an OperatorAnnotation on tokens which are quantifiers (or other natural logic operators), and a PolarityAnnotation on all tokens in the sentence.
openie	OpenIEAnnotator	EntailedSentencesAnnotation, RelationTriplesAnnotation	Extract open-domain relation triples. System description in this paper
entitylink	WikidictAnnotator	WikipediaEntityAnnotation	Link entity mentions to Wikipedia entities
kbp	KBPAnnotator	KBPTriplesAnnotation	Extracts (subject, relation, object) triples from sentences, using a combination of a statistical model, patterns over tokens, and patterns over dependencies. Extracts TAC-KBP relations. Details about models and rules can be found in our write up for the TAC-KBP 2016 competition.
quote	QuoteAnnotator	QuotationAnnotation	Deterministically picks out quotes delimited by “ or ‘ from a text. All top-level quotes are supplied by the top level annotation for a text. If a QuotationAnnotation corresponds to a quote that contains embedded quotes, these quotes will appear as embedded QuotationAnnotations that can be accessed from the QuotationAnnotation that they are embedded in. The QuoteAnnotator can handle multi-line and cross-paragraph quotes, but any embedded quotes must be delimited by a different kind of quotation mark than its parents. Does not depend on any other annotators. Support for unicode quotes is not yet present. Sub-annotators: `quote.attribution`
quote.attribution	QuoteAttributionAnnotator	-	Attribute quotes to speakers in the document. Run as part of: `quote`
sentiment	SentimentAnnotator	entimentCoreAnnotations.AnnotatedTree	Implements Socher et al’s sentiment model. Attaches a binarized tree of the sentence to the sentence level CoreMap. The nodes of the tree then contain the annotations from RNNCoreAnnotations indicating the predicted class and scores for that subtree. See the sentiment page for more information about this project.
truecase	TrueCaseAnnotator	TrueCaseAnnotation and TrueCaseTextAnnotation	Recognizes the true case of tokens in text where this information was lost, e.g., all upper case text. This is implemented with a discriminative model implemented using a CRF sequence tagger. The true case label, e.g., INIT_UPPER is saved in TrueCaseAnnotation. The token text adjusted to match its true case is saved as TrueCaseTextAnnotation.
udfeats	UDFeatureAnnotator	CoNLLUFeats, CoarseTagAnnotation	Labels tokens with their Universal Dependencies universal part of speech (UPOS) and features.

Annotator Dependencies

Property name	Annotator class name	Requirements
tokenize	TokenizerAnnotator	None
cleanxml	CleanXmlAnnotator	`tokenize`
ssplit	WordsToSentenceAnnotator	`tokenize`
docdate	DocDateAnnotator	None
pos	POSTaggerAnnotator	`tokenize`
lemma	MorphaAnnotator	`tokenize, pos`
ner	NERClassifierCombiner	`tokenize, pos, lemma`
regexner	RegexNERAnnotator	`tokenize, pos`
sentiment	SentimentAnnotator	`tokenize, pos, parse`
parse	ParserAnnotator	`tokenize, parse`
depparse	DependencyParseAnnotator	`tokenize, pos`
dcoref	DeterministicCorefAnnotator	`tokenize, pos, lemma, ner, parse`
coref	CorefAnnotator	`tokenize, pos, lemma, ner, parse` (Can also use `depparse`)
relation	RelationExtractorAnnotator	`tokenize, pos, lemma, ner, depparse`
natlog	NaturalLogicAnnotator	`tokenize, pos, lemma, depparse` (Can also use `parse`)
entitylink	WikiDictAnnotator	`tokenize, ner`
kbp	KBPAnnotator	`tokenize, pos, lemma, parse, ner, coref` (Can also use `depparse` ; `coref` optional)
quote	QuoteAnnotator	`tokenize, pos, lemma, ner, depparse, coref`

Sub-Annotators

While every annotator can technically be run as a top-level component, in some cases it makes sense for one annotator to run another as a sub-annotator. For instance the coref annotator runs the coref.mention annotator (which identifies coref mentions) as a sub-annotator by default. So instead of supplying an annotator list of tokenize,parse,coref.mention,coref the list can just be tokenize,parse,coref. Another example is the ner annotator running the entitymentions annotator to detect full entities. Below is a table summarizing the annotator/sub-annotator relationships that currently exist in the pipeline. By default annotators will generally run their sub-annotators.

Annotator	Sub-Annotators
coref	coref.mention
ner	docdate,sutime,regexner,tokensregex,entitymentions
quote	quote.attribution

Full List Of Annotators

Table of contents

Annotator Descriptions

Annotator Dependencies

Sub-Annotators