Processors are units of the neural pipeline that create different annotations for a
Document. The neural pipeline now supports the following processors:
||Annotator class name
Sentences, each containing a list of
Tokens. This processor also predicts which tokens are multi-word tokens, but leaves expanding them to the MWT expander.
|Tokenizes the text and performs sentence segmentation.
||Expands multi-word tokens into multiple words when they are predicted by the tokenizer.
||Expands multi-word tokens (MWT) predicted by the tokenizer.
||Perform lemmatization on a
Word using the
Word.upos value. The result can be accessed in
|Generates the word lemmas for all tokens in the corpus.
||UPOS, XPOS, and UFeats annotations accessible through
|Labels tokens with their universal POS (UPOS) tags, treebank-specific POS (XPOS) tags, and universal morphological features (UFeats).
||Determines the syntactic head of each word in a sentence and the dependency relation between the two words that are accessible through
|Provides an accurate syntactic dependency parser.