Processors are units of the neural pipeline that create different annotations for a Document. The neural pipeline now supports the following processors:
| Name |
Annotator class name |
Generated Annotation |
Description |
| tokenize |
TokenizeProcessor |
Segments a Document into Sentences, each containing a list of Tokens. This processor also predicts which tokens are multi-word tokens, but leaves expanding them to the MWT expander. |
Tokenizes the text and performs sentence segmentation. |
| mwt |
MWTProcessor |
Expands multi-word tokens into multiple words when they are predicted by the tokenizer. |
Expands multi-word tokens (MWT) predicted by the tokenizer. |
| lemma |
LemmaProcessor |
Perform lemmatization on a Word using the Word.text and Word.upos value. The result can be accessed in Word.lemma. |
Generates the word lemmas for all tokens in the corpus. |
| pos |
POSProcessor |
UPOS, XPOS, and UFeats annotations accessible through Word’s properties pos, xpos, and ufeats. |
Labels tokens with their universal POS (UPOS) tags, treebank-specific POS (XPOS) tags, and universal morphological features (UFeats). |
| depparse |
DepparseProcessor |
Determines the syntactic head of each word in a sentence and the dependency relation between the two words that are accessible through Word’s governor and dependency_relation attributes. |
Provides an accurate syntactic dependency parser. |