Processors are units of the neural pipeline that create different annotations for a Document
. The neural pipeline now supports the following processors:
Name |
Annotator class name |
Generated Annotation |
Description |
tokenize |
TokenizeProcessor |
Segments a Document into Sentence s, each containing a list of Token s. This processor also predicts which tokens are multi-word tokens, but leaves expanding them to the MWT expander. |
Tokenizes the text and performs sentence segmentation. |
mwt |
MWTProcessor |
Expands multi-word tokens into multiple words when they are predicted by the tokenizer. |
Expands multi-word tokens (MWT) predicted by the tokenizer. |
lemma |
LemmaProcessor |
Perform lemmatization on a Word using the Word.text and Word.upos value. The result can be accessed in Word.lemma . |
Generates the word lemmas for all tokens in the corpus. |
pos |
POSProcessor |
UPOS, XPOS, and UFeats annotations accessible through Word ’s properties pos , xpos , and ufeats . |
Labels tokens with their universal POS (UPOS) tags, treebank-specific POS (XPOS) tags, and universal morphological features (UFeats). |
depparse |
DepparseProcessor |
Determines the syntactic head of each word in a sentence and the dependency relation between the two words that are accessible through Word ’s governor and dependency_relation attributes. |
Provides an accurate syntactic dependency parser. |