Processors are units of the neural pipeline that create different annotations for a Document. The neural pipeline now supports the following processors:

Name Annotator class name Generated Annotation Description
tokenize TokenizeProcessor Segments a Document into Sentences, each containing a list of Tokens. This processor also predicts which tokens are multi-word tokens, but leaves expanding them to the MWT expander. Tokenizes the text and performs sentence segmentation.
mwt MWTProcessor Expands multi-word tokens into multiple words when they are predicted by the tokenizer. Expands multi-word tokens (MWT) predicted by the tokenizer.
lemma LemmaProcessor Perform lemmatization on a Word using the Word.text and Word.upos value. The result can be accessed in Word.lemma. Generates the word lemmas for all tokens in the corpus.
pos POSProcessor UPOS, XPOS, and UFeats annotations accessible through Word’s properties pos, xpos, and ufeats. Labels tokens with their universal POS (UPOS) tags, treebank-specific POS (XPOS) tags, and universal morphological features (UFeats).
depparse DepparseProcessor Determines the syntactic head of each word in a sentence and the dependency relation between the two words that are accessible through Word’s governor and dependency_relation attributes. Provides an accurate syntactic dependency parser.