StanfordNLP

Processors are units of the neural pipeline that create different annotations for a Document. The neural pipeline now supports the following processors:

Name	Annotator class name	Generated Annotation	Description
tokenize	TokenizeProcessor	Segments a `Document` into `Sentence`s, each containing a list of `Token`s. This processor also predicts which tokens are multi-word tokens, but leaves expanding them to the MWT expander.	Tokenizes the text and performs sentence segmentation.
mwt	MWTProcessor	Expands multi-word tokens into multiple words when they are predicted by the tokenizer.	Expands multi-word tokens (MWT) predicted by the tokenizer.
lemma	LemmaProcessor	Perform lemmatization on a `Word` using the `Word.text` and `Word.upos` value. The result can be accessed in `Word.lemma`.	Generates the word lemmas for all tokens in the corpus.
pos	POSProcessor	UPOS, XPOS, and UFeats annotations accessible through `Word`’s properties `pos`, `xpos`, and `ufeats`.	Labels tokens with their universal POS (UPOS) tags, treebank-specific POS (XPOS) tags, and universal morphological features (UFeats).
depparse	DepparseProcessor	Determines the syntactic head of each word in a sentence and the dependency relation between the two words that are accessible through `Word`’s `governor` and `dependency_relation` attributes.	Provides an accurate syntactic dependency parser.

Processors Summary