Link

Part-of-Speech & Morphological Features

Table of contents


Description

The Part-of-Speech (POS) & morphological features tagging module labels words with their universal POS (UPOS) tags, treebank-specific POS (XPOS) tags, and universal morphological features (UFeats). This is jointly performed by the POSProcessor in Stanza, and can be invoked with the name pos.

NameAnnotator class nameRequirementGenerated AnnotationDescription
posPOSProcessortokenize, mwtUPOS, XPOS, and UFeats annotations are accessible through Word’s properties pos, xpos, and ufeats.Labels tokens with their universal POS (UPOS) tags, treebank-specific POS (XPOS) tags, and universal morphological features (UFeats).

Options

Option nameTypeDefaultDescription
pos_batch_sizeint5000When annotating, this argument specifies the maximum number of words to process as a minibatch for efficient processing.
Caveat: the larger this number is, the more working memory is required (main RAM or GPU RAM, depending on the computating device). This parameter should be set larger than the number of words in the longest sentence in your input document, or you might run into unexpected behaviors.

Example Usage

Running the POSProcessor requires the TokenizeProcessor and MWTProcessor. After the pipeline is run, the Document will contain a list of Sentences, and the Sentences will contain lists of Words. The part-of-speech tags can be accessed via the upos(pos) and xpos fields of each Word, while the universal morphological features can be accessed via the feats field.

Accessing POS and Morphological Feature for Word

Here is an example of tagging a piece of text and accessing part-of-speech and morphological features for each word:

import stanza

nlp = stanza.Pipeline(lang='en', processors='tokenize,mwt,pos')
doc = nlp('Barack Obama was born in Hawaii.')
print(*[f'word: {word.text}\tupos: {word.upos}\txpos: {word.xpos}\tfeats: {word.feats if word.feats else "_"}' for sent in doc.sentences for word in sent.words], sep='\n')

As can be seen in the result, we can tell that the word was is a third-person auxiliary verb in the past tense from Stanza’s analysis.

word: Barack    upos: PROPN     xpos: NNP       feats: Number=Sing
word: Obama     upos: PROPN     xpos: NNP       feats: Number=Sing
word: was       upos: AUX       xpos: VBD       feats: Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin
word: born      upos: VERB      xpos: VBN       feats: Tense=Past|VerbForm=Part|Voice=Pass
word: in        upos: ADP       xpos: IN        feats: _
word: Hawaii    upos: PROPN     xpos: NNP       feats: Number=Sing
word: .         upos: PUNCT     xpos: .         feats: _

Training-Only Options

Most training-only options are documented in the argument parser of the POS/UFeats tagger.