Description

Recognize named entities for all token spans in the corpus.

Name Annotator class name Requirement Generated Annotation Description
ner NERProcessor tokenize, mwt Named entities accessible through Document or Sentence’s properties entities or ents. Token-level NER tags accessible through Token’s properties ner. Recognize named entities for all token spans in the corpus.

Options

Option name Type Default Description
ner_batch_size int 32 When annotating, this argument specifies the maximum number of sentences to process as a minibatch for efficient processing.
Caveat: the larger this number is, the more working memory is required (main RAM or GPU RAM, depending on the computating device).

Example Usage

Running the NERProcessor simply requires the TokenizeProcessor. After the pipeline is run, the Document will contain a list of Sentences, and the Sentences will contain lists of Tokens. Named entities can be accessed through Document or Sentence’s properties entities or ents. Alternatively, token-level NER tags can be accessed via the ner fields of Token.

Accessing Named Entities for Sentence and Document

Here is an example of performing named entity recognition for a piece of text and accessing the named entities in the entire document:

import stanza

nlp = stanza.Pipeline(lang='en', processors='tokenize,ner')
doc = nlp("Chris Manning teaches at Stanford University. He lives in the Bay Area.")
print(*[f'entity: {ent.text}\ttype: {ent.type}' for ent in doc.ents], sep='\n')

Instead of accessing entities in the entire document, you can also access the named entities in each sentence of the document. The following example provides an identical result from the one above, by accessing entities from sentences instead of the entire document:

import stanza

nlp = stanza.Pipeline(lang='en', processors='tokenize,ner')
doc = nlp("Chris Manning teaches at Stanford University. He lives in the Bay Area.")
print(*[f'entity: {ent.text}\ttype: {ent.type}' for sent in doc.sentences for ent in sent.ents], sep='\n')

As can be seen in the output, Stanza correctly identifies that Chris Manning is a person, Stanford University an organization, and the Bay Area is a location.

entity: Chris Manning	type: PERSON
entity: Stanford University	type: ORG
entity: the Bay Area	type: LOC

Accessing Named Entity Recogition (NER) Tags for Token

It might sometimes be useful to access the BIOES NER tags for each token, and here is an example how:

import stanza

nlp = stanza.Pipeline(lang='en', processors='tokenize,ner')
doc = nlp("Chris Manning teaches at Stanford University. He lives in the Bay Area.")
print(*[f'token: {token.text}\tner: {token.ner}' for sent in doc.sentences for token in sent.tokens], sep='\n')

The result is the BIOES representation of the entities we saw above

token: Chris	ner: B-PERSON
token: Manning	ner: E-PERSON
token: teaches	ner: O
token: at	ner: O
token: Stanford	ner: B-ORG
token: University	ner: E-ORG
token: .	ner: O
token: He	ner: O
token: lives	ner: O
token: in	ner: O
token: the	ner: B-LOC
token: Bay	ner: I-LOC
token: Area	ner: E-LOC
token: .	ner: O

Training-Only Options

Most training-only options are documented in the argument parser of the NER tagger.