Parts Of Speech
Table of contents
Description
Part of speech tagging assigns part of speech labels to tokens, such as whether they are verbs or nouns. Every token in a sentence is applied a tag. For instance, in the sentence Marie was born in Paris.
the word Marie
is assigned the tag NNP
.
Name | Annotator class name | Requirement | Generated Annotation | Description |
---|---|---|---|---|
pos | POSTaggerAnnotator | TokensAnnotation, SentencesAnnotation | PartOfSpeechAnnotation | Applies part of speech tags to tokens. |
Options
Option name | Type | Default | Description |
---|---|---|---|
pos.model | String | edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger | Model to use for part of speech tagging. |
pos.maxlen | int | Integer.MAX_VALUE | Maximum sentence length to tag. Sentences longer than this will not be tagged. |
Part Of Speech Tagging From The Command Line
This command will apply part of speech tags to the input text:
java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos -file input.txt
Other output formats include conllu
, conll
, json
, and serialized
.
This command will apply part of speech tags using a non-default model (e.g. the more powerful but slower bidirectional model):
java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos -pos.model edu/stanford/nlp/models/pos-tagger/english-bidirectional-distsim.tagger -file input.txt
If running on French, German, or Spanish, it is crucial to use the MWT annotator:
java edu.stanford.nlp.pipeline.StanfordCoreNLP -props french -annotators tokenize,mwt,pos -file input.txt
Part Of Speech Tagging From Java
package edu.stanford.nlp.examples;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import java.util.*;
public class POSTaggingExample {
public static String text = "Marie was born in Paris.";
public static void main(String[] args) {
// set up pipeline properties
Properties props = new Properties();
// set the list of annotators to run
props.setProperty("annotators", "tokenize,pos");
// build pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// create a document object
CoreDocument document = pipeline.processToCoreDocument(text);
// display tokens
for (CoreLabel tok : document.tokens()) {
System.out.println(String.format("%s\t%s", tok.word(), tok.tag()));
}
}
}
This demo code will print out the part of speech labels for each token:
Marie NNP
was VBD
born VBN
in IN
Paris NNP
. .