Link

Parts Of Speech

Table of contents


Description

Part of speech tagging assigns part of speech labels to tokens, such as whether they are verbs or nouns. Every token in a sentence is applied a tag. For instance, in the sentence Marie was born in Paris. the word Marie is assigned the tag NNP.

NameAnnotator class nameRequirementGenerated AnnotationDescription
posPOSTaggerAnnotatorTokensAnnotation, SentencesAnnotationPartOfSpeechAnnotationApplies part of speech tags to tokens.

Options

Option nameTypeDefaultDescription
pos.modelStringedu/stanford/nlp/models/pos-tagger/english-left3words-distsim.taggerModel to use for part of speech tagging.
pos.maxlenintInteger.MAX_VALUEMaximum sentence length to tag. Sentences longer than this will not be tagged.

Part Of Speech Tagging From The Command Line

This command will apply part of speech tags to the input text:

java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos -file input.txt

Other output formats include conllu, conll, json, and serialized.

This command will apply part of speech tags using a non-default model (e.g. the more powerful but slower bidirectional model):

java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos -pos.model edu/stanford/nlp/models/pos-tagger/english-bidirectional-distsim.tagger -file input.txt

If running on French, German, or Spanish, it is crucial to use the MWT annotator:

java edu.stanford.nlp.pipeline.StanfordCoreNLP -props french -annotators tokenize,mwt,pos -file input.txt

Part Of Speech Tagging From Java

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;

import java.util.*;

public class POSTaggingExample {

  public static String text = "Marie was born in Paris.";

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = new Properties();
    // set the list of annotators to run
    props.setProperty("annotators", "tokenize,pos");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument document = pipeline.processToCoreDocument(text);
    // display tokens
    for (CoreLabel tok : document.tokens()) {
      System.out.println(String.format("%s\t%s", tok.word(), tok.tag()));
    }
  }
}

This demo code will print out the part of speech labels for each token:

Marie	NNP
was	VBD
born	VBN
in	IN
Paris	NNP
.	.