Parts Of Speech

Description
Options
Part Of Speech Tagging From The Command Line
Part Of Speech Tagging From Java

Description

Part of speech tagging assigns part of speech labels to tokens, such as whether they are verbs or nouns. Every token in a sentence is applied a tag. For instance, in the sentence Marie was born in Paris. the word Marie is assigned the tag NNP.

Name	Annotator class name	Requirement	Generated Annotation	Description
pos	POSTaggerAnnotator	TokensAnnotation, SentencesAnnotation	PartOfSpeechAnnotation	Applies part of speech tags to tokens.

Options

Option name	Type	Default	Description
pos.model	String	edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger	Model to use for part of speech tagging.
pos.maxlen	int	Integer.MAX_VALUE	Maximum sentence length to tag. Sentences longer than this will not be tagged.

Part Of Speech Tagging From The Command Line

This command will apply part of speech tags to the input text:

java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos -file input.txt

Other output formats include conllu, conll, json, and serialized.

This command will apply part of speech tags using a non-default model (e.g. the more powerful but slower bidirectional model):

java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,pos -pos.model edu/stanford/nlp/models/pos-tagger/english-bidirectional-distsim.tagger -file input.txt

If running on French, German, or Spanish, it is crucial to use the MWT annotator:

java edu.stanford.nlp.pipeline.StanfordCoreNLP -props french -annotators tokenize,mwt,pos -file input.txt

Part Of Speech Tagging From Java

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;

import java.util.*;

public class POSTaggingExample {

  public static String text = "Marie was born in Paris.";

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = new Properties();
    // set the list of annotators to run
    props.setProperty("annotators", "tokenize,pos");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument document = pipeline.processToCoreDocument(text);
    // display tokens
    for (CoreLabel tok : document.tokens()) {
      System.out.println(String.format("%s\t%s", tok.word(), tok.tag()));
    }
  }
}

This demo code will print out the part of speech labels for each token:

Marie	NNP
was	VBD
born	VBN
in	IN
Paris	NNP
.	.

Parts Of Speech

Table of contents

Description

Options

Part Of Speech Tagging From The Command Line

Part Of Speech Tagging From Java