Lemmatization

Description
Lemmatization From The Command Line
Lemmatization From Java

Description

Lemmatization maps a word to its lemma (dictionary form). For instance, the word was is mapped to the word be.

Name	Annotator class name	Requirement	Generated Annotation	Description
lemma	MorphaAnnotator	TokensAnnotation, SentencesAnnotation, PartOfSpeechAnnotation	LemmaAnnotation	Determine lemmas for each token.

Lemmatization From The Command Line

This command will find lemmas for the input text:

java -Xmx5g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma -file input.txt

Other output formats include conllu, conll, json, and serialized.

Lemmatization From Java

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;

import java.util.*;


public class LemmatizingExample {

  public static String text = "Marie was born in Paris.";

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = new Properties();
    // set the list of annotators to run
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument document = pipeline.processToCoreDocument(text);
    // display tokens
    for (CoreLabel tok : document.tokens()) {
      System.out.println(String.format("%s\t%s", tok.word(), tok.lemma()));
    }
  }

}

This demo code will print out the lemmas for each token:

Marie	Marie
was	be
born	bear
in	in
Paris	Paris
.	.

Lemmatization

Table of contents

Description

Lemmatization From The Command Line

Lemmatization From Java