Table of contents


Lemmatization maps a word to its lemma (dictionary form). For instance, the word was is mapped to the word be.

NameAnnotator class nameRequirementGenerated AnnotationDescription
lemmaMorphaAnnotatorTokensAnnotation, SentencesAnnotation, PartOfSpeechAnnotationLemmaAnnotationDetermine lemmas for each token.

Lemmatization From The Command Line

This command will find lemmas for the input text:

java -Xmx5g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma -file input.txt

Other output formats include conllu, conll, json, and serialized.

Lemmatization From Java

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;

import java.util.*;

public class LemmatizingExample {

  public static String text = "Marie was born in Paris.";

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = new Properties();
    // set the list of annotators to run
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument document = pipeline.processToCoreDocument(text);
    // display tokens
    for (CoreLabel tok : document.tokens()) {
      System.out.println(String.format("%s\t%s", tok.word(), tok.lemma()));


This demo code will print out the lemmas for each token:

Marie	Marie
was	be
born	bear
in	in
Paris	Paris
.	.