Stanford CoreNLP provides a set of human language technology
tools. It can give the base
forms of words, their parts of speech, whether they are names of
companies, people, etc., normalize dates, times, and numeric quantities,
mark up the structure of sentences in terms of
phrases and syntactic dependencies, indicate which noun phrases refer to
the same entities, indicate sentiment,
extract particular or open-class relations between entity mentions,
get the quotes people said, etc.
Choose Stanford CoreNLP if you need:
- An integrated NLP toolkit with a broad range of grammatical analysis tools
- A fast, robust annotator for arbitrary texts, widely used in production
- A modern, regularly updated package, with the overall highest quality text analytics
- Support for a number of major (human) languages
- Available APIs for most major modern programming languages
- Ability to run as a simple web service
Stanford CoreNLP’s goal is to
make it very easy to apply a bunch of linguistic analysis tools to a piece
of text. A tool pipeline can be run on a piece of plain text with
just two lines of code. CoreNLP is designed to be highly
flexible and extensible. With a single option you can change which
tools should be enabled and disabled.
Stanford CoreNLP integrates many of Stanford’s NLP tools,
including the part-of-speech (POS) tagger,
the named entity recognizer (NER),
the coreference resolution system,
bootstrapped pattern learning,
open information extraction
tools. Moreover, an annotator pipeline can include additional custom or third-party annotators.
CoreNLP’s analyses provide the foundational building blocks for
higher-level and domain-specific text understanding applications.
Stanford CoreNLP can be downloaded via the link below. This will download a large (~500 MB) zip file containing (1) the CoreNLP code jar, (2) the CoreNLP models jar (required in your classpath for most tasks), (3) the libraries required to run CoreNLP, and (4) documentation / source code for the project. Unzip this file, open the folder that results and you’re ready to use it.
Alternatively, Stanford CoreNLP is available on Maven Central.
Source is available on GitHub.
For more information on obtaining CoreNLP, see the download page.
To download earlier versions of Stanford CoreNLP or language packs for earlier versions, go to the history page.
Note: Stanford CoreNLP 3.9.1 requires a minimum of Java 8, but also
works with Java 9 and 10. However, the SUTime component uses the
jollyday library which depends on JAXB, a Java EE component which
started to be removed in Java 9. If using Java 9 or 10, you either
need to add this Java flag to your command for CoreNLP to run:
or you need to add JAXB dependencies (see this StackOverflow answer). This second solution will become mandatory with Java 11. We’ll try to update the download package in advance of that….
The table below has jars for the current release with all the models for each language we support.
Due to size issues we have divided the English resources into two
jars. The English (KBP) models jar contains extra resources needed to
run relation extraction and entity linking.
Human languages supported
The basic distribution provides model files for the analysis of well-edited English,
but the engine is compatible with models for other languages. In the
table above, we provide packaged models for
Arabic, Chinese, French, German, and Spanish.
We also provide two jars that contain all of our
English models, which include various variant models, and in particular models
optimized for working with uncased English (e.g., mostly or all
either uppercase or lowercase).
There is also some third party support for additional languages (and
we would welcome more!). You can find out more about using CoreNLP with
various human languages on the
other human languages page.
Programming languages and operating systems
Stanford CoreNLP is written in Java; recent releases require
Java 1.8+. You need to have Java installed to run
CoreNLP. However, you can interact with CoreNLP via the command-line
or its web service;
Python, or some other language.
You can use Stanford CoreNLP from the command-line,
via its original Java
programmatic API, via the object-oriented simple API,
via third party APIs for most major modern
programming languages, or via a web service.
It works on Linux, macOS, and Windows.
Stanford CoreNLP is licensed under the GNU General Public License
(v3 or later; in general Stanford NLP
code is GPL v2+, but CoreNLP uses several Apache-licensed libraries, and so the composite is v3+).
Note that the license is the full GPL,
which allows many free uses, but not its use in
which is distributed to others.
For distributors of
CoreNLP is also available from Stanford under a
You can contact us at
If you don’t need a commercial license, but would like to support
maintenance of these tools, we welcome gift funding:
use this form
and write “Stanford NLP Group open source software” in the Special Instructions.
Citing Stanford CoreNLP in papers
If you’re just running the CoreNLP pipeline, please cite this CoreNLP paper:
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60. [pdf] [bib]
If you’re dealing in depth with particular annotators,
you’re also encouraged to cite the papers that cover individual
sentiment, or Open IE.
You can find more information on the Stanford NLP
software pages and/or