About

Stanford CoreNLP provides a set of natural language analysis tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract particular or open-class relations between entity mentions, get quotes people said, etc.

Choose Stanford CoreNLP if you need:

  • An integrated toolkit with a good range of grammatical analysis tools
  • Fast, reliable analysis of arbitrary texts
  • The overall highest quality text analytics
  • Support for a number of major (human) languages
  • Available interfaces for most major modern programming languages
  • Ability to run as a simple web service

Stanford CoreNLP’s goal is to make it very easy to apply a bunch of linguistic analysis tools to a piece of text. A tool pipeline can be run on a piece of plain text with just two lines of code. CoreNLP is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and which should be disabled. Stanford CoreNLP integrates many of Stanford’s NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, the coreference resolution system, sentiment analysis, bootstrapped pattern learning, and the open information extraction tools. Moreover, an annotator pipeline can include additional custom or third-party annotators. CoreNLP’s analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications.

CoreNLP screenshot

Download

Stanford CoreNLP can be downloaded via the link below. This will download a large (536 MB) zip file containing (1) the CoreNLP code jar, (2) the CoreNLP models jar (required in your classpath for most tasks), (3) the libraries required to run CoreNLP, and (4) documentation / source code for the project. Unzip this file, open the folder that results and you’re ready to use it.

Download CoreNLP 3.7.0

Alternatively, Stanford CoreNLP is available on Maven Central. Source is available on GitHub. For more information on obtaining CoreNLP, see the download page. To download earlier versions of Stanford CoreNLP, including the last stable release (3.6.0), or language packs for earlier versions, go to the history page.

You can find the jars for 3.7.0 containing the models for each language we support in the table below. Due to size issues we have divided the English resources into two jars. The English (KBP) models jar contains extra resources needed to run relation extraction and entity linking.

Language model jar version
Arabic download 3.7.0
Chinese download 3.7.0
English download 3.7.0
English (KBP) download 3.7.0
French download 3.7.0
German download 3.7.0
Spanish download 3.7.0

Human languages supported

The basic distribution provides model files for the analysis of well-edited English, but the engine is compatible with models for other languages. We provide packaged models for Arabic, Chinese, French, German, and Spanish. We also provide a jar that contains all of our English models, which includes various variant models, and in particular has one optimized for working with uncased English (e.g., mostly or all either uppercase or lowercase). There is also some third party support for additional languages. You can find out more about using CoreNLP with various human languages on the other human languages page.

Programming languages and operating systems

Stanford CoreNLP is written in Java; current releases require Java 1.8+.

You can use Stanford CoreNLP from the command-line, via its Java programmatic API, via third party APIs for most major modern programming languages, or via a service. It works on Linux, OS X, and Windows.

License

Stanford CoreNLP is licensed under the GNU General Public License (v3 or later; in general Stanford NLP code is GPL v2+, but CoreNLP uses several Apache-licensed libraries, and so the composite is v3+). Note that the license is the full GPL, which allows many free uses, but not its use in proprietary software which is distributed to others. For distributors of proprietary software, commercial licensing is available from Stanford. You can contact us at java-nlp-support@lists.stanford.edu. If you don’t need a commercial license, but would like to support maintenance of these tools, we welcome gift funding: use this form and write “Stanford NLP Group open source software” in the Special Instructions.

Citing Stanford CoreNLP in papers

If you’re just running the CoreNLP pipeline, please cite this CoreNLP demo paper:

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60. [pdf] [bib]

If you’re dealing in depth with particular annotators, you’re also encouraged to cite the papers that cover individual components: POS tagging, NER, parsing (with parse annotator), dependency parsing (with depparse annotator), coreference resolution, sentiment, or Open IE. You can find more information on the Stanford NLP software pages and/or publications page.