CoreNLP is your one stop shop for natural language processing in Java! CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. CoreNLP currently supports 6 languages: Arabic, Chinese, English, French, German, and Spanish.
The centerpiece of CoreNLP is the pipeline. Pipelines take in raw text, run a series of NLP annotators on the text, and produce a final set of annotations.
Pipelines produce CoreDocuments, data objects that contain all of the annotation information, accessible with a simple API, and serializable to a Google Protocol Buffer.
CoreNLP generates a variety of linguistic annotations, including:
Download and unzip CoreNLP 4.2.1
Download model jars for the language you want to work on and move the jars to the distribution directory.
mv /path/to/stanford-corenlp-4.2.1-models-french.jar /path/to/stanford-corenlp-4.2.1
- Include the distribution directory in your CLASSPATH.
- You’re ready to go! There are many ways to run a CoreNLP pipeline. For instance here’s how to run a pipeline on a text file. The output will be available in a file called
java -Xmx5g edu.stanford.nlp.pipeline.StanfordCoreNLP -file input.txt
You can use Stanford CoreNLP from the command-line, via its original Java programmatic API, via the object-oriented simple API, via third party APIs for most major modern programming languages, or via a web service. It works on Linux, macOS, and Windows.
The full Stanford CoreNLP is licensed under the GNU General Public License v3 or later. More precisely, all the Stanford NLP code is GPL v2+, but CoreNLP uses some Apache-licensed libraries, and so our understanding is that the the composite is correctly licensed as v3+. You can run almost all of CoreNLP under GPL v2; you simply need to omit the time-related libraries, and then you lose the functionality of SUTime. Note that the license is the full GPL, which allows many free uses, but not its use in proprietary software which is distributed to others. For distributors of proprietary software, CoreNLP is also available from Stanford under a commercial licensing You can contact us at firstname.lastname@example.org. If you don’t need a commercial license, but would like to support maintenance of these tools, we welcome gift funding: use this form and write “Stanford NLP Group open source software” in the Special Instructions.
If you’re just running the CoreNLP pipeline, please cite this CoreNLP paper:
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60. [pdf] [bib]
If you’re dealing in depth with particular annotators, you’re also encouraged to cite the papers that cover individual components: POS tagging, NER, constituency parsing, dependency parsing, coreference resolution, sentiment, or Open IE. You can find more information on the Stanford NLP software pages and/or publications page.