Using CoreNLP within other programming languages and packages
Table of contents
- Go (golang)
- R (CRAN)
- Thrift server
- ZeroMQ/ØMQ servers
Below are interfaces and packages for running Stanford CoreNLP from other languages or within other packages. They have been written by many other people (thanks!). In general you should contact these people directly if you have problems with these packages.
- DataLinguist by Simon Gray wraps most of CoreNLP with an idiomatic Clojure API. As of 2022, this is the most complete and completely up-to-date Clojure API for CoreNLP.
- org.clojurenlp.core extended the earlier
https://github.com/damienstanton/stanford-corenlp. It incorporates work by Cory Giles, Hans Engel, Damien Stanton, Andrew McCloud, Leon Talbot, and Marek Owsikowski. It covers tokenization, POS tagging, NER, and parsing, but is currently (2021) not very actively maintained.
- Clojure wrapper for CoreNLP by Nils Gruenwald. Very partial, currently only wrapping the tagger and TokensRegex, and not being developed.
Okay, Docker isn’t a language, but you know what we mean….
- stanford-corenlp-docker A dockerfile by Arne Neumann. Updated Apr 2021. The NLPBox project provides dockerfiles for many NLP tools. Source on GitHub.
- CoreNLP Complete dockerfile A Dockerfile for Stanford CoreNLP server by Graham MacDonald. Comes with good examples of use. Updated in Dec 2018. GitHub.
And there are about 200 others – it’s not so hard to build a dockerfile! Here’s a list, which includes a number of dockerfiles setup to run CoreNLP with different human languages:
Note on running the CoreNLP server under docker: The container’s port 9000 has to be published to the host. For example, give a command like:
docker run -p 9000:9000 -itd --name CoreNLP graham3333/corenlp-complete. If, when going to
localhost:9000/, you see the error
This site can’t be reached. localhost refused to connect, then this is what you failed to do!
go-corenlp is a Golang wrapper for CoreNLP by Hironobu Saito.
corenlp-golang is another wrapper by Peter Bi written in 2022
- DKPro Core is a collection of NLP components, wrapped as UIMA components. It includes the Stanford CoreNLP components, and there is a tutorial on how to use them in the DKPro Core documentation. DKPro Core is part of the DKPro community. It is well-maintained and our recommended way of using Stanford CoreNLP within UIMA. DKPro Core was principally developed by Richard Eckart de Castilho at the Ubiquitous Processing Lab (UKP) at the Technische Universität Darmstadt.
- cleartk-stanford-corenlp is a UIMA wrapper for Stanford CoreNLP built by Steven Bethard in the context of the ClearTK toolkit.
- A Vert.x module for acccessing Stanford CoreNLP by Jonny Wray.
- Wrapper for each of Stanford’s Chinese tools by Mingli Yuan.
- RESTful API for integrating between Stanford CoreNLP and Apache Stanbol by Rupert Westenthaler and Cristian Petroaca.
- corenlp (github site) by Gerardo Bort is an actively developed node.js CoreNLP library. Multilingual support. You can run this package in your browser, using RunKit.
- corenlp-sentiment (github site) adds support for sentiment analysis to the above corenlp package. By Garrick James McMickell.
- CoreNLP-client (GitHub site) is a simple corenlp client to the corenlp http server using request-promise by Romain Beaumont. Extended by Christophe B. for multilingual use as corenlp-client-multilang (github site). The latter has multilingual support.
- corenlp-request-wrapper (github site) is a wrapper for a Stanford CoreNLP server by nash403.
- stanford-corenlp (github site) is a simple node.js wrapper by hiteshjoshi.
- stanford-corenlp-node (github site) is a webservice interface to CoreNLP in node.js by Mike Hewett. No recent development.
- stanford-simple-nlp (github site) is a node.js CoreNLP wrapper by Taeho Kim (xissy). This doesn’t seem to have been updated lately. You’re better off with something else
- corenlp-js-interface was a (too) simple interface to a CoreNLP server in node.js. It is deprecated, suffers from a command injection vulnerability and the GitHub site is no longer available.
- corenlp-js-prefab was a simple interface to the CoreNLP server with a prefab function so you only have to send text and no extra parameters with each call by Noah Dessauer. It is deprecated and the GitHub site is no longer available.
- Perl wrapper by Kalle Raeisaenen.
- php-stanford-corenlp-adapter by Dennis De Swart. Well-maintained client connection to Stanford CoreNLP server. PHPclasses. Packagist.
- php-stanford-nlp-datastore by Dennis De Swart. Stores data analyzed by Stanford CoreNLP (words, NER, OpenIE triples, coreference) in an SQLite database, which can then be searched. PHPclasses. Packagist.
We are actively developing a Python package called Stanza, with state-of-the-art NLP performance enabled by deep learning. Besides, this package also includes an API for starting and making requests to a Stanford CoreNLP server. It is the recommended way to use Stanford CoreNLP in Python.
- Stanza: Official Stanford NLP Python package, covering 70+ human languages, as well as biomedical English text.
These packages use the Stanford CoreNLP server that we’ve developed over the last couple of years.
- stanfordcorenlp by Lynten Guo. A Python wrapper to Stanford CoreNLP server, version 3.9.1. PyPI page:
pip install stanfordcorenlp
- pycorenlp, A Python wrapper for Stanford CoreNLP by Smitha Milli that uses the new CoreNLP v3.6+ server. Available on PyPI.
- corenlp-pywrap by Sherin Thomas also uses the new CoreNLP v3.6+ server. Python 3.x (only). Also: PyPI page.
- Stanford CoreNLP Python Interface: A reference implementation of a Python interface to the Stanford CoreNLP server. By Arun Chaganty. PyPI page:
pip install stanford-corenlpPyPI page.
- pynlp A (Pythonic) Python wrapper for Stanford CoreNLP by Sina. PyPI page.
- NLTK since version 3.2.3 (from mid-2018) has a new interface to Stanford CoreNLP using the StanfordCoreNLPServer:
nltk.parse.corenlp.CoreNLPParser. Please use it. There is a nice wiki page of instructions. See also: the API for the dependency and constituency parsers (with many examples) and the code for this module. Here’s a friendly introduction on how to get started by Data District Labs. Much of this work was done by Dmitrijs Milajevs. NLTK also includes an older generation of interfaces to Stanford NLP tools, and, unfortunately, they do not want to remove them until version 4 for compatibility reasons and, for some other reason that we don’t understand, they don’t even warn you against using them in the documentation. You should totally avoid using the old Stanford tokenizer/segmenter/NER/parser (unless stuck on a very old version of NLTK) – these classes are very slow, since they perform calls to Java via the command-line for each invocation. That is, you should avoid:
These packages are miscellaneous utilities or other frameworks that use Stanford CoreNLP.
- python-corenlp-protobuf: Stanford CoreNLP Python Bindings by Arun Chaganty. This package contains python bindings for Stanford CoreNLP’s protobuf specifications, as generated by protoc. These bindings can used to parse binary data produced by, e.g., the Stanford CoreNLP server. PyPI page.
- PyStanfordDependencies, a Python interface for converting Penn Treebank trees to Stanford Dependencies by David McClosky (see also: PyPI page). Last we checked, it is at Stanford CoreNLP v3.5.2 and can do Universal and Stanford dependencies (though it’s currently missing Universal POS tags and features).
- corenlp-xml, a library for handling interactions with CoreNLP’s XML output by Robert Elwell. Available on PyPI. Documentation.
- corpkit, a sophisticated corpus linguistics toolkit with GUI by Daniel McDonald. Interfaces with CoreNLP v3.6.0 to parse documents, and uses Tregex/CoreNLP XML to find patterns in corpora. Available on PyPI. A graphical interface is also available.
- corenlp-xml-reader by Edward Newell on GitHub and there it’s a PyPI package. He also has corenlpy, which runs Java in a subprocess; see Github repository.
These are previous generation Python interfaces to Stanford CoreNLP, using a subprocess or their own server. They are now not generally being developed and are obsolete. (But thanks a lot to the people who wrote them in the early days!)
- The “Wordseer fork” of stanford-corenlp-python, a Python wrapper for Stanford CoreNLP (see also: PyPI page). The “Wordseer fork” seems to merge the work of a number of people building on the original Dustin Smith wrapper, namely: Hiroyoshi Komatsu, Johannes Castner, Robert Elwell, Tristan Chong, Aditi Muralidharan. At Stanford CoreNLP v3.5.2, last we checked. See also Robert Elwell’s version (at CoreNLP v3.2.0, last we checked).
- stanford-corepywrapper Python wrapper by Brendan O’Connor or maybe John Beieler’s fork. At CoreNLP v3.5.0, last we checked.
- corenlp-python , an up-to-date fork of Smith (below) by Hiroyoshi Komatsu and Johannes Castner (see also: PyPI page). At CoreNLP v3.4.1, last we checked.
- stanford-corenlp-python , the original Python wrapper including JSON-RPC server by Dustin Smith. At CoreNLP v3.4.1, last we checked.
- corenlp , a Python wrapper for Stanford CoreNLP by Chris Kedzie (see also: PyPI page). At Stanford CoreNLP v3.2.0, last we checked.
- cleanNLP: A Tidy Data Model for Natural Language Processing by Taylor Arnold. GitHub. Paper: pdf.
- coreNLP: Wrappers Around Stanford CoreNLP Tools by Taylor Arnold and Lauren Tilton. Github. Supports CoreNLP version ≥ 3.5.2.
- NLP: Natural Language Processing Infrastructure by Kurt Hornik. Code and models available here.
- Stanford CoreNLP Ruby bindings by Louis Mullie (see also: Ruby Gems page). (Updated in Feb 2017 to CoreNLP 3.5.0.)
- The larger TREAT NLP toolkit by Louis Mullie also makes available Stanford CoreNLP.
- corenlp by Lengio Corp. is another interface to CoreNLP (last updated for CoreNLP 3.4).
- stanford-core-nlp by Will Hayworth is another older interface to CoreNLP (also for CoreNLP 3.4).
CoreNLP wrapper for Apache Spark by Xiangrui Meng of Databricks. Last we checked it was at version 0.41 supporting version 3.9.1 of CoreNLP.
Scala API for CoreNLP by Mihai Surdeanu, one of the original developers of the CoreNLP package.
- Apache Thrift server for Stanford CoreNLP by Diane Napolitano. (Written in Java, but usable from many languages.)
- exist-stanford-nlp by Loren Cahlander
- stanford-0mq by Diane Napolitano. An implementation of a server for Stanford’s CoreNLP suite using Ømq and a basic client/server/JSON requests configuration. Last commit: Oct 2015.
- stanford-corenlp-zeromq by URXtech. Basic JSON wrapper around CoreNLP.
- corenlp-zmq by Thom Neale. A Dockerfile and Ansible provisioning script to build and run a Stanford CoreNLP server process with a single ZMQ broker font-end that proxies incoming requests to one or more back-end Scala workers. Last commit: 2015.
- corenlp-server by Eric Kow. Simple Java server communicating with clients via XML through ZeroMQ. Example Python client included. Last commit: 2014.