Advanced Usage & Client Customization

Overview
CoreNLP Server Start Options (Pipeline)
CoreNLP Server Start Options (Server)
CoreNLP Server Start Options (Advanced)
- Changing server ID when using multiple CoreNLP servers on a machine
- Protecting a CoreNLP server with password
Switching Languages
Using a CoreNLP server on a remote machine
Dynamically Changing Properties for Each Annotation Request

In this section, we introduce how to customize the client options such that you can annotate a different language, use a different CoreNLP model, or have finer control over how you want the CoreNLP client or server to start.

Overview

By default, the CoreNLP server will run the following English annotators:

tokenize,ssplit,pos,lemma,ner,depparse,coref,kbp

There are a variety of ways to customize a CoreNLP pipeline, including:

using a different list of annotators (e.g. tokenize,ssplit,pos)
processing a different language (e.g. French)
using custom models (e.g. my-custom-depparse.gz)
returning different output formats (e.g. JSON)

These customizations are achieved by specifying properties.

The first step is always importing CoreNLPClient

from stanza.server import CoreNLPClient

When starting a CoreNLP server via Stanza, a user can choose what properties to initialize the server with. For instance, here is an example of launching a server with a different parser model that returns JSON:

CUSTOM_PROPS = {"parse.model": "edu/stanford/nlp/models/srparser/englishSR.beam.ser.gz"}

with CoreNLPClient(properties=CUSTOM_PROPS, output_format="json") as client:

Or one could launch a server with CoreNLP French defaults as in this example:

with CoreNLPClient(properties="french") as client:

When communicating with a CoreNLP server via Stanza, a user can send specific properties for one time use with that request. These request level properties allow for a dynamic NLP application which can apply different pipelines depending on input text.

For instance, one could switch between German and French pipelines:

french_text = "Emmanuel Macron est le président de la France."
german_text = "Angela Merkel ist die deutsche Bundeskanzlerin."

with CoreNLPClient() as client:
    french_ann = client.annotate(french_text, properties="fr")
    german_ann = client.annotate(german_text, properties="de")

If a user has created custom biomedical and finanical models, they could switch between them based on what kind of document they are processing:

BIOMEDICAL_PROPS = {
    "depparse.model": "/path/to/biomedical-parser.gz",
    "ner.model": "/path/to/biomedical-ner.ser.gz"
}
FINANCE_PROPS = {
    "depparse.model": "/path/to/finance-parser.gz",
    "ner.model": "/path/to/finance-ner.ser.gz"
}

with CoreNLPClient() as client:
    bio_ann = client.annotate(bio_text, properties=BIOMEDICAL_PROPS)
    finance_ann = client.annotate(finance_text, properties=FINANCE_PROPS)

CoreNLP Server Start Options (Pipeline)

There are three ways to specify pipeline properties when starting a CoreNLP server:

Properties Type	Example	Description
Stanford CoreNLP supported language	french	One of {arabic, chinese, english, french, german, spanish} (or the ISO 639-1 code), this will use Stanford CoreNLP defaults for that language
Python dictionary	{‘annotators’: ‘tokenize,ssplit,pos’, ‘pos.model’: ‘/path/to/custom-model.ser.gz’}	A Python dictionary specifying the properties, the properties will be written to a tmp file
File path	/path/to/server.props	Path on the file system or CLASSPATH to a properties file

For convenience one can also specify the list of annotators and the desired output_format in the CoreNLPClient constructor. The values for those two arguments will override any additional properties supplied at construction time.

Below are examples that illustrate how to use the three different types of properties:

Using a language name:
```
with CoreNLPClient(properties='french') as client:
```
As introduced above, this option allows quick switch between languages, and a default list of models will be used for each language.

Using a Python dictionary

with CoreNLPClient(properties={
      'annotators': 'tokenize,ssplit,pos',
      'pos.model': '/path/to/custom-model.ser.gz'
  }) as client:

This option allows you to override the default models used by the server, by providing (model name, model path) pairs.

Using a properties file:
```
with CoreNLPClient(properties='/path/to/server.props') as client:
```
This option allows the finest level of control over what annotators and models are going to be used in the server. For details on how to write a property file, please see the instructions on configuring CoreNLP property files.

For convenience one can also specify the list of annotators and the desired output_format in the CoreNLPClient constructor.

Option name	Type	Default	Description
annotators	str	“tokenize,ssplit,lemma,pos,ner,depparse”	The default list of CoreNLP annotators the server will use
output_format	str	“serialized”	The default output format to use for the server response, unless otherwise specified. If set to be “serialized”, the response will be converted to local Python objects (see usage examples here).

The values for those two arguments will override any additional properties supplied at construction time.

with CoreNLPClient(properties='french', annotators='tokenize,ssplit,mwt,pos,ner,parse', output_format='json') as client:

CoreNLP Server Start Options (Server)

In addition to customizing the pipeline the server will run, a variety of server specific properties can be specified at server construction time.

Here we provide a list of commonly-used arguments that you can initialize your CoreNLPClient with, along with their default values and descriptions:

Option name	Type	Default	Description
endpoint	str	http://localhost:9000	The host and port that the CoreNLP server will run on. If port 9000 is already in use by something else on your machine, you can change this to another free port, like maybe `endpoint="http://localhost:9007"`. You can also point to a CoreNLP server running on a different machine.
classpath	str	None	Classpath to use for CoreNLP. None means using the classpath as set by the `$CORENLP_HOME` environment variable, “$CLASSPATH” means to use the system CLASSPATH, and otherwise, the given string is used
timeout	int	60000	The maximum amount of time, in milliseconds, to wait for an annotation to finish before cancelling it.
threads	int	5	The number of threads to hit the server with. If, for example, the server is running on an 8 core machine, you can specify this to be 8, and the client will allow you to make 8 simultaneous requests to the server.
memory	str	“5G”	This specifies the memory used by the CoreNLP server process.
start_server	stanza.server.StartServer	FORCE_START	Whether to start the CoreNLP server when initializing the Python `CoreNLPClient` object. By default the CoreNLP server will be started using the provided options. Alternatively, `DONT_START` doesn’t start a new CoreNLP server and attempts to connect to an existing server instance at `endpoint`; `TRY_START` tries to start a new server instance at the endpoint provided, but doesn’t fail like `FORCE_START` if one is already running there. Note that this Enum is new in Stanza v1.1, and in previous versions it only supports boolean input.
stdout	file	sys.stdout	The standard output used by the CoreNLP server process.
stderr	file	sys.stderr	The standard error used by the CoreNLP server process.
be_quiet	bool	False	If set to False, the server process will print detailed error logs. Useful for diagnosing errors.
max_char_length	int	100000	The max number of characters that will be accepted and processed by the CoreNLP server in a single request.
preload	bool	True	Load the annotators immediately upon server start; otherwise the annotators will be lazily loaded upon the first annotation request is made.

Here is a quick example that specifies a list of annotators to load, allocates 8G of memory to the server, uses plain text output format, and requests the server to print detailed error logs during annotation:

with CoreNLPClient(
    annotators='tokenize,ssplit,pos,lemma,ner',
    output_format='text',
    memory='8G',
    be_quiet=False) as client:

Note:

The be_quiet option is set to False by default! It is advised to review CoreNLP server logs when starting out to make sure any errors are not happening on the server side of your application. If your application is generally stable, you can set be_quiet=True to stop seeing CoreNLP server log output.

CoreNLP Server Start Options (Advanced)

Apart from the above options, there are some very advanced settings that you may need to customize how the CoreNLP server will start in the background. They are summarized in the following table:

Option	Description
server_id	ID for the server, label attached to server’s shutdown key file
status_port	Port to server status check endpoints
uriContext	URI context for server
strict	Obey strict HTTP standards
ssl	If true, start server with (an insecure) SSL connection
key	.jks file to load if ssl is enabled
username	The username component of a username/password basic auth credential
password	The password component of a username/password basic auth credential
blockList	a list of IPv4 addresses to ban from using the server

You can also find more documention for the server’s start up options on the CoreNLP Server website.

Here we highlight two common use cases on why you may need these options.

Changing server ID when using multiple CoreNLP servers on a machine

When a CoreNLP server is started, it will write a special shutdown key file to the local disk, to indicate its running status. This will create an issue when multiple servers need to be run simultaneously on a single machine, since a second server won’t be able to write and delete its own shutdown key file. This is easily solvable by giving a special server ID to the second server instance, when the client is initialized:

with CoreNLPClient(server_id='second-server-name') as client:

Protecting a CoreNLP server with password

You can even password-protect a CoreNLP server process, so that other users on the same machine won’t be able to access or change your CoreNLP server:

with CoreNLPClient(username='myusername', password='1234') as client:

Now you’ll need to provide the same username and password when you call the annotate function of the client, so that the request can authenticate itself with the server:

ann = client.annotate(text, username='myusername', password='1234')

Easy, right?

Switching Languages

Stanza by default starts an English CoreNLP pipeline when a client is initialized. You can switch to a different language by setting a simple properties argument when the client is initialized. The following example shows how to start a client with default French models:

with CoreNLPClient(properties='french') as client:

Alternatively, you can also use the ISO 639-1 code for a language:

with CoreNLPClient(properties='fr') as client:

This will initialize a CoreNLPClient object with the default set of French models. If you want to further customize the models used by the CoreNLP server, please read on.

Note:

Currently CoreNLP only provides official support for 6 human languages. For a full list of languages and models available, please see the CoreNLP website.

Using a CoreNLP server on a remote machine

With the endpoint option, you can even connect to a remote CoreNLP server running in a different machine:

with CoreNLPClient(endpoint='http://remote-server-address:9000') as client:

Dynamically Changing Properties for Each Annotation Request

Properties for the CoreNLP pipeline run on text can be set for each particular annotation request. If properties are set for a particular request, the server’s initialization properties will be overridden. This allows you to dynamically change your annotation need, without needing to start a new client-server from scratch.

Request level properties can be specified with a Python dictionary, or the name of a CoreNLP supported language.

Here is an example of making a request with a custom dictionary of properties:

FRENCH_CUSTOM_PROPS = {
    'annotators': 'tokenize,ssplit,pos,parse', 'tokenize.language': 'fr',
    'pos.model': 'edu/stanford/nlp/models/pos-tagger/french/french.tagger',
    'parse.model': 'edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz',
    'outputFormat': 'text'
}

with CoreNLPClient() as client:
    ann = client.annotate(text, properties=FRENCH_CUSTOM_PROPS)

Alternatively, request-level properties can simply be a language that you want to run the CoreNLP pipeline for:

ann = client.annotate(text, properties='german')

Note:

A subtle point to note is that when requests are sent with custom properties, those custom properties will overwrite the properties the server was started with, unless a CoreNLP language name is specified, in which case the server start properties will be ignored and the CoreNLP defaults for that language will be written on top of the original CoreNLP defaults.

Similarly to CoreNLPClient initialization, you can also specify the annotators and output format for CoreNLP for individual annotation requests as:

ann = client.annotate(text, properties=FRENCH_CUSTOM_PROPS, annotators='tokenize,ssplit,pos', output_format='json')

Advanced Usage & Client Customization

Table of contents

Overview

CoreNLP Server Start Options (Pipeline)

CoreNLP Server Start Options (Server)

CoreNLP Server Start Options (Advanced)

Changing server ID when using multiple CoreNLP servers on a machine

Protecting a CoreNLP server with password

Switching Languages

Using a CoreNLP server on a remote machine

Dynamically Changing Properties for Each Annotation Request