To get started with StanfordNLP, we strongly recommend that you install it through PyPI. Once you have pip installed, simply run in your command line
This will take care of all of the dependencies necessary to run StanfordNLP. The neural pipeline of StanfordNLP depends on PyTorch 1.0.0 or a later version with compatible APIs.
To try out StanfordNLP, you can simply follow these steps in the interactive Python interpreter:
>>> import stanfordnlp
>>> stanfordnlp.download('en') # This downloads the English models for the neural pipeline
>>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
The last command here will print out the words in the first sentence in the input string (or
Document, as it is represented in StanfordNLP), as well as the indices for the word that governs it in the Universal Dependencies parse of that sentence (its “head”), along with the dependency relation between the words. The output should look like:
('Barack', '4', 'nsubj:pass')
('Obama', '1', 'flat')
('was', '4', 'aux:pass')
('born', '0', 'root')
('in', '6', 'case')
('Hawaii', '4', 'obl')
('.', '4', 'punct')
To build a pipeline for other languages, simply pass in the language code to the constructor like this
stanfordnlp.Pipeline(lang="fr"). For a full list of languages (and their corresponnding language codes) supported by StanfordNLP, please see this section.
We also provide a demo script in our Github repostory that demonstrates how one uses StanfordNLP in other languages than English, for example Chinese (traditional)
python demo/pipeline_demo.py -l zh
And expect outputs like the following:
tokens of first sentence:
達沃斯 達沃斯 PROPN
世界 世界 NOUN
經濟 經濟 NOUN
論壇 論壇 NOUN
是 是 AUX
每年 每年 DET
全球 全球 NOUN
政 政 PART
商界 商界 NOUN
領袖 領袖 NOUN
聚 聚 VERB
在 在 VERB
一起 一起 NOUN
的 的 PART
年度 年度 NOUN
盛事 盛事 NOUN
。 。 PUNCT
dependency parse of first sentence:
('達沃斯', '4', 'nmod')
('世界', '4', 'nmod')
('經濟', '4', 'nmod')
('論壇', '16', 'nsubj')
('是', '16', 'cop')
('每年', '10', 'nmod')
('全球', '10', 'nmod')
('政', '9', 'case:pref')
('商界', '10', 'nmod')
('領袖', '11', 'nsubj')
('聚', '16', 'acl:relcl')
('在', '11', 'mark')
('一起', '11', 'obj')
('的', '11', 'mark:relcl')
('年度', '16', 'nmod')
('盛事', '0', 'root')
('。', '16', 'punct')
Why do I keep getting a
SyntaxError: invalid syntax error message while trying to import stanfordnlp?
StanfordNLP will not work with Python 3.5 or below. If you have trouble importing the package, please try to upgrade your Python.
Why am I getting an
OSError: [Errno 22] Invalid argument error and therefore a
Vector file is not provided exception while the model is being loaded?
If you are getting this error, it is very likely that you are running macOS and using Python with version <= 3.6.7 or <= 3.7.1. If this is the case, then you are affected by a known Python bug on macOS, and upgrading your Python to >= 3.6.8 or >= 3.7.2 should solve this issue. If you are not running macOS or already have the specified Python version and still seeing this issue, please report this to us via the GitHub issue tracker.
Models for Human Languages
Downloading and Using Models
Downloading models for human languages of your interest for use in the StanfordNLP pipeline is as simple as
>>> import stanfordnlp
>>> stanfordnlp.download('ar') # replace "ar" with the language code or treebank code you need, see below
The language code or treebank code can be looked up in the next section. If only the language code is specified, we will download the default models for that language (indicated by in the table), which are the models trained on the largest treebank available in that language. If you really care about the models of a specific treebank, you can also download the corresponding models with the treebank code.
To use the default model for any language, simply build the pipeline as follows
>>> nlp = stanfordnlp.Pipeline(lang="es") # replace "es" with the language of interest
If you are using a non-default treebank for the langauge, make sure to also specify the treebank code, for example
>>> nlp = stanford.Pipeline(lang="it", treebank="it_postwita")
Human Languages Supported by StanfordNLP
Below is a list of all of the (human) languages supported by StanfordNLP (through its neural pipeline). The performance of these systems on the CoNLL 2018 Shared Task official test set (in our unofficial evaluation) can be found here.
- Models marked with have significantly low unlabeled attachment score (UAS) when evaluated end-to-end (from tokenization all the way to dependency parsing). Specifically, their UAS is lower than 50% on the CoNLL 2018 Shared Task test set. Any use of these models for serious syntactical analysis is strongly discouraged.
- marks models that are at least 1% absolute UAS worse than the full neural pipeline presented in our paper (which uses the Tensorflow counterparts for the tagger and the parser), so that might raise a yellow flag.