StanfordNLP

Models for Human Languages

Downloading and Using Models

Downloading a language pack (a set of machine learning models for a human language that you wish to use in the StanfordNLP pipeline) is as simple as

>>> import stanfordnlp
>>> stanfordnlp.download('ar')    # replace "ar" with the language or treebank code you need, see below

The language code or treebank code can be looked up in the next section. If only the language code is specified, we will download the default models for that language. If you are seeking the language pack built from a specific treebank, you can download the corresponding models with the appropriate treebank code. By default, language packs are stored in a stanfordnlp_resources folder inside your home directory.

To use the default language pack for any language, simply build the pipeline as follows:

>>> nlp = stanfordnlp.Pipeline(lang="es")    # replace "es" with the language of interest

If you are using a non-default treebank for the langauge, make sure to also specify the treebank code, for example:

>>> nlp = stanford.Pipeline(lang="it", treebank="it_postwita")

Human Languages Supported by StanfordNLP

Below is a list of all the (human) languages supported by StanfordNLP (through this Python neural pipeline). All languages are built using data from and are annotated according to Universal Dependencies v2. You can find more information about the POS tags, morphological features, and syntactic relations used on the Universal Dependencies website. The performance of these systems on the CoNLL 2018 Shared Task official test set (in our unofficial evaluation) can be found here.

Notes

marks models which have very low unlabeled attachment score (UAS) when evaluated end-to-end (from tokenization all the way to dependency parsing). Specifically, their UAS is lower than 50% on the CoNLL 2018 Shared Task test set. Users should be very cautious in using the output of these models for serious syntactic analysis.
marks models that are at least 1% absolute UAS worse than the full neural pipeline presented in our paper (which uses the Tensorflow counterparts for the tagger and the parser), so that might raise a yellow flag for people wishing to do parser comparison experiments, but in general these models are fine to use for syntactic analysis.
marks the default language pack for a language, which is the language pack trained on the largest treebank available for that language.
The copyright and licensing status of machine learning models is not very clear (to us). We list in the table below the Treebank License of the underlying data from which each language pack (set of machine learning models for a treebank) was trained. To the extent that The Trustees of Leland Stanford Junior University have ownership and rights over these language packs, all these StanfordNLP language packs are made available under the Open Data Commons Attribution License v1.0.

Language	Treebank	Language code	Treebank code	Models	Version	Treebank License
Afrikaans	AfriBooms	af	af_afribooms	download	0.2.0
Ancient Greek	Perseus	grc	grc_perseus	download	0.2.0
	PROIEL	grc	grc_proiel	download	0.2.0
Arabic	PADT	ar	ar_padt	download	0.2.0
Armenian	ArmTDP	hy	hy_armtdp	download	0.2.0
Basque	BDT	eu	eu_bdt	download	0.2.0
Bulgarian	BTB	bg	bg_btb	download	0.2.0
Buryat	BDT	bxr	bxr_bdt	download	0.2.0
Catalan	AnCora	ca	ca_ancora	download	0.2.0
Chinese (traditional)	GSD	zh	zh_gsd	download	0.2.0
Croatian	SET	hr	hr_set	download	0.2.0
Czech	CAC	cs	cs_cac	download	0.2.0
	FicTree	cs	cs_fictree	download	0.2.0
	PDT	cs	cs_pdt	download	0.2.0
Danish	DDT	da	da_ddt	download	0.2.0
Dutch	Alpino	nl	nl_alpino	download	0.2.0
	LassySmall	nl	nl_lassysmall	download	0.2.0
English	EWT	en	en_ewt	download	0.2.0
	GUM	en	en_gum	download	0.2.0
	LinES	en	en_lines	download	0.2.0
Estonian	EDT	et	et_edt	download	0.2.0
Finnish	FTB	fi	fi_ftb	download	0.2.0
	TDT	fi	fi_tdt	download	0.2.0
French	GSD	fr	fr_gsd	download	0.2.0
	Sequoia	fr	fr_sequoia	download	0.2.0	LGPLLR
	Spoken	fr	fr_spoken	download	0.2.0
Galician	CTG	gl	gl_ctg	download	0.2.0
	TreeGal	gl	gl_treegal	download	0.2.0	LGPLLR
German	GSD	de	de_gsd	download	0.2.0
Gothic	PROIEL	got	got_proiel	download	0.2.0
Greek	GDT	el	el_gdt	download	0.2.0
Hebrew	HTB	he	he_htb	download	0.2.0
Hindi	HDTB	hi	hi_hdtb	download	0.2.0
Hungarian	Szeged	hu	hu_szeged	download	0.2.0
Indonesian	GSD	id	id_gsd	download	0.2.0
Irish	IDT	ga	ga_idt	download	0.2.0
Italian	ISDT	it	it_isdt	download	0.2.0
	PoSTWITA	it	it_postwita	download	0.2.0
Japanese	GSD	ja	ja_gsd	download	0.2.0
Kazakh	KTB	kk	kk_ktb	download	0.2.0
Korean	GSD	ko	ko_gsd	download	0.2.0
	Kaist	ko	ko_kaist	download	0.2.0
Kurmanji	MG	kmr	kmr_mg	download	0.2.0
Latin	ITTB	la	la_ittb	download	0.2.0
	Perseus	la	la_perseus	download	0.2.0
	PROIEL	la	la_proiel	download	0.2.0
Latvian	LVTB	lv	lv_lvtb	download	0.2.0
North Sami	Giella	sme	sme_giella	download	0.2.0
Norwegian	Bokmaal	no_bokmaal	no_bokmaal	download	0.2.0
	Nynorsk	no_nynorsk	no_nynorsk	download	0.2.0
	NynorskLIA	no_nynorsk	no_nynorsklia	download	0.2.0
Old Church Slavonic	PROIEL	cu	cu_proiel	download	0.2.0
Old French	SRCMF	fro	fro_srcmf	download	0.2.0
Persian	Seraji	fa	fa_seraji	download	0.2.0
Polish	LFG	pl	pl_lfg	download	0.2.0
	SZ	pl	pl_sz	download	0.2.0
Portuguese	Bosque	pt	pt_bosque	download	0.2.0
Romanian	RRT	ro	ro_rrt	download	0.2.0
Russian	SynTagRus	ru	ru_syntagrus	download	0.2.0
	Taiga	ru	ru_taiga	download	0.2.0
Serbian	SET	sr	sr_set	download	0.2.0
Slovak	SNK	sk	sk_snk	download	0.2.0
Slovenian	SSJ	sl	sl_ssj	download	0.2.0
	SST	sl	sl_sst	download	0.2.0
Spanish	AnCora	es	es_ancora	download	0.2.0
Swedish	LinES	sv	sv_lines	download	0.2.0
	Talbanken	sv	sv_talbanken	download	0.2.0
Turkish	IMST	tr	tr_imst	download	0.2.0
Ukrainian	IU	uk	uk_iu	download	0.2.0
Upper Sorbian	UFAL	hsb	hsb_ufal	download	0.2.0
Urdu	UDTB	ur	ur_udtb	download	0.2.0
Uyghur	UDT	ug	ug_udt	download	0.2.0
Vietnamese	VTB	vi	vi_vtb	download	0.2.0

Models History

Models from earlier releases can be downloaded using the version argument. Note that not every release has a distinct model set.

>>> import stanfordnlp
>>> stanfordnlp.download('ar', version='0.1.0')  

Models from earlier releases can also be found in the table below.

Language	Treebank	Language code	Treebank code	Models
Afrikaans	AfriBooms	af	af_afribooms	0.1.0
Ancient Greek	Perseus	grc	grc_perseus	0.1.0
	PROIEL	grc	grc_proiel	0.1.0
Arabic	PADT	ar	ar_padt	0.1.0
Armenian	ArmTDP	hy	hy_armtdp	0.1.0
Basque	BDT	eu	eu_bdt	0.1.0
Bulgarian	BTB	bg	bg_btb	0.1.0
Buryat	BDT	bxr	bxr_bdt	0.1.0
Catalan	AnCora	ca	ca_ancora	0.1.0
Chinese (traditional)	GSD	zh	zh_gsd	0.1.0
Croatian	SET	hr	hr_set	0.1.0
Czech	CAC	cs	cs_cac	0.1.0
	FicTree	cs	cs_fictree	0.1.0
	PDT	cs	cs_pdt	0.1.0
Danish	DDT	da	da_ddt	0.1.0
Dutch	Alpino	nl	nl_alpino	0.1.0
	LassySmall	nl	nl_lassysmall	0.1.0
English	EWT	en	en_ewt	0.1.0
	GUM	en	en_gum	0.1.0
	LinES	en	en_lines	0.1.0
Estonian	EDT	et	et_edt	0.1.0
Finnish	FTB	fi	fi_ftb	0.1.0
	TDT	fi	fi_tdt	0.1.0
French	GSD	fr	fr_gsd	0.1.0
	Sequoia	fr	fr_sequoia	0.1.0
	Spoken	fr	fr_spoken	0.1.0
Galician	CTG	gl	gl_ctg	0.1.0
	TreeGal	gl	gl_treegal	0.1.0
German	GSD	de	de_gsd	0.1.0
Gothic	PROIEL	got	got_proiel	0.1.0
Greek	GDT	el	el_gdt	0.1.0
Hebrew	HTB	he	he_htb	0.1.0
Hindi	HDTB	hi	hi_hdtb	0.1.0
Hungarian	Szeged	hu	hu_szeged	0.1.0
Indonesian	GSD	id	id_gsd	0.1.0
Irish	IDT	ga	ga_idt	0.1.0
Italian	ISDT	it	it_isdt	0.1.0
	PoSTWITA	it	it_postwita	0.1.0
Japanese	GSD	ja	ja_gsd	0.1.0
Kazakh	KTB	kk	kk_ktb	0.1.0
Korean	GSD	ko	ko_gsd	0.1.0
	Kaist	ko	ko_kaist	0.1.0
Kurmanji	MG	kmr	kmr_mg	0.1.0
Latin	ITTB	la	la_ittb	0.1.0
	Perseus	la	la_perseus	0.1.0
	PROIEL	la	la_proiel	0.1.0
Latvian	LVTB	lv	lv_lvtb	0.1.0
North Sami	Giella	sme	sme_giella	0.1.0
Norwegian	Bokmaal	no_bokmaal	no_bokmaal	0.1.0
	Nynorsk	no_nynorsk	no_nynorsk	0.1.0
	NynorskLIA	no_nynorsk	no_nynorsklia	0.1.0
Old Church Slavonic	PROIEL	cu	cu_proiel	0.1.0
Old French	SRCMF	fro	fro_srcmf	0.1.0
Persian	Seraji	fa	fa_seraji	0.1.0
Polish	LFG	pl	pl_lfg	0.1.0
	SZ	pl	pl_sz	0.1.0
Portuguese	Bosque	pt	pt_bosque	0.1.0
Romanian	RRT	ro	ro_rrt	0.1.0
Russian	SynTagRus	ru	ru_syntagrus	0.1.0
	Taiga	ru	ru_taiga	0.1.0
Serbian	SET	sr	sr_set	0.1.0
Slovak	SNK	sk	sk_snk	0.1.0
Slovenian	SSJ	sl	sl_ssj	0.1.0
	SST	sl	sl_sst	0.1.0
Spanish	AnCora	es	es_ancora	0.1.0
Swedish	LinES	sv	sv_lines	0.1.0
	Talbanken	sv	sv_talbanken	0.1.0
Turkish	IMST	tr	tr_imst	0.1.0
Ukrainian	IU	uk	uk_iu	0.1.0
Upper Sorbian	UFAL	hsb	hsb_ufal	0.1.0
Urdu	UDTB	ur	ur_udtb	0.1.0
Uyghur	UDT	ug	ug_udt	0.1.0
Vietnamese	VTB	vi	vi_vtb	0.1.0