Link

Available Models & Languages

Table of contents


Stanza provides pretrained NLP models for a total 66 human languages. On this page we provide detailed information on these models.

Pretrained models in Stanza can be divided into two categories, based on the datasets they were trained on:

  1. Universal Dependencies (UD) models, which are trained on the UD treebanks, and cover functionalities including tokenization, multi-word token (MWT) expansion, lemmatization, part-of-speech (POS) and morphological features tagging and dependency parsing;
  2. NER models, which support named entity tagging for 8 languages, and are trained on various NER datasets.

Available UD Models

The following table lists all UD models supported by Stanza and pretrained on the Universal Dependencies v2.5 datasets. You can find more information about the POS tags, morphological features, and syntactic relations used on the Universal Dependencies website. We recommend you always use the lastest released models. However, you can still use these earlier models by downloading them and putting them in the correct directory. You can find performance of all available models on the System Performance page.

Table Notes

  1. marks models which have very low unlabeled attachment score (UAS) when evaluated end-to-end (from tokenization all the way to dependency parsing). Specifically, their UAS is lower than 50% on the Universal Dependencies 2.5 test set. Users should be very cautious in using the output of these models for serious syntactic analysis.
  2. marks the default package for a language, which is the package trained on the largest treebank available for that language.
  3. The copyright and licensing status of machine learning models is not very clear (to us). We list in the table below the Treebank License of the underlying data from which each language pack (set of machine learning models for a treebank) was trained. To the extent that The Trustees of Leland Stanford Junior University have ownership and rights over these language packs, all these Stanza language packs are made available under the Open Data Commons Attribution License v1.0.
LanguageLanguage codePackageVersionTreebank LicenseTreebank DocNotes
Afrikaansafafribooms1.0.0Creative Commons License
Ancient Greekgrcproiel1.0.0Creative Commons License
 grcperseus1.0.0Creative Commons License 
Arabicarpadt1.0.0Creative Commons License
Armenianhyarmtdp1.0.0Creative Commons License
Basqueeubdt1.0.0Creative Commons License
Belarusianbehse1.0.0Creative Commons License
Bulgarianbgbtb1.0.0Creative Commons License
Buryatbxrbdt1.0.0Creative Commons License
Catalancaancora1.0.0GNU License
Chinese (simplified)zh / zh-hansgsdsimp1.0.0Creative Commons License
Chinese (traditional)zh-hantgsd1.0.0Creative Commons License
Classical Chineselzhkyoto1.0.0Creative Commons License
Copticcopscriptorium1.0.0Creative Commons License
Croatianhrset1.0.0Creative Commons License
Czechcscac1.0.0Creative Commons License 
 cscltt1.0.0Creative Commons License 
 csfictree1.0.0Creative Commons License 
 cspdt1.0.0Creative Commons License
Danishdaddt1.0.0Creative Commons License
Dutchnlalpino1.0.0Creative Commons License
 nllassysmall1.0.0Creative Commons License 
Englishenewt1.0.0Creative Commons License
 engum1.0.0Creative Commons License 
 enlines1.0.0Creative Commons License 
 enpartut1.0.0Creative Commons License 
Estonianetedt1.0.0Creative Commons License
 etewt1.0.0Creative Commons License 
Finnishfiftb1.0.0Creative Commons License 
 fitdt1.0.0Creative Commons License
Frenchfrgsd1.0.0Creative Commons License
 frpartut1.0.0Creative Commons License 
 frsequoia1.0.0LGPLLR 
 frspoken1.0.0Creative Commons License 
Galicianglctg1.0.0Creative Commons License
 gltreegal1.0.0LGPLLR 
Germandegsd1.0.0Creative Commons License
 dehdt1.0.0Creative Commons License 
Gothicgotproiel1.0.0Creative Commons License
Greekelgdt1.0.0Creative Commons License
Hebrewhehtb1.0.0Creative Commons License
Hindihihdtb1.0.0Creative Commons License
Hungarianhuszeged1.0.0Creative Commons License
Indonesianidgsd1.0.0Creative Commons License
Irishgaidt1.0.0Creative Commons License
Italianitisdt1.0.0Creative Commons License
 itpartut1.0.0Creative Commons License 
 itpostwita1.0.0Creative Commons License 
 ittwittiro1.0.0Creative Commons License 
 itvit1.0.0Creative Commons License 
Japanesejagsd1.0.0Creative Commons License
Kazakhkkktb1.0.0Creative Commons License
Koreankogsd1.0.0Creative Commons License 
 kokaist1.0.0Creative Commons License
Kurmanjikmrmg1.0.0Creative Commons License
Latinlaittb1.0.0Creative Commons License
 laproiel1.0.0Creative Commons License 
 laperseus1.0.0Creative Commons License 
Latvianlvlvtb1.0.0Creative Commons License
Lithuanianltalksnis1.0.0Creative Commons License 
 lthse1.0.0Creative Commons License
Livviolokkpp1.0.0Creative Commons License
Maltesemtmudt1.0.0Creative Commons License
Marathimrufal1.0.0Creative Commons License
North Samismegiella1.0.0Creative Commons License
Norwegian (Bokmaal)no / nbbokmaal1.0.0Creative Commons License
Norwegian (Nynorsk)nnnynorsk1.0.0Creative Commons License
 nnnynorsklia1.0.0Creative Commons License 
Old Church Slavoniccuproiel1.0.0Creative Commons License
Old Frenchfrosrcmf1.0.0Creative Commons License
Old Russianorvtorot1.0.0Creative Commons License
Persianfaseraji1.0.0Creative Commons License
Polishpllfg1.0.0GNU License
 plpdb1.0.0Creative Commons License 
Portugueseptbosque1.0.0Creative Commons License
 ptgsd1.0.0Creative Commons License 
Romanianrononstandard1.0.0Creative Commons License 
 rorrt1.0.0Creative Commons License
Russianrugsd1.0.0Creative Commons License 
 rusyntagrus1.0.0Creative Commons License
 rutaiga1.0.0Creative Commons License 
Scottish Gaelicgdarcosg1.0.0Creative Commons License
Serbiansrset1.0.0Creative Commons License
Slovaksksnk1.0.0Creative Commons License
Slovenianslssj1.0.0Creative Commons License
 slsst1.0.0Creative Commons License 
Spanishesancora1.0.0GNU License
 esgsd1.0.0Creative Commons License 
Swedishsvlines1.0.0Creative Commons License 
 svtalbanken1.0.0Creative Commons License
Swedish Sign Languageswlsslc1.0.0Creative Commons License
Tamiltattb1.0.0Creative Commons License
Telugutemtg1.0.0Creative Commons License
Turkishtrimst1.0.0Creative Commons License
Ukrainianukiu1.0.0Creative Commons License
Upper Sorbianhsbufal1.0.0Creative Commons License
Urduurudtb1.0.0Creative Commons License
Uyghurugudt1.0.0Creative Commons License
Vietnamesevivtb1.0.0Creative Commons License
Wolofwowtb1.0.0Creative Commons License

Available NER Models

The following table lists all NER models supported by Stanza, pretrained on various NER datasets. Again, you can find performance of all available models on the System Performance page.

Table Notes

  1. marks the default package for a language.
  2. For packages with 4 named entity types, supported types include PER (Person), LOC (Location), ORG (Organization) and MISC (Miscellaneous); for package with 18 named entity types, supported types include PERSON, NORP (Nationalities/religious/political group), FAC (Facility), ORG (Organization), GPE (Countries/cities/states), LOC (Location), PRODUCT,EVENT, WORK_OF_ART, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL and CARDINAL (details can be found on page 21 of this OntoNotes documentation).
LanguageLANGUAGE CODEPACKAGE# TypesCORPUS DOCNOTES
ArabicarAQMAR4
ChinesezhOntoNotes18
DutchnlCoNLL024
DutchnlWikiNER4 
EnglishenCoNLL034 
EnglishenOntoNotes18
FrenchfrWikiNER4
GermandeCoNLL034
GermandeGermEval144 
RussianruWikiNER4
SpanishesCoNLL024
SpanishesAnCora4