Stanza provides pretrained NLP models for a total 66 human languages. On this page we provide detailed information on how to download these models to process text in a language of your choosing.
Pretrained models in Stanza can be divided into two categories, based on the datasets they were trained on:
- Universal Dependencies (UD) models, which are trained on the UD treebanks, and cover functionalities including tokenization, multi-word token (MWT) expansion, lemmatization, part-of-speech (POS) and morphological features tagging and dependency parsing;
- NER models, which support named entity tagging for 8 languages, and are trained on various NER datasets.
For more information on what models are available for download, please see Available Models.
Downloading Stanza models is as simple as calling the
stanza.download() method. We provide detailed examples on how to use the
download interface on the Getting Started page. Detailed descriptions of all available options (i.e., arguments) of the
download method are listed below:
|lang||Language code (e.g., |
|dir||Directory for storing the models downloaded for Stanza. By default, Stanza stores its models in a folder in your home directory.|
|package||Package to download for processors, where each package typically specifies what data the models are trained on. We provide a “default” package for all languages that contains NLP models most users will find useful, which will be used when the |
|processors||Processors to download models for. This can either be specified as a comma-seperated list of processor names to use (e.g., |
|logging_level||Controls the level of logging information to display during download. Can be one of |
|verbose||Simplified option for logging level. If |
You can override the default location
~/stanza_resources by setting an environmental variable called