Frequently Asked Questions (FAQ)
Table of contents
- Model Output
- Model predictions are wrong on some of my examples, is this normal?
- The model prediction is inconsistent between Stanza and CoreNLP, different versions of Stanza, or their online demos
- Can I use Stanza models in CoreNLP, or the other way around?
- Can I run POS tagging/morphological feature tagging/lemmatization/dependency parsing without expanding multi-word tokens (MWTs)?
- Troubleshooting Download & Installation
ERROR: Could not find a version
that satisfies the requirement torchwhen installing Stanza
module 'stanza' has no attribute 'download'when downloading models with Stanza
- Model download is very slow or I cannot connect to the server
requests.exceptions.ConnectionErrorwhen downloading models
- Troubleshooting Running Stanza
- Why do I keep getting a
SyntaxError: invalid syntaxerror message while trying to
Segmentation faultor other uninterpretable non-Python errors when trying to run the neural pipeline
- Why am I getting an
OSError: [Errno 22] Invalid argumenterror and therefore a
Vector file is not providedexception while the model is being loaded?
- Why do I keep getting a
This is absolutely normal, as all models in Stanza (yes, even tokenization!) are statistical. Although they are quite accurate, it does not mean these models are perfect. Therefore, it’s quite likely that you’ll find cases where the model prediction clearly doesn’t make sense, but statistically speaking, it shouldn’t be too far off on a large collection of text from the performance we report as long as the genre of your text is similar to what the models are trained on.
The model prediction is inconsistent between Stanza and CoreNLP, different versions of Stanza, or their online demos
Stanza’s neural pipeline use fundamentally different models from CoreNLP for all tasks, and are usually trained on different data, so it is not unexpected that their behaviors will differ.
As for online demos, it is possible that some demos are using models that are different from the latest models available for download, so it is not impossible that there are slight differences there as well.
Since Stanza’s neural pipeline use fundamentally different models from CoreNLP for all tasks, it will not be possible to use Stanza’s model in CoreNLP or the other way around.
However, you could use CoreNLP for part of the annotation (e.g., tokenization) through the
CoreNLPClient, and use the resulting annotations as input to Stanza’s neural pipeline.
Can I run POS tagging/morphological feature tagging/lemmatization/dependency parsing without expanding multi-word tokens (MWTs)?
For syntactic tasks such as POS/morphological feature tagging, lemmatization, and dependency parsing, Stanza uses data made available through the Universal Dependencies project which makes the distinction between tokens (substrings of the input text) and syntactic words (see the UD documentation on this for more information). This means if the language/dataset you want to use was deemed to contain multi-word tokens (MWTs), unfortunately nothing beyond tokenization and sentence segmentation can happen unless MWTs are expanded with the MWT expansion model in the pipeline (with the exception of named entity recognition, which are based on tokens!).
This is usually because PyTorch doesn’t have a version that Stanza requires for install through
pip. You can usually work around this issue by installing PyTorch from your package manager (e.g., Anaconda) first, before trying to install Stanza.
This is likely because you’re using Python 2. Note that Stanza only supports Python 3.6 or later.
Although we try our best to keep our model server available, it does become unavailable from time to time due to various reasons, e.g., hardware updates, power outages, etc. These will usually be resolved within a few hours. Please be patient while we fix issues on our side!
This is an known issue for users from some certain areas, such as China. A common reason for this is that a connection to the
raw.githubusercontent.com URL cannot be established, and therefore the resource file required for downloading models cannot be accessed. Users have widely reported that using a VPN that provides stable access to GitHub services can solve this issue.
Stanza will not work with Python 3.5 or below. If you have trouble importing the package, please try to upgrade your Python.
Segmentation fault or other uninterpretable non-Python errors when trying to run the neural pipeline
This is ususally caused by a corrupted installation of PyTorch in your environment. Try reinstalling PyTorch and Stanza.
Why am I getting an
OSError: [Errno 22] Invalid argument error and therefore a
Vector file is not provided exception while the model is being loaded?
If you are getting this error, it is very likely that you are running macOS and using Python with version <= 3.6.7 or <= 3.7.1. If this is the case, then you are affected by a known Python bug on macOS, and upgrading your Python to >= 3.6.8 or >= 3.7.2 should solve this issue.
If you are not running macOS or already have the specified Python version and still seeing this issue, please report this to us via the GitHub issue tracker.