Dependency Parsing
Table of contents
Description
Provides a fast syntactic dependency parser. We generate three dependency-based outputs, as follows: basic, uncollapsed dependencies, saved in BasicDependenciesAnnotation; enhanced dependencies saved in EnhancedDependenciesAnnotation; and enhanced++ dependencies in EnhancedPlusPlusDependenciesAnnotation. Most users of our parser will prefer the latter representation.
This is a separate annotator for a direct dependency parser. These parsers require prior part-of-speech tagging. If you need constituency parses then you should look at the parse
annotator.
Property name | Annotator class name | Generated Annotation |
---|---|---|
depparse | DependencyParseAnnotator | BasicDependenciesAnnotation, EnhancedDependenciesAnnotation, EnhancedPlusPlusDependenciesAnnotation |
Options
Option name | Type | Default | Description |
---|---|---|---|
depparse.model | file, classpath, or URL | edu/stanford/nlp/models/parser/nndep/english_UD.gz | Dependency parsing model to use. There is no need to explicitly set this option, unless you want to use a different parsing model than the default. By default, this is set to the UD parsing model included in the stanford-corenlp-models JAR file. |
Training a model
Here is an example command for training your own model. In this example we will train a French dependency parser.
java edu.stanford.nlp.parser.nndep.DependencyParser -trainFile fr-ud-train.conllu -devFile fr-ud-dev.conllu -model new-french-UD-model.txt.gz -embedFile wiki.fr.vec -embeddingSize 300 -tlp edu.stanford.nlp.trees.international.french.FrenchTreebankLanguagePack -cPOS
- UD train/dev/test data for a variety of languages can be found here
- There are many places to find word embedding data, in this example Facebook fastText embeddings are being used, they are found here
- Note that you need a tokenizer for your language that matches the tokenization of the UD training files, you may have to reprocess the files to match the tokenizing you plan to use
- Likewise, if you use the
-cPOS
setting, you will have to have POS tags that match the UD training data - The amount of RAM necessary to train the model may vary depending on various factors
More information
For details about the dependency software, see this page. For more details about dependency parsing in general, see this page.