Provides a fast syntactic dependency parser. We generate three dependency-based outputs, as follows: basic, uncollapsed dependencies, saved in BasicDependenciesAnnotation; enhanced dependencies saved in EnhancedDependenciesAnnotation; and enhanced++ dependencies in EnhancedPlusPlusDependenciesAnnotation. Most users of our parser will prefer the latter representation.
This is a separate annotator for a direct dependency parser. These
parsers require prior part-of-speech tagging. If you need constituency
parses then you should look at the
||Annotator class name
||BasicDependenciesAnnotation, EnhancedDependenciesAnnotation, EnhancedPlusPlusDependenciesAnnotation
||file, classpath, or URL
||Dependency parsing model to use. There is no need to explicitly set this option, unless you want to use a different parsing model than the default. By default, this is set to the UD parsing model included in the stanford-corenlp-models JAR file.
Training a model
Here is an example command for training your own model. In this example we will train a French dependency parser.
java -Xmx12g edu.stanford.nlp.parser.nndep.DependencyParser -trainFile fr-ud-train.conllu -devFile fr-ud-dev.conllu -model new-french-UD-model.txt.gz -embedFile wiki.fr.vec -embeddingSize 300 -tlp edu.stanford.nlp.trees.international.french.FrenchTreebankLanguagePack -cPOS
- UD train/dev/test data for a variety of languages can be found here
- There are many places to find word embedding data, in this example Facebook fastText embeddings are being used, they are found here
- Note that you need a tokenizer for your language that matches the tokenization of the UD training files, you may have to reprocess the files to match the tokenizing you plan to use
- Likewise, if you use the
-cPOS setting, you will have to have POS tags that match the UD training data
- The amount of RAM necessary to train the model may vary depending on various factors
For details about the dependency software, see this page. For more details about dependency parsing in general, see this page.