Description
Provides an accurate syntactic dependency parser.
Property name |
Annotator class name |
Generated Annotation |
depparse |
DepparseProcessor |
Determines the syntactic head of each word in a sentence and the dependency relation between the two words that are accessible through Word ’s governor and dependency_relation attributes. |
Options
Option name |
Type |
Default |
Description |
depparse_batch_size |
int |
5000 |
When annotating, this argument specifies the maximum number of words to process as a minibatch for efficient processing. Caveat: the larger this number is, the more working memory is required (main RAM or GPU RAM, depending on the computating device). This parameter should be set larger than the number of words in the longest sentence in your input document, or you might run into unexpected behaviors. |
Example Usage
The depparse
processor depends on tokenize
, mwt
, pos
, and lemma
. After all these processors have been run, each Sentence
in the output would have been parsed into Universal Dependencies (version 2) structure, where the governor index of each word
can be accessed by word.governor
, and the dependency relation between the words word.dependency_relation
. Note that the governor index starts at 1 for actual words, and is 0 only when the word itself is the root of the tree. This index should be offset by 1 when looking for the govenor word in the sentence. Here is an example to access dependency parse information:
import stanfordnlp
nlp = stanfordnlp.Pipeline(processors='tokenize,mwt,pos,lemma,depparse', lang='fr')
doc = nlp("Van Gogh grandit au sein d'une famille de l'ancienne bourgeoisie.")
print(*[f"index: {word.index.rjust(2)}\tword: {word.text.ljust(11)}\tgovernor index: {word.governor}\tgovernor: {(doc.sentences[0].words[word.governor-1].text if word.governor > 0 else 'root').ljust(11)}\tdeprel: {word.dependency_relation}" for word in doc.sentences[0].words], sep='\n')
This will generate the following output:
index: 1 word: Van governor index: 3 governor: grandit deprel: nsubj
index: 2 word: Gogh governor index: 1 governor: Van deprel: flat:name
index: 3 word: grandit governor index: 0 governor: root deprel: root
index: 4 word: à governor index: 6 governor: sein deprel: case
index: 5 word: le governor index: 6 governor: sein deprel: det
index: 6 word: sein governor index: 3 governor: grandit deprel: obl
index: 7 word: d' governor index: 9 governor: famille deprel: case
index: 8 word: une governor index: 9 governor: famille deprel: det
index: 9 word: famille governor index: 6 governor: sein deprel: nmod
index: 10 word: de governor index: 13 governor: bourgeoisie deprel: case
index: 11 word: l' governor index: 13 governor: bourgeoisie deprel: det
index: 12 word: ancienne governor index: 13 governor: bourgeoisie deprel: amod
index: 13 word: bourgeoisie governor index: 9 governor: famille deprel: nmod
index: 14 word: . governor index: 3 governor: grandit deprel: punct
Training-Only Options
Most training-only options are documented in the argument parser of the dependency parser.