Tregex, Tsurgeon, Semgrex, and Ssurgeon
Table of contents
About
Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for “tree regular expressions”). Tregex comes with Tsurgeon , a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph
, called semgrex. Recent versions of CoreNLP include a dependency graph editor based on Semgrex called Ssurgeon.
Tregex: The best introduction to Tregex is the brief powerpoint tutorial for Tregex by Galen Andrew. The best way to learn to use Tregex is by working with the GUI (TregexGUI
). It has help screens which summarize the syntax of Tregex. You can find brief documentation of Tregex’s pattern language on the TregexPattern javadoc page, and, of course, you should also be very familiar with Java regular expression syntax. Tregex contains essentially the same functionality as TGrep2 (which had a superset of the functionality of the original tgrep), plus several extremely useful relations for natural language trees, for example “A is the lexical head of B”, and “A and B share a (hand-specified) variable substring” (useful for finding nodes coindexed with each other). Because it does not create preprocessed indexed corpus files, it is however somewhat slower than TGrep2 when searching over large treebanks, but gains from being able to be run on any trees without requiring index construction. As a Java application, it is platform independent, and can be used programmatically in Java software. There is also both a graphical interface (also platform independent) and a command line interface through the TregexPattern
main method. To launch the graphical interface double click the stanford-tregex.jar file.
Tsurgeon: A good introduction is the powerpoint slides for Tsurgeon by Marie-Catherine de Marneffe. Tsurgeon can be run from the command line and is also incorporated into the TregexGUI graphical interface. Its syntax is presented on the Tsurgeon javadoc page.
Semgrex: An included set of powerpoint slides and the javadoc for SemgrexPattern
provide an overview of this package.
Ssurgeon: The Javadoc page describes the basic opterations available for Ssurgeon.
Tregex was written by Galen Andrew and Roger Levy. Tsurgeon was written by Roger Levy. The graphical interface for both was written by Anna Rafferty. A lot of bug fixing and various extensions to both were done by John Bauer. Semgrex was written by Chloé Kiddon and John Bauer. Ssurgeon was written by Eric Yeh and John Bauer. These programs also rely on classes developed by others as part of the Stanford JavaNLP project.
There is a paper describing Tregex and Tsurgeon. You’re encouraged to cite it if you use Tregex or Tsurgeon.
Roger Levy and Galen Andrew. 2006. Tregex and Tsurgeon: tools for querying and manipulating tree data structures. 5th International Conference on Language Resources and Evaluation (LREC 2006).
Semgrex is very briefly described in this paper:
Nathanael Chambers, Daniel Cer, Trond Grenager, David Hall, Chloé Kiddon Bill MacCartney, Marie-Catherine de Marneffe, Daniel Ramage Eric Yeh, and Christopher D. Manning. 2007. Learning Alignments and Leveraging Natural Logic. Proceedings of the Workshop on Textual Entailment and Paraphrasing , pages 165–170.
We published a more complete description of Semgrex and Ssurgeon at GURT 2023:
John Bauer, Chloé Kiddon, Eric Yeh, Alex Shan, and Christopher D. Manning. 2023. Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)
Tregex, Tsurgeon, Semgrex, and Ssurgoen are licensed under the GNU General Public License (v2 or later). Note that this is the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing is available. Source is included. The package includes components for command- line invocation and a Java API.
Questions
There is a tregex FAQ list (with answers!). Please send any other questions or feedback, or extensions and bugfixes to our Github
Extensions: Packages by others using Tregex/Semgrex
- Javascript (node.js): semgrex: NodeJs wrapper for Stanford NLP Semgrex. [GitHub]
- Python interface to tsurgeon, semgrex, and ssurgeon integrated into Stanza, officially supported by Stanford CoreNLP
Download
Tregex, Tsurgeon, Semgrex, and Ssurgeon are all included in the latest CoreNLP releases.
Standalone Package
Older versions were built as standalone packages, described here.
Contents
The download is a 9 Mb zip file. It contains:
- README-tregex.txt – Basic information about the distribution, including a “quickstart” guide.
- README-tsurgeon.txt – information about Tsurgeon.
- README-gui.txt – information about using the graphical interface
- LICENSE – Tregex is licensed under the Gnu General Public License.
- stanford-tregex.jar – This is a JAR file containing all the Stanford classes necessary to run tregex.
- src directory – a directory with the source files for Tregex and Tsurgeon
- lib directory – library files required for recompiling the distribution (with Mac OS X customization; see
lib/ABOUT-AppleJavaExtensions.txt
for removing this dependency) - build.xml, Makefile – files for recompiling (with ant or make) the distribution
- javadoc – Javadocs for the distribution
- tregex.sh, tsurgeon.sh – sample scripts for running Tregex and Tsurgeon from the command line
- run-tregex-gui.command, run-tregex-gui.bat – shell script for running the graphical interface for Tregex with more memory for searching larger treebanks; can be double-clicked to open on a Mac or PC, respectively
- examples directory – example files for Tregex and Tsurgeon
Download Tregex version 4.2.0 (source and executables for all platforms)
Download Tregex version 3.4 Mac OS X disk image (GUI packaged as Mac application; Java 1.7 runtime included)
Release history
Version | Date | Description |
---|---|---|
4.2.0 | 2020‑11‑17 | Update for compatibility |
4.0.0 | 2020‑04‑19 | Update for compatibility |
3.9.2 | 2018‑10‑16 | Update for compatibility |
3.9.1 | 2018‑02‑27 | Update for compatibility |
3.8.0 | 2017‑06‑09 | Update for compatibility |
3.7.0 | 2016‑10‑31 | Update for compatibility |
3.6.0 | 2015‑12‑09 | Updated for compatibility |
3.5.2 | 2015‑04‑20 | Update for compatibility |
3.5.1 | 2015‑01‑29 | Update for compatibility |
3.5.0 | 2014‑10‑26 | Upgrade to Java 8 |
3.4.1 | 2014‑08‑27 | Fix a thread safety issue in tsurgeon. Last version to support Java 6 and Java 7. |
3.4 | 2014‑06‑16 | Added a new tregex pattern, exact subtree, and improved efficiency for certain operations |
3.3.1 | 2014‑01‑04 | Added a new tsurgeon operation, createSubtree |
3.3.0 | 2013‑11‑12 | Add an option to get a TregexMatcher from a TregexPattern with a different HeadFinder |
3.2.0 | 2013‑06‑20 | Fix minor bug in tsurgeon indexing |
2.0.6 | 2013‑04‑04 | Updated for compatibility with other software releases |
2.0.5 | 2012‑11‑11 | Minor efficiency improvements |
2.0.4 | 2012‑07‑09 | Minor bug fixes |
2.0.3 | 2012‑05‑22 | Updated to maintain compatibility with other Stanford software. |
2.0.2 | 2012‑03‑09 | Regex matching efficiency improvement |
2.0.1 | 2012‑01‑06 | Fix matchesAt, fix category heads. Last version to support Java 5. |
2.0 | 2011‑09‑14 | Introduces semgrex, which operates on SemanticGraphs. |
1.4.4 | 2011‑06‑19 | Updated to maintain compatibility with other Stanford software. |
1.4.3 | 2011‑05‑15 | Updated to maintain compatibility with other Stanford software. |
1.4.2 | 2011‑04‑20 | Addition of tree difference display. Several bugfixes. |
1.4.1 | 2010‑11‑18 | Small fixes and improvements (multipattern Tsurgeon scripts, file and line numbers in sentence window, fixed GUI lock-up and tregex immediate domination path matching) |
1.4 | 2009‑08‑30 | GUI slider for tree size, allow @ and __ in path constraints, incompatibly generalize Tsurgeon relabel command, bug fix for links and backreferences being used as named node, more memory/space efficient treebank reading |
1.3.2 | 2008‑05‑06 | Additional features added to the graphical interface, which is now version 1.1: browse trees, better memory handling |
1.3.1 | 2007‑11‑20 | Additional features added to the graphical interface: better copy/paste and drag and drop support, capability to save matched sentences as well as matched trees, and can save files in different encodings |
1.3 | 2007‑09‑20 | Various bug fixes and improvements; additional Tsurgeon operations; and added a graphical interface |
1.2 | 2005‑11‑23 | Bundled in Tsurgeon. |
1.1.1 | 2005‑09‑15 | Fixed bugs: 1) in variable groups; 2) in number of reported matches for “<” relation |
1.1 | 2005‑07‑19 | Several new relations added; variable substring capability added too. |
1.0 | 2005‑02‑17 | Initial release |