Link

Tregex, Tsurgeon, Semgrex, and Ssurgeon

Table of contents


About

Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for “tree regular expressions”). Tregex comes with Tsurgeon , a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, called semgrex. Recent versions of CoreNLP include a dependency graph editor based on Semgrex called Ssurgeon.

Tregex: The best introduction to Tregex is the brief powerpoint tutorial for Tregex by Galen Andrew. The best way to learn to use Tregex is by working with the GUI (TregexGUI). It has help screens which summarize the syntax of Tregex. You can find brief documentation of Tregex’s pattern language on the TregexPattern javadoc page, and, of course, you should also be very familiar with Java regular expression syntax. Tregex contains essentially the same functionality as TGrep2 (which had a superset of the functionality of the original tgrep), plus several extremely useful relations for natural language trees, for example “A is the lexical head of B”, and “A and B share a (hand-specified) variable substring” (useful for finding nodes coindexed with each other). Because it does not create preprocessed indexed corpus files, it is however somewhat slower than TGrep2 when searching over large treebanks, but gains from being able to be run on any trees without requiring index construction. As a Java application, it is platform independent, and can be used programmatically in Java software. There is also both a graphical interface (also platform independent) and a command line interface through the TregexPattern main method. To launch the graphical interface double click the stanford-tregex.jar file.

Tsurgeon: A good introduction is the powerpoint slides for Tsurgeon by Marie-Catherine de Marneffe. Tsurgeon can be run from the command line and is also incorporated into the TregexGUI graphical interface. Its syntax is presented on the Tsurgeon javadoc page.

Semgrex: An included set of powerpoint slides and the javadoc for SemgrexPattern provide an overview of this package.

Ssurgeon: The Javadoc page describes the basic opterations available for Ssurgeon.

Tregex was written by Galen Andrew and Roger Levy. Tsurgeon was written by Roger Levy. The graphical interface for both was written by Anna Rafferty. A lot of bug fixing and various extensions to both were done by John Bauer. Semgrex was written by Chloé Kiddon and John Bauer. Ssurgeon was written by Eric Yeh and John Bauer. These programs also rely on classes developed by others as part of the Stanford JavaNLP project.

There is a paper describing Tregex and Tsurgeon. You’re encouraged to cite it if you use Tregex or Tsurgeon.

Roger Levy and Galen Andrew. 2006. Tregex and Tsurgeon: tools for querying and manipulating tree data structures. 5th International Conference on Language Resources and Evaluation (LREC 2006).

Semgrex is very briefly described in this paper:

Nathanael Chambers, Daniel Cer, Trond Grenager, David Hall, Chloé Kiddon Bill MacCartney, Marie-Catherine de Marneffe, Daniel Ramage Eric Yeh, and Christopher D. Manning. 2007. Learning Alignments and Leveraging Natural Logic. Proceedings of the Workshop on Textual Entailment and Paraphrasing , pages 165–170.

We published a more complete description of Semgrex and Ssurgeon at GURT 2023:

John Bauer, Chloé Kiddon, Eric Yeh, Alex Shan, and Christopher D. Manning. 2023. Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)

Tregex, Tsurgeon, Semgrex, and Ssurgoen are licensed under the GNU General Public License (v2 or later). Note that this is the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing is available. Source is included. The package includes components for command- line invocation and a Java API.

Questions

There is a tregex FAQ list (with answers!). Please send any other questions or feedback, or extensions and bugfixes to our Github

Extensions: Packages by others using Tregex/Semgrex

  • Javascript (node.js): semgrex: NodeJs wrapper for Stanford NLP Semgrex. [GitHub]
  • Python interface to tsurgeon, semgrex, and ssurgeon integrated into Stanza, officially supported by Stanford CoreNLP

Download

Tregex, Tsurgeon, Semgrex, and Ssurgeon are all included in the latest CoreNLP releases.

Standalone Package

Older versions were built as standalone packages, described here.

Contents

The download is a 9 Mb zip file. It contains:

  1. README-tregex.txt – Basic information about the distribution, including a “quickstart” guide.
  2. README-tsurgeon.txt – information about Tsurgeon.
  3. README-gui.txt – information about using the graphical interface
  4. LICENSE – Tregex is licensed under the Gnu General Public License.
  5. stanford-tregex.jar – This is a JAR file containing all the Stanford classes necessary to run tregex.
  6. src directory – a directory with the source files for Tregex and Tsurgeon
  7. lib directory – library files required for recompiling the distribution (with Mac OS X customization; see lib/ABOUT-AppleJavaExtensions.txt for removing this dependency)
  8. build.xml, Makefile – files for recompiling (with ant or make) the distribution
  9. javadoc – Javadocs for the distribution
  10. tregex.sh, tsurgeon.sh – sample scripts for running Tregex and Tsurgeon from the command line
  11. run-tregex-gui.command, run-tregex-gui.bat – shell script for running the graphical interface for Tregex with more memory for searching larger treebanks; can be double-clicked to open on a Mac or PC, respectively
  12. examples directory – example files for Tregex and Tsurgeon

Download Tregex version 4.2.0 (source and executables for all platforms)

Download Tregex version 3.4 Mac OS X disk image (GUI packaged as Mac application; Java 1.7 runtime included)

Release history

VersionDateDescription
4.2.02020‑11‑17Update for compatibility
4.0.02020‑04‑19Update for compatibility
3.9.22018‑10‑16Update for compatibility
3.9.12018‑02‑27Update for compatibility
3.8.02017‑06‑09Update for compatibility
3.7.02016‑10‑31Update for compatibility
3.6.02015‑12‑09Updated for compatibility
3.5.22015‑04‑20Update for compatibility
3.5.12015‑01‑29Update for compatibility
3.5.02014‑10‑26Upgrade to Java 8
3.4.12014‑08‑27Fix a thread safety issue in tsurgeon. Last version to support Java 6 and Java 7.
3.42014‑06‑16Added a new tregex pattern, exact subtree, and improved efficiency for certain operations
3.3.12014‑01‑04Added a new tsurgeon operation, createSubtree
3.3.02013‑11‑12Add an option to get a TregexMatcher from a TregexPattern with a different HeadFinder
3.2.02013‑06‑20Fix minor bug in tsurgeon indexing
2.0.62013‑04‑04Updated for compatibility with other software releases
2.0.52012‑11‑11Minor efficiency improvements
2.0.42012‑07‑09Minor bug fixes
2.0.32012‑05‑22Updated to maintain compatibility with other Stanford software.
2.0.22012‑03‑09Regex matching efficiency improvement
2.0.12012‑01‑06Fix matchesAt, fix category heads. Last version to support Java 5.
2.02011‑09‑14Introduces semgrex, which operates on SemanticGraphs.
1.4.42011‑06‑19Updated to maintain compatibility with other Stanford software.
1.4.32011‑05‑15Updated to maintain compatibility with other Stanford software.
1.4.22011‑04‑20Addition of tree difference display. Several bugfixes.
1.4.12010‑11‑18Small fixes and improvements (multipattern Tsurgeon scripts, file and line numbers in sentence window, fixed GUI lock-up and tregex immediate domination path matching)
1.42009‑08‑30GUI slider for tree size, allow @ and __ in path constraints, incompatibly generalize Tsurgeon relabel command, bug fix for links and backreferences being used as named node, more memory/space efficient treebank reading
1.3.22008‑05‑06Additional features added to the graphical interface, which is now version 1.1: browse trees, better memory handling
1.3.12007‑11‑20Additional features added to the graphical interface: better copy/paste and drag and drop support, capability to save matched sentences as well as matched trees, and can save files in different encodings
1.32007‑09‑20Various bug fixes and improvements; additional Tsurgeon operations; and added a graphical interface
1.22005‑11‑23Bundled in Tsurgeon.
1.1.12005‑09‑15Fixed bugs: 1) in variable groups; 2) in number of reported matches for “<” relation
1.12005‑07‑19Several new relations added; variable substring capability added too.
1.02005‑02‑17Initial release