Stanford NLP Group Research

There has always been a broad range of research topics pursued by different students in the group. Nevertheless, here are a few overall trends and themes.

The Statistical NLP era: A turn to empirical language research

A shift from symbolic (rule-based or constraint-based) approaches to NLP to probabilistic (later, machine learning) approaches began in the late 1980s and started to become dominant in the mid 1990s. The Stanford NLP Group did not begin that early in the Statistical NLP era, but nevertheless early enough to become quite dominant in it.

1999–2004

Work in the group emphasized probabilistic parsing, part-of-speech tagging, named entity recognition, and grammar induction. The main technical methods were probabilistic context-free grammars and “maximum entropy” (softmax regression) models.

2005–2012

Work started to extend into new areas such as semantic role labeling, the Stanford Dependencies representation for syntactic dependency parsing, and approaches to natural language inference, paraphrasing, and veridicality. We explored new approaches around unsupervised and distantly supervised learning and the use of mechanical turk data. In the later years, probablistic NLP work emphasized relation extraction and event extraction, coreference resolution, knowledge base population, and semantic parsing with Sempre. Work of the group extended into broader studies of online and academic communities, topics like politeness, and used techniques such as labeled LDA.

The Neural NLP era

We moved early into artificial neural network approaches to NLP, doing some of the first work on neural natural language understanding.

2010–2020

Work in this period mainly still targeted particular NLP problems, and we worked on word vectors, semantic compositionality, sentiment analysis, natural language inference, reading comprehension and question answering, dependency parsing, neural machine translation, summarization, and dialogue generation. We made some benchmark datasets that had considerable influence, including SQuAD for QA, the Stanford Sentiment Treebank (SST) for sentiment analysis, and SNLI for natural language inference. Some work moved into multimodal areas looking at combinations of language with knowledge graphs, images, or tables, and other work moved into areas of model robustness, fairness, and bias.

2021–2025: The Foundation Model era

Members of the group were central in producing the paper “On the opportunities and risks of foundation models” (2021/08), which was early in defining the new era of neural model research. Specific research topics include benchmarking, such as Holistic Evaluation of Language Models, Dynabench and MQuAKE; hallucinations; model calibration; emergent abilities of LLMs; differentially private LLMs; causal inference analysis of models; representation tuning; post-training methods like Direct Preference Optimization and Kahneman-Tversky Optimization; and the DSPy approach to language model programs. Techniques investigated include simple test-time scaling, vision-language-action models, techniques for optimizing training data selection, the Alpaca approach to post-training data acquisition for instruction following and benchmarking, model unlearning, and text diffusion models. Applications include LLM output detection, neural model editing, Wikipedia authoring, news summarization, LLM coauthoring, generative agent simulations, conflict simulation, and generating novel research ideas. Concerns about foundation models are studied in pieces on demographic stereotypes, racial and gender biases, jailbreaking LLMs, and how to improve model safety.