Workflow-Guided Exploration (WGE) is a framework for exploring action sequences more efficiently when a small amount of demonstrations are available. It helps a reinforcement learning (RL) agent discover reward more quickly even when the reward is sparse.

Motivation and Setup

Our motivating task is learning an RL agent to use the Internet by controlling the web browser. Here is a simple example where the agent has to forward Bob’s email to Alice:

email-inbox task

The input goal is different for each episode, and there can be multiple subtasks (e.g., forward an email, reply to an email, delete an email). The agent receives a sparse binary reward at the end of the episode.

To aid learning, suppose we also have access to a few (e.g., 10) human demonstrations of how to complete the tasks.

Framework Overview

Instead of directly training a model on the demonstrations (which would lead to overfitting), we use the demonstrations to constrain exploration.

email-inbox task

Demonstrations

Evaluation is done on OpenAI Mini World of Bits (MiniWoB) benchmark. To aid further research, we have released the following resources:

Some results of the models learned using WGE, compared with models that use behavioral cloning + RL, are shown here:

WGE (10 demos) BC+RL (100 demos) BC+RL (300 demos) BC+RL (1000 demos)
social-media_wge social-media_100 social-media_300 social-media_1000
enter-time_wge enter-time_100 enter-time_300 enter-time_1000
click-checkboxes-large_wge click-checkboxes-large_100 click-checkboxes-large_300 click-checkboxes-large_1000
click-checkboxes-soft_wge click-checkboxes-soft_100 click-checkboxes-soft_300 click-checkboxes-soft_1000
email-inbox-nl-turk_wge email-inbox-nl-turk_100 email-inbox-nl-turk_300 email-inbox-nl-turk_1000

References

Evan Zheran Liu*, Kelvin Guu*, Panupong (Ice) Pasupat*, Tianlin Shi, Percy Liang. Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration. ICLR 2018.