Thursday, December 9, 2010

Paper notes - Reinforcement Learning for mapping instructions to actions

Branavan, Chen, Luke, Barzilay. ACL 2009

Problem: Given a set of text instructions, choose a sequence of actions that carries out the instructions. For example, "Click Start, point to search, and then click for files and folders. In the search results dialog box, on the tools menu, click folder options..."

Contribution: Problem already attacked in supervised setting. They introduce reinforcement learning setting (assume it can be known whether the actions were successful or not).

  • problem old, reinforcement setting old, algorithm (policy gradient) old, results worse than supervised case.
  • Neat and comprehensive representation of instruction, action, environment (where actions are taken).
  • Simple log-linear model for distributions - p(x) = \exp{\theta \cdot \phi(x)}/Z. (Estimate \theta.)
  • Lot of work in designing features (4400 feature for one application, and 8000+ for the other).
  • Lot of work in designing reward function for reinforcement.
  • Policy gradient (Sutton et al., 2000) algorithm for learning; niceties there too.
  • Testing the fundamental hypothesis - Analysis of real impact of text on the task (checking if it is work due to information other than the text, by measuring certain stats and by cleverly removing text info).
  • Baseline analysis - (a) Show that naive baselines do bad (so non-trivial task) (b) Show that supervised baseline is not great ('hard' task) (c) Finer analysis - cause of hardness (d) Mixing methods to measure partial impact of each method (a different kind of 'ablation' study)
  • Always report statistical significance (sign test)

No comments:

Post a Comment