Learning Language from Perceptual Context

Doctoral Dissertation Proposal
for

David L. Chen

University of Texas at Austin
Department of Computer Sciences

Supervising Professor: Raymond J. Mooney
Committee Members: Peter Stone
Kristen Grauman
Katrin Erk
Regina Barzilay

[ Abstract | Full Text | Presentation | Links | ]

Abstract

Back to Top

Most current natural language processing (NLP) systems are built using statistical learning algorithms trained on large annotated corpora which can be expensive and time-consuming to collect. In contrast, humans can learn language through exposure to linguistic input in the context of a rich, relevant, perceptual environment. If a machine learning system can acquire language in a similar manner without explicit human supervision, then it can leverage the large amount of available text that refers to observed world states (e.g. sportscasts, instruction manuals, weather forecasts, etc.) Thus, my research focuses on how to build systems that use both text and the perceptual context in which it is used in order to learn a language.

I will first present a system we completed that can describe events in RoboCup 2D simulation games by learning only from sample language commentaries paired with traces of simulated activities without any language-specific prior knowledge. By applying an EM-like algorithm, the system was able to simultaneously learn a grounded language model as well as align the ambiguous training data. Human evaluations of the generated commentaries indicate they are of reasonable quality and in some cases even on par with those produced by humans.

For future work, I am proposing to solve the more complex task of learning how to give and receive navigation instructions in a virtual environment. In this setting, each instruction corresponds to a navigation plan that is not directly observable. Since an exponential number of plans can all lead to the same observed actions, we have to learn from compact representations of all valid plans rather than enumerating all possible meanings as we did in the sportscasting task. Initially, the system will passively observe a human giving instruction to another human, and try to learn the correspondences between the instructions and the intended plan. After the system has a decent understanding of the language, it can then participate in the interactions to learn more directly by playing either the role of the instructor or the follower.

Full Text

Back to Top

[PDF]

Presentation

Back to Top

Presented on Tuesday, December 15, 2009 at UT Austin

[PowerPoint] [PDF] [print version (PDF)]

Links

Back to Top