DeepMind Talk | Grounded Language Learning in Virtual Environments

Our talk by DeepMind’s Dr. Stephen Clark and Dr. Felix Hill took place on Thursday, 20 Feb at the Department of Engineering. Attached are some photos and details of the talk.

We will be having out next talk by Samsung AI just next week. Follow us on Facebook to stay tuned for more details!

Talk description:

In collaboration with the Cambridge University Language Technology Lab, CUMIN invites you to a talk given by DeepMind Research Scientists Dr. Stephen Clark and Dr. Felix Hill. Join us in room LT2 in the Engineering Department at 4pm on February 20th. As spaces are limited, make sure to come a few minutes early to ensure you get a place!

***
ABSTRACT:
Natural Language Processing is currently dominated by the application of text-based language models such as BERT and GPT-2. One feature of these models is that they rely entirely on the statistics of text, without making any connection to the world, which raises the interesting question of whether such models could ever properly “understand” the language. One way in which these models can be “grounded” is to connect them to images or videos, for example by conditioning the language models on visual input and using them for captioning.

In this talk we extend the grounding idea to a simulated virtual
world: an environment which an agent can perceive and interact
with. More specifically, a neural-network-based agent is trained – using distributed deep reinforcement learning – to associate words and phrases with things that it learns to see and do in the virtual world. The world is 3D, built in Unity, and contains recognisable objects, including some from the ShapeNet repository of assets.

One of the difficulties in training such networks is that they have a tendency to overfit to their training data, so first we’ll demonstrate how the interactive, first-person perspective of an agent provides it with a particular inductive bias that helps it to generalize to out-of-distribution settings. Another difficulty is providing the agent with enough linguistic experience so that it can learn to handle the variety and noise in natural language. One way to increase the agent’s linguistic knowledge is to provide it with BERT embeddings, and we’ll show how an agent endowed with BERT representations can achieve substantial (zero-shot) transfer from template-based language to noisy natural instructions given by humans with access to the agent’s world.

**
SPEAKER BIOS:
Dr. Stephen Clark is a Research Scientist at DeepMind and an Honorary Professor at Queen Mary University of London. He has previously worked in multiple UK Universities, including the University of Edinburgh, the University of Oxford and the University of Cambridge.

Dr. Felix Hill is a Research Scientist at DeepMind. He holds a PhD in Computational Linguistics from the University of Cambridge and a Master of Mathematics from the University of Oxford.

Leave a Reply

Your email address will not be published. Required fields are marked *