Articles, Blog

DeepMind’s AI Learns Imagination-Based Planning | Two Minute Papers #178

December 2, 2019


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. A bit more than two years ago, the DeepMind
guys implemented an algorithm that could play Atari Breakout on a superhuman level by looking
at the video feed that you see here. And the news immediately took the world by
storm. This original paper is a bit more than 2 years
old and has already been referenced in well over a thousand other research papers. That is one powerful paper! This algorithm was based on a combination
of a neural network and reinforcement learning. The neural network was used to understand
the video feed, and reinforcement learning is there to come up with the appropriate actions. This is the part that plays the game. Reinforcement learning is very suitable for
tasks where we are in a changing environment and we need to choose an appropriate action
based on our surroundings to maximize some sort of score. This score can be for instance, how far we’ve
gotten in a labyrinth, or how many collisions we have avoided with a helicopter, or any
sort of score that reflects how well we’re currently doing. And this algorithm works similarly to how
an animal learns new things. It observes the environment, tries different
actions and sees if they worked well. If yes, it will keep doing that, if not, well,
let’s try something else. Pavlov’s dog with the bell is an excellent
example of that. There are many existing works in this area
and it performs remarkably well for a number of problems and computer games, but only if
the reward comes relatively quickly after the action. For instance, in Breakout, if we miss the
ball, we lose a life immediately, but if we hit it, we’ll almost immediately break some
bricks and increase our score. This is more than suitable for a well-built
reinforcement learner algorithm. However, this earlier work didn’t perform
well on any other games that required long-term planning. If Pavlov gave his dog a treat for something
that it did two days ago, the animal would have no clue as to which action led to this
tasty reward. And this work’s subject is a game where we
control this green character and our goal is to push the boxes onto the red dots. This game is particularly difficult, not only
for algorithms, but even humans, because of two important reasons: one, it requires long-term
planning, which, as we know, is a huge issue for reinforcement learning algorithms. Just because a box is next to a dot doesn’t
mean that it is the one that belongs there. This is a particularly nasty property of the
game. And two, some mistakes we make are irreversible,
for instance, pushing a box in a corner can make it impossible to complete the level. If we have an algorithm that tries a bunch
of actions and sees if they stick, well, that’s not going to work here! It is now hopefully easy to see that this
is an obscenely difficult problem, and the DeepMind guys just came up with Imagination-Augmented
Agents as a solution for it. So what is behind this really cool name? The interesting part about this novel architecture
is that it uses imagination, which is a routine to cook up not only one action, re plans consisting
of several steps, and finally, choose one that has the greatest expected reward over
the long term. It takes information about the present and
imagines possible futures, and chooses the one with the most handsome reward. And as you can see, this is only the first
paper on this new architecture and it can already solve a problem with seven boxes. This is just unreal. Absolutely amazing work. And please note that this is a fairly general
algorithm that can be used for a number of different problems. This particular game was just one way of demonstrating
the attractive properties of this new technique. The paper contains more results and is a great
read, make sure to have a look. Also, if you’ve enjoyed this episode, please
consider supporting Two Minute Papers on Patreon. Details are available in the video description,
have a look! Thanks for watching and for your generous
support, and I’ll see you next time!

You Might Also Like

1 Comment

  • Reply Rooster HAX July 31, 2018 at 10:54 pm

    True AGI!!!1

  • Leave a Reply