AlphaGo had already sealed its series victory by winning the first three games of the best-of-five.
That triumph dealt a blow to Facebook, which has also been developing AI to play Go, a hugely complex ancient Chinese board game.
In January, Facebook CEO Mark Zuckerberg said the company had created an AI system that was close to being able to beat the best human Go players. But before Facebook could show off its achievement, Google’s DeepMind unit put its AlphaGo system up against 18-time world champion Sedol.
Zuckerberg posted after a message of congratulations after AlphaGo’s third victory in a row: “Congrats to the Google DeepMind team on this historic milestone in AI research – a third straight victory over Go grandmaster Lee Sedol. We live in exciting times.”
However, Yann LeCun, Facebook’s head of AI research, has spent the last few days being rather more critical.
“Congrats to the DeepMind AlphaGo team for this Grand Slam,” LeCun said on Facebook. “Now, can you do it purely through reinforcement learning, without pre-training the convolutional net on recorded games between humans?”
Congrats to the DeepMind AlphaGo team for this Grand Slam.Now, can you do it purely through reinforcement learning, without pre-training the convolutional net on recorded games between humans?
Posted by Yann LeCun on Saturday, 12 March 2016
LeCun was suggesting that AlphaGo didn’t do enough of its own learning, instead, just aping the best moves from human Go players based on millions of historial matches in its dtatabase.
‘Reinforcement learning’, on the other hand, requires a machine to analyze what it did to bring a positive or negative outcome, so it can learn how to make positive, winning moves and avoid moves that would lead it toward defeat.
In its defence, Google has said that AlphaGo did in fact undertake reinforcement learning, on top of the pre-training.
As DeepMind founder Demis Hassabis put it in a blog post, “Our goal is to beat the best human players, not just mimic them. To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and adjusting the connections using a trial-and-error process known as reinforcement learning.”
LeCun took several additional swipes at AlphaGo, posting a link to a blog post suggesting the significance of Google’s victory was being overblown; putting up a comment saying the next iteration of AlphaGo “should be called BettaGo (haha!) and include online adaptation to the opponent;” and linking to an article that called AlphaGo’s victory an “intermediate” accomplishment rather than a breakthrough.