If their learning material is based on expert human games, how can it ever get b...

brian_cloutier · on Jan 28, 2016

This was the question which originally led me to lose faith in deep learning for solving go.

Existing research throws a bunch of professional games at a DCNN and trains it to predict the next move.

It generally does quite well but fails hilariously when you give it a situation which never comes up in pro games. Go involves lots of implicit threats which are rarely carried out. These networks learn to make the threats but, lacking training data, are incapable of following up.

The first step of creating AlphaGo worked the same way (and actually was worse at predicting the next move than current state of the art), but Deep Mind then took that base network and retrained it. Instead of playing the move a pro would play it now plays the move most likely to result in a win.

For pros, this is the same move. But for AlphaGo, in this completely different MCTS environment, they are quite different. Deep Mind then played the engine against older versions of itself and used reinforcement learning to make the network as accurate as possible.

They effectively used the human data to bootstrap a better player. The paper used a lot of other cool techniques and optimizations, but I think this one might be the coolest.

mourner · on Jan 28, 2016

Fantastic explanation, thank you!

space_fountain · on Jan 27, 2016

How can a human ever get better than their teacher?

In this case though they play and optimize against themselves

kazinator · on Jan 28, 2016

> How can a human ever get better than their teacher?

By learning from other teachers, and by applying original thought. Also, due to innately superior intelligence. If your IQ is 140, and that of the teacher is 105, you will eventually outstrip the teacher.

jibalt · on Jan 31, 2016

The question was rhetorical. And what is needed is aptitude for the specific task, not "IQ" ... the two are often very different.

yvsong · on Jan 28, 2016

I concluded that the all time no. 1 master Go Seigen's secret is 1. learn from all masters; 2. keep inventing/innovating. Most experts do 1 well, and are pretty much stuck there. Few are good at 2. I doubt if computers can invent/innovate.

kitd · on Jan 28, 2016

I would have thought (he says casually) that some kind of genetic algorithm of introducing random moves and evaluating outcomes for success would be entirely possible, no?

DanBC · on Jan 28, 2016

There's a large space of random moves. How many are likely to be useful?

jibalt · on Jan 31, 2016

Do you ask that of natural evolution, too?

jibalt · on Jan 31, 2016

"I doubt if computers can invent/innovate."

Sheer ignorance.

sawwit · on Jan 27, 2016

It's because they have a much larger stack size than a human brain (which does not have a stack at all, but just various kinds of short term memories). An expert Go player can realistically maybe consider 2-3 moves into the future and can have a rough idea about what will happen in the coming 10 moves, while this method does tree search all the way to the end of the game on multiple alternative paths for each move.

donmaq · on Jan 28, 2016

Not true. Profession go players read out 20+ moves consistently. Go Seigan's nemesis Kitani Minoru regularly read-out 30-40 moves.

As an AGAAmateur 4 dan I read 10 moves pretty regularly, that's including variations. And if the sequence includes joseki (known optimal sequences of 15-20+ moves), then pros will read even deeper...

sawwit · on Jan 28, 2016

Yes, the latter number was perhaps too conservative; no doubt about deeper predictions being easily possible, but I doubt even expert players consider many alternative paths in the search tree. They might recognize overall strategies which reach many moves into the future, but extensive consideration of what will happen in the upcoming moves is probably constrained to a only few steps; at least relative to the number and depths of paths that AlphaGo considers.

jibalt · on Jan 31, 2016

"while this method does tree search all the way to the end of the game"

No it doesn't. You seem quite happy to just make stuff up that you know nothing about, like "2-3 moves into the future".

reddytowns · on Jan 28, 2016

If you took one expert and faced him against a room full of experts who all together decided on the next move, who would win?

blackskad · on Jan 28, 2016

The one expert, because the others would not be able to reach a decision on which move to play.

reacweb · on Jan 28, 2016

In fact, no. A big group of average experts appears to be better than a single super expert. This is the principal justification for the success of AI in oil prospective (https://books.google.fr/books?id=6DNgIzFNSZsC&pg=SA30-PA5&lp...)

Jach · on Jan 28, 2016

Counterpoint: https://en.wikipedia.org/wiki/Kasparov_versus_the_World

I think a key missing component to crowd success on real expert knowledge (as opposed to trivia) is captured by the concept of prediction markets. (https://en.wikipedia.org/wiki/Prediction_market) The experts who are correct will make more money than the incorrect ones and eventually drive them out of the market for some particular area.

jibalt · on Jan 31, 2016

That's no counterpoint because the World team (of which I was a member) was made up of boobs on the internet, not players of Kasparov's strength, which was the premise of the question you responded to.

blackskad · on Jan 28, 2016

The easy thing about combining AI systems is that they don't argue. They don't try to change the opinion of the other experts. They don't try to argue with the entity that combines all opinions, every AI expert gets to say his opinion once.

With humans on the other hand, there will always be some discussion. And some human experts may be better at persuading other human experts or the combining entity.

I think it would be an interesting thing to try after they beat the number 1 player. Gather the top 10 (human) Go players and let them play as a team against AlphaGo.

jibalt · on Jan 31, 2016

This is nonsense. To combine AI systems requires a mechanism to combine their evaluations. The most effect way would be a feedback system, where each system uses evaluations from other systems as input to possibly modify its own evaluation, with the goal being consensus. This is simply a formalization of argumentation -- which can be rational; it doesn't have to be based on personal benefit. And generalized AI systems may well some day have personal motivations, as has been discussed at length.

panglott · on Jan 29, 2016

This reminds me of the story of the Game of the Century, with Go Seigen's shinfuseki. https://en.wikipedia.org/wiki/List_of_go_games#.22The_Game_o...

https://en.wikipedia.org/wiki/Shinfuseki

zodiac · on Jan 28, 2016

the expert human games are used just to predict future moves