David Silver, Aja Huang1, Chris J. Maddison, Arthur Guez, Laurent Sifre1, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe,
John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach1, Koray Kavukcuoglu,
Thore Graepel1, Demis Hassabis
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of stateof-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm,our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
The challenge is daunting. In 1994, machines took the checkers crown, when a program called Chinook beat the top human. Then, three years later, they topped the chess world, IBM’s Deep Blue supercomputer besting world champion Garry Kasparov. Now, computers match or surpass top humans in a wide variety of games: Othello, Scrabble, backgammon, poker, even Jeopardy. But not Go. It’s the one classic game where wetware still dominates hardware.
David Ormerod: To start with please tell us a bit about yourself and your research interests. How did you learn of Go and how did you become involved in computer Go?
Martin Müller: I am a professor in the Department of Computing Science at the University of Alberta in Edmonton, Canada.
My research interests are in heuristic search, studying how to solve large, complex problems using computer searches.
The main application areas studied in my research group are games such as Go, and automated planning.
In recent years, Monte Carlo search methods have been our main focus – both for games and for planning. As part of my game-related activities, I am the leader of the team developing the open source software Fuego, which was the first program to defeat a top professional in an even game on 9×9.
I learned Go when I was 15 years old and played a lot in my teens and early twenties. I am a 5, 6 or 7 Dan amateur player, depending on the country. My biggest success was probably taking 2nd place at the US Go congress open tournament in 1998.
I became interested in computer Go as an undergraduate in my home country of Austria, through my supervisor. This was around 1985. I have stayed with the topic ever since, doing a Diploma thesis, a PhD and a few postdocs, before getting my current job.
What’s Monte Carlo?
Most people with any interest at all in computer Go know that the strongest programs these days use a ‘Monte Carlo’ algorithm, but many people don’t know much more about it than that.
Could you briefly explain where the term Monte Carlo came from and what it means in this context?
The term Monte Carlo refers to an affluent suburb of Monaco which is famous for its Casino. Monte Carlo methods use statistics collected from randomized simulations as a way to analyze complex systems which are too hard to ‘solve’ by other means.
They were first developed for nuclear physics and atomic bomb research in the 1940s. Nowadays they are very widely used, but their application to games such as Go took off just a few years ago.
Now that computers are powerful enough, Monte Carlo methods are used across a wide variety of disciplines.
For example, I’ve used them at work to help with risk analysis. It’s often difficult to explain to people why this approach works though, because it seems so counterintuitive at first.
Do you have a good analogy to explain how a large enough number of random simulations can provide a useful answer to a question?
Statistical sampling, which is at the core of Monte Carlo methods, is a very powerful technique.
For example, think about opinion polls. Any single random person who you ask about their opinion may be completely crazy, but if you ask one thousand people, who are carefully selected to represent the overall population, then you get quite a good idea of the general mood and can use that to make informed decisions.
This is why we keep getting more and more of those pesky phone calls doing surveys at dinner time!
How computer Go programs improved
It’s been more than five years since UCT (an extension of Monte Carlo search) was first applied to Go, but the strongest programs were still at the kyu level not that long ago (at least on 19×19 boards).
In contrast, the strongest programs these days are dan level and they seem relatively sharp, even in tactical situations.
To what extent do they make use of heuristics for shape, tesuji, life and death, the opening and so on?
Many programs use learned local patterns such as 3×3 for simple shape, and they modify the playouts to avoid some bad tactical moves.
Also, when there is a single important fight going on, the full board search will be able to analyze it quite deeply, and do well in the tactics. The problems start when there are several fights going on at the same time.
For the opening, some programs simply use large scale patterns to imitate popular openings played by human experts. But usually those are not absolute rules. These moves simply get a bonus, but the search can override them. So it is better than the hard coded ‘expert systems’ of the 1980s.
What other changes and improvements have helped computers get to their current mid-dan level on larger boards since then?
I think many factors are involved. Better patterns and rules as above, better search, better parallel scaling, several years of testing, debugging and tuning the programs, and better hardware all help.
What are the pros and cons of combining a knowledge based approach with a Monte Carlo approach?
Crazy Stone is a program that plays the game of Go (Weiqi, Baduk), by Rémi Coulom.
It is one of the first computer Go programs to utilize a modern variant of the Monte-Carlo tree search. It is part of the Computer Go effort. In January 2012 Crazy Stone was rated as 5 dan on KGS, in March 2014 as 6 dan.
Coulom began writing Crazy Stone in July 2005, and at the outset incorporated the Monte Carlo algorithm in its design. Early versions were initially available to download as freeware from his website, albeit no longer. Pattern recognition and searching was added in 2006, and later that year Crazy Stone took part in its first tournament, winning a gold medal in the 9×9 competition at the 11th Computer Olympiad. Coulom subsequently entered the program into the 12th Computer Olympiad the following year, winning bronze in the 9×9 and silver in the 19×19 competitions.
However, Crazy Stone’s most significant accomplishment was to defeat Kaori Aoba, a professional Japanese 4 dan, in an 8-stone handicap match in 2008. In doing so, the engine became the first to officially defeat an active professional in Japan with a handicap of less than nine stones. Three months later, on 12 December 2008, Crazy Stone defeated Aoba again in a 7-stone match.
On March 21, 2014, at the second annual Densei-sen competition, Crazy Stone defeated Norimoto Yoda, Japanese professional 9-dan, in a 19×19 game with four handicap stones by a margin of 2.5 points.
Crazy Stone computer Go program defeats Ishida Yoshio 9 dan with 4 stones
- 2014-03-24: Recently, Crazy Stone established a solid 6-dan rank on KGS, took the second place of the 7th UEC Cup, and won in the Densei-sen in a 4-stone game against Norimoto Yoda.
The Computer vs the computer
It was an ironic showdown between the computer and ‘The Computer’.
Ishida was nicknamed ‘The Computer’ in his prime, because of the accuracy of his counting and endgame skills.
Born in 1948, Ishida is now 64 years old.
However, back in the 70s, Ishida won the prestigious Honinbotitle for an impressive five consecutive years, making him one of the top players of that era.
After the game, Ishida said that he thought the program was a ‘genius’ and marvelled at the calmness and flexibility of its moves.
Zen is a strong Go engine by an individual Japanese programmer Yoji Ojima (cluster parallelism is added by Hideki Kato). On KGS several bots run engine maintaining ranks between 3d and 5d: Zen19, Zen19b, Zen19D and Zen19n. Zen was the first bot to hold a KGS 3d rating for more than 20 rated games in a row, and a blitz version seems to be holding 5 dan ratings in 2011. It was also the first to hold a 2d and 1d rating for more than 20 games, respectively. Hardware used to run Zen19 on KGS: Mac Pro 8 core, Xeon 2.26GHz.
It won the 2009 Computer Olympiad in Pamplona, Spain, running on the slowest hardware among the competitors. It also won the 2011 Olympiad in Tilburg.
Zen was released commercially under the name Tencho no Igo Zenith Go on 18 September 2009. Version 2 release on August 27, 2010 and version 3 release on 30 September 2011. Website for the software (Japanese) http://soft.mycom.co.jp/pcigo/tencho3/index.html
See latest go software updates for current version information.
In 2011, several different experiments of Zen started playing on KGS:
|Zen19N||4D||20 Minutes + 30 seconds Byo-Yomi||Mac Pro 8 cores, Xeon 2.26 GHz||Zen19N|
|Zen19B||5D||15 seconds per move||Mac Pro 8 cores, Xeon 2.26 GHz||Zen19B|
|Zen19D||6D||15 seconds per move||Mini-cluster of 6 PCs||Zen19D|
|Zen19S||5D||20 Minutes + 30 seconds Byo-Yomi||Mini-cluster of 6 PCs||Zen19S|
|Zen19||5D||15 seconds per move||Zen19|
The only version active in 2014 has been Zen19S