The theory behind the 'cold poker master.'
Jeopardy is a human problem "solved" by a machine, but poker is a machine problem at the outset. Winning, or maximizing one's winning potential, is a matter of analyzing a card game as a succession of states, wherein each state offers players probabilities of certain events happening or not happening that can then be maximized in the interests of winning. Poker is a natural computer science problem, but, fortunately, computers are rarely allowed at the table.
Yet, sometimes they are. Meet Lengpudashi, a poker bot developed by researchers at Carnegie Mellon University whose name translates to "cold poker master." Which is perfect. In a recent set of exhibition matches on China's Hainan island, Lengpudashi won $792,327 in poker chips over the course of five days and 36,000 hands. Its opposition was Team Dragon, a group of human engineers and computer scientists led by Yue Du, an amateur poker player and venture capitalist who took home a 2016 World Series of Poker golden bracelet (the first Chinese player to do so).
Lengpudashi's win meant an IRL $290,000 in prize money, which will be reinvested in Strategic Machine, an AI company founded by CMU researchers Tuomas Sandholm and Noam Brown. An earlier poker bot developed by the duo won $1,766,250 in chips earlier this year in an epic 20-day match against five of the world's top poker players.
However natural it may seem as an AI problem, poker isn't Go. Poker introduces the problem of incomplete information. A machine playing Go or chess can observe a board and know immediately the complete state of the game. Like a human poker player, however, the poker bot only knows what it sees. The best it can do is count cards—based on what's on the table and in my hand, what cards are left in the deck? This set of remaining cards only constrains possibilities. Luck makes the final call. Luck, after all, is just what we don't know.
More technically, poker is an example of an information-imperfect game. Sandholm and Brown go deeper in a paper presented earlier this year at the AAAI-17 Workshop on Computer Poker and Imperfect Information Games in San Francisco. The technique described in the paper for such problems is a variation of what's known as "endgame solving."
In a information-complete game like checkers or chess (or Go), the computer's strategy is to decompose the larger game into a bunch of smaller games. Iteratively solve those problems and you should wind up with a solution to the whole problem. This is endgame solving as it's usually understood. Checkers was definitively solved in this fashion.
"In imperfect-information games, endgame solving is drastically more challenging," Sandholm and Brown write. "In perfect-information games it is possible to solve just a part of the game in isolation, but this is not generally possible in imperfect-information games." Imperfect-information games have to be solved as whole entities, not through decomposition.
This is a problem with large games, like No-Limit Texas Hold'em. No-Limit Texas Hold'em has 10 165 nodes that have to be computed in order to be "solved."
The general answer to this problem is for algorithms to create more manageable abstractions of the whole game that are essentially miniaturized versions. Solve the little version of the game, and you can then remap it to the larger game. The catch is that a lot of complexity from the real game is lost in the abstraction process, which in many cases means the remapping just doesn't do the job.
The solution devised by Sandholm and Brown is known as Reach-Maxmargin refinement. The basic idea is to take this mini-games and imagine them along different paths that the whole game might take as it evolves. In their words, it's "a new method for refining endgames that considers what payoffs are achievable from other paths in the game."
The result is essentially an algorithm that knows how to bluff. It understands that it can benefit from preventing the game from progressing down certain paths and seeks to manipulate that. "People have a misunderstanding of what computers and people are each good at. People think that bluffing is very human —it turns out that's not true," Brown told Bloomberg. "A computer can learn from experience that if it has a weak hand and it bluffs, it can make more money."