Dwango Media Village(DMV)

Deep Learning Mahjong AI "NAGA"

This graph plots NAGA’s confidence levels (explained later) at the time of tile discarding throughout a half-game. You can view the game state and the distribution of tile discarding reasons by selecting points on the graph. Please give it a try. Note that the confidence level at the end of the game is 0.

Overview

At DMV, we have been working on creating a Mahjong AI, NAGA (Neural Architectural Game Agent, Twitter Account), by deep learning from the game records of high-ranking players to make tile selection and call decisions. As a result, NAGA has achieved the highest rank of Eight-Dan in the online Mahjong game Tenhou. In this article, we introduce our efforts to create a strong Mahjong AI using deep learning.

Mahjong

Mahjong is a game of incomplete information where the tiles held by opponents are not visible, and it is uncertain which tile will be drawn next. The difficulty lies in the fact that there are four players, making it hard to predict the state of the game by your next turn, and that the final performance is determined by multiple rounds, requiring long-term strategy. Conventional Mahjong AIs [Mizukami] have tackled these issues using methods such as expected final ranking and Monte Carlo methods, whereas NAGA uses deep learning.

Tenhou

Tenhou is one of the largest online Mahjong sites in Japan and uses a rank system. Players can only compete in tables according to their rank, which changes based on rank points. Rank points fluctuate based on the current rank, table, and match results (rankings). NAGA started participating in Tenhou on October 22, 2018, with a bot account that can play up to the Tokujou (upper-intermediate) table, starting from the general table, ranking up to the intermediate table on October 24, 2018, and to the Tokujou table on October 29, 2018, as a Fourth-Dan. We thank everyone who played against NAGA.

Rank distributions and rank point fluctuations as of January 17, 2019

As of January 17, 2019, the number of players in each rank and rank point fluctuations (from Online Mahjong Tenhou / Manual). When playing in the Tokujou table as an Eight-Dan, gaining first place awards +75 points, second place +30 points, third place ±0 points, and fourth place -150 points. Thus, one fourth place requires two first places to offset.

AI Mechanism

This AI consists of four CNN models: tile discarding model, calling model, riichi model, and kan model. The tile discarding model is policy-based, taking as input information that completely reproduces the current situation, such as the player’s hand, discards of all four players, calls, scores, and the number of unseen tiles, and outputting the distribution of which tile to discard. Other models similarly take the available table information at that moment to decide what action to take.
Heuristics are incorporated for winning judgment in the final round (avoiding winning if it results in the last place), while most other actions are determined solely by CNN outputs. For training each model, we used the game records of the highest-level table in Tenhou.

NAGA’s tile selection mechanism
NAGA's tile selection mechanism

In Mahjong, not only aiming for a win but also folding is important. This folding judgment has been a challenging task for traditional Mahjong AIs. NAGA learns appropriate behaviors from the game records it trains on without explicitly designing offensive and defensive judgments.

Situation where NAGA folds
Situation where NAGA folds

Here, NAGA chooses to discard a safe tile (6m) after drawing an unsafe tile (1p) against a riichi from the lower player, breaking its tenpai.

Mahjong’s victory is determined by multiple rounds, such as East-only games and half-games, so the strategy changes based on the game’s progress. NAGA, which learns from human actions, reflects such complex long-term strategies.

Comparison of riichi rates per round between NAGA and human players (highest-level table)

Situations of Uncertainty

NAGA tweets about situations it was uncertain about, along with match results, after each match. These uncertain situations are selected based on the confidence estimation results [DeVries]. NAGA uses confidence levels for training each CNN model. If it is not confident in the predicted action distribution, it incurs a penalty and refers to the correct answer. Thus, situations with low confidence are those where NAGA was uncertain about tile selection. On Twitter, NAGA tweets about the situation with the lowest confidence level below 50%. If all situations have confidence above 50%, it does not tweet an uncertain situation.

NAGA’s match result tweet
NAGA's match result tweet

Basis for Decisions

Since NAGA inputs almost all table information, Guided Backpropagation [Springenberg] can visualize the basis for each decision. By calculating the gradient that amplifies its selection, it can visualize the features that led to that selection. Conversely, calculating the gradient that attenuates its selection can visualize the features that would lead it not to choose that action. In the deeper layers of the neural network, it is believed that decisions are made at a complex level, such as “having a double-sided wait of 3p5p7p, and 4p6p are not yet discarded,” but the current mechanism visualizes at the unit level of the inputs used.

Basis for tile selection
Basis for tile selection

Conclusion

NAGA will continue to compete in the Tokujou table on Tenhou. Please look forward to playing against it.

References

[Mizukami] 水上直紀, 鶴岡慶雅, 期待最終順位の推定に基づくコンピュータ麻雀プレイヤの構築, Proceedings of the 20th Game Programming Workshop, pp. 179–186 (2015). http://www.logos.ic.i.u-tokyo.ac.jp/~mizukami/paper/GPW_paper_2015.pdf

[DeVries] T. DeVries and G. W. Taylor, Learning confidence for out-of-distribution detection in neural networks. ArXiv(2018). https://arxiv.org/abs/1802.04865

[Springenberg] J. T. Springenberg, A. Dosovitskiy, T. Brox, M. RIedmiller, Striving for simplicity: The all convolutional net. ArXiv(2014). https://arxiv.org/abs/1412.6806

Publish: 2019/01/21