Dwango Media Village(DMV)

A 24-hour live broadcast of Marltas' learning was held

Screenshot
Screenshot

A live broadcast of the system ‘Marltas’, developed by DMV to learn how to play RPG Atsumaru, was held on Nico Nico Live on Wednesday, August 26th. In this program, the learning process of Marltas was broadcast from the beginning to the end of the program, using two games posted on RPG Atsumaru as the subject. At the opening and closing of the program, under the guidance of game commentator/creator まっくす, an introduction to Marltas and the Nico Nico Indie Game Fest Rookie Award 2020 was held. In particular, a game showdown between Max and Marltas was held at the end of the program.

In this article, we will review the broadcast content, analyze how the learning progressed, and introduce the gameplay video of the high score reached after the program ended.

Broadcast Content

The program consisted of an opening and closing one-hour segment, with the learning process being broadcast continuously for 21 hours. The games that were the subject of the learning were “To Hole of Hell (ver1.1)” and “COSMIC SHOOTE (ファミコン互換)” posted on RPG Atsumaru. To Hole of Hell is a forced scrolling game where you control a character to progress as deeply as possible in the stage without hitting enemy characters. COSMIC SHOOTER is a shooting game where you aim for a high score by avoiding enemy planes and bullets while defeating enemies. Both games have game overs, and the repeated play of Marltas was broadcast. Below is a demo that reproduces the video of the game play and the state of the game pad during play.

COSMIC SHOOTER

The above video is an example of gameplay at the beginning, middle, and end of the program. In COSMIC SHOOTER, “2Hours”, “10Hours”, and “22.5Hours” correspond to 2 hours, 10 hours, and 22.5 hours after the start of learning, respectively. The 22.5-hour mark was played in the showdown with Max. At the start of learning, the play is not good and quickly results in game over, but as learning progresses, you can see the improvement.

To Hole of Hell

In To Hole of Hell, similar to COSMIC SHOOTER, “2Hours”, “10Hours”, and “19Hours” correspond to 2 hours, 10 hours, and 19 hours after the start of learning. The 19-hour mark was also used in the showdown. During the live broadcast, a technical explanation of deep reinforcement learning, which is the learning mechanism of Marltas, was given by Sasaki, a member of the development team. Marltas uses deep reinforcement learning, specifically a method called Deep-Q Learning. This article also provides an introduction. The technical explanation segment started with the framework of reinforcement learning problem setting, introduction of action value, necessity of Q-learning, and then the introduction of deep learning. Based on the properties of Deep-Q Learning, the features of games that Marltas excels at and struggles with were also introduced. The slides used in the program can be viewed below.

Analysis of Learning Results

Progress of To Hole of Hell scores
Progress of COSMIC SHOOTER scores
Score progression during learning

What kind of growth did Marltas achieve as a result of the learning conducted during the program? The above graphs plot the minimum/maximum, average/median, and top 10 percentile values of the scores obtained in line with the learning progress. In To Hole of Hell, scores increased linearly from the beginning but stopped growing after about 12 hours of learning. This is likely due to the inability to effectively deal with the increasing number of enemy characters (avoiding enemies and collecting life recovery items) as the game progresses. In COSMIC SHOOTER, scores increased rapidly up to about 6 hours but then grew more slowly. This timing coincides with the appearance of enemy characters that did not appear in the early stages, and it is believed that changes in game progress in the latter half caused similar issues as the other game.

Continued Learning After the Broadcast

Progress of To Hole of Hell scores
Progress of COSMIC SHOOTER scores
Learning progression after the program ended (including scores from unscreened play)

Although learning ended during the program, it continued afterward. The above graphs show the results of continued learning after the program. The plotted scores reflect information from plays generated during the exploration in learning, not just the gameplay shown during the program. Therefore, the values differ from the graph with only the scores replayed during the program, even within 24 hours. In To Hole of Hell, the score growth stopped toward the end of the program and stagnated for a while but started to rise gradually again after about 90 hours. COSMIC SHOOTER did not show significant changes in average scores after the program but did show fluctuations in maximum scores.

In the demo above, you can watch the best plays of each game obtained after the program by selecting “Best”. In To Hole of Hell, it did not quite reach Max’s game clear (100th floor). COSMIC SHOOTER significantly updated the high score during the program, reaching 9380 points.

Notes

We would like to thank あおいたく (To Hole of Hell), suzukiplan (COSMIC SHOOTER), and all the viewers who watched the program for developing the games used as the subject of the program. Thank you very much!

Through this program project, we learned many things. Before starting the program, there were concerns that simply watching the learning process might be monotonous and boring. However, actually doing it made us expect better scores next time, and we found ourselves engrossed before we knew it. The comments during the broadcast not only focused on Marltas’ button operations but also mentioned the features and fun of the games that were the subjects, which was impressive. You can understand the fun of a game by reading the promotional text or watching human gameplay videos. However, on the other hand, there might be an aspect of discovering the fun of a game while vaguely watching the many failed plays.

As announced during the program, the Nico Nico Indie Game Fest Rookie Award 2020 includes the “Marltas Award” for games to be played by Marltas. The award-winning works will be actually learned by Marltas, a unique prize. In addition to this Marltas Award, we plan to add more features to enjoy video games even more, so please continue to support Marltas.

Author

Publish: 2019/09/05

Kazuma Sasaki