BrawlNet - Deep Reinforcement Learning for Brawlhalla

Personal Project

Introduction

BrawlNet is a deep reinforcement learning model that is aimed to be capable of playing the video game "Brawlhalla" with proficiency comparable to human players. This work originated from my enjoyment of the game, which I frequently played with a friend who is often better at it than me. I wondered if I could train an AI to play the game and beat him as a harmless prank, with no intention of using the AI in online lobbies. "Brawlhalla" operates in a dynamic, fast-paced, 2D platform fighting game environment that requires strategic real-time decision-making. My primary goal was to construct an AI agent that could understand and navigate the gameplay as proficiently as humans do, using reinforcement learning and convolutional neural networks.

Initial Approach

The initial idea was to create a synthetic image dataset where I randomly placed a player character on a random location of a map with random health values. This dataset would facilitate the training of a game state detector model capable of assessing real game screenshots. Subsequently, this detector would enable the training of a second AI that would watch Brawlhalla championship videos to learn top-level gameplay. This approach would have been simpler with real-time mouse input data from users, but such data is typically absent in professional gameplay videos, necessitating inferences based on the kinematic motions of characters. Recognizing the complexity of this approach, I decided to change my strategy.

Model Design

The core of the project involved capturing and decoding real-time game state data effectively. This data, in the form of screenshots, served as a means for the AI agent to comprehend the virtual environment. Each screenshot was downscaled to a fixed size, greyscaled, and then normalized to reduce computation load. The processed set of preceding five frames was then input into the designed model, termed 'BrawlNet', a convolutional neural network specifically outfitted for understanding motion and action consequences. To facilitate faster training, the game's tick speed was increased 20 times, allowing the AI to process and learn from game states more efficiently.

BrawlNet accepts stacked grayscale images as inputs, processes them through multiple convolutional layers to extract meaningful data and interpret game dynamics. Each layer consists of an increasing number of filters for detailed and comprehensive learning of attributes from the game's state images. Following the processing, an output layer generates 21 distinct potential actions corresponding to distinct game controls such as movement and attack options.

Adapting to a constant learning curve, the RL algorithm is a variant of policy gradient methods. The choice of this method was primarily driven by the necessity for the model to make decisions in a continuous action space, learning from delayed rewards. The AI agent received feedback from the reward function based on various in-game events like knockouts, damage taken or dealt, and falls.

Testing and Challenges

Upon testing the model for numerous episodes, the AI showed gradual improvement in understanding the gameplay dynamics, adapting to playing styles, evading attacks, and targeting opponents. However, after 8 hours of training, the agent was only slightly better, indicating that further improvements were needed. One of the next goals is to initially train the model using my own gameplay, providing a quick-start understanding of actions that yield high rewards and those that result in negative rewards. After this initial phase, the AI will continue to train on its own to further accelerate learning. Additionally, another approach to be tested is increasing the game tick speed even further and parallelizing the training process to expedite learning and enhance the AI's performance.

Conclusion

The project foregrounds potential future work, including integrating recurrent neural networks to enhance temporal understanding, experimenting with advanced RL algorithms like Proximal Policy Optimization, and ongoing adjustments of the model's hyperparameters based on performance metrics. While challenging on many fronts, this project solidifies a significant stride toward the application of deep reinforcement learning in intricate real-time game settings. Despite certain limitations, it has demonstrated promising progress in developing sophisticated AI gaming agents.

Go back