A Problog-Based Stochastic Agent For One-On-One Fighting Game

Introduction

In keeping with tradition, I created a ProbLog-based agent to challenge human players in the 2D video game called DareFightingIce. The agent's official repository is here

This post explores probabilistic programming techniques, and it will showcase an example of a stochastic AI agent that can defeat any novice human player, especially if the organic agents choose their next action randomly.

FightingICE is a 2D fighting game platform developed specifically for AI research and competitions. The game starts with a match, which consist of three rounds. In each round, the agent wins when the opponent's health points (HP) is 0 or below the agent's at the end of the round. The match is won if the agent secures victory in at least two rounds. Each round lasts 60 seconds, and the game runs at 60 frames per second. An agent can perform three types of actions: movement, attack, and defense. Attack actions are divided into two categories: normal, such as kicking or punching, and special, such as launching an energy ball or executing a super energy move. Each action must be decided within 1 to 15 milliseconds. This rapid decision-making ensures the agent gameplay remains fast-paced, dynamic and competitive. By design, the opponent agent's state information is delayed by 14 to 15 frames. In other words, the agent receives updates about the opponent's HP, energy, actions taken, and other status indicators with this delay. This intentional lag forces the agent to make decisions based on slightly outdated information. As a result, players must anticipate their opponent's moves and adjust their strategy in real time, adding an extra layer of complexity to the gameplay.
For more on the game, please refer to the official repository, which can be found here.

ProbLog is a probabilistic logic programming toolbox designed to facilitate the development of complex, heterogeneous programs that model uncertainties in the domain. ProbLog extends traditional logic programming by allowing facts to be annotated with probabilities, reflecting the likelihood of their truth. E.g.

% Probabilistic facts:
0.5::heads1.
0.6::heads2.

% Rules:
someHeads :- heads1.
someHeads :- heads2.

% Queries:
query(someHeads).

This computes the probability of at least one coin to be head (0.6 represents a biased coin).
For an in-depth exploration of the topic, please visit the official website, where you'll find comprehensive resources and detailed insights.

My agent uses its knowledge base (KB) to determine the optimal next action based on the states of both agents. By analyzing detailed information about its own status and that of its opponent; including health, energy, and previous actions. The KB enables the agent to predict future moves and select the most effective strategy.

Here is an example of my agent (P2) against Monte Carlo Tree Search based agent (P1):

Context

FightingICE is a Java-based fighting game platform that provides integration with Python through the pyftg package. To enable Python-based agents to interact with the game, the program must be launched with the --pyftg-mode argument.
(E.g.:java -cp "bin:lib/*:lib/lwjgl/*:lib/lwjgl/natives/linux/amd64/*" Main --limithp 400 400 --grey-bg --gRPC --pyftg-mode)

When this mode is activated, FightingICE initializes a socket communication channel that serves two key purposes:

It allows the game to initialize Python agents
It establishes a communication protocol for receiving commands (actions) from these agents

As seen in the script, the game is executed with several other parameters like --limithp 400 400, --grey-bg, and --gRPC, but the --pyftg-mode specifically enables the Python interface functionality.

To create an agent for the DareFightingICE game, you must implement the abstract AIInterface class, which defines several essential methods called during the game execution. If you want more context, you can see these examples.

The action selection logic in the Python component is primarily handled by the processing method, which performs the following functions:

Historical Data Update: The method begins by updating historical data such as my_hps, opponent_actions, and other relevant game state information.
Knowledge Base Clause Update: The method updates all relevant clauses in the ProbLog knowledge base (KB), including the current positions of both agents, their health and energy levels, and other dynamic game state parameters. This ensures the KB reflects the current game state for the agent and slightly outdated for the opponent's one.
Action Inference via ProbLog Query: The agent queries the KB using the find_my_best_action(BestAction, BestUtility) predicate to infer potential actions and their associated utilities.
Action Selection with Weighted Randomness: The method processes the results of the ProbLog query, discarding any actions with a probability of 0. It then calculates a weight for each remaining action by summing its likelihood (probability) and utility score. Finally, it selects an action randomly, with the selection biased towards actions with higher weights. This introduces a strategic element of randomness, prioritizing actions considered more promising by the KB while still allowing for the exploration of potentially beneficial alternatives.

The Agent's Core Logic

My bot leverages its knowledge base (KB) to determine the optimal action. This KB consists of facts and rules, some of which have a probability to be true.
The rule entry point for the action query is called find_my_best_action and is defined as follows:

find_my_best_action(BestAction, BestUtility) :-
    curr_pos(me, X1, Y1),
    curr_pos(opponent, X2, Y2),
    curr_energy_value(me, MyEnergy),
    prev_energy_value(me, MyPrevEnergy),
    curr_energy_value(opponent, OppEnergy),
    prev_energy_value(opponent, OppPrevEnergy),
    predict_opp_next_action_type(PredOppActionType),
    curr_hp_value(me, MyHP),
    prev_hp_value(me, MyPrevHP),
    curr_hp_value(opponent, OppHP),
    prev_hp_value(opponent, OppPrevHP),
    prev_action(me, MyPrevAction),
    prev_action(opponent, OppPrevAction),
    facing_dir(me, MyFDir),
    hbox(opponent, OppHBox),
    possible_actions(
        X1, X2, Y1, Y2, MyFDir, MyHP, MyPrevHP, OppHP, OppPrevHP, MyEnergy, MyPrevEnergy, 
        OppEnergy, OppPrevEnergy, PredOppActionType, MyPrevAction, OppPrevAction,OppHBox, ActionList),
    length(ActionList, L),
    (
        (L > 0,
            find_utilities(ActionList, UtilityList),
            sort(UtilityList, SortedList),
            last(SortedList, BestUtility-BestAction)
        );
        (L =:= 0,
            BestAction = crouch_fb,
            BestUtility = 0.5
        )
    ).

This rule binds all the relevant state values to variables, for example, curr_pos binds the x,y position to the variables X and Y. It then estimates the opponent's next action type: special, attack, movement, defense, or non_attack (a state for actions not covered by the previous categories) and returns the action with the highest utility. This process is accomplished through the possible_actions rule, which selects at most the top k actions with the highest weights (calculated based on potential damage, current agent's state, etc...) that are executable successfully (hit the opponent, block the opponent's attack action, etc...).

These candidate actions are then evaluated by the find_utilities rule, which computes a utility score for each action. Once the utility scores are determined, the system selects randomly the action biased towards the one with the highest overall utility plus the marginal probability of that action. This final choice is executed by the bot, ensuring that its behavior is both reactive to the opponent's anticipated moves and optimized for success.

Running the Agent

To run the Problog-based AI agent, you have a couple of options: using the provided shell scripts or manually starting the game and agent in separate terminals.

Option: Using the run_py_ag.sh script. This script simplifies the process by handling both launching the game and connecting the agent. Here are some examples:

# Basic usage
./run_py_ag.sh problog_agent/MainMctsVsProblog.py

# To run with a specific port
./run_py_ag.sh problog_agent/MainMctsVsProblog.py -p 4242

# To run in headless mode (without GUI)
./run_py_ag.sh problog_agent/MainMctsVsProblog.py --headless

# To play against the agent using a keyboard
./run_py_ag.sh problog_agent/MainKeyboardVsProblogA.py

Manual startup:
1. Start the DareFightingICE game:
```
cd DareFightingICE_CODE
./compile_run_linux.sh
```
2. In a separate terminal, start the agent:
```
python problog_agent/MainMctsVsProblog.py
```
  You can also specify command-line parameters such as --host, --port during game execution and Python agent connection.

Evaluation and Results

Runnable notebook is here

Conclusion

The Problog agent demonstrated significantly superior performance compared to the other bot. However, its reliance on a single, albeit effective, strategy may present a vulnerability. Opponents could potentially exploit this predictability by developing counter-strategies, thereby fostering a more competitive environment. While this tactic proved successful against the MCTS bot, further investigation is warranted to assess its robustness against a wider range of opponents and playing styles. However, the probabilistic agent is strong, is responsive, and the experiments suggest that it might be adaptive. The development process presented significant complexities. It necessitated a comprehensive understanding of the game mechanics and the construction of a knowledge base that maintained computational tractability. Furthermore, the relative immaturity and infrequent updates of the chosen probabilistic programming language (ProbLog) introduced unexpected behaviors and challenges, potentially hindering the development of robust and functional programs. Despite the aforementioned limitations, ProbLog offered a framework for representing and reasoning with uncertainty in a real time FightingICE setting. The ability to integrate probabilistic inference with logical rules proved crucial during agent decision under conditions of incomplete information. Future development could focus on enhancing the agent's action selection strategy by diversifying the range of inferred actions while maintaining a high damage output and winning rate. This could involve incorporating a mechanism to select the optimal action based on the predicted opponent state. Additionally, dynamically adjusting action probabilities by analyzing opponent behavior could enable the agent to adapt more effectively and counteract with a more diverse set of actions. Furthermore, the agent's adaptability could be significantly enhanced by incorporating an online learning mechanism to dynamically adjust action probabilities. This could involve continuously monitoring opponent actions and updating the probabilities of different actions based on their observed frequency and effectiveness. Such an approach would enable the agent to evolve its strategy in response to opponent behavior, potentially leading to more robust and unpredictable gameplay. For example, if the opponent frequently uses a jumping attack, the agent could increase the probability of selecting anti-air countermeasures.

That's all folks.

Credits:

Main photo: Merlin Lightpainting

Supporting

Support my work by renting space on my website: The Link