Selective game state processing (ARV #0001)

Approaches for agents in games are simple from a high level – your agent observes some state from the game (e.g. where you are, your inventory, nearby items) and then decides on some action (e.g. dodge, shoot, interact, taunt).

This seems simple, until you begin to dive deeper into the state and action space of your game. You begin to encounter a world of extremely complex data – continuous positions, health and item information, match metadata, ability cooldowns, vision, animation progress, velocities, etc… You end up with a far more complex picture of the game in a single frame than you could have ever imagined.

Screenshot of just some of the state observed in a single frame of Dota 2 – From OpenAI

Thought experiment: how do players learn to play games? When I’m playing League of Legends (any Zilean mains out there?), I’m much more selective about the state I’m observing. In one moment, I may be considering where my enemies are. In another, I may be observing how many kills my team has, if objectives are coming up soon, what abilities I have available, etc. The thing is, I’m not looking at every available state at every moment. Imagine learning a new game like League of Legends as a human, and at every moment, you have to look at every single piece of state available to you before taking an action, with no initial advice on what is most important. It would take you a long time to learn, and this is exactly what typical AI agents in games see.

I wonder if typical agent approaches, whether it be RL, imitation learning, LLMs, behavior-trees, etc… suffer from an issue of too much state every time an action needs to be sampled. I wonder if too much state causes:

  1. An excess of complexity in learning a policy (this is especially evident in behavior trees, where lots of state requires more and more configuration of the tree).
  2. The model to get “distracted” by this extra state during learning. I believe most ML practitioners would say that the model would simply learn to ignore this state, but is that causing the model to take longer to converge? Maybe this is why it takes tens of thousands of years of gameplay experience to train a Dota 2 agent when a player can become a great player within one lifetime.*

Maybe there is something here – is there a way to selectively filter for specific states at certain moments in a game, which allows agents to learn gameplay more easily, by ignoring irrelevant state? I feel that there must be something here, potentially similar to the recent work shown in Mamba, where a foundation model was trained with this idea of “selective state spaces”, where the model could “ignore” parts of the input sequence that it doesn’t care about. They found that the model was faster for inference and training.

We should think about something similar for models and approaches in game AI – incorporating this idea of selective state filtering during the training phase, to enable agents to filter out the irrelevant game state that they probably do not require.


* I say lifetime because playing Dota 2 probably requires some knowledge learned as part of being human, such as skills learned at a young age – for instance, how to read, understand what a character is, what it means to attack something IRL, etc…

Other notes and resource

https://openai.com/research/openai-five

Started writing this post on Feb 5, 2024

Leave a comment