Week 2: The Coding Journey Begins in Earnest
March 11, 2024
Hi everyone and welcome back to my blog! This week, I embarked on the path toward coding and training the AI assistant that will (hopefully soon) become a part of my game. Before we start, though, I’d like to define some reinforcement learning (RL) terms that will be used later in today’s blog entry.
Environment: what the agent (AI) must navigate/solve (in this case, the game itself)
State: the current configuration that the environment is in
Action space: the set of all actions that the agent can take
Observation space: the set of all things that the agent knows about a given state
Q-Learning
I decided to model my RL environment and agent off of a sample environment in OpenAI Gym called Blackjack. This environment simulates a game of blackjack, where the agent plays against a dealer and eventually learns to create an optimized strategy. Since blackjack involves drawing randomized cards, it was the most optimal base for my custom environment, which centers around randomly generating humanoids. Using this base, I created the simple version of my game as an RL environment, where the action space consists only of 3 actions (squish, save, skip) and the observation space consists only of 3 humanoids (zombie, injured human, healthy human). I then modified the reward values for each possible action/state combination, and the environment was ready to be used.
After setting up the environment and ensuring it could run with randomly selected actions, I then began creating the Q-learning agent. Q-learning relies on the AI updating a Q-table, a matrix of values corresponding to each action/state combination, as it trains. In order to ensure that the AI continues to train instead of making a new Q-table from scratch each time the code is run, I save the Q-table as a pickle file when the round is terminated.
Github
During our meeting, my advisors suggested I set up a Github to make updates, comments, and feedback more convenient. However, this resulted in much more work than I had expected: I began by creating an empty remote repository and assigning a URL to it. Then, since terminal password authentication for Github is no longer supported, I had to learn about and create a personal access token. After trying to use this token to access and clone my newly created empty repository, I found that the repository somehow never existed. Thus, my next step is to try an alternate method of creating the repository so that I can add my code to it.
Literature Review
This week, I read MIT’s Human-Machine Teaming (HMT) Systems Guide to learn more about how I can potentially improve my game in terms of the AI’s collaboration with players. In particular, the General HMT Requirements section was useful for designing my game and ensuring that the AI assistant aids human players without obstructing their ability to make decisions somewhat independently. Some of the guidelines, involving information presentation and clarity in attention-directing, have made me reconsider the layout of my game’s UI in order to draw more attention to the AI suggestion component. I will also likely remove the time limit on decision-making in order to facilitate greater collaboration between the AI and the human player.
Thank you for reading and see you next week!
Works cited:
Mcdermott, P., Dominguez, C., Kasdaglis, N., Ryan, M., Mitre, I., & Nelson, A. (2018). Bedford, MA Human-Machine Teaming Systems Engineering Guide. https://www.mitre.org/sites/default/files/2021-11/prs-17-4208-human-machine-teaming-systems-engineering-guide.pdf
Leave a Reply
You must be logged in to post a comment.