Week 6: RL Meets Original Game
April 9, 2024
Hi everyone! I did a lot more work than expected this week—and so did my computer, but I am reaching the completion of my first game prototype. For the past weeks, the AI and game components of my project have remained separate, but I am finally combining them.
AI Training
Throughout the week, I continued training my AI to produce a better Q table. To my surprise, training for a mere 10,000 iterations (as compared to the customary 100,000) took approximately 2 days! Later, I found that this was because the AI trained at a much slower rate when my computer was sleeping, so I adjusted the computer’s settings to make it sleep only after the program had finished running. Though this allows the training to complete after only 14 hours, it also puts a greater strain on my computer, so I have been training the AI in intervals of 10,000 instead of all at once. At the moment, my Q table is not necessarily the most accurate, but it’s enough to work with.
Adding the AI to the Game
The original game from MIT Beaverworks, which I have been working with, already contained a different AI assistant. This assistant was an image classification CNN (convolutional neural network), which identified what humanoid was in the currently displayed image and suggested an appropriate action to take. To obtain the CNN’s wisdom, players could use the “Suggest” and “Act” buttons. I have repurposed the Suggest button to offer my RL agent’s “suggestion” instead. However, to be deliberately ambiguous, I formatted the suggestion as 3 numbers, corresponding to the 3 actions that players could take for each humanoid. Here is what it looks like so far:
Here, the highest value, 9.3 for save, indicates what the agent believes is the best choice, but 8.6 for skip also appears to be a close contender. I am hoping that certain situations will cause dilemmas for players, where two or more of the values are extremely close to each other.
Making the Suggestions Make More Sense
On their own, the raw Q values (which are what I’m displaying) don’t make much sense magnitude-wise, since they simply reflect the AI’s expected reward for that action. For example, I currently have values spanning from 10-4 to 10, which can be confusing: what if the AI suggested 0.1 and 0.2 for the actions? Would that be different from 9 and 9.1?
Thus, my advisors proposed that I process them using a softmax (normalized exponential) function. The softmax function normalizes a set of numbers into a probability distribution, spanning from 0 to 1:
By doing this, I can give the suggestion values a meaning beyond just their magnitudes relative to each other, while narrowing the range of values present in the suggestions. I will most likely spend the rest of the week working on the suggestion values, as well as continuing to train my AI.
Thank you for reading and see you next week!
Works cited:
Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). “6.2.2.3 Softmax Units for Multinoulli Output Distributions”. Deep Learning. MIT Press. pp. 180–184. ISBN 978-0-26203561-3.
Sako, Y. (2018, June 2). “Is the term ‘softmax’ driving you nuts?” Medium; Medium. https://medium.com/@u39kun/is-the-term-softmax-driving-you-nuts-ee232ab4f6bd
Leave a Reply
You must be logged in to post a comment.