Week 5: AI Training is Finally Done!
April 2, 2024
Hello everyone! The end of this week marks a milestone in my senior project: the end of the AI phase. After completing the environment that represents my game last week, I modified the AI agent that I had previously made so that it could train on the environment. Now that the training is complete, I can begin adding the AI policy to the game itself as a UI feature.
But First, Some Modifications…
To my surprise, my AI agent took over 24 hours to train last week. This was a far cry from the 15 minutes that it had spent training on the simpler environment, so I decided to investigate with the help of my advisors. When I printed the Q-table, I realized that it was much larger than before yet mostly populated with 0s. I immediately realized the issue: I had made the observation space unnecessarily specific. For example, rather than setting the ambulance capacity to a specific number, I could make it a boolean—“has_space.” Similarly, the amount of time remaining could be expressed as “high,” “low,” and “none.” After I make these modifications, I expect that the training time will decrease significantly along with the size of the Q-table.
An Update to the Render Function
With the help of the Python API, I eventually finished creating the render function. Most of the difficulty (besides figuring out functions and syntax) was in positioning the text and images in a somewhat organized format. Though render doesn’t serve much of a purpose in terms of training or gameplay, it’s useful as a visual reference:
Next Steps
The original version of my game contained an image-classification CNN which also acted as an assistant. In order to get suggestions from the CNN, players could use two buttons: “Suggest” and “Act.” Suggest originally displayed the type of humanoid the CNN believed was currently onscreen, and Act performs the corresponding action for that type of humanoid. I will be replacing the CNN with the RL agent that I have just finished training, but the buttons will still be used. Instead of suggesting what type of humanoid the player is looking at, though, the “Suggest” button will display the four possible actions with corresponding “confidence” values based on the Q-values from the completed Q-table. The “Act” button will perform the action with the highest confidence value once clicked.
Thank you for reading, and I hope to see you next week!
Leave a Reply
You must be logged in to post a comment.