7: Reinforcement Learning, part 4
May 2, 2025
Right now, I’m debating on whether, during training, the Q-network should be trained on producing Q-values based on the hands the LSTM predicted, or the hands the players actually held.
Using predicted hands:
- CON: More logistically challenging to implement
- CON: Predictions ould have additional noise which may distract from the relationship between players’ cards and Q-values
- PRO: Uncertainty can be learnt as a factor in decision-making
Using actual hands:
- PRO: Do not have to use new implementation
- PRO: Allows Q-values to learn precisely relationship between cards and rewards provided when asking & calling
- CON: Does not reflect uncertainty. This leads to some problems, such as calling set when uncertainty is high.
So I ended up using predicted hands instead, and it works well enough.
Also, my model still doesn’t know how to call set. I’m thinking I’ll just collect all the memories where the model incorrectly/correctly called set and provide it a designated training session on that instead. Other than that, it works surprisingly well—it has an average of around 35% validation accuracy on correctly determining the location of each card (given that in an actual game, you wouldn’t know everyone’s cards anyways), and it’s also able to learn pretty well based on its hand predictions where they can successfully ask for cards. In a passing observation, its weak spot is coordinating with teammates to obtain all cards in a set, as the mechanism for calling set isn’t well developed yet. In the last stretch, I will try to refine that process, and move onto evaluating the model against human players.
Leave a Reply
You must be logged in to post a comment.