7: Reinforcement Learning, part 4

May 2, 2025

Right now, I’m debating on whether, during training, the Q-network should be trained on producing Q-values based on the hands the LSTM predicted, or the hands the players actually held.

Using predicted hands:

CON: More logistically challenging to implement
CON: Predictions ould have additional noise which may distract from the relationship between players’ cards and Q-values
PRO: Uncertainty can be learnt as a factor in decision-making

Using actual hands:

PRO: Do not have to use new implementation
PRO: Allows Q-values to learn precisely relationship between cards and rewards provided when asking & calling
CON: Does not reflect uncertainty. This leads to some problems, such as calling set when uncertainty is high.

So I ended up using predicted hands instead, and it works well enough.

Also, my model still doesn’t know how to call set. I’m thinking I’ll just collect all the memories where the model incorrectly/correctly called set and provide it a designated training session on that instead. Other than that, it works surprisingly well—it has an average of around 35% validation accuracy on correctly determining the location of each card (given that in an actual game, you wouldn’t know everyone’s cards anyways), and it’s also able to learn pretty well based on its hand predictions where they can successfully ask for cards. In a passing observation, its weak spot is coordinating with teammates to obtain all cards in a set, as the mechanism for calling set isn’t well developed yet. In the last stretch, I will try to refine that process, and move onto evaluating the model against human players.

View more of Yourui S.'s posts.

7: Reinforcement Learning, part 4

Reader Interactions

Leave a Reply Cancel reply