Blog #7: Learning Curve
April 13, 2025
Hi!
I hope you’ve had a good week. Sit down, pour yourself a cup of tea, and enjoy my blog post 😀
This week, I made some key changes to my model’s training setup: I fixed tokenization issues by adding a pad token, lowered the learning rate to 5e-5, reduced the number of training epochs to 3, and added L2 regularization. These tweaks all help with generalization.
And it works now! The output is finally sensible.
[INST] <<SYS>>
{{ You are a specialist in the field of rare diseases. You will be provided and asked about a complicated clinical case; read it carefully and then provide a diverse and comprehensive differential diagnosis. }}
<</SYS>>
{{ This rare disease patient suffers from symptoms: [‘Seizure’, ‘Spastic diplegia’, ‘Encephalopathy’, ‘Death in infancy’, ‘Fever’, ‘Spastic tetraplegia’, ‘Tachypnea’, ‘Glutaric aciduria’]. Enumerate the top 10 most likely diagnoses. Be precise, listing one diagnosis per line, and try to cover many unique possibilities (at least 10). The top 10 diagnoses are: }} [/INST]
Carnitine uptake defect
Carnitine palmitoyltransferase II deficiency
Methylmalonyl-CoA mutase deficiency
Medium chain acyl-CoA dehydrogenase complex deficiency
Hepatic mitochondrial trifunctional protein deficiency
Pyruvate dehydrogenase complex deficiency
Ketone body metabolism disorder
Propionic aciduria type I
Methylmalonic acidemia type A
Methylmalonic acidemia type B
In this case, it gave the right diagnosis – Medium chain acyl-CoA dehydrogenase deficiency
To dig deeper, I took a look at the model’s (literal) learning curve:
Something weird you probably notice here is that the validation loss is less than the training loss, especially in the earlier steps. After double-checking for data contamination and finding nothing suspicious, I was pretty confused too, so I turned to my trustworthy-friend, Google. Apparently, this discrepancy is normal when adding regularization – The correction term is only applied to the training loss and not the validation loss. As the correction term decreases with more steps, so does the difference.
Otherwise, this graph looks pretty solid! Validation loss is going down with training loss, and they converge at the end, indicating good fit. In the last steps, the training loss has stopped decreasing and plateaus, which means I chose the right number of epochs.
Once I had a working model, I started grading its responses. One interesting observation – the fine-tuned model performs better on diseases that are more frequently represented in the dataset. For instance, it’s accuracy for PKU rose to 62.5% from a baseline accuracy of 50% (even then, some of the zero-shot predictions determined as “correct” only captured the broader class, Hyperphenylalaninemia).
As for the overall top-10 accuracy, the fine-tuned model hits 25.48%, increased 1.67% from the baseline of 23.81%. I’m happy with improvement, no matter how small, but I think I can do better if I try hard enough. The (figurative) learning curve starts slow, but we can get there!
Back to work 💻
– Lucy
Leave a Reply
You must be logged in to post a comment.