Blog #7: Learning Curve

April 13, 2025

Hi!

I hope you’ve had a good week. Sit down, pour yourself a cup of tea, and enjoy my blog post 😀

This week, I made some key changes to my model’s training setup: I fixed tokenization issues by adding a pad token, lowered the learning rate to 5e-5, reduced the number of training epochs to 3, and added L2 regularization. These tweaks all help with generalization.

And it works now! The output is finally sensible.

[INST] <<SYS>>

{{ You are a specialist in the field of rare diseases. You will be provided and asked about a complicated clinical case; read it carefully and then provide a diverse and comprehensive differential diagnosis. }}

<</SYS>>

{{ This rare disease patient suffers from symptoms: [‘Seizure’, ‘Spastic diplegia’, ‘Encephalopathy’, ‘Death in infancy’, ‘Fever’, ‘Spastic tetraplegia’, ‘Tachypnea’, ‘Glutaric aciduria’]. Enumerate the top 10 most likely diagnoses. Be precise, listing one diagnosis per line, and try to cover many unique possibilities (at least 10). The top 10 diagnoses are: }} [/INST]

Carnitine uptake defect

Carnitine palmitoyltransferase II deficiency

Methylmalonyl-CoA mutase deficiency

Medium chain acyl-CoA dehydrogenase complex deficiency

Hepatic mitochondrial trifunctional protein deficiency

Pyruvate dehydrogenase complex deficiency

Ketone body metabolism disorder

Propionic aciduria type I

Methylmalonic acidemia type A

Methylmalonic acidemia type B

In this case, it gave the right diagnosis – Medium chain acyl-CoA dehydrogenase deficiency

To dig deeper, I took a look at the model’s (literal) learning curve:

Something weird you probably notice here is that the validation loss is less than the training loss, especially in the earlier steps. After double-checking for data contamination and finding nothing suspicious, I was pretty confused too, so I turned to my trustworthy-friend, Google. Apparently, this discrepancy is normal when adding regularization – The correction term is only applied to the training loss and not the validation loss. As the correction term decreases with more steps, so does the difference.

Otherwise, this graph looks pretty solid! Validation loss is going down with training loss, and they converge at the end, indicating good fit. In the last steps, the training loss has stopped decreasing and plateaus, which means I chose the right number of epochs.

Once I had a working model, I started grading its responses. One interesting observation – the fine-tuned model performs better on diseases that are more frequently represented in the dataset. For instance, it’s accuracy for PKU rose to 62.5% from a baseline accuracy of 50% (even then, some of the zero-shot predictions determined as “correct” only captured the broader class, Hyperphenylalaninemia).

As for the overall top-10 accuracy, the fine-tuned model hits 25.48%, increased 1.67% from the baseline of 23.81%. I’m happy with improvement, no matter how small, but I think I can do better if I try hard enough. The (figurative) learning curve starts slow, but we can get there!

Back to work 💻

– Lucy

View more of Luoxi W.'s posts.

Blog #7: Learning Curve

Reader Interactions

Leave a Reply Cancel reply