Week 11 (5/4 - 5/8) - Hyperparameter Tuning, Slide, Revisions, and Results!
May 9, 2026
Welcome back to my blog! This week was a lighter but still productive one as I wrapped up the project, focused on squeezing more performance out of the model through hyperparameter optimization, cleaning up the web app, and refining the project presentation.
The first thing I tackled was a few lingering issues with the web app. The main fix was removing the patient record that was incorrectly showing up at the end of the treatment recommendations output, which was cluttering the final display, alongside some general formatting cleanup. Small things, but they go a long way in making the demo feel polished.
The bulk of my time this week went into hyperparameter tuning, specifically experimenting with the LoRA rank and learning rate. My first experiment was increasing the LoRA rank from r = 16 to r = 32 for Stage 1. In LoRA fine-tuning, the rank controls how many trainable parameters are reintroduced into the model, and so a higher rank means more capacity to learn task-specific patterns. However, the results were disappointing: the results looked almost identical to before (the loss curves barely changed). This was a pretty clear signal that the bottleneck isn’t the model’s capacity, but rather the size and diversity of the training data itself.
I then tried adjusting the learning rate, bumping it up to 1e-4 to see if a more conservative update step would help. The loss curves immediately increased, so I brought it back to the original and am planning to stick to the current value of 2e-4. For Stage 2, I ran the same r = 32 experiment and saw severe overfitting, significantly worse than Stage 1, which confirmed that r = 16 is the right call for that stage too. At this point, I think the model has learned as much as it can from the available data. My next idea was to try increasing the max_seq_length from 2048 to 3072, since there’s a chance that longer patient records and guideline texts are getting truncated, which could explain why the model keeps hitting a ceiling. However, the same loss results were found, proving a similar conclusion that the LLM has learned all it can. Below I’ve attached one patient case as a demo of the LLM’s output!


On the presentation side, I spent a large part of the week working through feedback on my slides and decided to make the switch from Google Slides to Canva for a cleaner, better aesthetic. After seeing my cohort’s presentations, some ideas I have include: the main changes were adding title slides between sections to improve the flow, adding a dedicated research question slide upfront, and reordering the content so that the pipeline is explained before diving into the baseline models.
Looking at the slides as a whole, I’m also thinking more carefully about what context I can add. One thing that stands out to me is the “Next Steps” section at the end, which discusses how this project could fit into a broader ecosystem of AI tools in healthcare. Specifically, the connection to ambient clinical documentation tools (like Nuance DAX Copilot, Abridge, and Nabla) that listen to doctor-patient conversations and auto-generate clinical notes is really compelling. Pairing an LLM like mine with tools like those would allow the model to work with consistently structured, high-quality clinical notes, which is one of the biggest limitations of the current approach given how messy real-world patient data can be.
Finally, I sent the model’s outputs to Dr. Maecker for evaluation using the rubric we discussed last week: safety, actionability, and clinical appropriateness. I’m curious to see how the outputs hold up under expert review, especially as the training data for this project was sparse. That feedback will be included in next week’s post (along with the final presentation). Stay tuned!
Reader Interactions
Comments
Leave a Reply
You must be logged in to post a comment.

Hi Aanya, great updates as always. I had a question: when you were tuning your hyperparameters, were there any parameters that ended up affecting the model much more than you initially expected?