Week 8: It's Under Control
April 18, 2026
Hello readers! After a refreshing and restful spring break, week 8 of my senior project was quite productive!
This week, my main focus was running my control group test. To do this, I created a small sample set of prompts (20 prompts) completely unrelated to tabla. I gave these prompts to each model (fine-tuned and base) and evaluated the accuracy and instructional quality of their responses.
This is the same process I did with my 140 tabla-related prompts over the past few weeks. However, these 20 prompts do not pertain to tabla. The expectation is that the model responses will be of equal quality and accuracy. If this is the case, it would prove that any discrepancies in my 140 tabla-related prompts can only be due to the fine-tuning process. As a reminder, the fine-tuning process educated the model solely on tabla, as my dataset did not contain information on any other topic.
I chose to create a set of questions related to piano, as I expect another musical instrument will cause a smooth apples-to-apples comparison. To ensure consistency with how I spoke to tabla students to create a rubric, I also spoke to one of my friends who has been playing piano for nearly a decade. I found a beginner’s guide to playing piano online and asked my friend to scroll through it and cross-check the validity of the information based on her learning. With her notes and approval, this article served as the ground truth for piano-playing information on which the accuracy could be scored.
Going through the process of collecting responses and running the script to evaluate them, I filled in my spreadsheet for the 20 piano-related prompts. Just scrolling through my spreadsheet, I can already see that the accuracy scores for piano are much higher than tabla. Though this doesn’t directly affect my conclusions, it does interestingly imply that a base model is much more knowledgeable about piano than tabla, which is a bias I’d like to look more into and discuss in my final presentation.
Ideally, this marks the end of the data collection phase of my project. Next week, my focus will be on actually starting to analyze the scores collected. I’ll be noting outliers, calculating averages per topic, and compiling graphs and findings. At this point, I feel a bit ahead of schedule, but I expect next week’s analysis will provide insight into what worked and what didn’t, so I’m prepared to rerun tests and collect data again if needed. Looking forward to another productive week!

Leave a Reply
You must be logged in to post a comment.