Week 8: Model Evaluation and Refinement
April 24, 2026
Hi everyone, welcome back! This week was originally planned as a buffer period to either catch up on modeling or get an early start on the research paper. However, since the pipeline and initial experiments are now in place, I decided to fully dedicate this week to evaluating and refining the models.
The primary goal was to better evaluate the models in a more rigorous, systematic, and reliable fashion. Rather than just splitting the data into training and testing datasets, I began to use methods such as cross-validation to look at the performance of the models on various subsets of the original data. This really helps to avoid overfitting, one of the most common problems with machine learning models. It also gives a much more accurate picture of how well the models generalize, which is basically how well the models perform on unseen data, and not simply just on the data they were trained on.
Apart from cross-validation, I also worked on robustness testing. This helps to determine how sensitive the models are to changes in the data and configuration. In other words, by making slight changes to the inputs and observing changes in performance, it is easier to tell which models are robust and which are more prone to changes in data and configuration.
Another crucial aspect of this week was to expand the number of evaluation metrics. Instead of simply relying on only accuracy, precision, and recall, I experimented with some other metrics to better capture different aspects of the models’ performance. I also started to think more about stability and generalization as key factors, because a model that performs well on average but is not stable across multiple runs is very unreliable. As for the specific metrics, I will give myself some extra time next week to experiment with and finalize them so they are ready to be included in the paper.
Based on the evaluations I conducted this week, I started refining both preprocessing choices and model configurations. This included adjusting feature selections, revisiting how certain variables are handled, and modifying model parameters to improve performance. The goal was not just to find the best-performing model, but to understand why certain approaches work better than others.
Overall, this week was focused on moving from experimentation to deeper analysis. Instead of just running models, the emphasis shifted toward interpreting results and improving the system based on evidence. With a clearer understanding of model behavior, the next step will be to finalize results and begin structuring the findings for the research paper. See you next week!

Leave a Reply
You must be logged in to post a comment.