Week #9: My machine learning model is complete!
May 2, 2025
Hello everyone, I seriously cannot believe that this week is the second to last week of my senior project! This week, I finally completed my machine learning model… with some problems. Let’s get into it!
Clustering is an unsupervised machine learning algorithm, meaning it doesn’t really have training, testing, and validation datasets. It’s like having an x variable without a y value. So, how did I plan on predicting the vt, you may ask?
As stated before, I had grouped my data into various clusters. In each cluster, I found its average threshold voltage value. Whenever I had a new data point, I used my clustering algorithm to figure out which cluster that data point fit into. Then, the predicted vt would just be the average vt of that cluster. I originally thought that this way of predicting would work well, but after testing them out with the r^2 score and mean absolute error, I was proven wrong.
For context, a good r² score is around 0.8–1, and a good mean absolute error (MAE) is something as close to 0 as possible. But my scores were bad. As you can see my r^2 score was in the negatives, which means that the correlation was even worse than just predicting the vt’s average. Additionally, my MAE value was far far away from 0.
So I had to rethink everything. The problem was that finding the average vt alone wasn’t going to get me an accurate vt value. An average vt would most likely get me a decimal number, which would definitely not be an accurate prediction vt value.
I ended up grouping different sets of threshold voltages at the same point. And instead of finding the average value, I found the most common value in the set, which allows the machine learning program to ignore any inconsistencies and outliers in the data. If a point only had one set, then the machine learning program wouldn’t predict the point and move on.
Lastly, I ended up choosing one block again: block 45. I chose this block because it appears across 3 samples consistently, so therefore it would make the most accurate predictions because there are more sets for each datapoint.
The final accuracy of block 45 was 72%, which isn’t perfect, but it is the best accuracy I can accomplish given the span of my data. There are definitely some errors that occurred that stumped my accuracy, such as: (1) not having enough training data, (2) not factoring in other variables that may have affected the threshold voltage, such as the number of P/E cycles. But overall, I am very happy with the improvements I made this week.
So all in all, I was able to make the predictions that I wanted in my proposal for one block. Though it is difficult to measure the predictions’ impact on an SSD’s lifespan due to time restraints, I strongly believe that I’ve contributed to setting a strong foundation in increasing data retention. Next week is my last week, and I’ll be finishing up my Streamlit dashboard and working on prepping for my final presentation. I honestly can’t believe how fast this flew by. See you next week!
Leave a Reply
You must be logged in to post a comment.