Week 8: Almost There
April 26, 2024
Hey guys, welcome back to the blog. This week I have some exciting news so let’s get into it.
As you know from last week, I decided to code the neural network from scratch instead of using machine learning libraries. A very daunting task at first but once I started coding, I started to realize I had an intuitive understanding of the neural network due to the first two weeks of preparation as well as the hours spent debugging pytorch code.
I approached the task by modularizing it, or in other words, splitting it into functions. I first coded the activation functions sigmoid, ELU, and tanh, then the MSE loss function. Afterwards I moved on the the harder tasks: initializing parameters, forward propagation, backward propagation, and gradient descent. Doing it this way lets me split the objective into smaller pieces which allows for much easier debugging. Debugging at each step is always the right way to go as it will avoid the trouble of running into black magic(reference from last week). I hate black magic. After running the training many times and figuring out how to match the dimensions of the matrices to perform forward and backward propagation, finally it worked. Well kind of. Randomly, I would get runtime overflow errors caused by NaN values, or Not a Number values. This means that somewhere in the math, the values go way too large. I also noticed that if I turn down the learning rate, the NaN values disappear. I reimplemented the parameters initialization and forward propagation more efficiently to see if that would make a difference. It did not fix the NaN values, but it improved the code and because of the way I organized the code, I didn’t have to change the other functions to fit the new changes.
At this point I was a little scared that this program was going to backfire but then I realized something as I was playing with the learning rates and observing the loss. As the learning rate increased, the loss was unable to converge. In fact, it diverged even faster. I realized that this is a common problem with neural networks and probably arises from the fact that I used a simple gradient descent algorithm. So I implemented the Adam optimizer which was meant to help with the rate of convergence and avoid saddle points in the loss function. And low and behold, it worked. My anxiety dropped at the same rate as the loss. Adam is a very interesting formula with physics inspired components. I encourage anyone who is interested in deep learning to read up on it. After graphing out the results, I still did not get the expected outcome but that is not the problem of the neural network but rather a problem with the trial solution. Trial solutions are assumptions on what the final function might look like. By adjusting it, I might just be able to find a solution to this problem.
Immense progress was made this week and I am learning a ton. Coding from scratch is a great way to learn things and I would suggest anyone who is interested in coding to do the same. I am excited to complete this project even if success is not guaranteed.
Thanks for reading my blog and see you next week.
Leave a Reply
You must be logged in to post a comment.