Week 6: Working Through Gradient Descent Problems

April 16, 2026

This week, most of my problems came from training the CV-CNN. To understand the problems within training a neural network, it’s helpful to understand loss functions.

A neural network is trained by iteratively reducing the value of a loss function. The loss function is usually the distance between the actual values and the predicted values of a function. For classification problems (of which seizure prediction is a subset), the most common loss function is the cross-entropy loss function, given by the average of the real value distribution multiplied by the log of the predicted class distribution. If a model correctly identifies what it’s supposed to, the specific prediction and label pair will contribute to the model’s loss. If the model doesn’t correctly identify what it’s supposed to, the specific prediction and label pair won’t contribute to the model’s loss. It’s important to note that, as outputs will always be constrained to be probabilities between 0 and 1, the lack of contribution from false positives will not matter because correct predictions will always encourage the model.

As mentioned before, these are then optimized by way of derivatives, which repeatedly try to point the model in a direction that reduces loss by optimizing weights and biases (or, in my case, kernel weights). This “direction” is a vector called the gradient, and the magnitude of this vector can be controlled by way of a hyperparameter called the learning rate. If the learning rate is too large, then the gradients might accidentally place the model at a point with a higher loss than where it started from.

When it comes to my model, its loss was not moving at all. This can be for a variety of factors, which include my learning rate not being large or small enough, or my gradient might be getting smaller and smaller as they hit local minima. Next week, my goal is to fix this issue.

View more of Avi F.'s posts.

Week 6: Working Through Gradient Descent Problems

Reader Interactions

Leave a Reply Cancel reply