Week 4: Many Problems
This week, I began working on the conventional model. Since I am trying to prove that my custom-designed model will work better for NLP analysis than the method that is currently the most popular, I need to test out the standard. Thus, I began doing some research on what is being used in NLP today.
As of now, the RoBERTa model is a widely used technique in Natural Language Processing. It has even been applied to my specific topic: Sorting news as real or fake. Thus, I began familiarizing myself with the RoBERTa model.
One thing I realized from this project is that research takes a lot of reading. In a given week, I spend at least 2 hours per day reading a research paper to either learn more about a particular approach that has been tried, or an article about how a specific model/technique is built and implemented. Even though it is a long process, I have been learning many new things and I am excited to utilize my newfound knowledge!
As I began writing the RoBERTa model, I ran into multiple roadblocks. I had to debug for a long time, and I was unable to figure out the numerous errors in my code! It was smooth sailing for the first three weeks, and it seems I have finally hit a roadblock. Thus, I have reached out to my external advisor to find out more about the problem(s).
In the meantime, I began doing extra research to try and solve the issues with my code. I had a lot of trouble understanding some of the errors, so I had to begin some intense research on google. After some research, I devised a plan.
For some context, Neural Networks are computing systems that are a collection of connected nodes, or neurons, which recognize underlying relationships in data. Networks are further designed with architectures to address specific modeling challenges. For example, long short-term memory networks are able to connect contextual information across words in a sentence. Training of neural networks relies on a cost function to define model performance in predicting outcomes in a training set and a gradient descent algorithm to enable the training of parameters and thus minimize the cost function.
Gradient descent is an interactive algorithm used to find the local minimum of a given function. I plan to utilize Adam optimization which is an extension of stochastic gradient descent that includes the momentum and magnitude of the gradient. Stochastic gradient descent maintains the same learning rate during training. However, Adam uses an Adaptive Gradient Algorithm that has a per-parameter learning rate to improve performance. It also adapts to the magnitudes of the gradients for the weight using Root Mean Square Propagation.
Lastly, to prepare for the later stages of my project, I watched some videos on YouTube on how to create a chrome extension. I also have a backup plan in case this fails: creating an app to release on the iOS app store. To do so, I signed up for and started an online course on Codecademy to learn the programming language Swift. Even if I don’t need to make an app, I have always wanted to learn Swift. I am enthusiastic about seeing where this project takes me!