Week 6: Intro To LSTM
April 15, 2023
This week was the first week I spent on my model, I spent a lot of time learning about the proposed machine learning system. My model utilizes the LSTM. For background, LSTMs are a special type of RNN model. An RNN is a Recurrent Neural Network, which is a type of model that is able to connect previous information through loops, allowing them to perform the present task. LSTMs are a type of RNN that contain multiple neural layers instead of a single layer.
**Since LSTMs are an extension of RNNs, the technical term for them is LSTM-RNN. For this blog, I will simply be referring to them as “LSTM”.
In order to create my own LSTM, I had to do some extra research for a better grasp of its framework. Thus, I got started on understanding the background of LSTMs. Here is what I learned!
NNs, RNNs, and LSTM-RNNs
RNNs, or Recurrent Neural Networks, mimic the way humans process things. For example, when a human reads text, they don’t read every word separately; each word is tied together to form a phrase, then a sentence, and so on. Context is one of the key factors in reading and retaining information. This is the idea of “persistence”.
Plain/standard Neural Networks are unable to use this technique, thus they cannot derive the same insights as humans. This is a major shortcoming of traditional Neural Networks, which the Recurrent Neural Network fixes. RNNs have loops, and a specific system that allows information to “persist”. When an RNN is “unrolled”, it looks like a chain, which is the easiest way to visualize them.
Initially, I thought a simple RNN would be able to do the basic task of distinguishing between a fake and a true headline. However, what if the headline is long, or has multiple context clues? This could pose a unique problem for an RNN. According to my research, RNNs struggle with long-term dependencies. In other words, if a headline had context clues at the start of the headline, but needed to connect it with something at the end of the headline, the recurrent neural network would fail. After looking at my dataset, it became evident that a standard RNN would not suffice. This is illustrated below.
Figure 1. Demonstration of the problem in RNNs and their deficiencies in connecting information. As the gap between neurons grows, old information is lost1
To solve this problem, the LSTM is introduced. This is a type of RNN that is capable of long-term dependencies. Although it has the same type of chain of repeating neurons, it contains extra layers. This is demonstrated in Figure 2. For extra information on the calculations behind cell states and layers, visit https://colah.github.io/posts/2015-08-Understanding-LSTMs/.
Figure 2. Difference between a neuron in an RNN vs. an LSTM1
In conclusion, I have settled on the LSTM, specifically the Bi-directional LSTM-RNN. Figure 3 is a diagram that demonstrates just that.
Figure 3. Modeling Radiological Language with Bidirectional Long Short-Term Memory Networks2
At the core of the LSTM, each linear cell state goes through a tanh layer and is multiplied by the output of the sigmoid gates with unimportant information removed. The result is refined information for long periods of time, avoiding long-term dependency problems. This has given me the framework of how I should create my model. I am excited to follow it next week!