Recurrent Neural Network (RNN)

Introduction

The biggest problem with the neural network that we studied earlier was that they were unable to take into consideration previous data. Like for example, when we humans are given an incomplete sentence we can think what would be the next word in the sentence. This is possible only due to the reason that humans drive their conclusions from previous words in the sentence. The neural network that we studied earlier did not have any property of taking previous information into consideration. This is the major drawback of these Neural Networks. One more disadvantage of earlier neural network was that it takes fixed sized size inputs i.e. say for example, in a Convolutional Neural Network (CNN) we had predefined size of the input. We also overcome this problem in this new neural network called the Recurrent Neural Network (RNN). We will study RNN in detail, discuss its advantages, disadvantages, applications and study one variant of RNN called Long short-term Memory (LSTM), which is an improvement of RNN, in the next blog.

APPLICCATIONS

  • Machine Translation
  • Voice Recognition
  • Text Generation
  • Stock Market Prediction
  • Sales Prediction

Note: Before we start I would like you to understand what a timestep means, as it would be used freqeuntly later. Suppose that we have sentence I am a programmer. This sentence is represented as x<1> = I , x<2> = am. So what does a timestep means, it is the state at a particular time, i.e. say for at timestep 1 we mean at the first input word which is I.

RECURRENT NEURAL NETWORK (RNN)

A Recurrent Neural Network at each timestep takes into consideration some previous information. In a RNN we have three different types of weights which are generally represented as Waa, Wya, Wax.

aa is the weight between the cell and the information vector.

Wax is the weight between the cell and the input at a given timestep.

Wya is the weight between the cell and the output at a given timestep.

The image above shows a typical RNN. The weights Waa, Wya, Wax  are represented as W,V,U respectively . xt is the input to the RNN at the tth  timestep. Similarly ot is the output of the RNN at the tth  timestep.

Now we will understand how a RNN works.

We start by initializing a vector which is a<0>, usually to zero, which is a vector that stores the information from previous timesteps. Then we compute the output of the 1st timestep.  We compute this by using the following formulas:

First we compute the information vector a<1>

a<1> = activation(Waa a<0> + Wax x<1> + ba)

Then from this we compute the output of the 1st timestep.

y_pred<1> = activation(Wya a<1> + by)

where activation is the activation function to be used.

In general we can write down the output of the tth timestep as:

a<t> = activation(Waa a<t-1> + Wax x<t> + ba)

y_pred<t> = activation(Wya a<t> + by)

BACWARD PROPOGATION

The loss function for a timestep is defined as:

L<t><y,y_pred> = -y<t>log y_pred<t>  – (1- y<t>) log(1- y_pred<t>)

If we sum up this for all the timesteps

Now we find the derivatives of the weights with respect to the loss function defined above.

RNN IMPLEMNTATION

Problem statement

We are given the opening prices of the google stock for about 1000 days. Our task is to predict the opening prices of the following days.

Note: You can find the complete code here: https://www.kaggle.com/amritansh22/stock-price-prediction-in-keras-and-pytorch Please upvote it if you find it helpful

KERAS

We start by loading necessary data pre-processing and computation libraries.

import numpy as np
import pandas as pd

Loading the dataset

dataset = pd.read_csv("/kaggle/input/google-stock-price/Google_Stock_Price_Train.csv")

Checking if there are any null values

dataset.isnull().sum()

Check dataset shape

dataset.shape

Viewing the dataset

dataset.head()

Out of the various values given, we will be predicting just the opening prices of the stock.

data = dataset.iloc[:, 1:2].values

Viewing the data

print(data[:5])

Checking the shape of the data

data.shape

Now we scale the data into a range of -1 and 1 using the scikit-learn library

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(-1, 1))
data = scaler.fit_transform(data)

Now we convert the data into sequences. This is done as the RNN layer only accepts in this form. We take the open price of the first 50 days into X_train and then the 51th day’s open price is stored in y_train. We repeatedly performed this and store them into a numpy arrays X_train and y_train.

X_train = []
y_train = []
for i in range(50, 1258):
    X_train.append(data[i-50:i, 0])
    y_train.append(data[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)

We check the shape of X_train

X_train.shape

But the RNN does not accept this type of shape. So we need to reshape this so that it is suitable for RNN.

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))

So this is the new shape.

X_train.shape

Now we start the building our model. We start by importing the necessary libraries.

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import SimpleRNN
from keras.layers import Dropout

Now we start building our model.

We first create an instance of the Sequential class, which is the class of Keras, the Sequential model is a linear stack of layers.

Then we add to the model which named as regressor an LSTM layer. This layer has 50 units or cells of LSTM, we set return_sequences TRUE to return_sequences TRUE to tell that the LSTM cell need to return the last state, so that it can be used in the next cell. Then we tell the shape of thee input sequence that we would be giving to the layer.

Then we add a dropout in order to prevent overfitting.

We stack up these layers.

Then we add the last LSTM layer. Since this is last layer we do not want to take into consideration the last state of the cell, so we skip it as the default is FALSE.

Next we add a Dropout layer and finally add a Dense Layer to get the output.

regressor = Sequential()
regressor.add(SimpleRNN(units=50,return_sequences = True,input_shape = (X_train.shape[1],1)))
regressor.add(Dropout(0.2))
regressor.add(SimpleRNN(units = 50,return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(SimpleRNN(units = 50,return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(SimpleRNN(units = 50))
regressor.add(Dropout(0.2))
regressor.add(Dense(units = 1))

Now we get the summary of the model that we just built.

regressor.summary() 

Now we will compile the model. We used the adam optimizer and the loss function is mean_squared_error as we have a regression problem.

regressor.compile(optimizer = 'adam',loss = 'mean_squared_error')

Now we train or fit the model. We would train for 100 epochs, with a batch size of 32.

regressor.fit(X_train,y_train,epochs = 10, batch_size = 32)

Output:

First 10 epochs

PyTorch

We would study the same problem as above in PyTorch.

We start by loading necessary data pre-processing and computation libraries.

import numpy as np 
import pandas as pd

Loading the dataset

dataset = pd.read_csv("/kaggle/input/google-stock-price/Google_Stock_Price_Train.csv")

Checking if there are any null values

dataset.isnull().sum()

Check dataset shape

dataset.shape

Viewing the dataset

dataset.head()

Out of the various values given, we will be predicting just the opening prices of the stock.

data = dataset.iloc[:, 1:2].values

Viewing the data

print(data[:5])

Checking the shape of the data

data.shape

Now we scale the data into a range of -1 and 1 using the scikit-learn library

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(-1, 1))
data = scaler.fit_transform(data)

Now we convert the data into sequences. This is done as the RNN layer only accepts in this form. We take the open price of the first 50 days into X_train and then the 51th day’s open price is stored in y_train. We repeatedly performed this and store them into a numpy arrays X_train and y_train.

X_train = []
y_train = []
for i in range(50, 1258):
    X_train.append(data[i-50:i, 0])
    y_train.append(data[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)

We check the shape of X_train

X_train.shape

But the RNN in PyTorch does not accept this type of shape. So we need to reshape this so that it is suitable for RNN.

X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))

Now we check the shape

X_train.shape

Output

(1208, 1, 50)

Now we import necessary PyTorch dependencies.

import torch.nn as nn 
import torch
from torch.autograd import Variable

Now we declare some variables to be used later.

INPUT_SIZE = 50 
HIDDEN_SIZE = 40
NUM_LAYERS = 2
OUTPUT_SIZE = 1

Now we create the LSTM class.

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(RNN, self).__init__()

        self.RNN = nn.RNN(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers
        )
        self.out = nn.Linear(hidden_size, output_size)

    def forward(self, x, h_state):
        r_out, hidden_state = self.RNN(x, h_state)
        
        hidden_size = hidden_state[-1].size(-1)
        r_out = r_out.view(-1, hidden_size)
        outs = self.out(r_out)

        return outs, hidden_state

In this we create a network composed of multiple RNN cells stacked up together. The number of RNN cells is defined by num_layers. The nn.Linear layer is the final layer that gives us the output.

In the forward function we perform the forward propogation of the network. First we store the output and the hidden state of the LSTM in r_out and hidden_state. Then we compute the hidden_size, as this would be used later. Now we need to get the output. We only take the output from the final timetep. So we need to take out the last hidden sate. We then pass it to the Linear layer and get the ouput.

Now we create a RNN class.

RNN = RNN(INPUT_SIZE, HIDDEN_SIZE, NUM_LAYERS, OUTPUT_SIZE)

Now we define the optimizer and the loss function. We will also define the hidden state to None.

optimiser = torch.optim.Adam(RNN.parameters(), lr=0.01) criterion = nn.MSELoss() 
hidden_state = None

Now we will start the training.

for epoch in range(100):
    inputs = Variable(torch.from_numpy(X_train).float())
    labels = Variable(torch.from_numpy(y_train).float())

    output, hidden_state = RNN(inputs, hidden_state) 

    loss = criterion(output.view(-1), labels)
    optimiser.zero_grad()
    loss.backward(retain_graph=True)                     # back propagation
    optimiser.step()                                     # update the parameters
    
    print('epoch {}, loss {}'.format(epoch,loss.item()))

Now we will train the network. We train it for 100 epochs.

First we convert the X_train and y_train to PyTorch Variables.  Then we calculate the output.  And then find the loss on this output. zero_grad is a PyTorch function. It sets our gradients to zero as PyTorch accumulates the gradients on subsequent backward passes. Then using the backward() function we backpropagate the Neural Network. The step() function updates the parameters using the gradients calculated.

Then we print the loss after every epoch.

Output:

First 10 epochs

At 99th epoch

Categories: Deep Learning

1 Comment

David Kate · May 21, 2020 at 1:38 pm

Wonderful Explanation

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert