**Recurrent Neural Network (RNN) **

**Introduction**

The biggest problem with the neural network that we studied earlier was that they were unable to take into consideration previous data. Like for example, when we humans are given an incomplete sentence we can think what would be the next word in the sentence. This is possible only due to the reason that humans drive their conclusions from previous words in the sentence. The neural network that we studied earlier did not have any property of taking previous information into consideration. This is the major drawback of these Neural Networks. One more disadvantage of earlier neural network was that it takes fixed sized size inputs i.e. say for example, in a Convolutional Neural Network (CNN) we had predefined size of the input. We also overcome this problem in this new neural network called the Recurrent Neural Network (RNN). We will study RNN in detail, discuss its advantages, disadvantages, applications and study one variant of RNN called Long short-term Memory (LSTM), which is an improvement of RNN, in the next blog.

**APPLICCATIONS**

- Machine Translation
- Voice Recognition
- Text Generation
- Stock Market Prediction
- Sales Prediction

**Note:** Before we
start I would like you to understand what a timestep means, as it would be used
freqeuntly later. Suppose that we have sentence **I am a programmer. **This sentence is represented as x^{<1> }=
**I ,** x^{<2> }= **am. **So what does a timestep means, it
is the state at a particular time, i.e. say for at timestep 1 we mean at the
first input word which is **I**.

**RECURRENT NEURAL NETWORK (RNN)**

A Recurrent Neural Network at each timestep takes into
consideration some previous information. In a RNN we have three different types
of weights which are generally represented as W_{aa, }W_{ya, }W_{ax}.

W_{aa }is the weight between the cell and the information
vector.

W_{ax }is the weight between the cell and the input at a
given timestep.

W_{ya }is the weight between the cell and the output at a given timestep.

The image above shows a typical RNN. The weights W_{aa, }W_{ya,
}W_{ax }are represented as
W,V,U respectively . x_{t} is the input to the RNN at the t^{th } timestep. Similarly o_{t} is the
output of the RNN at the t^{th } timestep.

Now we will understand how a RNN works.

We start by initializing a vector which is a^{<0>},
usually to zero, which is a vector that stores the information from previous
timesteps. Then we compute the output of the 1^{st} timestep. We compute this by using the following
formulas:

First we compute the information vector a^{<1>}

**a ^{<1>} = activation(W_{aa
}a^{<0>} + W_{ax }x^{<1>} + b_{a})**

Then from this we compute the output of the 1^{st}
timestep.

**y_pred ^{<1>}
= activation(W_{ya }a^{<1>} + b_{y})**

where activation is the activation function to be used.

In general we can write down the output of the t^{th}
timestep as:

**a ^{<t>}
= activation(W_{aa }a^{<t-1>} + W_{ax }x^{<t>}
+ b_{a})**

**y_pred ^{<t>}
= activation(W_{ya }a^{<t>} + b_{y})**

**BACWARD PROPOGATION**

The loss function for a timestep is defined as:

**L ^{<t>}<y,y_pred>
= -y^{<t>}log y_pred^{<t> } – (1- y^{<t>}) log(1- y_pred^{<t>})**

If we sum up this for all the timesteps

Now we find the derivatives of the weights with respect to the loss function defined above.

**RNN IMPLEMNTATION**

**Problem statement**

We are given the opening prices of the google stock for about 1000 days. Our task is to predict the opening prices of the following days.

Note: You can find the complete code here: https://www.kaggle.com/amritansh22/stock-price-prediction-in-keras-and-pytorch Please upvote it if you find it helpful

**KERAS**

We start by loading necessary data pre-processing and computation libraries.

import numpy as np import pandas as pd

Loading the dataset

dataset = pd.read_csv("/kaggle/input/google-stock-price/Google_Stock_Price_Train.csv")

Checking if there are any null values

dataset.isnull().sum()

Check dataset shape

dataset.shape

Viewing the dataset

dataset.head()

Out of the various values given, we will be predicting just the opening prices of the stock.

data = dataset.iloc[:, 1:2].values

Viewing the data

print(data[:5])

Checking the shape of the data

data.shape

Now we scale the data into a range of -1 and 1 using the scikit-learn library

from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler(feature_range=(-1, 1)) data = scaler.fit_transform(data)

Now we convert the data into sequences. This is done as the RNN layer only accepts in this form. We take the open price of the first 50 days into X_train and then the 51th day’s open price is stored in y_train. We repeatedly performed this and store them into a numpy arrays X_train and y_train.

X_train = [] y_train = [] for i in range(50, 1258): X_train.append(data[i-50:i, 0]) y_train.append(data[i, 0]) X_train, y_train = np.array(X_train), np.array(y_train)

We check the shape of X_train

X_train.shape

But the RNN does not accept this type of shape. So we need to reshape this so that it is suitable for RNN.

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))

So this is the new shape.

X_train.shape

Now we start the building our model. We start by importing the necessary libraries.

from keras.models import Sequential from keras.layers import Dense from keras.layers import SimpleRNN from keras.layers import Dropout

Now we start building our model.

We first create an instance of the Sequential class, which is the class of Keras, the Sequential model is a linear stack of layers.

Then we add to the model which named as regressor an LSTM layer. This layer has 50 units or cells of LSTM, we set return_sequences TRUE to return_sequences TRUE to tell that the LSTM cell need to return the last state, so that it can be used in the next cell. Then we tell the shape of thee input sequence that we would be giving to the layer.

Then we add a dropout in order to prevent overfitting.

We stack up these layers.

Then we add the last LSTM layer. Since this is last layer we do not want to take into consideration the last state of the cell, so we skip it as the default is FALSE.

Next we add a Dropout layer and finally add a Dense Layer to get the output.

regressor = Sequential() regressor.add(SimpleRNN(units=50,return_sequences = True,input_shape = (X_train.shape[1],1))) regressor.add(Dropout(0.2)) regressor.add(SimpleRNN(units = 50,return_sequences = True)) regressor.add(Dropout(0.2)) regressor.add(SimpleRNN(units = 50,return_sequences = True)) regressor.add(Dropout(0.2)) regressor.add(SimpleRNN(units = 50)) regressor.add(Dropout(0.2)) regressor.add(Dense(units = 1))

Now we get the summary of the model that we just built.

regressor.summary()

Now we will compile the model. We used the adam optimizer and the loss function is mean_squared_error as we have a regression problem.

regressor.compile(optimizer = 'adam',loss = 'mean_squared_error')

Now we train or fit the model. We would train for 100 epochs, with a batch size of 32.

regressor.fit(X_train,y_train,epochs = 10, batch_size = 32)

**Output:**

First 10 epochs

**PyTorch**

We would study the same problem as above in PyTorch.

We start by loading necessary data pre-processing and computation libraries.

import numpy as np

import pandas as pd

Loading the dataset

dataset = pd.read_csv("/kaggle/input/google-stock-price/Google_Stock_Price_Train.csv")

Checking if there are any null values

dataset.isnull().sum()

Check dataset shape

dataset.shape

Viewing the dataset

dataset.head()

Out of the various values given, we will be predicting just the opening prices of the stock.

data = dataset.iloc[:, 1:2].values

Viewing the data

print(data[:5])

Checking the shape of the data

data.shape

Now we scale the data into a range of -1 and 1 using the scikit-learn library

from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler(feature_range=(-1, 1)) data = scaler.fit_transform(data)

Now we convert the data into sequences. This is done as the RNN layer only accepts in this form. We take the open price of the first 50 days into X_train and then the 51th day’s open price is stored in y_train. We repeatedly performed this and store them into a numpy arrays X_train and y_train.

X_train = [] y_train = [] for i in range(50, 1258): X_train.append(data[i-50:i, 0]) y_train.append(data[i, 0]) X_train, y_train = np.array(X_train), np.array(y_train)

We check the shape of X_train

X_train.shape

But the RNN in PyTorch does not accept this type of shape. So we need to reshape this so that it is suitable for RNN.

X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))

Now we check the shape

X_train.shape

Output

(1208, 1, 50)

Now we import necessary PyTorch dependencies.

import torch.nn as nn

import torch

from torch.autograd import Variable

Now we declare some variables to be used later.

INPUT_SIZE = 50

HIDDEN_SIZE = 40

NUM_LAYERS = 2

OUTPUT_SIZE = 1

Now we create the LSTM class.

```
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, output_size):
super(RNN, self).__init__()
self.RNN = nn.RNN(
input_size=input_size,
hidden_size=hidden_size,
num_layers=num_layers
)
self.out = nn.Linear(hidden_size, output_size)
def forward(self, x, h_state):
r_out, hidden_state = self.RNN(x, h_state)
hidden_size = hidden_state[-1].size(-1)
r_out = r_out.view(-1, hidden_size)
outs = self.out(r_out)
return outs, hidden_state
```

In this we create a network composed of multiple RNN cells stacked
up together. The number of RNN cells is defined by **num_layers. **The nn.Linear layer is the final layer that gives us
the output.

In the forward function we perform the forward propogation of the
network. First we store the output and the hidden state of the LSTM in **r_out** and **hidden_state. **Then we compute the **hidden_size**, as this would be used later. Now we need to get the
output. We only take the output from the final timetep. So we need to take out
the last hidden sate. We then pass it to the Linear layer and get the ouput.

Now we create a RNN class.

`RNN = RNN(INPUT_SIZE, HIDDEN_SIZE, NUM_LAYERS, OUTPUT_SIZE)`

Now we define the optimizer and the loss function. We will also define the hidden state to **None**.

optimiser = torch.optim.Adam(RNN.parameters(), lr=0.01) criterion = nn.MSELoss()

hidden_state = None

Now we will start the training.

```
for epoch in range(100):
inputs = Variable(torch.from_numpy(X_train).float())
labels = Variable(torch.from_numpy(y_train).float())
output, hidden_state = RNN(inputs, hidden_state)
loss = criterion(output.view(-1), labels)
optimiser.zero_grad()
loss.backward(retain_graph=True) # back propagation
optimiser.step() # update the parameters
print('epoch {}, loss {}'.format(epoch,loss.item()))
```

Now we will train the network. We train it for 100 epochs.

First we convert the **X_train**
and **y_train** to PyTorch
Variables. Then we calculate the
output. And then find the loss on this
output. **zero_grad** is a PyTorch
function. It sets our gradients to zero as PyTorch a*ccumulates the
gradients* *on subsequent backward passes. *Then
using the **backward() **function we **backpropagate** the Neural Network. The **step()** function updates the parameters
using the gradients calculated.

Then we print the loss after every epoch.

**Output:**

First 10 epochs

At 99^{th} epoch

## 1 Comment

## David Kate · May 21, 2020 at 1:38 pm

Wonderful Explanation