A neural network is a computational system which tries to mimic the way human mind works. Neural Networks are used mostly for classification and clustering and regression problems with its application in almost every field. The neural network is a field of research and innovation. The neural networks and the artificial intelligence is shaping the new era with its immense application in the fields of medicine, finance, entertainment, etc.
The first artificial neural network was invented in 1958 Frank Rosenblatt. But at that time it didn’t gain much popularity owing to the fact of limited computation power at that time. But the NN again surged up with advancement of computer hardware.
Neural networks are basically some interconnected graphs with multiple units called nodes. Just like a brain a neural network has a lot of neurons connected to each other. A neural network also has a lot of units or nodes interconnected and divided in layers.
This computational system of neurons is called as an Artificial Neural Network or ANN. The ANN learns itself by modifying some of it values called weights and bias.
Multiple units of an ANN are arranged in the form of series of layers. The first layer of an ANN is called input layer and the last layer is called output layer while the intermediate layers are called hidden layer. There are unique values called weights of each unit which are related between each node of the current layer to the each node of the adjacent layer. There is also a value called bias of each node.
The neural network learns by itself. We will be able to understand this line completely once we will study the working of the ANN. But for now we just need to get an intuition that an ANN modifies it weights and bias values to get an accurate and precise output.
The circles represent nodes and the black arrows represent connections between them.
(Image source: geekforgeeks.com)
w1, w2, w3,… represent weight values and b1 and b2 represent bias values of node.
The working of a neural network can be divided into two methods: forward propagation and backward propagation.
In a forward propagation (also called forward pass), we traverse from the input layer to the output layer and then we compare our results with the actual output. During this pass we calculate the value of each node. This is done by the formula:
Here activation is the activation function applied, w is the weight of the node, x is the input and b is the bias term. Activation function will be discussed later.
In the case of backpropagation we traverse back from the output layer to input layer, by updating our weight values. The weights are updated by subtracting the derivative of the activation function applied. The goal of backpropagation is to minimize loss. Loss is defined as the absolute difference between the predicted value and the original value.
ACTIVATION FUNCTION: We use activation function in an ANN to introduce non-linearity. Activation function decides, whether a neuron should be activated or not by calculating weights.
There are various types of activation functions. Some famous ones being:
- Leaky ReLu
Let’s take a simple example of a neural network which predicts the XOR value of given inputs. This ANN will have two units in input layer, one hidden layer having three units and one output layer and having a single unit. We will be using a sigmoid activation function.
Now let’s make our own dataset:
Now let’s first forward pass. We calculate the value of the first node of the hidden layer using the formula described earlier
Likewise we will do for all the units and find y1, y2 ,y3.
Then we calculate the Neural Networks output as
Now comes the main part, the backward propagation.
We compare our final output y with the actual output and find the difference between it called loss.
Now we back propagate the neural network and update our weight values as follows :
Where m denotes the no of training examples. In our case it is 4.
We then find the derivative of the cost function with respect to each parameter:
We then update our weight and bias values as follows:
α here is called learning rate. Learning rate is a number chosen randomly to decide the step size of the gradient descent. We would study about gradient descent later but for now you need to understand neither it should be two big nor too small. Ideal values of learning rate is 0.001, 0.01,0.0001,etc;
We then perform forward and backward propagation for a number of cycles called as epochs. We slowly see that our neural networks loss J decreases steadily as number of epochs increase.