We have already studied how does a CNN work’s. In this blog we will be discussing how to implement a CNN.

We will be implementing it first using Keras and then using PyTorch


Problem Statement: Given an image of a handwritten number, identify it. The numbers range from 0 to 9.

Dataset: We are given a dataset of size 60000 images for training 10000 images for testing. The images are black and white with dimensions 28×28. This dataset is called MNIST First we import the dataset. We will use the keras.dataset to import the MNIST dataset.

import keras
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Now let’s examine the variables


We have 60000 training examples and 10000 testing examples.

Let’s now plot the some x_train examples.

import matplotlib.pyplot as plt
plt.imshow(x_train[0], cmap=plt.get_cmap('gray'))
plt.imshow(x_train[1], cmap=plt.get_cmap('gray'))
plt.imshow(x_train[2], cmap=plt.get_cmap('gray'))
plt.imshow(x_train[3], cmap=plt.get_cmap('gray'))
# show the plot

Now we will need to reshape our inputs and outputs to shapes according to the shapes required by Keras.

We will first encode our target vector as:

y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

And since our x_train and x_test were of shape (60000,28,28) and (10000,28,28) and the images are of black and white colour so we need to set the number of channels to one.(RGB images have 3 channels) So we will reshape them as follows:

X_train = x_train.reshape(x_train.shape[0], 28, 28,1)
X_test = x_test.reshape(x_test.shape[0], 28, 28,1)

The updated shapes are:


One last pre-processing step is to scale the input data. Since the input data contains values ranging from 0 to 255 so we scale them by dividing all values by 255.

X_train = X_train/255

Now we will import all the necessary libraries required. We will be using the high level Keras Functional API for building the model.

from keras.layers import Conv2D,MaxPooling2D,Dense,Dropout,Flatten,Input
from keras.models import Model

We will first declare the input tensor:

inputs = Input(shape=(28,28,1))

Now we will create our first convolutional layer.

o1 = Conv2D(32, kernel_size = (3,3), strides=(1, 1),activation='relu', kernel_initializer='glorot_uniform')(inputs)

O1 is our first convolutional layer. It is 2D since the input is a 2D image. 32 is the number of channels that output from this layer would have. kernel_size is a list/tuple which depicts the size of the kernel used in the layer. strides is also a list/tuple which depicts the by how much positions the kernel should slide. We set the activation function to relu. kernel_initializer tells by which method we would initialize the values of the kernel. We pass our inputs tensor as input to this layer.

We would again make a convolutional layer. This time we would use 64 channels and would keep everything same as earlier.  We would feed the output of the last layer as input to this layer.

o2 = Conv2D(64, kernel_size = (3,3), strides=(1, 1),activation='relu', kernel_initializer='glorot_uniform')(o1)

Now we will reduce the dimensions by using the Max Pooling Layer. We will use a pooling window of size (2,2)

o3 = MaxPooling2D(pool_size=(2, 2))(o2)

Now we will use the dropout layer. This prevents our model from overfitting. The input to this model is a percentage. We will be dropping of 25% this time so we will pass 0.25 .

o4 = Dropout(0.25)(o3)

Now we will flatten the output using the Flatten layer.

o5 = Flatten()(o4)

This layer flattens the input. For example if the input to Flatten is (7,10,10) , it will return 700.

Now the output vector from Flatten layer would be connected to layer having 128 nodes using the Dense layer.

o6 = Dense(128, activation='relu')(o5)

We will apply dropout to output of this layer and then again connect it to a layer having 10 nodes since we have 10 classes. This time we would be using the softmax layer because we want the probabilities of each class.

o7 = Dropout(0.5)(o6)

o8 = Dense(10, activation='softmax')(o7)

Now we will declare our model.

model = Model(inputs=inputs, outputs=o8)

So now we have successfully build our model class name model.

We will now compile the model. We will be using the categorical_crossentropy as our loss function since it out target contains various classes.


We can study the model we created by using the summary function.


Now we can start training our model.

model.fit(X_train, y_train,
          validation_data=(X_test, y_test),callbacks=[learning_rate_reduction])

Our model started to overfit so we stopped it after the 3rd epoch.

We see that that our model got a training accuracy of 98% and a validation accuracy of 97%.


Dataset: The dataset consists images of 10 different classes viz. airplane, truck, ship, etc.

Problem statement: Given an input image, we have to identify the class of the image.

We will start by importing the necessary libraries.

import torch
import torchvision
import torchvision.transforms as transforms

Next we import the dataset. We will use the torchvision for this purpose. The torchvision package consists of popular datasets. We pass a path to the root variable. The dataset is downloaded and stored in this path. The next argument is train. We set train to True if we want the function to return the training set and False if we want it to return the test set. In the download we pass true or false, if true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. The next argument is transform. It transforms the data into some specified type. We transformed it to PyTorch tensors.

Now we come to DataLoader. It represents a Python iterable over a dataset. The first argument is the object of the dataset class. The next is batch size. The data is passed for training according to the batch_size declared. The shuffle keyword when True shuffles the dataset randomly.

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transforms.ToTensor())
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transforms.ToTensor())
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Now we will declare our model. We define our model class Net. In the constructor of the class we define all our CNN layers. The input image is having 3 channels with a height and width of 32 and 32 pixels.

The first layer is a conv2D. In this layer we pass the input channels, output channels andthe kernel size. In our first conv2D we pass 3 as input channels, 32 output channels and 2 as the kernel size.

 The next layer is a Max pooling. The only input we pass is the kernel size which we passed (2,2).

The third layer is again a conv2D. In this layer we pass the input channels, output channels andthe kernel size as mentioned earlier.In our first conv2D we pass 32 as input channels since the first conv2D layer returned 32 channels,64 output channels and 2 as the kernel size.

Now we define our linear layers. The first layer takes in input a shape of (16 * 5 * 5) which is the output given out after applying the pooling and the conv2D layer. The output size is 120. The next linear layer takes in input size of 120 and a output size 10.

Now in the forward function we call all the layers declared in the class constructor.

Notice the following line.

x = x.view(-1, 16 * 5 * 5)

In this line we are converting the output of the conv2D and pooling layers into a vector which would be then fed to the linear layer.

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, 2)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(32, 64, 2)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

In the end we create an instance of the Net class net.

Now we will declare our loss function and our training examples.

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001)

We used the CrossEntropyLoss of PyTorch for this purpose and the SGD optimizer.

We will train for 2 epochs. Then we set our gradients to zero as PyTorch accumulates the gradientson subsequent backward passes. Then we make a predictions and store in outputs. Further we calculate the loss on it using the loss function specified earlier. Then using the backward() function we backpropagate the Neural Network. The step() function updates the parameters using the gradients calculated. Then we print the loss after every 2000 data point.

for epoch in range(2):  

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data

        outputs = net(inputs)
        loss = criterion(outputs, labels)

        running_loss += loss.item()
        if i % 2000 == 1999: 
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

This will output:


Now we will test our model.

We first load our test data images and their corresponding labels and store them in images and labels respectively. We store the prediction of test data in output. Then we find the class with maximum value. We compare the predicted class with the actual class. Then we print the accuracy on test data

correct = 0
total = 0

for data in testloader:
  images, labels = data
  output = net(images)
  _, pred = torch.max(output.data, 1)
  total += labels.size(0)
  correct += (pred == labels).sum().item()

print('Accuracy of the network on the test images: %d %%' % (100 * correct / total))

We got an accuracy of 95%.


Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview