Hey guys !! In today’s article I am going to explain how to count people using Deep Learning and OpenCV. For this, I’ll be using YOLOv3 object detector to detect objects in an image. By applying object detection we will be able to understand what is an image and where a given object resides.

I’ll apply the YOLO object detector on image to count the number of persons in the frame. I’ll train a model simultaneously on both the Image Net classification dataset and COCO detection dataset. The result is a YOLO model, called YOLO9000.

The COCO dataset consists of 80 labels, including, but not limited to:

  • People
  • Bicycles
  • Cars and trucks
  • Airplanes
  • Stop signs and fire hydrants
  • Animals, including cats, dogs, birds, horses, cows, and sheep, to name a few
  • Kitchen and dining objects, such as wine glasses, cups, forks, knives, spoons, etc.
  • …and many more!

You can download the pre-trained YOLOv3 weights from here.

Importing necessary packages

import cv2
import numpy as np

Loading YOLO weights and cfg

After importing libraries, I’ll load YOLO weights and cfg files to make classes using COCO files using DNN.

net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))

Counting Persons in live video stream

Now, I’ll turn on my webcam for live video streaming. I’ll loop over the frames and construct a blob from the input frame and then perform a forward pass of the YOLO object detector, which will give us bounding boxes and associated probabilities. Next, I’ll filter out weak predictions, i.e., predictions having probability greater than 0.7 will be considered. Then, I’ll draw a bounding box rectangle over the people.

#load input video stream
cap = cv2.VideoCapture("person.mp4") 
#instantiate a variable 'p' to keep count of persons
p = 0  
#initialize the writer
writer = None
(W, H) = (None, None)
starting_time = time.time()
frame_id = 0
while True:
    ret , frame= cap.read()
    frame_id += 1
    if W is None or H is None:
        (H, W) = frame.shape[:2]

    # Detecting objects
    blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)

    outs = net.forward(output_layers)
    # initialize our lists of detected bounding boxes, confidences, and class IDs, respectively
    boxes = []
    confidences = []
    class_ids = []
    # loop over each of the layer outputs
    for out in outs:
        # loop over each of the detections
        for detection in out:
            # extract the class ID and confidence 
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            # filter out weak predictions
            if confidence > 0.7:
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)

                # Rectangle coordinates
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                # update our list of bounding box coordinates, confidences, and class IDs
                boxes.append([x, y, w, h])

    # apply non-maxima suppression to suppress weak, overlapping bounding boxes
    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.3, 0.2)

    #detecting persons
    if len(indexes) > 0:
        # loop over the indexes we are keeping
        for i in indexes.flatten():
            # extract the bounding box coordinates
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])
            label = str(classes[class_ids[i]])
            if label == 'person':
            # draw a bounding box rectangle and label on the frame
            color = [int(c) for c in colors[classIDs[i]]]
            cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
            text = label + ':' + str(p)
            cv2.putText(frame, text, (x, y+30),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2)
        if writer is None:
            # initialize our video writer
            fourcc = cv2.VideoWriter_fourcc(*"MJPG")
            writer = cv2.VideoWriter("person_out1.mp4", fourcc, 30,(frame.shape[1], frame.shape[0]), True)

    elapsed_time = time.time() - starting_time
    fps = frame_id / elapsed_time
    print(str(round(fps, 2))
    cv2.imshow("Frame", frame)
    if cv2.waitKey(1) == 27:

The program is now ready to run. Each frame is run through the YOLO object detector and identified items are highlighted. The program can be stopped by pressing the key ‘ESC’ at any time.



Yayyy!! Finally, we are able to count people in real time video streaming. Try to implement it on your own and let us know what would you do differently in the comments section.

Shruti Sharma


Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview