Hey guys !! In today’s article I am going to explain how to count people using Deep Learning and OpenCV. For this, I’ll be using YOLOv3 object detector to detect objects in an image. By applying object detection we will be able to understand what is an image and where a given object resides.

I’ll apply the YOLO object detector on image to count the number of persons in the frame. I’ll train a model simultaneously on both the Image Net classification dataset and COCO detection dataset. The result is a YOLO model, called YOLO9000.

The COCO dataset consists of 80 labels, including, but not limited to:

• People
• Bicycles
• Cars and trucks
• Airplanes
• Stop signs and fire hydrants
• Animals, including cats, dogs, birds, horses, cows, and sheep, to name a few
• Kitchen and dining objects, such as wine glasses, cups, forks, knives, spoons, etc.
• …and many more!

### Importing necessary packages

import cv2
import numpy as np

After importing libraries, I’ll load YOLO weights and cfg files to make classes using COCO files using DNN.

net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
classes = []
with open("coco.names", "r") as f:
classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))

### Counting Persons in live video stream

Now, I’ll turn on my webcam for live video streaming. I’ll loop over the frames and construct a blob from the input frame and then perform a forward pass of the YOLO object detector, which will give us bounding boxes and associated probabilities. Next, I’ll filter out weak predictions, i.e., predictions having probability greater than 0.7 will be considered. Then, I’ll draw a bounding box rectangle over the people.

#load input video stream
cap = cv2.VideoCapture("person.mp4")
#instantiate a variable 'p' to keep count of persons
p = 0
#initialize the writer
writer = None
(W, H) = (None, None)
starting_time = time.time()
frame_id = 0
while True:
frame_id += 1
if W is None or H is None:
(H, W) = frame.shape[:2]

# Detecting objects
blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)

net.setInput(blob)
outs = net.forward(output_layers)

# initialize our lists of detected bounding boxes, confidences, and class IDs, respectively
boxes = []
confidences = []
class_ids = []
# loop over each of the layer outputs
for out in outs:
# loop over each of the detections
for detection in out:
# extract the class ID and confidence
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
# filter out weak predictions
if confidence > 0.7:
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)

# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)

# update our list of bounding box coordinates, confidences, and class IDs
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)

# apply non-maxima suppression to suppress weak, overlapping bounding boxes
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.3, 0.2)

#detecting persons
if len(indexes) > 0:
# loop over the indexes we are keeping
for i in indexes.flatten():
# extract the bounding box coordinates
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])
label = str(classes[class_ids[i]])
if label == 'person':
p=p+1
else:
continue
# draw a bounding box rectangle and label on the frame
color = [int(c) for c in colors[classIDs[i]]]
cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
text = label + ':' + str(p)
cv2.putText(frame, text, (x, y+30),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2)
if writer is None:
# initialize our video writer
fourcc = cv2.VideoWriter_fourcc(*"MJPG")
writer = cv2.VideoWriter("person_out1.mp4", fourcc, 30,(frame.shape[1], frame.shape[0]), True)

elapsed_time = time.time() - starting_time
fps = frame_id / elapsed_time
print(str(round(fps, 2))
cv2.imshow("Frame", frame)
writer.write(frame)
if cv2.waitKey(1) == 27:
cap.release()
writer.release()
break
cv2.destroyAllWindows()

The program is now ready to run. Each frame is run through the YOLO object detector and identified items are highlighted. The program can be stopped by pressing the key ‘ESC’ at any time.

### Conclusion

Yayyy!! Finally, we are able to count people in real time video streaming. Try to implement it on your own and let us know what would you do differently in the comments section.

Shruti Sharma

$${}$$