Hey guys !! In today’s article I am going to explain how to count people using Deep Learning and OpenCV. For this, I’ll be using YOLOv3 object detector to detect objects in an image. By applying object detection we will be able to understand what is an image and where a given object resides.
I’ll apply the YOLO object detector on image to count the number of persons in the frame. I’ll train a model simultaneously on both the Image Net classification dataset and COCO detection dataset. The result is a YOLO model, called YOLO9000.
The COCO dataset consists of 80 labels, including, but not limited to:
- People
- Bicycles
- Cars and trucks
- Airplanes
- Stop signs and fire hydrants
- Animals, including cats, dogs, birds, horses, cows, and sheep, to name a few
- Kitchen and dining objects, such as wine glasses, cups, forks, knives, spoons, etc.
- …and many more!
You can download the pre-trained YOLOv3 weights from here.
Importing necessary packages
import cv2 import numpy as np
Loading YOLO weights and cfg
After importing libraries, I’ll load YOLO weights and cfg files to make classes using COCO files using DNN.
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg") classes = [] with open("coco.names", "r") as f: classes = [line.strip() for line in f.readlines()] layer_names = net.getLayerNames() output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] colors = np.random.uniform(0, 255, size=(len(classes), 3))
Counting Persons in live video stream
Now, I’ll turn on my webcam for live video streaming. I’ll loop over the frames and construct a blob from the input frame and then perform a forward pass of the YOLO object detector, which will give us bounding boxes and associated probabilities. Next, I’ll filter out weak predictions, i.e., predictions having probability greater than 0.7 will be considered. Then, I’ll draw a bounding box rectangle over the people.
#load input video stream cap = cv2.VideoCapture("person.mp4") #instantiate a variable 'p' to keep count of persons p = 0 #initialize the writer writer = None (W, H) = (None, None) starting_time = time.time() frame_id = 0 while True: ret , frame= cap.read() frame_id += 1 if W is None or H is None: (H, W) = frame.shape[:2] # Detecting objects blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False) net.setInput(blob) outs = net.forward(output_layers) # initialize our lists of detected bounding boxes, confidences, and class IDs, respectively boxes = [] confidences = [] class_ids = [] # loop over each of the layer outputs for out in outs: # loop over each of the detections for detection in out: # extract the class ID and confidence scores = detection[5:] class_id = np.argmax(scores) confidence = scores[class_id] # filter out weak predictions if confidence > 0.7: center_x = int(detection[0] * width) center_y = int(detection[1] * height) w = int(detection[2] * width) h = int(detection[3] * height) # Rectangle coordinates x = int(center_x - w / 2) y = int(center_y - h / 2) # update our list of bounding box coordinates, confidences, and class IDs boxes.append([x, y, w, h]) confidences.append(float(confidence)) class_ids.append(class_id) # apply non-maxima suppression to suppress weak, overlapping bounding boxes indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.3, 0.2) #detecting persons if len(indexes) > 0: # loop over the indexes we are keeping for i in indexes.flatten(): # extract the bounding box coordinates (x, y) = (boxes[i][0], boxes[i][1]) (w, h) = (boxes[i][2], boxes[i][3]) label = str(classes[class_ids[i]]) if label == 'person': p=p+1 else: continue # draw a bounding box rectangle and label on the frame color = [int(c) for c in colors[classIDs[i]]] cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2) text = label + ':' + str(p) cv2.putText(frame, text, (x, y+30),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2) if writer is None: # initialize our video writer fourcc = cv2.VideoWriter_fourcc(*"MJPG") writer = cv2.VideoWriter("person_out1.mp4", fourcc, 30,(frame.shape[1], frame.shape[0]), True) elapsed_time = time.time() - starting_time fps = frame_id / elapsed_time print(str(round(fps, 2)) cv2.imshow("Frame", frame) writer.write(frame) if cv2.waitKey(1) == 27: cap.release() writer.release() break cv2.destroyAllWindows()
The program is now ready to run. Each frame is run through the YOLO object detector and identified items are highlighted. The program can be stopped by pressing the key ‘ESC’ at any time.
Implementation
Conclusion
Yayyy!! Finally, we are able to count people in real time video streaming. Try to implement it on your own and let us know what would you do differently in the comments section.
–Shruti Sharma
0 Comments