Using YOLO for Object Detection: How to Extract People Images
YOLO (You Only Look Once) is a popular open source neural network model for object detection. In this post, we will explain how to use YOLO to extract images where a bunch of people are in (or at least one person is in).
Firstly, we need to install the YOLO library and dependencies. For this, we will use the pip package manager and install the following libraries:
pip install numpy
pip install opencv-python
pip install tensorflow
pip install keras
Next, we will download the pre-trained YOLO weights and configuration file from the official website. These files can be found at https://pjreddie.com/darknet/yolo/.
Once we have the weights and configuration files, we can use them to perform object detection on our images. Here is an example of how to use YOLO to detect people in an image:
import cv2
import numpy as np
# Load YOLO model
net = cv2.dnn.readNet("./yolov3.weights", "./darknet/cfg/yolov3.cfg")
# Define input image
image = cv2.imread("image.jpg")
# Get image dimensions
(height, width) = image.shape[:2]
# Define the neural network input
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
# Perform forward propagation
output_layer_name = net.getUnconnectedOutLayersNames()
output_layers = net.forward(output_layer_name)
# Initialize list of detected people
people = []
# Loop over the output layers
for output in output_layers:
# Loop over the detections
for detection in output:
# Extract the class ID and confidence of the current detection
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
# Only keep detections with a high confidence
if class_id == 0 and confidence > 0.5:
# Object detected
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)
# Add the detection to the list of people
people.append((x, y, w, h))
# Draw bounding boxes around the people
for (x, y, w, h) in people:
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Show the image
cv2.imshow("Image", image)
cv2.waitKey(0)
In this example, we first load the YOLO model using the cv2.dnn.readNet
method. Then, we define the input image and get its dimensions. Next, we define the neural network input by creating a blob from the image and normalizing its pixel values. Finally, we perform forward propagation and loop over the output layers to extract the detections and extract the bounding box coordinates around the people (our interest is in person detection). We can then draw these bounding boxes around the people on the original image using the cv2.rectangle
method.
After running this code, you should be able to see an image with bounding boxes around any people that were detected. Check the image below!
If you want to get rid of duplicated rectangles, you can use NMS, for example:
cv2.dnn.NMSBoxes(boxes, confidences, score_threshold=0.5, nms_threshold=0.4)
Also, it is possible to add a caption for each detection:
cv2.putText(image, label, (x, y + 30), font, 2, color, 3)
And the output is:
Now you can easily count the number of persons in provided image.
If you want to learn more about object detection or YOLO, there are plenty of resources available online. Some good places to start include the official YOLO website and the TensorFlow object detection API documentation.
We encourage you to try out the code sample provided in this article and share your results with us. If you have any questions or feedback, please don’t hesitate to reach out.
Thank you for reading, and we hope you found this article helpful.