In this article, we are going to implement a pre-trained TensorFlow face mask detection model originally developed by Hussain Mujtaba. Some of the code and TensorFlow model training information can be found in his article.
To begin, let’s go through some of the basics of OpenCV.
First, make a new directory for the project files. Inside of the directory, let’s make a virtual environment to download the necessary packages. If you do not have virtualenv you should run the first line of code, otherwise, skip the first line.
Now that our environment is created we can activate it by typing:
Inside of our virtual environment, lets download the necessary packages:
Now that we have all of our packages installed, let’s add some files and folders to our directory.
First introduced as an object detection method in 2001 by Paul Viola and Michael Jones in this paper, the Viola Jones Algorithm is one of the most efficient and computationally inexpensive facial recognition algorithms. Despite being almost 20 years old, it is still widely used. In fact, if you have ever used a digital camera that drew bounding boxes around faces in the viewfinder, chances are it was utilizing this algorithm.
The algorithm is trained on thousands of positive (with a face) and negative images (without a face), and uses Haar Features to calculate the difference between different regions of an image. These calculations are made by subtracting pixel values from different regions within a specified area.
Let’s say we have an image. In order to reduce computation time we convert that image to grayscale:
Let’s look at how our computer is ‘seeing’ the corner of Obama’s mouth:
Adding and subtracting all of these regions would be too computationally expensive to do in real time, and in order to solve this problem the concept of integral images was introduced. Each pixel in an integral image is calculated by adding all pixels up and to the left of a specific point in the original image. The integral image can the be used to quickly calculate the specific areas of the input image, as opposed to having to make repeated sweeps over all pixels every pass over the image.
In order to make detections, the each feature is tested against and input image. In the beginning, the decision thresholds for the features a low, meaning that some faces will be detected, and some other things will be detected as well. A typical Haar classifier will have around 6,000 features, and as goes further and further through the features, it gets more and more picky. So it may let noise through when checking the first 10 features, but if the 11th feature rejects the image, then the classifier also rejects the image. This allows the algorithm to be fast and efficient.
Implementing a face mask detector.
To see this in action we are going to implement a face mask detector on a public IP camera stream. These are older security/surveillance cameras hooked up to the internet either without passwords, or without changing the default passwords on devices with known security issues. You can implement computer vision on some of these streams, because they transfer data in the form of a .mjpg, which can be loaded into OpenCV with the following method:
Here is the stream we are going to use. It appears to be a doorbell camera facing towards the street. We are going to write some functions that place screenshots from this video feed into the folders with_mask and without_mask. When our program detects a person, it will take a screenshot and place the resulting screenshot.jpg in the proper folder. It will also compile a .csv with the relevant classification, time, prediction confidence (of the TensorFlow mask detection model), and file path to the image of the observation.
We will then add an infinite while loop, that will repeatedly grab the images from the .mjpg stream, and search for faces with the Haar Cascade Classifier. If the Haar Cascade matches a face, the pre-trained TensorFlow model will predict whether the person is or isn’t wearing a mask, and the cycle repeats.
If you run the code from this article, you will see that it does an ok job of properly classifying people with and without masks. But, the model often captures random areas of the screen as either ‘mask’ or ‘no mask’ predictions, leading to a high number of false observations.
This is because the default Haar Cascade we are using was trained for frontal face positions. There are other trained classifiers, including haarcascade_profileface.xml, which may offer marginal improvements on performance. Note: the profileface.xml classifier linked above was trained on left-side profiles.
To improve classification performance, it would be interesting to train a custom Haar Classifier to detect some combination of profile and frontal facial positions. Stay tuned!
The Github repo can be found here.