Creating, comparing, and storing histograms
A grey-green colour that often finds itself on the walls of public institutions—e.g., hospitals, schools, government buildings—and, where appropriated, on sundry supplies and equipment. |
||
-- "Institutional green", Segen's Medical Dictionary (2012) |
I hesitate to make sweeping statements about the ideal color of paint on a wall. It depends. I have found solace in many walls of many colors. My mother is a painter and I like paint in general.
But not all color is paint. Some color is dirt. Some color is concrete or marble; plywood or mahogany. Some color is the sky through big windows, the ocean, the golf course, or the swimming pool or jacuzzi. Some color is discarded plastics and beer bottles, baked food on the stove, or perished vermin. Some color is unknown. Maybe the paint camouflages the dirt.
A typical camera can capture at least 16.7 million (256 * 256 * 256) distinct colors. For any given image, we can count the number of pixels of each color. This set of counts is called the color histogram of the image. Typically, most entries in the histogram will be 0 because most scenes are not polychromatic (many colored).
We can normalize the histogram by dividing the color counts by the total number of pixels. Since the number of pixels is factored out, normalized histograms are comparable even if the original images have different resolutions.
Given a pair of normalized histograms, we can measure the histograms' similarity on a scale of 0 to 1. One measure of similarity is called the intersection of the histograms. It is computed as follows:
Here is the equivalent Python code (which we will optimize later):
def intersection(hist0, hist1): assert len(hist0) == len(hist1), 'Histogram lengths are mismatched' result = 0 for i in range(len(hist0)): result += min(hist0[i], hist1[i]) return result
For example, suppose that in one image 50 percent of the pixels are black and 50 percent are white. In another image, 100 percent of the pixels are black. The similarity is:
min(50%, 100%) + min(50%, 0%) = 50% = 0.5.
Note
Here, a similarity of 1 does not mean that the images are identical; it means that their normalized histograms are identical. Relative to the first image, the second image could be of a different size, could be flipped, or could even contain the same pixel values in a randomly different order.
Conversely, a similarity of 0 does not mean that the images look completely different to a layperson; it just means that they have no color values in common. For example, an image that is all black and another image that is all charcoal gray have histograms with a similarity of 0 by our definition.
For the purpose of classifying images, we want to find the average similarity between a query histogram and a set of multiple reference histograms. A single reference histogram (and a single reference image) would be much too specific for a broad classification such as "Luxury, indoor."
Note
Although we will focus on one approach to compare histograms, there are many alternatives. For a discussion of several algorithms and their implementations in Python, see this blog post by Adrian Rosebrock: http://www.pyimagesearch.com/2014/07/14/3-ways-compare-histograms-using-opencv-python/.
Let's write a class called HistogramClassifier
, which creates and stores sets of references histograms and finds the average similarity between a query histogram and each set of reference histograms. To support this functionality, we will use OpenCV, NumPy, and SciPy. Create a file called HistogramClassifier.py
and add the following import statements at the top:
import numpy import cv2 import scipy.io import scipy.sparse
An instance of HistogramClassifier
stores several variables. A public Boolean called verbose
controls the level of logging. A public float called minimumSimilarityForPositiveLabel
defines a similarity threshold; if all the average similarities fall below this value, then the query image is given an "Unknown"
classification. Several variables store values related to the color space. We will assume that our images have 3 color channels with 8 bits (256 possible values) per channel. Finally, and most importantly, a dictionary called _references
maps string keys such as "Luxury, interior"
to lists of reference histograms. Let's declare the variables in the __init__
method belonging to HistogramClassifier
, as follows:
class HistogramClassifier(object): def __init__(self): self.verbose = False self.minimumSimilarityForPositiveLabel = 0.075 self._channels = range(3) self._histSize = [256] * 3 self._ranges = [0, 255] * 3 self._references = {}
Note
By convention, in a Python class, a variable or method name is prefixed with an underscore if the variable or method is meant to be protected (accessed only within the class and its subclasses). However, this level of protection is not actually enforced. Most of our member variables and methods in this book are marked as protected, but a few are public. Python supports private variables and methods (denoted by a double underscore prefix) that are meant to be inaccessible even to subclasses. However, we will avoid private variables and methods in this book because Python classes should typically be highly extensible.
HistogramClassifier
has a method, _createNormalizedHist
, which takes two arguments: an image and a Boolean value indicating whether to store the resulting histogram in a sparse (compressed) format. The histogram is computed using an OpenCV function, cv2.calcHist
. As arguments, it takes the image, the number of channels, the histogram size (that is, the dimensions of the color space), and the range of each color channel. We will flatten the resulting histogram into a one-dimensional format that uses memory more efficiently. Then, optionally, we will convert the histogram to a sparse format using a SciPy function called scipy.sparse.csc_matrix
.
Note
A sparse matrix uses a form of compression that relies on a default value, normally 0. That is to say, we won't bother to store all the zeroes individually, instead we will note the ranges that are full of zeroes. For histograms, this is an important optimization because in a typical image, most of the possible colors are absent. Thus, most of the histogram values are 0.
Compared to an uncompressed format, a sparse format offers better memory efficiency but worse computational efficiency. The same tradeoff applies to compressed formats in general.
Here is the implementation of _createNormalizedHist
:
def _createNormalizedHist(self, image, sparse): # Create the histogram. hist = cv2.calcHist([image], self._channels, None, self._histSize, self._ranges) # Normalize the histogram. hist[:] = hist * (1.0 / numpy.sum(hist)) # Convert the histogram to one column for efficient storage. hist = hist.reshape(16777216, 1) if sparse: # Convert the histogram to a sparse matrix. hist = scipy.sparse.csc_matrix(hist) return hist
A public method, addReference
, accepts two arguments: an image and a label. (The label is a string that describes the classification.) We will pass the image to _createNormalizedHist
in order to create a normalized histogram in a sparse format. For a reference histogram, the sparse format is more appropriate because we want to keep many reference histograms in memory for the entire duration of a classification session. After creating the histogram, we will add it to a list in _references
using the label as the key. Here is the implementation of addReference
:
def addReference(self, image, label): hist = self._createNormalizedHist(image, True) if label not in self._references: self._references[label] = [hist] else: self._references[label] += [hist]
For the purposes of Luxocator, reference images come from the files on disk. Let's give HistogramClassifier
a public method, addReferenceFromFile
, which accepts a file path instead of directly accepting an image. It also accepts a label. We will load the image from a file using an OpenCV method called cv2.imread
, which accepts a path and a color format. Based on our earlier assumption about having 3 color channels, we always want to load images in color, not grayscale. This option is represented by the value cv2.CV_LOAD_IMAGE_COLOR
. Having loaded the image, we will pass it and the label to addReference
. The implementation of addReferenceFromFile
is as follows:
def addReferenceFromFile(self, path, label): image = cv2.imread(path, cv2.CV_LOAD_IMAGE_COLOR) self.addReference(image, label)
Now, we have arrived at the crux of the matter: the classify
public method, which accepts a query image as well as an optional string to identify the image in a log output. For each set of reference histograms, we will compute the average similarity to the query histogram. If all similarity values fall below minimumSimilarityForPositiveLabel
, we will return the 'Unknown'
label. Otherwise, we will return the label of the most similar set of reference histograms. If verbose
is true
, we will also log all the labels and their respective average similarities. Here is the method's implementation:
def classify(self, queryImage, queryImageName=None): queryHist = self._createNormalizedHist(queryImage, False) bestLabel = 'Unknown' bestSimilarity = self.minimumSimilarityForPositiveLabel if self.verbose: print '================================================' if queryImageName is not None: print 'Query image:' print ' %s' % queryImageName print 'Mean similarity to reference images by label:' for label, referenceHists in self._references.iteritems(): similarity = 0.0 for referenceHist in referenceHists: similarity += numpy.sum(numpy.minimum( referenceHist.todense(), queryHist)) similarity /= len(referenceHists) if self.verbose: print ' %8f %s' % (similarity, label) if similarity > bestSimilarity: bestLabel = label bestSimilarity = similarity if self.verbose: print '================================================' return bestLabel
Note the use of the todense
method to decompress a sparse matrix.
We will also provide a public method, classifyFromFile
, which accepts a filepath instead of directly accepting an image. The following code defines this method:
def classifyFromFile(self, path, queryImageName=None): if queryImageName is None: queryImageName = path queryImage = cv2.imread(path, cv2.CV_LOAD_IMAGE_COLOR) return self.classify(queryImage, queryImageName)
Computing all our reference histograms will take a bit of time. We do not want to recompute them every time we run Luxocator. Thus, we need to serialize and deserialize (save and load) the histograms to/from the disk. For this purpose, SciPy provides two functions, scipy.io.savemat
and scipy.io.loadmat
. They accept a file and various optional arguments.
We can implement a serialize
method with optional compression, as follows:
def serialize(self, path, compressed=False): file = open(path, 'wb') scipy.io.savemat( file, self._references, do_compression=compressed)
While deserializing, we will get a dictionary from scipy.io.loadmat
. However, this dictionary contains more than our original _references
dictionary. It also contains some serialization metadata and some additional arrays that wrap the lists that were originally in _references
. We will strip out these unwanted added contents and store the result back in _references
. The implementation is as follows:
def deserialize(self, path): file = open(path, 'rb') self._references = scipy.io.loadmat(file) for key in self._references.keys(): value = self._references[key] if not isinstance(value, numpy.ndarray): # This entry is serialization metadata so delete it. del self._references[key] continue # The serializer wraps the data in an extra array. # Unwrap the data. self._references[key] = value[0]
That is our classifier. Next, we will test our classifier by feeding it some reference images and a query image.