World's most popular travel blog for travel bloggers.

# [Solved]: How are HOGs calculated?

, ,
Problem Detail:

Histograms of oriented gradients for human detection seems to be the paper from which all other papers cite when they use HOG features. However, this was the only description I could find in it:

[...] In practice this is implemented by dividing the image window into small spatial regions ("cells"), for each cell accumulating a local 1-D histogram of gradient directions or edge orientations over the pixels of the cell. The combined histogram entries form the representation. For better invariance to illumination, shadowing, etc., it is also useful to contrast-normalize the local responses before using them. This can be done by accumulating a measure of local histogram "energy" over somewhat larger spatial regions ("blocks") and using the results to normalize all of the cells in the block. We will refer to the normalized descriptor blocks as Histogram of Oriented Gradient (HOG) descriptors.

It is not clear to me how the HOG features are now calculated:

• How big are the cells? How is that size chosen?
• What is a 1-D histogram?
• What is a histogram of directions?

Suppose we have the following 5x5 patch of a grayscale image (I hope that is a normal size for a cell - if not, please just copy this block as often as necessary or give another example):

000 025 255 016 200

000 255 255 017 201

010 012 210 012 111

000 000 000 000 000

255 254 255 254 255

What would be the HOG features of it?

(Can you give a citable resource for the description?)

###### Asked By : Martin Thoma

First, let's talk about what a histogram of directions is. You can think of the image as a 2D discrete function of x and y: I(x,y). You can take partial derivatives of this function: Ix and Iy. So at each pixel you have the gradient, which is the vector (Ix(x,y), Iy(x,y)). If you compute the magnitude of the gradient, and look at it as an image, you will see the edge map. If you compute the orientation of the gradient at each pixel, you get the edge orientation at that pixel. This is what you take a histogram of.

You divide the range of orientations into some number of bins (let's say 8). Then you look at a single cell. Inside the cell you compute the gradient of every pixel, take its orientation, and add it to the appropriate histogram bin. You can also weigh the contributions by the gradient magnitude, so that pixels with stronger edges contribute more to the histogram. Once you are done, the histogram encodes the distribution of edge orientations within the cell. Dividing the image into cells preserves some of the spatial layout of the edges.

The size of the cell is a parameter, which you can tune. Typically, it is set to 8x8. If the cell is too small, then you do not have enough pixels for a meaningful histogram. If the cell is too large, then you lose too much of the spatial information.