For a long time, there has been a task that has eluded computers and software, yet a five year old can do masterfully: object detection, specifically using visual inputs. The applications of this task can range from locating cancer cells from an x-ray image to detecting pedestrians on the street to detecting faces on social media. However, before the advent of neural networks, particularly convolutional neural networks, object recognition algorithms tended to be very specialized and required lots of manual tuning.

For object detection we need to solve two problems: localizing objects and recognizing/classifying the objects, see figure 1. The latter can be achieved using a classifier. To learn how to make those robust see our project on adversarial examples. This series focuses on the recognition part.

Figure 1: The two tasks in object detection. Localization is visualized using bounding boxes which in this case are rectangles.

Recognition of objects in images is difficult since there is a high variability in what distinguishes an object. While in some cases colour can be used to distinguish objects, in other cases texture may be more useful. Or a combination of both might be used [1]. Additionally, object recognition algorithms must somehow obtain such features from a simple grid of pixel intensities.

Around 90 years ago the Gestalt movement has found that there are certain relevant structures to identify objects and that images are strictly hierarchical [2]. Algorithms to detect objects often take advantage of that [1], [3], [4].


References

[1]   Uijlings, J. R. R., van de Sande, K. E. A., Gevers, T., & Smeulders, A. W. M. (2013). Selective Search for Object Recognition. International Journal of Computer Vision, 104(2), 154–171. https://doi.org/10.1007/s11263-013-0620-5

[2]   M. Wertheimer. Laws of organization in perceptual forms (partial translation). W. B. Ellis, editor, A Sourcebook of Gestalt Psychology, pages 71-88. Harcourt, Brace and Company, 1938.

[3]   Shi, J., & Malik, J. (2000). Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 18.

[4]   Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619. https://doi.org/10.1109/34.1000236