Accuracy - A measure of model performance. For binary classification tasks, the proportion of correct predictions out of the total number of predictions.

Anchor Box - An initial bounding box used in object detection which is then normally altered to better fit the object. Its size and aspect ratio are either set manually or determined by k-means clustering. See Faster R-CNN for more details.

Average Pooling - An operation used to condense information within a region of an image by taking the average value within that region. When taking the maximum instead of the average, this is called max pooling. See ConvNet Basics for more details.

Average Precision (AP) - A measure of the performance of an object detection algorithm by calculating the area under the precision-recall curve. See metrics section for more details.

Batch Normalization - A technique where the input to a neural network layer is standardized by transforming the mini batch to have its mean set to \(0\) and standard deviation set to \(1\). See Batch Norm and ResNets for more details.

Bounding Box - Rectangular box used to identify the locations of objects in object detection.

Convolutional Feature Map - Output of a Fully Convolutional Network. Has a height, a width and a depth according to the number of kernels used. Is used for object detection in the R-CNN networks.

Convex Optimization Problem - Optimization problem with a convex objective function. Here, the local minimum is the global minimum which can be approximately located by picking a point and then descending down the gradient in polynomial-time.

Corner - An intersection of two or more edges.

Dropout - A regularization technique used where neurons are randomly removed while training a network.

Edge - A type of feature in an image where there is a substantial change in pixel values between neighbouring pixels.

Feature - In computer vision, it is an intermediate output in a neural network or machine learning algorithm. It could be a colour, an edge, the presence of a wheel or face, or a combination of all of the above. Generally, it is a useful input for a machine learning model.

Feature Pyramid - Similar to Image Pyramid, however consisting of feature map representations instead.

Fully Convolutional Network (FCN) - A ConvNet without fully connected layers.

Image Pyramid - A multi-scale image representation in which an image is repeatedly smoothed and subsampled to create the different levels of the pyramid.

Intersection over Union (IoU) - A measure to quantify the overlap of two shapes, for example bounding boxes. Calculated by dividing the overlapping area by the total area both shapes cover.

K-Means Clustering - An unsupervised clustering algorithm where a number of centroids (k) and each datapoint is clustered with its closest centroid. The centroids are then iteratively moved to minimize the sum of the squared distance between a centroid and the datapoints in its cluster. The datapoints may change clusters as the centroids are moved.

Kernel - Also called a filter. It is a grid of weights, for example used in convolutions to summarize spatial information in an area of an image. See ConvNet Basics.

Max Pooling - See average pooling

Mean Average Precision (mAP) - AP averaged over all object classes. Sometimes referred to as just AP based on context.

Non-Max Suppression - Algorithm used to remove redundant bounding boxes based on an IoU threshold.

Padding - Values added around the boundaries of an image or feature map to control the size and shape of the output during a convolution or other operation. Normally these values are zeroes (zero padding), but can be other values too.

Precision - Ratio of true positives over the predicted positives. Tells you how often the positive predictions are correct.

Recall - Ratio of correctly predicted positives over all positives which exist in the dataset. Tells you how often the true positives are caught or missed in the predictions.

Region Proposal - Also called Region of Interest (RoI). Term used in the R-CNN family of networks for regions which could contain objects. RPs are computed by either a dedicated algorithm like Selective Search or the Region Proposal Network.

Regularization - The process of using techniques that prevent overfitting to help generalize the model or algorithm to data outside of the training dataset.

ResNet - Short for Residual Network. Often used to mitigate the vanishing gradient problem to create very deep neural networks hundreds of layers deep. Works by creating “identity shortcut connections’’ that concatenate the input from several layers earlier with the normal input of the layer. See Batch Norm and ResNets for more details.

Region of Interest (RoI) - See Region Proposal.

Selective Search - Algorithm to segment an image. Used in earlier R-CNN networks to generate Region Proposals.

Support Vector Machine - A type of supervised learning model that may be used for classification or regression. Works by creating a decision boundary to separate two classes of data and maximizing the margin between the decision boundary and the data points. Can be used to separate non-linearly separable data using the kernel trick to project the data into higher dimensions where they may be linearly separable.

Stride - The number of pixels or units to slide over after completing a step of a convolution or other operation.