The earliest and simplest method to generate adversarial examples is the Fast Gradient Sign Method (FGSM) as introduced in Explaining and Harnessing Adversarial Examples by Goodfellow, I. et al. This non-iterative method generates examples in one step and leads to robust adversaries. It computes a step of gradient descent and moves one step of magnitude \(\epsilon\) into the direction of this gradient:

\begin{equation} \tag{1.1} \widetilde{X} = X + \eta \end{equation}

\begin{equation} \tag{1.2} \eta = \epsilon sign(\nabla_{x} J(\Theta, x, y)) \end{equation}

Essentially, FGSM takes one step to increase the cost function with the correct label, hoping that this will be enough to change the top prediction. The main benefit of this technique is it takes relatively little computational time to create adversarial images.

One downside of the FGSM is that the manipulated images are often perceptible for humans for anything but the smallest changes in pixel values. This may be because this method can only modify pixel values upwards or downwards a constant value rather than a seemingly random value. Additionally, manipulations using this technique are particularly noticeable around the darker areas of an image because the relative magnitude of manipulation compared to the original image’s pixel values. This can be improved by using iterative methods.

The notebook is available here.

We first load and preprocess the data as previously explained. The attack is implemented as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def attack_FGSM(mean, std, image, epsilon, grad_x):
    '''
    Applies Fast Gradient Sign Method (FGSM) attack on the input image.
    
    Inputs:
    mean           -- Mean from data preparation
    std            -- Standard deviation from data preparation
    image          -- Image data as tensor of shape (1, 3, 224, 224)  
    epsilon        -- Hyperparameter for sign method. Has to be scaled to epsilon/255
    grad_x         -- Gradient obtained from prediction with image on model
    
    Returns:
    image_tilde    -- Adversarial image as tensor
    '''
    
    ## Calculated normalized epsilon and convert it to a tensor   
    eps_normed = [epsilon/s for s in std]
    eps_normed = torch.tensor(eps_normed, dtype=torch.float).unsqueeze(-1).unsqueeze(-1)
    
    ## Compute eta part
    eta = eps_normed * grad_x.sign()

    ## Apply perturbation
    image_tilde = image + eta    
    
    ## Clip image to maintain the range [min, max]
    image_tilde = torch.clamp(image_tilde, image.detach().min(), image.detach().max())
    
    ## Calculate normalized range [0, 1] and convert them to tensors
    zero_normed = [-m/s for m,s in zip(mean, std)]
    zero_normed = torch.tensor(zero_normed, dtype=torch.float).unsqueeze(-1).unsqueeze(-1)
    
    max_normed = [(1-m)/s for m,s in zip(mean,std)]
    max_normed = torch.tensor(max_normed, dtype=torch.float).unsqueeze(-1).unsqueeze(-1)
    
    ## Clip image so after denormalization and destandardization, the range is [0, 255]
    image_tilde = torch.max(image_tilde, zero_normed)
    image_tilde = torch.min(image_tilde, max_normed)
    
    return image_tilde

In the data preparation step, the data is both normalized and standardized; therefore \(\epsilon\) has to be as well. \(\epsilon\) is normalized by dividing it by 255 and standardized by dividing it by the standard deviation of the data (line 17). The images are then clipped to ensure that the image’s pixel values are between 0 and 255 after destandardization and denormalization.

We investigate how the FGSM attack performs in the Results section.