Iterative Least Likely Method

Both of the previous methods are untargeted attacks, so they only try to change the prediction to another class, most likely one similar to that of the original class which can lead to uninteresting results. For example, adversarially perturbation may only confuse the classifier from labelling a Husky as an Alaskan Malamute, two similar looking breeds of dog. With some small modifications to the Basic Iterative Method, it is possible to turn it into a targetted adversarial attack.

By changing the BIM algorthm to alter the image towards a specific target class instead of away from the correct class, it yields the Iterative Gradient Sign Method. A problem arises in studying the effectiveness of the algorithm because it would be dependent of the target class. In targetting the least likely class by choosing the lowest confidence class in each image gives the Iterative Least Likely Class Method (ILLM) Adversarial Examples in the Physical World. By choosing the least likely class for each example, it gives an idea of the worst case scenario for the algorithm.

Similar to BIM a clean image is used for initialization in iteration :

The next step is similar to BIM. The most noteable changes are the change in the + to a - to represent the modification of the image towards class , the least likely class, as opposed to altering the image away from the correct label.

As in equation 2.3 the adversary is generated by finally clipping the pixel values:

These steps are repeated times. For the hyperparameters and the authors use the same values as for BIM.

We implement ILLM as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def attack_ILLM(mean, std, model, image, class_index, epsilon, alpha, num_iterations=10):
    '''
    Applies given number of steps of the Iterative Least Likely Method (ILLM) attack on the input image.

    Inputs:
    mean           -- Mean from data preparation
    std            -- Standard deviation from data preparation
    model          -- Network under attack
    image          -- Image data as tensor of shape (1, 3, 224, 224)
    class_index    -- Label from image as numpy array   
    epsilon        -- Hyperparameter for sign method. Has to be scaled to epsilon/255
    alpha          -- Hyperparameter for iterative step as absolute value. Has to be scaled to alpha/255
    num_iterations -- Number of iterations to perform. Default is 10. It is recommended to use the heuristic from the
                      paper "Adversarial Examples in the Pysical World" to determine the number of iterations
    
    Returns:
    image_adver    -- Adversarial image as tensor
    '''

    # Convert label to torch tensor of shape (1)
    class_index = torch.tensor([class_index])

    # Check input image and label shapes
    assert(image.shape == torch.Size([1, 3, 224, 224]))
    assert(class_index.shape == torch.Size([1]))
    
    # Initialize adversarial image as image according to equation 3.1
    image_adver = image.clone()   
    
    # Calculate normalized range [0, 1] and convert them to tensors
    zero_normed = [-m/s for m,s in zip(mean, std)]
    zero_normed = torch.tensor(zero_normed, dtype=torch.float).unsqueeze(-1).unsqueeze(-1)
    
    max_normed = [(1-m)/s for m,s in zip(mean,std)]
    max_normed = torch.tensor(max_normed, dtype=torch.float).unsqueeze(-1).unsqueeze(-1)
    
    # Calculate normalized alpha
    alpha_normed = [alpha/s for s in std]
    alpha_normed = torch.tensor(alpha_normed, dtype=torch.float).unsqueeze(-1).unsqueeze(-1)

    # Calculated normalized epsilon and convert it to a tensor
    eps_normed = [epsilon/s for s in std]
    eps_normed = torch.tensor(eps_normed, dtype=torch.float).unsqueeze(-1).unsqueeze(-1)
    
    # Calculate the maximum change in pixel value using epsilon to be later used in clip function
    image_plus = image + eps_normed
    image_minus = image - eps_normed
    
    for i in range(num_iterations):
        
        # Make a copy and detach so the computation graph can be constructed
        image_adver = image_adver.clone().detach()
        image_adver.requires_grad=True
        
        # Compute gradient of cost with least likely class     
        pred = model(image_adver)
        least_likeliest_class = torch.argmin(pred)
        least_likeliest_class.unsqueeze_(0)     
        loss = F.nll_loss(pred, least_likeliest_class)        
        model.zero_grad()        
        loss.backward()        
        grad_x = image_adver.grad.data       

        # Check if gradient exists
        assert(image_adver.grad is not None)
               
        # Compute X_prime according to equation 3.2
        image_prime = image_adver - alpha_normed * grad_x.detach().sign()
        assert(torch.equal(image_prime, image_adver) == False)
      
        # Equation 3.3 part 1
        third_part_1 = torch.max(image_minus, image_prime)
        third_part = torch.max(zero_normed, third_part_1)
              
        # Equation 3.3 part 2
        image_adver = torch.min(image_plus, third_part)                 
        image_adver = torch.min(max_normed, image_adver)                        

    return image_adver