To appear at the Beyond Backpropagation workshop at NeurIPS 2020 🔥
🎭 Adversarial attacks generate slightly perturbed inputs that are misclassified by neural networks. The accuracy of some machine learning models can collapse like a house of cards when facing attacks that would not even be noticed by humans. The brittleness of models is a tough obstacle to the deployment of trained models in real-life scenarios. Furthermore, it poses serious questions on what neural networks are actually learning. Training models resistant to adversarial attacks is a hot topic both from a practical and theoretical point of view.
Adversarial attacks can be categorized into white and black-box attacks. Our focus is on white-box attacks, where the attacker has access to the data distribution and the full model architecture. Using this information the attacker can easily craft a specific input to fool the network. While the most famous examples are in computer vision, adversarial attacks are an issue in many other AI domains, like speech recognition and reinforcement learning.
The effectiveness of white-box attacks stems from the great deal of information carried by gradients computed with backpropagation. In the same way, as they can be used to tweak the parameters of a neural network to improve performance, gradients can also allow an attacker to create a deceitful sample that destroys such performance. The Fast Gradient Sign Method (FGSM) uses the following simple equation to generate adversarial examples:
More sophisticated attacks like Projected Gradient Descent (PGD) iterate the previous equation and project back in the neighborhood of the sample when needed. The dynamics of such algorithm in the loss landscape is shown in Figure 1.
🛡Current defensive strategies
Current strategies to build robust architectures can be divided in two groups:
The training process does not minimize a standard loss but a so-called robust loss. In short, the training is also performed on adversarial examples, as an extreme form of data augmentation. Despite being an effective method to increase model robustness, it comes at a cost. Indeed, robust training is data and time expensive. Furthermore, it has been shown robustness achieved in this way naturally introduces a trade-off with accuracy . As a consequence, robust training does not easily scale to larger architectures and datasets.
Another family of defenses consists in actively masking gradients while maintaining a high quality training. Some examples are:
- ⛏️ Shattered gradients: numerical instability or non-differentiable layers generate zero or incorrect gradients.
- 🔀 Stochastic gradients: randomness is introduced in the network itself or in the input data.
- 💥 Exploding and vanishing gradients: using a pipeline similar to recurrent neural networks to create very deep networks where the gradients become extremely small, or extremely large.
The authors of  find that this kind of defenses “give a false sense of security”, because they can be circumvented. For example, we can build differentiable approximations of operations that produce shattered gradients (the authors call this backward pass differentiable approximation).
It has been shown how attacks generated with synthetic gradients, such as those generated by Direct Feedback Alignment (DFA) , are less effective than ones using the true gradients . Nevertheless, in a white-box setting, the attacker is free to choose their favorite method to compute adversarial examples. We need a way to force them to use the inefficient strategy.
💡Leveraging ignorance to increase robustness
A LightOn Optical Processing Unit (OPU) performs a matrix product followed by a nonlinearity that is in practice non-differentiable, since:
- 🧐 The entries of the transmission matrix are unknown. Even though it is possible to recover them through phase retrieval methods , this quickly becomes unfeasible with increasing size, and there is always some relative error.
- 👾 The input and the output are quantized to 1 and 8 bits respectively.
We can take advantage of these characteristics to build an architecture that is robust by design: the attacker is forced to use inefficient attack methods.
An architecture equipped with a layer performing this operation can be trained only with an algorithm that does not use the forward weights in the backward pass, and can handle non-differentiable layers. DFA is a perfect fit.
Figure 2 shows a network with an OPU layer. During training and attacks, DFA is used to bypass the non-differentiable OPU random operation, while the convolutional layers are still trained using BP. In our experiments, we have used a VGG-16 architecture, equipped with an OPU, that we namedVGG-OPU, trained on CIFAR-10.
To quantify the robustness of the VGG-OPU we have attacked it using FGSM and PGD. The natural baseline for our comparison is the performance of a standard VGG-16 under the same kind of attacks. We show the results in Figure 3:
Our architecture is more robust with respect to a standard VGG-16, and this happens by design, without the hassle of alternative expensive training techniques. The defense, arising from the nature of our co-processor, are very expensive, if not impossible, to bypass for the attacker who is limited in its malicious intentions.
Our defense can scale to larger datasets and architectures since its robustness comes at no additional cost.
🚧 What’s next?
Adversarial threats are not limited to images, gradient-based attacks can naturally extend beyond the computer vision domain. Our next step is creating a family of robust-by-design-OPU like architectures in the different domains of AI.
 Arild Nøkland “Direct Feedback Alignment Provides Learning in Deep Neural Networks”
 Mohamed Akrout “On the Adversarial Robustness of Neural Networks without Weight Transport”
 Dimitris Tsipras et al “Robustness May Be at Odds with Accuracy”
 Anish Athalye, Nicholas Carlini et al “Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples”
 Gupta, Sidharth, et al. “Don’t take it lightly: Phasing optical random projections with unknown operators”