Adversarial samples are inputs designed to fool a model: they are inputs created by applying perturbations to example inputs in the dataset such that the perturbed inputs result in the model outputting an incorrect answer with high confidence. Often, perturbations are so small that they are imperceptible to the human eye — they are inconspicuous.
Adversarial samples are a concern in a world where algorithms make decisions that affect lives: imagine an imperceptibly altered stop sign that the otherwise high-accuracy image recongnition algorithm of a self-driving car misclassifies as a toilet. Curiously and concerningly, the same adversarial example is often misclassified by a variety of classifiers with different architectures trained on different subsets of data. Attackers can use their own model to generate adversarial samples to fool models they did not build.
Accessorize to a crime (paper), a pair of (physical) eyeglasses to fool facial recognition systems. Impersonators carrying out the attack are shown in the top row and corresponding impersonation targets in the bottom row (including Milla Jovovich).
But adversarial samples are useful, too. They inform us about the inner workings of models by giving us an inuition for what aspects of model input matter for model output (cf. influence functions). In case of adversarial examples, aspects of model input matter for model output that should not matter. Adversarial samples can help expose weaknesses of models. Combined with fast and efficient methods for generation of adversarial examples, such as the Fast Sign, Iterative, and L-BFGS method, adversarial samples can help train neural networks to be less vulnerable to adversarial attack.
The model is fooled by the (distractor) sentence (in blue) (paper).
Adversarial samples will inform the direction of research within the community. Adversarial samples are a consequence of models being too linear. Linear models are easier to optimize but they lack the capacity to resist adversarial perturbation. Ease of optimization has come at the cost of models that are easily misled. This motivates the development of optimization procedures that are able to train models whose behavior is more locally stable … and less vulnerable to attack.
Self-driving cars analyze images from varying distances and viewpoints. A recent paper shows that current methods for generation of adversarial samples generate samples that only fool models at certain distances and from certain viewing angels. Or maybe not … that claim is already being challenged.
More from the Blog
Sep 29 2017
For some, Mayweather-McGregor was the prizefight of the summer. For others, it has been Musk-Zuckerberg going toe-to-toe over the risks posed by AI, with Musk voicing his reservations about artificial intelligence while Zuck remains more sanguine. Musk has called AI possibly the “biggest threat” to humanity and gone so far as to suggest the decidedly un-Catholic opinion that Silicon Valley shou...
Oct 2 2017
by — Earlier this year we launched a research report on probabilistic programming, an emerging programming paradigm that makes it easier to describe and train probabilistic models. The Bayesian probabilistic approach to model building and inference has many advantages in practical data science, including the ability to quantify risk (a superpower in industries like finance and insurance) and the abi...
Dec 6 2018
by — Our prototypes are designed to demonstrate the value of the technologies we research. For our most recent prototype, Turbofan Tycoon, we decided that the best way to demonstrate the value of federated learning was to place you in an interactive simulation where you’re in charge of maintaining four turbofan engines. In this post, I’m going to try and explain a bit about why we decided that, an...