Slides are available here, made from the same org file that this Hugo blogpost was generated from.

Motivating the Rules of the Game for Adversarial Example Research 🔗

Justin Gilmer, Ryan P. Adams, Ian Goodfellow, David Andersen, George E. Dahl (July 2018)

Presented by Christabella Irwanto

Goals of the paper 🔗

Background and definitions 🔗

…and not motivated by generalization performance or biological mimicry, rather than security

Key contributions 🔑 🔗

Possible rules of the game 🔗

To study security, we need a threat model that is

Goals of the attacker 🔗

Knowledge of the attacker 🔗

Who goes first, and is the game repeated? 🔗

Action space of the attacker 🔗

What are they allowed to do?

Action space of the attacker 🔗

Can they do this instead?

Action space of the attacker 🔗

Can they pick their own starting point?

Action spaces 🔗

Security setting Constraints on input (human perception) Starting point
Indistinguishable perturbation Changes must be undetectable Fixed
Content-preserving perturbation Change must preserve content Fixed
Non-suspicious input Input must look real Any input
Content-constrained input Input must preserve content or function Any input
Unconstrained Any Any

Content-constrained input 🔗

Non-suspicious input 🔗

Unconstrained input 🔗

screenshot_20181104_172433.png https://wired.com/story/hackers-say-broke-face-id-security/

Content-preserving perturbation 🔗

Content-preserving perturbation 🔗

screenshot_20181104_172852.png https://qz.com/721615/smart-pirates-are-fooling-youtubes-copyright-bots-by-hiding-movies-in-360-degree-videos

Imperceptible perturbation 🔗

Recent literature 🔗

probability that a random \(x\) is with distance \(\epsilon\) of a misclassified sample

Problems with measuring robustness 🔗

Robustness is a misleading metric 🔗

Suspicious adversarial defenses

Where does it fit in the taxonomy? 🔗

\(l_p \neq\) perceived similarity 🔗

Plausibility of examples in literature 🔗

If security is the motivation, we should

Let’s look at common motivating scenarios for the standard ruleset in the literature

Stop Sign Attack 🔗

Knocked-over Stop Sign Attack 🔗

… simply covering sign, or knocking it over

Figure 3: “knocked over stop sign attack” is 100% successful in “tricking” the model, robust to lighting and perspective changes, and even worse, already occurs “in the wild”!

Evading Malware Detection 🔗

Fooling Facial Recognition 👓 🔗

Test Set Attack 🔗

Evaluating errors realistically 🔗

Moving forward 🔗

Evaluating SOTA defense realistically 🔗

Don’t forget simpler attacks 🔗

Vikas Verma, Alex Lamb, Christopher Beckham, Aaron Courville, Ioannis Mitliagkis, and Yoshua Bengio. “Manifold Mixup: Encouraging Meaningful On-Manifold Inter- polation as a Regularizer”. In: arXiv preprint arXiv:1806.05236 (2018).

Logan Engstrom, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. “A Rotation and a Translation Su ce: Fooling CNNs with Simple Transformations”. In: arXiv preprint arXiv:1712.02779 (2017).

Security-centric proxy metrics 🔗

Conclusion 🔗

Discussion 🔗

Discussion 🔗

no free lunch: https://github.com/MadryLab/robust-features-code

https://github.com/vikasverma1077/manifold%5Fmixup