Slides are available here, made from the same org file that this Hugo blogpost was generated from.

## Membership Inference Attacks against Machine Learning Models 🔗

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov (2017)

Presented by Christabella Irwanto

### Machine learning as a service 🔗

The elements of output vector are in [0, 1] and sum up to 1.

### Machine learning privacy 🔗

In the context of the overall ML pipeline, we are considering a malicious client

### Basic membership inference attack 🔗

E.g. patients’ clinical records in disease-related models.

• 😈 Who: malicious client
• 🕔 Attack time: inference
• 🥅 Goal: compromise training data privacy
• determine if a data record $$\mathbf{x}$$ was in the model $$f_{target}$$’s (sensitive) training dataset $$D^{train}_{target}$$
• 💪 Capability:
• labeled data record $$(\mathbf{x}, y)$$
• model query access to obtain prediction vector $$\mathbf{y}$$ for $$\mathbf{x}$$
• format of inputs and outputs, e.g. shape and range of values
• either
• (1) architecture and training algorithm of model, or
• (2) black-box access to the oracle (e.g., a “ML as a service” platform) that was used to train the model

### Key contributions 🔑 🔗

• Turn membership inference into a binary classification problem
• Invent “shadow training technique” to mimic black-box models
• Develop 3 effective methods to generate training data for the shadow models
• Evaluate membership inference techniques against neural networks, Amazon ML, and Google Prediction API on realistic tasks successfully
• Quantify how membership leakage relates to performance and overfitting
• Evaluate mitigation strategies

## Membership inference approach 🔗

For a given labeled data record $$(\mathbf{x}, y)$$ and a model $$f$$’s prediction vector $$\mathbf{y} = f(\mathbf{x})$$, determine if $$(\mathbf{x}, y)$$ was in the model’s training dataset $$D^{train}_{target}$$

### How is this even possible? 🔗

• Intuition: machine learning models often behave differently on data that they were trained on 🐵 versus “unseen” data 🙈
• Overfitting is one of the reasons
• We can construct an attack model that learns this behaviour difference

### End-to-end attack process 🔗

• With labeled record $$(\mathbf{x}, y)$$, use target model $$f_{target}$$ to compute prediction vector $$\mathbf{y} = f_{target}(\mathbf{x})$$
• Attack model $$f_{attack}$$ receives both true class label $$y$$ and $$\mathbf{y}$$
• We need $$y$$ since $$\mathbf{y}$$’s distribution depends heavily on it
• $$f_{attack}$$ computes membership probability $$Pr{(\mathbf{x}, y) \in D^{train}_{target}}$$

### How to train $$f_{attack}$$ without detailed knowledge of $$f_{target}$$ or its training set? 🔗

• Mimic target model with “shadow models”
• Train shadow models on proxy targets for which we will know the membership ground truth
• Becomes supervised training
• A binary classification task predicting “in” or “out”

• $$k$$ shadow models, each $$f^i_{shadow}$$ trained on dataset $$D^{train}_{shadow^i}$$ of same format and similar distribution as $$D^{train}_{target}$$
• Assume worst case performance that $$\forall i, D^{train}_{shadow^i} \bigcap D^{train}_{target} = \emptyset$$
• $$\uparrow k \implies \uparrow$$ training fodder for $$f_{attack} \implies \uparrow$$ accuracy of $$f_{attack}$$

Any overlap and the attack will perform better

• The training datasets of the shadow models may overlap.
• Shadow models must be trained similarly to target model, either with same training algorithm and model structure if known, or with the same ML service.
• All models’ internal parameters are trained independently.

### Synthesizing datasets for $$f_{shadow}$$ 🔗

• Model-based synthesis: Synthesize high confidence records on $$f_{target}$$ from noise with hill-climbing search and sampling
• Statistics-based synthesis: Requires statistical information about the population from which $$D^{training}_{target}$$ was drawn
• Simulated by independently sampling from marginal distributions of each feature
• Noisy real data: Real data from a different population or sampled non-uniformly
• Simulated by flipping binary values of 10% or 20% randomly selected features

### Model-based synthesis 🔗

• (1) Search the space of possible data records to find high confidence inputs, and (2) sample from these records.
• Initialization values sampled uniformly at random from entire possible range.
• Hill-climbing objective
• not met: $$k$$ features randomized again
• met, but not sufficient: add to rejections count and reduce $$k$$ if too many rejects
• controls diameter of search around accepted record
• met, and sufficient: select record with probability $$y_c$$

### Training dataset for $$f_{attack}$$ 🔗

• Query each $$f^i_{shadow}$$ with $$D^{train}_{shadow^i}$$ and a disjoint $$D^{test}_{shadow^i}$$.
• $$\forall (\mathbf{x}, y) \in D^{train}_{shadow^i}$$, get $$\mathbf{y} = f^i_{shadow}(\mathbf{x})$$ and add $$(y, \mathbf{y}, \text{in})$$ to $$D^{train}_{attack}$$
• $$\forall (\mathbf{x}, y) \in D^{test}_{shadow^i}$$, get $$\mathbf{y} = f^i_{shadow}(\mathbf{x})$$ and add $$(y, \mathbf{y}, \text{out})$$ to $$D^{train}_{attack}$$

### Training $$f_{attack}$$ 🔗

• Partition $$D^{train}_{attack}$$ by class and train a separate model for each class label $$y$$
• Given $$\mathbf{x}$$ and $$\mathbf{y}$$, predict membership status (“in” or “out”) for $$\mathbf{x}$$
• Class-specific models $$\uparrow$$ accuracy because the true class heavily influences the target model’s behaviour (produces different output distributions)
• If using method 1, model-based synthesis, the records used in both $$D^{training}_{target}$$ and $$D^{test}_{target}$$ have high confidence

• $$\implies$$ $$f_{attack}$$ does not simply learn to classify “in” vs “out” based on high confidence, but performs a much subtler task
• Method-agnostic: can use any state-of-the-art machine learning framework or service to build the attack model

## Experiments 🔗

• Datasets for classification
• CIFAR-10 and CIFAR-100: 32x32 color images
• Shopping purchases to predict shopping style: 600 binary features
• Performed clustering into different number of classes {2, 10, 20, 50, 100}
• Foursquare check-ins to predict geosocial type: 446 binary features
• Texas hospital stays to predict procedure: 6,170 binary features
• MNIST: 32 x 32 monochrome images
• UCI Adult (Census Income): predict if annual income exceeds \$50K
• Target models
• Google Prediction API: No configurations
• Amazon ML: A few tweakable metaparameters; authors test defaults and one other configuration
• Standard CNN for CIFAR and standard fully-connected neural network for purchases

## Evaluation methodology 🔗

• Equal number of members (“in”) and non-members (“out”) to maximize uncertainty of inference; baseline accuracy is 0.5
• Metrics
• ✅ Precision: what fraction of records inferred as members are indeed members
• ☂ Recall: coverage, i.e. what fraction of members are correctly inferred
• Training datasets for different shadow models may overlap

### Effect of overfitting 🔗

• Large gaps between training and test accuracy $$\implies$$ overfitting
• Larger test accuracy $$\implies$$ 👍 generalizability, predictive power
• $$\uparrow$$ overfitting, $$\uparrow$$ leakage (Fig. 11)… but only for same model type
• Amazon (100, 1e − 4) overfits and leaks more than Amazon (10, 1e − 6)
• But Google leaks more than both Amazon models, even if it is less overfitted and has generalizability
• Overfitting is not the only factor in vulnerability; different model structures “remember” different amounts of information

### Precision on CIFAR against CNN (Fig. 4) 🔗

• Low accuracies (0.6 and 0.2) $$\implies$$ heavily overfitted
• Precision follows the same pattern across all classes
• $$\uparrow$$ training dataset size, $$\uparrow$$ variance across classes and $$\downarrow$$ precision
• Attack performs much better than baseline, especially CIFAR-100
• The more classes, the more leakage because models need to “remember” more about training data
• CIFAR-100 is more overfitted to training dataset

### Precision on Purchase Dataset against all target models 🔗

• Any point shows cumulative fraction of classes in y for which the attacker can obtain a membership inference precision up to x
• 50, 75, 90-percentiles of precision are (0.74, 0.79, 0.84), (0.84, 0.88, 0.91), and (0.94, 0.97, 1) respectively
• E.g. 50% of classes get up to 0.74 precision for Amazon
• Recall is close to 1 on all

### Failure modes 🔗

• Failed on MNIST (0.517 precision) because of small number of classes and lack of randomness in data in each class
• Failed on Adult dataset, because
• Model is not overfitted
• Model is binary classifier, so attacker essentially has only has 1 signal to infer membership on

### Effect of noisy shadow data on precision (Fig. 8) 🔗

Real data 10% noise 20% noise
Precision 0.678 0.666 0.613
Recall 0.98 0.99 1.00
• Concludes that attacks are robust even if assumptions about $$D^{training}_{target}$$ are not very accurate

### Real data vs synthetic data (Fig. 9) 🔗

• Overall precision: 0.935 on real data, 0.795 for marginal-based synthetics, 0.895 for model-based synthetics
• Much lower for marginal-based but still very high for most classes
• Dual behaviour for model-based: mostly very high but a few very low
• Because these classes make up < 0.6% of $$D^{training}_{target}$$
• Concludes attack can be trained with only black-box access

algorithm cannot synthesize representatives of these classes via search.

## Why do the attacks work? 🔗

### Overfitting from train-test gap 🔗

Models with higher generalizability are less vulnerable to membership inference attack

### Relating accuracy and uncertainty of $$\mathbf{y}$$ to membership 🔗

• Fig. 12: Differences between member vs non-member inputs’
• output metrics are more observable in the cases where attack is more successful (on purchase dataset with higher classes)

## Mitigation strategies 🔗

• Restrict prediction vector $$\mathbf{y}$$ to top $$k$$ classes
• Round $$\mathbf{y}$$ to $$d$$ floating point digits
• Increase entropy of $$\mathbf{y}$$ by increasing normalizing temperature $$t$$ of softmax layer
• Use $$L_2$$ -norm regularization with various factors $$\lambda$$

## Evaluation of strategies 🔗

• Target model’s prediction accuracy maintained or improved (regularization)
• Unless regularization $$\lambda$$ too large–need to be careful
• Nevertheless, regularization seems necessary and useful both for generalizing and decreasing information leakage
• Not just $$L_2$$ -norm; dropout also shown to strengthen privacy guarantees
• Overall, attack is robust against mitigation strategies
• Restriction to top $$k=1$$ class is not enough, as members and non-members are mislabeled differently

## Conclusion 🔗

• First membership inference attack against machine learning models
• Shadow training technique using noisy data, or synthetic data without prior knowledge of $$D^{training}_{target}$$

### Model inversion 🔗

• Authors took Fredrikson (2015), ran model inversion on CIFAR-10.
• If images in a class are diverse, “reconstructions” from model inversion are semantically meaningless
• Model inversion produces an average of a class and does not reconstruct any specific image, nor infer membership

### Privacy-preserving machine learning 🔗

• ❎ Secure multiparty computation (SMC), or training on encrypted data, would not mitigate inference attacks
• ✅ Differential private models are, by construction, secure against the attack (this Friday)

### ML Models that Remember Too Much (MTRTM) 🔗

• If this paper is a “side channel”, MTRTM is like a “covert channel”
• Malicious training algorithm intentionally designed to leak information
• Differences
• Extent of information leakage: Membership inference only vs up to 70% of corpus
• Relies on low generalizability of $$f$$ vs. aims for high generalizability

## Commentary 🔗

• 👍 Simplicity, intuition, many experiments, novelty
• Good evaluation and synthesis of related work
• Pick fewer binary datasets, attacks mostly performed quite poorly because of reasons mentioned (not enough signal for attack to extract useful membership information)
• Should have more real-valued features in datasets
• Why choose CIFAR to have locally-hosted target model? Because they have the largest training set? Would like to understand more of the reasons behind certain choices made.
• Some missing/hard-to-find information, e.g. in Fig. 12, which method used for shadow training data (to explain low prediction confidence for Google membership )

## Commentary 🔗

• Why the higher the training dataset, the lower the precision, both overall (Fig. 1) and within each class (Fig. 11)? More overfitting, more signals…?
• Empirical CDF diagrams are a little confusing
• Why 10% and 20% noise, is that realistic? 20% a lot worse than 10% as well.
• Should the “out” data records for $$f_{attack}$$ not all belong in the test set of the shadow models? Does it matter?
• Can you synthesize an adversarial example accidentally with model-based synthesis, and would it matter?

## Discussion topics 🤔 🔗

• Realistic applications
• Do the paper’s results have “substantial practical privacy implications”?
• Or is this approach more practically useful as an evaluation metric, if lack of information leakage correlates to how well a model is regularized/generalizes?
• Also to measure effectiveness of privacy preserving techniques
• When should ML service providers be held accountable for training their models in a privacy-preserving manner?
• Having more generalizable models will both increase privacy and utility. Is this the root characteristic of a good ML model?

## Discussion topics 🤔 🔗

• Scalability with number of classes, train a different $$f_{attack}$$ for each class label $$y$$.
• Also mode-based data synthesis method needs possibly many queries, especially as the size of the possible input space grows
• How to extend to regression models?
• How do differences in model structures affect information leakage?