Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov (2017)
Presented by Christabella Irwanto
E.g. patients’ clinical records in disease-related models.
For a given labeled data record \((\mathbf{x}, y)\) and a model \(f\)’s prediction vector \(\mathbf{y} = f(\mathbf{x})\), determine if \((\mathbf{x}, y)\) was in the model’s training dataset \(D^{train}_{target}\)
If using method 1, model-based synthesis, the records used in both \(D^{training}_{target}\) and \(D^{test}_{target}\) have high confidence
Real data | 10% noise | 20% noise | |
Precision | 0.678 | 0.666 | 0.613 |
Recall | 0.98 | 0.99 | 1.00 |
Models with higher generalizability are less vulnerable to membership inference attack