đź“‘ “Privacy-Preserving Deep Learning” Shokri đź”—
Paper summary đź”—
How do we enable multiple parties to jointly learn an accurate neural network model without sharing their input datasets? For example, hospitals could benefit from other hospital’s datasets to improve the accuracy of their own models.
The devised collaborative deep learning protocol titled “Selective Stochastic Gradient Descent (SGD)” (Fig. 2) consists of \(N\) separate participants with the same model and training strategy, communicating with a shared parameter server.
- Each participant initializes their own parameter values and trains a model locally.
- After a local SGD epoch, they select some parameters (by choosing the largest values, or random selection with theshold), and upload their gradients to the parameter server. Selective parameter sharing is effective because SGD can be parallelized and run asynchronously.
- Server updates parameters, with a decay factor to increase the contribution of fresher updates.
- Participant downloads latest parameter values from server to train the next epoch.
Table 4 confirms the intuition that by sharing any small fraction of the parameters (10%, 1%, or 0.1%) overfitting is reduced, and the accuracy is better than if each model trained on its own (“Standalone”). The higher the fraction, the higher the accuracy, but the higher the privacy loss.
To ensure differential privacy of parameter updates, Laplacian random noise is added to perturb the gradient of the parameter update \(\Delta \mathbf{w}^{(i)}\) during selection and upload so that the input cannot be inverted, while maintaining the utility of the update. \(\Delta \mathbf{w}^{(i)}\) is bounded to \([-\gamma, \gamma]\) to avoid overfitting, and avoid heavy computation of the true function sensitivity of SGD, \(\Delta f\), by simply using the upper bound of \(2\gamma\). Lower privacy budget \(\epsilon\) results in higher noise and stronger differential privacy guarantees, but this reduces accuracy, as shown in Fig. 13.
Another privacy measure is an oblivious parameter server that cannot identify the participants or which gradients they upload.
Discussion summary đź”—
- What are the pros and cons of centralized vs distributed ML models?
Sometimes it’s not a matter of which is better, but rather the setup. Nevertheless, when there is a choice, distributed approaches are able to ensure different kinds of privacy (model, data) that centralized approaches cannot. However, distributed approaches still introduce their own set of privacy challenges (elaborated below).
- Are there any other attacks that could happen in this distributed deep learning model?
- Model evasion: Each participant knows what the shared model looks like (white-box), compared to a black-box MLAaS scenario, making it easier to, for example, train substitute models and generate adversarial examples. Participants having white-box access also increases the possibility of model leakage attacks.
- Model poisoning: One of the participants could feed bad weights in, even if the damage to the shared parameters might be limited if most parties are “honest”. It would be difficult to assess if each party is providing useful gradients, since anomalous behaviour alone does not imply maliciousness. Furthermore, some preventive measure like keeping track of the history of each participant, would compromise privacy.
Conclusion đź”—
Exploiting the fact that SGD is parallelizable and can be run asynchronously, and that the stochasticity of the learning process is beneficial, the authors propose and evaluate a feasible differentially private distributed selective SGD protocol that improves the accuracy of multiple parties’ deep learning models beyond what is individually achievable.
However, even the strictest (lowest) privacy budget shown here is calculated per parameter, and deep learning models typically have many parameters, so a participant’s privacy loss is actually tremendously high, exceeding several thousand on MNIST Abadi. Additionally, a follow-up paper Hitaj has experimentally shown this approach to be vulnerable, by training a Generative Adversarial Network (GAN) to generate prototypical samples of training data from a target class.
đź“‘ “GAZELLE: A Low Latency Framework for Secure Neural Network Inference” Juvekar đź”—
Paper summary đź”—
The premise is that a client wishes to provide some input data (without revealing details of the data) to a model owner, who wishes to provides inference results back (without revealing model parameters). Some applications are clients with highly sensitive data and Machine Learning as a Service (MLaaS) providers offering a “try before you buy” option.
The main contributions are a homomorphic encryption library providing fast algorithms for basic homomorphic operations, homomorphic linear algebra kernels mapping neural network layers to optimized homomorphic matrix-vector multiplication, and optimized encryption switching protocols which seamlessly convert between homomorphic and garbled circuit encodings for complete neural network inference.
Homomorphic encryption (HE) allows computation on encrypted data without knowledge of original data. While it is fast in computing linear layers of neural networks, it introduces noise that slows computation as it accumulates. This paper uses Packed Additively Homomorphic Encryption (PAHE).
Garbled circuits (GC) is a form of two-party computation (2PC) that “encrypts a computation” to reveal only its output, allowing two parties to jointly evaluate a function over private inputs. While it works well in nonlinear layers, bandwidth for communication is high in linear layers.
The paper proposes a protocol, GAZELLE, based on the alternating use of PAHE and GC:
- The client uses HE to encrypt their input to send to the server.
- The server uses HE to evaluate the linear layer.
- The client and server evaluate the non-linear layer with GC together.
On MNIST and CIFAR-10, GAZELLE obtain 20-30x lower latency and 2.5-88x lower online bandwidth when compared with multiple 2PC-based state-of-art secure network inference solutions, and more than 3 orders of magnitude lower latency and 2 orders of magnitude lower bandwidth than purely homomorphic approaches.
Discussion summary đź”—
- GAZELLE is impractical in real world applications, not only because of slow computation speeds, but even more crucially, high communication bandwidth. In fact, any resulting communication delay would in fact add to runtime as well, especially if the client and server are separate by a large distance.
- Input data are often more complex than the benchmark datasets used, e.g. high-resolution hospital images instead of CIFAR-10
- The model’s number of layers and size of each layer are exposed to any potentially malicious party. However, model stealing is unlikely to pose a threat as it’s too much trouble, given the communication overhead.
Conclusion đź”—
While fusing HE and 2PC has been done by others, this paper designed and implemented fast algorithms and linear algebra kernels in HE, optimized the switching between HE and 2PC, and showed experimental results with marked improvement over existing systems.
Still, these secure network inference protocols remain impractical, unless they can be scaled to handle larger neural networks and data sizes with reasonable time and communication overheads.
Bibliography
[Shokri] Shokri & Shmatikov, Privacy-preserving deep learning, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), (2015). link. doi. ↩
[Abadi] Martin Abadi, Andy Chu, Ian Goodfellow, , Brendan McMahan, Ilya Mironov, Kunal Talwar, & Li Zhang, Deep Learning with Differential Privacy, 10, in in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security - CCS’16, edited by (2016) ↩
[Hitaj] Hitaj, Ateniese, & Perez-Cruz, Deep Models Under the GAN, Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ’17, (2017). link. doi. ↩
[Juvekar] Juvekar, Vaikuntanathan & Chandrakasan, Gazelle: A low latency framework for secure neural network inference, arXiv preprint arXiv:1801.05507, (2018). ↩