📑 “Privacy-Preserving Deep Learning” Shokri 🔗

Paper summary 🔗

How do we enable multiple parties to jointly learn an accurate neural network model without sharing their input datasets? For example, hospitals could benefit from other hospital’s datasets to improve the accuracy of their own models.

The devised collaborative deep learning protocol titled “Selective Stochastic Gradient Descent (SGD)” (Fig. 2) consists of \(N\) separate participants with the same model and training strategy, communicating with a shared parameter server.

Table 4 confirms the intuition that by sharing any small fraction of the parameters (10%, 1%, or 0.1%) overfitting is reduced, and the accuracy is better than if each model trained on its own (“Standalone”). The higher the fraction, the higher the accuracy, but the higher the privacy loss.

To ensure differential privacy of parameter updates, Laplacian random noise is added to perturb the gradient of the parameter update \(\Delta \mathbf{w}^{(i)}\) during selection and upload so that the input cannot be inverted, while maintaining the utility of the update. \(\Delta \mathbf{w}^{(i)}\) is bounded to \([-\gamma, \gamma]\) to avoid overfitting, and avoid heavy computation of the true function sensitivity of SGD, \(\Delta f\), by simply using the upper bound of \(2\gamma\). Lower privacy budget \(\epsilon\) results in higher noise and stronger differential privacy guarantees, but this reduces accuracy, as shown in Fig. 13.

Another privacy measure is an oblivious parameter server that cannot identify the participants or which gradients they upload.

Discussion summary 🔗

Sometimes it’s not a matter of which is better, but rather the setup. Nevertheless, when there is a choice, distributed approaches are able to ensure different kinds of privacy (model, data) that centralized approaches cannot. However, distributed approaches still introduce their own set of privacy challenges (elaborated below).

Conclusion 🔗

Exploiting the fact that SGD is parallelizable and can be run asynchronously, and that the stochasticity of the learning process is beneficial, the authors propose and evaluate a feasible differentially private distributed selective SGD protocol that improves the accuracy of multiple parties’ deep learning models beyond what is individually achievable.

However, even the strictest (lowest) privacy budget shown here is calculated per parameter, and deep learning models typically have many parameters, so a participant’s privacy loss is actually tremendously high, exceeding several thousand on MNIST Abadi. Additionally, a follow-up paper Hitaj has experimentally shown this approach to be vulnerable, by training a Generative Adversarial Network (GAN) to generate prototypical samples of training data from a target class.

📑 “GAZELLE: A Low Latency Framework for Secure Neural Network Inference” Juvekar 🔗

Paper summary 🔗

The premise is that a client wishes to provide some input data (without revealing details of the data) to a model owner, who wishes to provides inference results back (without revealing model parameters). Some applications are clients with highly sensitive data and Machine Learning as a Service (MLaaS) providers offering a “try before you buy” option.

The main contributions are a homomorphic encryption library providing fast algorithms for basic homomorphic operations, homomorphic linear algebra kernels mapping neural network layers to optimized homomorphic matrix-vector multiplication, and optimized encryption switching protocols which seamlessly convert between homomorphic and garbled circuit encodings for complete neural network inference.

Homomorphic encryption (HE) allows computation on encrypted data without knowledge of original data. While it is fast in computing linear layers of neural networks, it introduces noise that slows computation as it accumulates. This paper uses Packed Additively Homomorphic Encryption (PAHE).

Garbled circuits (GC) is a form of two-party computation (2PC) that “encrypts a computation” to reveal only its output, allowing two parties to jointly evaluate a function over private inputs. While it works well in nonlinear layers, bandwidth for communication is high in linear layers.

The paper proposes a protocol, GAZELLE, based on the alternating use of PAHE and GC:

On MNIST and CIFAR-10, GAZELLE obtain 20-30x lower latency and 2.5-88x lower online bandwidth when compared with multiple 2PC-based state-of-art secure network inference solutions, and more than 3 orders of magnitude lower latency and 2 orders of magnitude lower bandwidth than purely homomorphic approaches.

Discussion summary 🔗

Conclusion 🔗

While fusing HE and 2PC has been done by others, this paper designed and implemented fast algorithms and linear algebra kernels in HE, optimized the switching between HE and 2PC, and showed experimental results with marked improvement over existing systems.

Still, these secure network inference protocols remain impractical, unless they can be scaled to handle larger neural networks and data sizes with reasonable time and communication overheads.

Bibliography

[Shokri] Shokri & Shmatikov, Privacy-preserving deep learning, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), (2015). link. doi.

[Abadi] Martin Abadi, Andy Chu, Ian Goodfellow, , Brendan McMahan, Ilya Mironov, Kunal Talwar, & Li Zhang, Deep Learning with Differential Privacy, 10, in in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security - CCS’16, edited by (2016)

[Hitaj] Hitaj, Ateniese, & Perez-Cruz, Deep Models Under the GAN, Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ’17, (2017). link. doi.

[Juvekar] Juvekar, Vaikuntanathan & Chandrakasan, Gazelle: A low latency framework for secure neural network inference, arXiv preprint arXiv:1801.05507, (2018).