Why Tutorial?

Disparate communities in ML Research, large model R&D, and Safety and Policy urgently want to have an overview on the state of privacy and evaluations that suit their purpose, especially as generative AI technologies become important. Yet, no single technology fits all use cases, and without in-depth knowledge of both machine learning and cryptography, it can be very difficult to initiate research in this impactful and fast-moving field. We hope that clarifying the technologies for this purpose will help towards effective scientific coordination.

Abstract

Meaningful Privacy-Preserving Machine Learning and How To Evaluate AI Privacy

In the world of large model development, model details and training data are increasingly closed down, pushing privacy to the forefront of machine learning – how do we protect the privacy of data used to train the model, permitting more widespread data sharing collaborations? How will individuals trust these technologies with their data? How do we verify that the integration of individuals’ privately-owned data is both useful to the rest of the participating federation, and, more importantly - safe for the data owner? How do regulations integrate into this complex infrastructure?

These open questions require a multitude of considerations between the incentives of model development,the data owning parties, and the overseeing agencies. Many cryptographic solutions target these incentives problems, but are they covering all essential components of trustworthy data sharing? Are they practical, or likely to be practical soon?

In this tutorial, we attempt to answer questions regarding specific capabilities of privacy technologies in three parts: 1. overarching incentive issues with respect to data and evaluations, 2. Where cryptographic and optimisation solutions can help; for evaluations, we delve deep into secure computation, while giving in-depth real-world views on differential privacy, federated learning, and machine unlearning. 3. Cultural, societal, and research agendas relating to practically implementing these technologies.

We hope that, by identifying the boundaries of the use of PrivacyML technologies, and providing a technical and structured framework for reasoning over these issues, we could empower the general audience to integrate these principles (and practical solutions) into their existing research. Those already interested in applying the technology can gain a deeper, hands-on understanding of implementation useful for modeling and developing incentive-compatible solutions for their own work.