ICML 2010 Tutorial on Privacy and Machine Learning
Stan Matwin, University of Ottawa, Canada
Overview
This tutorial will give a bird's eyes view of the area of privacy as it pertains to Machine Learning. The area is sometimes known as Privacy-preserving Data Mining (PPDM). This is an interesting and highly significant topic for the community because privacy is one of the main ethical/societal concerns surrounding IT in general and Machine Learning in particular. Many believe that there is moral obligation for at least some in the community to work in this area to propose privacy-protecting solutions. Moreover, there is an emerging body of work in Privacy-preserving Data Mining that needs to be presented to the community. Finally, the area of privacy is a fertile area of work/research for people looking for theses topics, new research directions, etc.
Who should attend
Since this is an important topic for the community, the tutorial will be of interest to graduate students and researchers. The tutorial has no specific prerequisites for the target audience.
Contents
- What is privacy? Some historical definitions and legal principles.
- Why is data privacy important? Why is a naive view of data privacy inadequate?
- A taxonomy of PPDM research.
- A principled approach to data anonymization and its limitations. k-anonymity and other concepts.
- Data randomization (Agrawal) and swapping approaches. Negative results (Kargupta). Empirical results.
- PPDM and Statistical Data Control commonalities and differences. Measures of data quality in SDC.
- Distributed data privacy problems SMC and cryptographic approaches a simplified example.
- Privacy of learning results. Inference channels. Protecting against discriminatory DM results.
- Selected new directions, open problems before the field.
Slides
Slides are availablel here (pdf, two per page)
Instructor
Stan Matwin is a professor of Computer Science at the University of Ottawa, active in Machine Learning research and teaching since many years. Privacy and PPDM is one of his active research areas.