Title: Differential Privacy for Statistics
Abstract
Much
of the prior work in privacy focuses on classifying attributes as
sensitive or non-sensitive, we focus on privacy-preserving statistical
analysis of data. The whole point of a statistical database is to teach
general truths, for example, that smoking causes cancer. However,
learning this fact can potentially reveal whether certain individuals
will develop cancer, even though they are not necessarily in the
database. Differential Privacy arose in this context, aiming to
constrain a computation in a way that the ability of an adversary to
inflict any harm or good should be essentially the same, independent of
whether any individual opts in to, or opts out of, the database.
In this presentation, we will motivate and review
the definition of differential privacy, discuss how to add random noise
to the output of a computation without distorting each answer
significantly in differentially private algorithms. There are also times
when the addition of noise for achieving privacy makes no sense. We
consider the potential of applying probabilistic inference to improve
the accuracy of existing approaches. Then we show that the algorithms
can be applied to personalized recommender systems, and that it can be
adapted to protect the internal states of click stream data.