"The goal of these tutorials is to give a hands-on introduction to *differential privacy*, a framework for thinking about the privacy risks inherent when doing statistics or data analytics on private or sensitive data. Many approaches to protecting data privacy seek to \"anonymize\" the data by removing obvious (or not so obvious) identifiers. For example, a data set might have names, addresses, social security numbers, and other personally identifying information removed. However, that does not guarantee that publishing a stripped-down data set is still safe -- there have been many well-publicized attacks on supposedly \"sanitized\" data that use a small amount of auxiliary (and sometimes public) information to re-identify individuals in the data set.\n",

"\n",

"The fundamental difficulty in these examples is that the *data itself is uniquely identifying*. The follow-on implication is that if we publish the output of a program (say, a statistical analysis method) that runs on private data, we *reveal something about the individuals in the data*. The *differential privacy* model is a way to quantify this additional risk of re-identification. Privacy is a property of the *algorithm that operates on the data*; different algorithms incur different *privacy risks*. While we have a \n",

"The fundamental difficulty in these examples is that the *data itself is uniquely identifying*. The follow-on implication is that if we publish the output of a program (say, a statistical analysis method) that runs on private data, we *reveal something about the individuals in the data*. The *differential privacy* model is a way to quantify this additional risk of re-identification. Privacy is a property of the *algorithm that operates on the data*; different algorithms incur different *privacy risks*.\n",

"\n",

"Differential privacy was first proposed in a paper by Dwork, McSherry, Nissim, and Smith in 2006 [DMNS06]. In the intervening years there has been a rapid growth in the research literature on differentially private approaches for many statistical, data mining, and machine learning algorithms of interest. The goal of this package is to provide easy-to-use implementations of these methods as well as tutorials (via ipython notebooks) to show how to use these methods."

]

...

...

@@ -19,7 +19,7 @@

"source": [

"## References\n",

"\n",

"[DMNS06]"

"[DMNS06] Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography. Lecture notes in computer science, Vol. 3876, eds S. Halevi and T. Rabin (Berlin, Heidelberg: Springer), 265–284."