In the world of privacy-preserving machine learning, empirical privacy defenses are often used to maintain training data privacy without significantly affecting model utility. These defenses usually require reference data, an additional dataset from the same or similar source as the training data. This paper offers the first comprehensive analysis of empirical privacy defenses, focusing on the availability and treatment of reference data in previous works, and the necessity of considering reference data privacy when comparing defenses. The authors propose a baseline defense that allows for easy understanding of the tradeoff between utility and privacy for both training and reference data. The method, called weighted empirical risk minimization (WERM), surprisingly outperforms existing privacy defenses in most scenarios. The study emphasizes the need to evaluate model utility, training data privacy, and reference data privacy when comparing privacy defenses.
Publication date: 19 Oct 2023
Project Page: ?
Paper: https://arxiv.org/pdf/2310.12112