The article discusses the development of PriorBoost, an adaptive algorithm for learning from aggregate responses. The paper focuses on the construction of aggregation sets (termed ‘bags’) for event-level loss functions. The authors demonstrate that the optimal bagging problem simplifies to a one-dimensional size-constrained k-means clustering for linear regression and generalized linear models. They propose PriorBoost, which forms increasingly homogeneous bags of samples to improve model quality. The article also explores label differential privacy for aggregate learning and provides experimental evidence of PriorBoost’s effectiveness.

 

Publication date: 7 Feb 2024
Project Page: https://arxiv.org/abs/2402.04987
Paper: https://arxiv.org/pdf/2402.04987