DsDm: Model-Aware Dataset Selection with Datamodels
This article explores the problem of dataset selection for training large-scale models. The authors argue that traditional methods, which filter data based on human notions of quality, often do not…
Continue reading