Dataset Difficulty and the Role of Inductive Bias

The paper presents a systematic study of example difficulty scores, which are used in dataset pruning and defect identification. The authors analyze the consistency of these scores across different training runs, scoring methods, and model architectures. They find that these scores are noisy over individual runs of a model, strongly correlated with a single notion of difficulty, and reveal examples that range from being highly sensitive to insensitive to the inductive biases of certain model architectures. They also propose a simple method for fingerprinting model architectures using a few sensitive examples. The findings guide practitioners in maximizing the consistency of their scores and provide comprehensive baselines for evaluating scores in the future.

Publication date: 4 Jan 2024
Project Page: https://arxiv.org/abs/2401.01867v1
Paper: https://arxiv.org/pdf/2401.01867

Post Views: 312

Dataset Difficulty and the Role of Inductive Bias

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

On the hardness of learning under symmetries

Optimal cross-learning for contextual bandits with unknown context distributions

Leave a Reply Cancel reply

Please allow ads on our site