The article presents Counterfactual Concept Bottleneck Models (CF-CBMs), a new class of models in deep learning. These models are designed to address three fundamental queries: predicting class labels, explaining task predictions, and imagining alternative scenarios that would result in different predictions. CF-CBMs are capable of providing accurate predictions, simple explanations for task predictions, and interpretable counterfactuals, thus enhancing the interpretability of AI systems. The authors argue that CF-CBMs can also sample or estimate the most probable counterfactual to explain the effect of concept interventions on tasks, show users how to get a desired class label, and propose concept interventions via task-driven interventions.

 

Publication date: 5 Feb 2024
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2402.01408