
This track aims to debug, modify, and evaluate machine learning models to enhance their performance and reliability using eXplainable AI (XAI) techniques. The topics include, but are not limited to, (1) identifying and mitigating the effect of learned spurious correlations (also known as Clever Hans effect or shortcut learning) and distribution shifts in the data, (2) diagnosing failure modes and edge cases leading to undesired model behaviour, and (3) improving models through targeted corrections. For each of these points, XAI methods serve as a critical foundation, as they enable to understand model decision strategies (and their flaws), and to identify critical parts of the model or data set that are most amenable to improvement. Additionally, this track covers the development and application of rigorous evaluation methods for verifying that the enhancements achieve the intended outcomes. Join us to discuss best practices, emerging tools, innovative techniques, and case studies showcasing how thoughtful model refinement can lead to fair, robust, and responsible AI solutions.
- Revealing spurious behavior of large models and/or on large datasets in an automated fashion
- Investigations of when, why, and how are spurious correlations learned by neural networks
- Automatically probing large models for spurious feature existence or reliance
- Identifying spurious correlations in specialized setting, e.g., medicine, natural sciences, or industrial applications, using XAI
- Analyses of shortcut learning in multi-modal foundation (e.g., vision-language) models using XAI
- Methods for finding adversarial model vulnerabilities using XAI
- Reliable and faithful estimation of directions or regions in latent space corresponding to harmful features
- Modification of models with shortcut behavior to reduce their reliance on spurious correlations
- Detection and cleaning of spurious training data (e.g., with generative models)
- Augmenting training data as a model-agnostic way to correct models
- Correcting (social) biases of foundation models
- Training-free/post-hoc model correction via pruning or modifications of model components using limited data
- Understanding and modifying the mechanistic role of model components via feature steering or activation patching
- Machine unlearning: Unlearning (groups) of data samples, e.g. for privacy and regulatory compliance
- Evaluating concept dependence: Advancing techniques to evaluate dependence on predefined concepts
- Evaluation benchmarks that take into account full training, fine-tuning, or post-hoc approaches
- Meta-evaluations of which evaluation techniques are reliable and which are not
- Advanced evaluation settings, e.g. realistic or difficult benchmarks for model robustness
- Model robustness evaluation techniques for settings where labels are only partially available
- Model robustness evaluations that more directly measure model reliance on spurious concepts
- XAI-based model correction frameworks that allow to reveal harmful behavior, revise and evaluate models



