Debugging and Improving Models with XAI

This track aims to debug, modify, and evaluate machine learning models to enhance their performance and reliability using eXplainable AI (XAI) techniques. The topics include, but are not limited to, (1) identifying and mitigating the effect of learned spurious correlations (also known as Clever Hans effect or shortcut learning) and distribution shifts in the data, (2) diagnosing failure modes and edge cases leading to undesired model behaviour, and (3) improving models through targeted corrections. For each of these points, XAI methods serve as a critical foundation, as they enable to understand model decision strategies (and their flaws), and to identify critical parts of the model or data set that are most amenable to improvement. Additionally, this track covers the development and application of rigorous evaluation methods for verifying that the enhancements achieve the intended outcomes. Join us to discuss best practices, emerging tools, innovative techniques, and case studies showcasing how thoughtful model refinement can lead to fair, robust, and responsible AI solutions.

Revealing spurious behavior of large models and/or on large datasets in an automated fashion
Investigations of when, why, and how are spurious correlations learned by neural networks
Automatically probing large models for spurious feature existence or reliance
Identifying spurious correlations in specialized setting, e.g., medicine, natural sciences, or industrial applications, using XAI
Analyses of shortcut learning in multi-modal foundation (e.g., vision-language) models using XAI
Methods for finding adversarial model vulnerabilities using XAI
Reliable and faithful estimation of directions or regions in latent space corresponding to harmful features
Modification of models with shortcut behavior to reduce their reliance on spurious correlations
Detection and cleaning of spurious training data (e.g., with generative models)
Augmenting training data as a model-agnostic way to correct models
Correcting (social) biases of foundation models

Training-free/post-hoc model correction via pruning or modifications of model components using limited data
Understanding and modifying the mechanistic role of model components via feature steering or activation patching
Machine unlearning: Unlearning (groups) of data samples, e.g. for privacy and regulatory compliance
Evaluating concept dependence: Advancing techniques to evaluate dependence on predefined concepts
Evaluation benchmarks that take into account full training, fine-tuning, or post-hoc approaches
Meta-evaluations of which evaluation techniques are reliable and which are not
Advanced evaluation settings, e.g. realistic or difficult benchmarks for model robustness
Model robustness evaluation techniques for settings where labels are only partially available
Model robustness evaluations that more directly measure model reliance on spurious concepts
XAI-based model correction frameworks that allow to reveal harmful behavior, revise and evaluate models