Intrinsically Interpretable explainable AI

Explainable AI methods can be broadly categorized into post-hoc approaches and intrinsically interpretable (ante-hoc) models. Post-hoc methods, such as SHAP, LIME, GradCAM, and its derivations, can easily be applied to existing decision models. However, they come at a price: One cannot attribute the explanation output to either the underlying model or the explanation method, and the explanation might not be faithful to the model. Essentially, post hoc approaches explain black-box models using black-box methods. Researchers have introduced intrinsically interpretable (ante-hoc) models, such as ProtoPNet, Concept Bottleneck Models, B-Cos, and their derivatives to overcome these shortcomings. These decision models are designed to transparently reveal the logic behind their predictions during inference. The primary goal is to enhance the model’s
interpretability by explanations that are true to the model’s intrinsic logic, thereby making the decision-making process accessible and replicable by humans. Despite their benefits, intrinsically interpretable models necessitate unique architectures and training procedures, and they may exhibit a drop in performance compared to their black-box counterparts. This track explores the latest challenges and developments in deep intrinsically interpretable models, including evaluation techniques, the balance between interpretability and accuracy, their practical applications, and their broader impact on society.

Topics

Intrinsically Interpretable Architectures (eg. ProtoPNet, PIP-Net, ProtoPool, B-Cos, and CBM)

Intrinsically Interpretable Representation Learning (e.g. ProtoVAE and others)

Evaluation metrics for Intrinsically Interpretable Models (coherence and compactness, the Co-12 XAI evaluation framework)

Evaluation metrics designed for interpretable models such as ProtoPNet in spatial misalignment benchmark

Novel XAI Evaluation benchmarks and frameworks for interpretable-by-design solutions

Extensions for existing benchmarks for intrinsically interpretable architectures (e.g. Quantus and OpenXAI)

Proposal and evaluation of trade-offs of intrinsically interpretable models

Applications of Intrinsically Interpretable Models in biomedical imaging (for example ProtoMIL, PIP-Net)

Evaluation of interpretable ante-hoc architectures for biomedicine with multiple data modalities (ProtoGNN, ProGReST)

User-Centric Aspects of Intrinsically Interpretable Models

Human-centric aspects of interpretability-by-design from the cognitive sciences (design of ProtoTree and ProtoPool)

Evaluation of users perception and understanding of explanations provided by intrinsically interpretable models (e.g. HIVE)

Multimodal intrinsically interpretable models designed for different modalities (e.g. vision with text)

Multimodal interpretability by design and interoperability (e.g. SPANet)

Evaluation of robustness of Intrinsically Interpretable Models (adversarial robustness, counterfactuals, explanations sanity checks)

Evaluation of intrinsically interpretable architectures using FunnyBirds frameworks

Intrinsically Interpretable generative models

Applications of Intrinsically Interpretable Models in biomedical imaging (for example, ProtoMIL, PIP-Net)

Human-Adaptable Intrinsically Interpretable Models

Novel mechanisms for model behaviour change in ante-hoc architectures (e.g. biases fixes in PiPNet for medical data and in ProtoPDebug)

Topics

Supported by