Concept-based global explainability

Deep Neural Networks have demonstrated remarkable success across various disciplines, primarily due to their ability to learn intricate data representations. However, the inherent semantic nature of these representations remains elusive, posing challenges for the responsible application of Deep Learning methods, particularly in safety-critical domains. In response to this challenge, this special track delves into the critical aspects of global explainability, a subfield of Explainable AI. Generally, the global explainability methods aim to interpret what abstractions have been learned by the network. This can be achieved by analyzing the network’s reliance on specific concepts in general or by examining individual neurons and their functional roles within models. This approach facilitates the elucidation of abstractions learned by the networks. It extends to identifying and interpreting circuits—computational subgraphs within the models that elucidate information flow within complex architectures. Furthermore, global explainability could be employed to explain the local decision-making of machines, termed glocal explainability.

Topics

quantification of interpretability of deep visual representations via network dissection xAI methods

compositional explanations of neurons

labelling neural representations with inverse recognition

automatic description XAI methods for neuron representations in deep vision networks

natural language-based descriptions of deep visual features for XAI

identification and analysis of interpretable subspaces in image representations

magnitude constrained optimization methods for feature visualization in deep neural networks

human-understandable explanations through concept relevance propagation

attribution maps for enhancing the explainability of concept-based features

concept recursive activation factorization methods for xAI

quantitative testing via concept activation vectors

completeness-aware concept-based explanations in deep learning

non-negative concept activation vectors for invertible concept-based explanations in convolutional neural networks

multi-dimensional concept discovery methods for XAI

mechanistic interpretability for automated circuit discovery

brain-inspired modular training for mechanistic interpretability

vision-language mechanistic interpretability xAI methods

mechanistic interpretability for grokking measures

Topics

Supported by