Explainability, Privacy, and Fairness in Trustworthy AI

Recent regulatory frameworks have emphasized the importance of responsibility in artificial intelligence (AI), with a focus on key desiderata such as fairness, explainability, and privacy. These principles are central to ensuring AI systems are trustworthy and align with societal values. However, theoretical and empirical studies have increasingly highlighted the inherent tensions between these desiderata, which often conflict with one another under certain conditions. For instance, research has demonstrated a two-way tension between privacy and explainability. On the one hand, providing explanations for model predictions can inadvertently expose sensitive information, leading to privacy risks such as adversarial privacy attacks. On the other hand, employing privacy-preserving techniques like differential privacy can degrade the quality and utility of explanations, making it challenging to interpret the model’s behavior. Similarly, there is a complex relationship between fairness and both privacy and explainability. Efforts to achieve fairness in machine learning models can sometimes introduce privacy vulnerabilities or undermine the interpretability of the system. For example, fairness-enhancing algorithms may require sensitive attribute information to ensure equitable outcomes, potentially increasing privacy risks. At the same time, ensuring fairness can complicate the generation of clear, intuitive explanations, as fairness adjustments may obscure the model’s decision-making processes. Research that explores these interdependencies and sheds light on the trade-offs between fairness, explainability, and privacy is crucial. Studies that examine how these desiderata interact both in theory and practice are particularly valuable for guiding the development of AI systems that balance these often-competing priorities effectively.

Tensions/Trade-offs between AI desiderata (privacy, fairness, and explainability)
Metrics for evaluating the trade-off between privacy, explainability, and fairness
Privacy Attacks using XAI (e.g. membership, linkage, inversion..)
XAI for adversarial robustness: how XAI can be used to identify and mitigate adversarial attacks
Privacy-preserving explanations
Fair explanations (e.g. fair counterfactuals)
Fairness through the lens of group vs. individual explanations
XAI for auditing algorithmic bias and fairness (e.g. with counterfactuals)
Privacy implications of fairness with XAI
The impact of privacy-preserving techniques on the utility of explanations (e.g. differential privacy)

Explainability in federated settings
Explainability for privacy-preserving and heterogeneous federated learning
Causal XAI for responsible AI
XAI for data minimization and privacy by design
Formal verification of fairness and privacy with explainable AI
Survey papers, datasets, benchmarks addressing topics related to the 3 desideratas
The role of human-centric design in balancing privacy, fairness, and explainability
Explainability and privacy with high-dimensional data (e.g. time series, NLP)
Position papers on the future of explainability, privacy, and fairness