Explainable AI for Privacy-Preserving Machine Learning

After a decade of application of deep learning, it is no surprise that these resulting models may suffer from difficulties in interpretability. Yet, a risk possibly more imminent is the potential training data leakage of models. Past years of research have shown that by systematically exploiting pre-trained models with various degrees of detail, including confidence values output, statistical assumptions about the training data, direct access to neural network weights or access to data that shares characteristics with the original training data, it is in surprisingly many cases possible to infer if an individual data observation has been included in the training data (membership attacks). Increasingly severe model inference attacks methods showcase reconstruction of the actual values of training data. Despite the continuous loop of the invention of new model attack methods and the subsequent release of risk mitigation strategies, there is a clear need for explainable AI methods for assessing and understanding the risk of successful attacks. Moreover, research on explainable AI methods is also needed to create knowledge on the effectiveness and influence of model risk mitigation strategies in various aspects, including vulnerabilities in parts of input feature space or risks associated with different data modalities. Legal frameworks and guidelines regarding AI link strongly to preserving the privacy of individuals training data; therefore, research on explainable AI methods for understanding how legal/technical challenges can be handled is welcomed.

Topics

Explainable ML-based data/model inference attack detection (e.g. explainable supervised/unsupervised ML-based detection methods, statistical change-point detection methods)

Privacy-preserving explainable models (e.g. resistant to model inversion attack methods, membership attack methods)

Applications of xAI methods for understanding/quantifying attack risk mitigation strategies for individual predictions (e.g. via local XAI methods, individual Shapley values)

Applications of xAI methods for understanding privacy attacks in federated Learning (e.g. visualizations of data leakage with attacks for vulnerabilities understanding, error bounds, similarity-based metrics)

Novel development of xAI methods for understanding/quantifying privacy attack risk & performance

Explainable-based defence mechanisms for attacks on anonymization processes (e.g. neuro-symbolic techniques with automated-reasoning capabilities)

Explainable models to facilitate the ‘right for data contributors to be forgotten’ (e.g. via models able to forget, unlearning methods, model specific/agnostic approximate unlearning)

Topics

Supported by