Abstract
We aim to build neural networks that are intrinsically robust against adversarial attacks. We focus on classifying images in real-world scenarios with complex backgrounds under unforeseen adversarial attacks. Previous defenses lack interpretability and have limited robustness against unforeseen attacks, failing to deliver trustworthiness to users. We will study Bayesian models, which are more interpretable and have intrinsic robustness. We will explore two directions: extending an existing Bayesian classifier with better models and building new Bayesian models from discriminative models.
Findings, Papers, and Presentations
TBA