March 9, 2020

Video: CLTC Seminar on “Veridical Data Science”

On February 19, 2020 the AI Security Initiative at the Center for Long-Term Cybersecurity (CLTC) hosted a lunchtime seminar featuring Bin Yu, Chancellor’s Professor in the Departments of Statistics and Electrical Engineering & Computer Science at UC Berkeley.

CLTC’s AI Security Initiative (AISI) works across technical, institutional, and policy domains to support trustworthy development of AI systems today and into the future. AISI facilitates research and dialogue to help AI practitioners and decision-makers prioritize the actions they can take today that will have an outsized impact on the future trajectory of AI security around the world.

In this seminar, Professor Yu presented her latest work focusing on a predictability, computability, and stability (PCS) framework, which aims to provide responsible, reliable, reproducible, and transparent results across the entire data science life cycle. The framework is meant to be a bridge between machine learning and statistics. It relies on predictability as a “reality check” and considers the importance of computation in data collection/storage and algorithm design. Her paper on the PCS framework was recently published and can be read in full here.

Yu went on to highlight a particular case called DeepTune that was a motivating project driving the principles of the PCS framework. The DeepTune framework is proposed as a way to elicit interpretations of deep neural network-based models of single neurons in the visual cortex area V4. For the remainder of her talk, Professor Yu covered two other projects: a predictive, descriptive, and relevant (PDR) framework for interpretable machine learning, followed by a segment on using agglomerative contextual decomposition (ACD) to interpret deep neural networks.

“People are the key to making data science veridical,” concluded Professor Yu. In order to do so, the statistics, data science, machine learning and AI communities need to come together on several opportunities and challenges: collaboration on a few common, robust and reliable products which everybody can take advantange of (i.e. deep learning, random forests); certification and labels for open-source and safe software; development of a more rigorous evaluation process of new algorithms; quality research and trustworthy publication norms as opposed to paper counting; ‘team-brain’ to solve complex transdisciplinary problems; and a fair collaborative environment so that the best arguments win.

Watch the video of the presentation above or on YouTube.