On December 10, the Center for Long-Term Cybersecurity hosted the third event in our 2020 Research Exchange, a series of three virtual conferences that showcased CLTC-funded researchers working across a wide spectrum of cybersecurity-related topics. The December event, themed “Long-Term Security Implications of AI/ML Systems.” featured talks from a diverse group of UC Berkeley researchers who are studying artificial intelligence (AI) and machine learning (ML), and developing tools and methods to help keep society more secure as technology continues to advance.
“There’s a huge amount of relative relevant work on this subject being done across the Berkeley campus, and of course everywhere in the world right now,” said Steven Weber, CLTC’s Faculty Director, in his introductory remarks. “We only had the privilege and capability to fund and engage with a small swath of interesting people doing work in this area on the Berkeley campus. But it’s a really important swath and an excellent one…. I have a feeling that a lot of this work is going to become very relevant in a very tangible way as we go through 2021.”
The full presentation can be viewed above, or click the titles of the talks below to see the individual talks from the event.
Covert Embodied Choice: Decision-Making, VR, and the Limits of Privacy Under Biometric Surveillance
The first presentation of the day was by Jeremy Gordon, a PhD student in the UC Berkeley School of Information, who presented his research on how AI systems may learn to predict human behavior — and how humans may adjust their behavior accordingly. Gordon provided an overview of an experiment he and his colleagues conducted using a virtual card matching game, with a “virtual adversary” tracking players’ physiology and behaviors through an eye tracker, head mounted display, and skin conductance monitor, as well as head controller positions. Participants were asked to try to alter their behavior to prevent the AI system from anticipating their move. The researchers found that, even when the participants tried to use deceptive behavior, the AI was largely successful in predicting their next move.
“There are many proposals that exist about how to help address the potential and real threats of remote sensing in public and private spaces,” Gordon said. “Our findings here indicate that initiatives like surveillance warning symbols… and DIY hardware filters may punt the responsibility for privacy to individuals. I think this work might suggest that these aren’t really solutions, if not all individuals have a nuanced understanding about the dynamics and the capabilities of the algorithms operating on their data. Nor should they have to.”
Novel Metrics for Robust Machine Learning
In a five-minute “lightning talk,” N. Benjamin Erichson, a postdoctoral researcher in the UC Berkeley Department of Statistics, introduced a novel approach for quantifying the uncertainty in the predictions that AI models make, including in real-world applications, like self-driving cars.
“To take out some of these challenges, we try to leverage ideas from dynamical systems,” Erichson explained. “And to do so, we try to break apart a deep neural network into simpler building blocks that we can then study using tools from control theory and dynamical systems theory. And, in turn, these insights that we generate can then be used to design more robust models… I think it’s really important that we start to rethink how we train how we design deep neural networks in order to improve the interpretability of such models. And in order to improve the robustness. And there are many different approaches how you can tackle this problem. But the dynamical systems point of view is a particularly interesting one that has recently generated a range of interesting results.”
Secure Machine Learning
David Wagner, Professor of Computer Science in the UC Berkeley Department of Electrical Engineering and Computer Science (EECS), shared his work on secure machine learning. “When we have systems that are powered by machine learning and making decisions using machine learning autonomously, what we’re discovering in the research community is that this opens up an avenue for attack known as adversarial examples, where an attacker who can control some of the inputs, that system can manipulate its behavior. As we integrate machine learning into our systems, we’re opening them up to a new kind of attack. and machine learning systems are used today are very fragile.” He cited examples of how the AI systems that guide self-driving cars have been shown to, for example, misinterpret a stop sign as a speed limit sign.
Wagner explained that his group is using a form of adversarial training that is adding noise of images of things like stop signs and helping to build the “immune system” of AI systems so they are more resistant to adversarial attacks. “It’s kind of like vaccinating the machine learning system by exposing it in advance to some example attacks as generative models, where the defense works by machine learning…. Adding noise seems to seems to make the machine learning classifiers less fragile, less susceptible to these kind of attacks.”
Detecting Images Generated by Neural Networks
In a lightning talk, Alexei Efros, Professor in the UC Berkeley Department of Electrical Engineering and Computer Science (EECS), introduced his research on detecting images generated by neural networks. “Fakes are getting much easier to make with the advances of neural networks,” Efros said. “What we wanted to do is we want to see if we can get some detector to be universal to work on all different types of deep network software.”
Efros’ research involves developing algorithms to identify “deepfakes” by exploiting the limited representational power of neural networks. He explained how using approaches such as “data augmentation,” adding various blurry versions of an image, proved effective in helping to identify deep fakes. “There is still much left to do,” he said. “In the real world, people use this in a different way…. Real fakers are probably just going to try a whole bunch of images and just pick the best one… And the future architectures will change. And also, our method is susceptible to adversarial attacks. But I think this is a very exciting direction.”
What Is My Data Worth? Towards a Principled and Practical Approach for Data Valuation
The next presentation featured Ruoxi Jia, formerly a researcher in the UC Berkeley Department of Electrical Engineering and Computer Science (EECS) and currently an assistant professor at Virginia Tech, discussing new approaches to valuing data. “We all know that data is new oil,” Jia said. “It has driven great progress in our current economy…. And in many cases, the data comes from people, but the problem is that those people who contribute data are not getting sufficient benefits from their data…. The question is, how can we price each individual’s data in an equitable and reasonable way?”
“The value of data is really outcome dependent,” she said. “It depends on how the data is used, and what is the performance of a machine learning model trained on this data. And thirdly, data can be used multiple times. The same piece of data can be used to serve multiple tasks. And the more use the data has, the more you’re valuable. Our framework is first one that tries to formalize the different properties and try to have a unified way to address all these different properties in a single framework.”
Hands-on Teaching Tools for Identifying and Addressing Machine Learning Bias
In a lightning talk, Inderpal Kaur, a fifth-year student in the Master of Information and Data Science program at the UC Berkeley School of Information, provided an overview of the mlfailures labs, a series of courses developed by the Daylight Security Research Lab to train students to identify how bias may be integrated into machine-learning algorithms.
“Our team has spent the past year developing hands-on educational lab materials that demonstrate examples of machine learning bias in real world settings, and teach students how to address it,” Kaur explained. Each mlfailures lab focuses on a different domain: one looks at racial bias in healthcare, another focuses on how bias is built in to algorithms for mortgage lending, and a third lab discusses gender bias in hiring and how to train a classifier to counteract this bias. The labs are interactive “Jupyter Notebooks” containing Python code that students can run and edit for themselves.
“These Jupyter Notebooks lay out the technical knowledge for identifying and correcting for bias,” she said. “More importantly, the labs probe students to think critically about bias in context. That includes examining how we define bias and who gets to do so…. Our aim is to close the gap between awareness of the problem and having the ability to do something about it. As far as we know, our labs are the first to teach students how to identify and ameliorate bias in machine learning.”
(Non-)Private Machine Learning
The final research talk of the day was presented by Nicholas Carlini, who earned his PhD in computer science from UC Berkeley in 2018 and is currently a research scientist at Google Brain, where he works at the intersection of machine learning and computer security. Carlini provided an overview of his research on “privacy-preserving machine learning.”
Carlini noted that machine learning classifiers may present privacy concerns, for example, if individuals’ medical images are used. “You can see how it would be problematic if it would be possible to fairly easily pull out individual images of specific people,” he said. “The general setup for this is, how can we train classifiers with this property? What researchers have started to study recently is something called “privacy through instance encoding.” And the way it works is, instead of taking your standard images and feeding them to the classifier, what we’re going to do is we’re going to turn them into encoded images. And these encoded images aren’t going to look essentially like noise to all of us, but are going to have the property that you can still train the classifier on them, and it will still be accurate on the original images…. it turns out that as a result, they still the classifiers still receive almost exactly the same original accuracy. The question is, are they private?”
Implementing Trustworthy AI for the Long-Term: A View from the Field
The final portion of the December Research Exchange featured a panel discussion featuring CLTC’s Executive Director Ann Cleaveland in conversation with Rachel Azafrani, an AI and IoT Security Strategist at Microsoft, and Priyanka Saxena, Senior Engagement Leader at Deloitte Consulting.
The panel focused on the role industry and government will play in advancing trustworthy AI over the next decade.
“AI security research is going to be an absolutely pivotal component of implementing trustworthy AI, and I think will become increasingly visible to the public as well,” said Azafrani. “We’ve seen, over the last couple of years, many countries have adopted AI national strategies, they’ve adopted AI principles, and there have been a number of a number of other organizations that have created sets of principles for responsible and ethical and trustworthy AI. The question is, how do we actually begin to operationalize these principles in legislation?”
She explained that the EU may roll out a regulatory framework for AI as soon as 2021 for AI, focusing on “high-risk applications of artificial intelligence” that relate to issues of safety and security. “I do think that increasingly we’re going to see more activity trying to establish normative behavior among nation states for responsible development and use of AI.”
Saxena agreed that AI security will be an important area of concern for companies and other organizations. “AI is becoming extremely mainstream,” said Saxena. “But that also means that the ethical challenges that come with AI deployments is also coming front and center. And what I’m seeing is that a lot of the large financial services firms, and honestly even other industries, realize that this is something that needs to be taken care of… What is being challenged at this point is, how do we go about putting those guardrails? And what is a structure that would work for a company to actually implement guardrails around the ethical implementation of AI? A lot of the companies that are consulting companies are really helping some of our larger clients define their own regulatory frameworks and work through that, to self-regulate at this point.”
“it’s crucial to maintain the integration of research along with the with the real world, where the rubber hits the road,” said Saxena. “Researchers are such an important part of the overall AI ecosystem, because when we’re talking about what’s happening right now, research is talking about what’s going to happen next. And we’re building the future as we do research. So it’s very important to stay tightly connected and tightly knit together.”