Update May 10th, 2023: First Full Draft of Profile available here for review and comment!
Overview of Project
CLTC researchers Tony Barrett and Jessica Newman, along with UC Berkeley colleagues Brandie Nonnecke and Dan Hendrycks, are leading an effort to create an AI risk management-standards profile for increasingly multi- or general-purpose AI, such as cutting-edge large language models. The profile guidance will be primarily for use by developers of such AI systems, in conjunction with the NIST AI Risk Management Framework (AI RMF) or the AI risk management standard ISO/IEC 23894. This profile will be a contribution to standards on AI safety, security, ethics, and policy with risk-management practices or controls for identifying, analyzing, and mitigating risks. We aim to publish Version 1.0 by the end of 2023, preceded by draft versions for feedback.
We are seeking participants to provide input and review, especially experts in AI safety, security, ethics, and policy. Participants will receive invitations to attend optional workshops approximately once every three months, and/or to provide input or feedback on drafts of the profile at your convenience. We also will make a version of the latest draft publicly available approximately once every three months, for anyone to provide feedback to the project lead via email. The final Version 1.0 and later updates will be available for free online for anyone to use.
If you are interested in participating, contact Tony Barrett (email@example.com).
Purpose and Intended Audience
Increasingly multi- or general-purpose AI systems, such as BERT, CLIP, GPT-4, DALL-E 2, and PaLM, can provide many beneficial capabilities, but they also introduce risks of adverse events with societal-scale consequences. This document provides risk-management practices or controls for identifying, analyzing, and mitigating risks of such AI systems. We intend this document primarily for developers of these AI systems; others that can benefit from this guidance include downstream developers of end-use applications that build on a multi- or general-purpose AI system platform. This document facilitates conformity with leading AI risk management standards and frameworks, adapting and building on the generic voluntary guidance in the NIST AI RMF and ISO/IEC 23894 AI risk management standard, with a focus on the unique issues faced by developers of increasingly general-purpose AI systems.
Examples of Preliminary Guidance
This project builds on our recent work on arXiv, “Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks” (https://arxiv.org/abs/2206.08966). In Section 3 of that work, we provide actionable-guidance recommendations for: identifying risks from potential unintended uses and misuses of AI systems; including catastrophic-risk factors within the scope of risk assessments and impact assessments; identifying and mitigating human rights harms; and reporting information on AI risk factors.
In Section 4 of that work, we outline our preliminary ideas for an AI RMF Profile, with supplementary guidance on cutting-edge, increasingly or general-purpose AI systems. Our ideas for guidance included the following examples:
- As part of risk identification:
- Identify potential use cases and misuse cases, to be considered in decisions on disallowed/unacceptable use-case categories of applications.
- As part of risk assessment:
- Use red teams and adversarial testing as part of extensive interaction with the AI systems to identify emergent properties such as new capabilities and failure modes.
- As part of risk mitigation:
- When training state-of-the-art machine learning models, increase the amount of compute incrementally (e.g., by not more than three times between each increment), and test models after each incremental increase to identify emergent properties.
Why Create this Profile?
Other initial AI RMF profiles seem likely to focus on specific industry sectors and end-use applications, e.g., in critical infrastructure or other high-risk categories of the draft EU AI Act. That seems valuable, especially for downstream developers of end-use applications, and could help the AI RMF achieve interoperability with other regulatory regimes such as the EU AI Act. However, an approach focused on end-use applications could overlook an opportunity to provide profile guidance for upstream developers of increasingly general-purpose AI, including AI systems sometimes referred to as “foundation models”. Such AI systems can have many uses, and early-development risk issues such as emergent properties that upstream developers are often in a better position to address than downstream developers building on AI platforms for specific end-use applications.
Guidance in this profile focuses on managing the broad context and associated risks of increasingly general-purpose AI, e.g.:
- To address important underlying risks and early-development risks in a way that does not rely on having certainty about each specific end-use application of the technology.
- To provide guidance on sharing of risk management responsibilities between upstream and downstream developers.
We are proceeding with the following profile-creation stages and approximate dates:
- Planning and preliminary outreach – Q3-Q4 2022
- Preliminary draft of the profile created, initial workshop and interviews – Q1 2023
- First full draft of the profile publicly available, second workshop and interviews, alpha test – Q2 2023
- Second full draft profile publicly available, third workshop and interviews, beta test – Q3 2023
- Release Profile 1.0 on UC Berkeley Center for Long-Term Cybersecurity and arXiv websites – Q4 2023
Anthony M. Barrett, Ph.D., PMP
Visiting Scholar, AI Security Initiative, Center for Long-Term Cybersecurity, UC Berkeley
Director, AI Security Initiative, Center for Long-Term Cybersecurity, UC Berkeley; Co-Director, AI Policy Hub, UC Berkeley
Berkeley AI Research Lab, UC Berkeley
Brandie Nonnecke, Ph.D.
Director, CITRIS Policy Lab, UC Berkeley
Co-Director, AI Policy Hub, UC Berkeley
We are seeking experts in AI safety, security, ethics, and policy who are interested in standards development, risk management, and the particular opportunities and challenges associated with increasingly general-purpose AI, such as cutting-edge large language models. Participants will receive invitations to attend optional quarterly workshops, or to provide input or feedback on drafts at your convenience. All activities are optional; no minimum time commitment is required.
Individual-level participation options include:
- Providing ideas in workshops or interviews
- Reviewing drafts
- Serving as a test user
Organization-level support options include:
- Providing time for employees to participate
- Allowing use of organizational logo on the Profile.
Please contact Tony Barrett (firstname.lastname@example.org) if you or your organization want to participate!
Available Drafts or Versions
The following is our most recent publicly available draft Profile:
- First Full Draft (May 10th, 2023, 128 pp, Google Doc format) – or choose pdf format.
Comments on this First Full Draft by June 6th, 2023 would be most helpful.
Questions for readers of this First Full Draft:
- What key risks of large language models or other increasingly general-purpose AI have we not addressed in this version or identified as an issue to address in future versions?
- What should we add regarding supplemental best practices, resources, and standards for developers of increasingly general-purpose AI, beyond the broadly applicable guidance of the NIST AI Risk Management Framework and Playbook?
- Does our current draft guidance meet the criteria in Appendix 2 (e.g., for actionability, measurability, and being sufficiently future-proof for AI systems over the next 10 years)?
- What other key items are missing?
Please send input or feedback to email@example.com. If you would rather not be listed in the Participants or Acknowledgments sections of future drafts or versions of the Profile as someone that provided input or feedback, please let us know that when you provide us your comments.