UC Berkeley AI Risk-Management Standards Profile for General-Purpose AI Systems (GPAIS) and Foundation Models

Update November 8th, 2023: Version 1.0 of the Profile is now available here!

Overview of Project

UC Berkeley researchers Tony Barrett, Jessica Newman, and Brandie Nonnecke are leading an effort to create an AI risk-management standards profile for general-purpose AI systems (GPAIS), foundation models, and generative AI, such as cutting-edge large language models. The profile guidance will be primarily for use by developers of such AI systems, in conjunction with the NIST AI Risk Management Framework (AI RMF) or the AI risk management standard ISO/IEC 23894. This profile will be a contribution to standards on AI policy, safety, security, and ethics with risk-management practices or controls for identifying, analyzing, and mitigating risks of GPAIS and foundation models. We aim to publish Version 1.0 by the end of 2023, free online for anyone to use. 

The Second Full Draft of the Profile is available here. If you have feedback or would like to discuss the draft Profile, contact Tony Barrett (anthony.barrett@berkeley.edu). 

We have been developing this profile in a multi-stakeholder process with input and feedback on drafts from more than 70 people representing a range of stakeholders, including organizations developing large-scale GPAIS and foundation models, and other organizations across industry, civil society, academia, and government. Our Berkeley GPAIS and foundation model profile effort is separate from, but aims to complement and inform the work of, other guidance development efforts such as the PAI protocols for large-scale model deployment and the NIST Generative AI Public Working Group.

Purpose and Intended Audience

Increasingly general-purpose AI systems, such as BERT, CLIP, GPT-4, DALL-E 2, and PaLM, can provide many beneficial capabilities, but they also introduce risks of adverse events with societal-scale consequences. This document provides risk-management practices or controls for identifying, analyzing, and mitigating risks of such AI systems. We intend this document primarily for developers of these AI systems; others that can benefit from this guidance include downstream developers of end-use applications that build on a general-purpose AI system platform. This document facilitates conformity with leading AI risk management standards and frameworks, adapting and building on the generic voluntary guidance in the NIST AI RMF and ISO/IEC 23894 AI risk management standard, with a focus on the unique issues faced by developers of increasingly general-purpose AI systems.

Examples of High Priority Guidance

The following is an excerpt from the Second Full Draft Profile executive summary:

Users of this Profile should place high priority on the following risk management steps and corresponding Profile guidance sections:

  • Take responsibility for risk assessment and risk management tasks for which your organization has substantially greater information and capability than others in the value chain (Section 3.1, Govern 2.1)
    • We also recommend applying this principle throughout other risk assessment and risk management steps, and we refer to it frequently in other guidance sections.
  • Set risk-tolerance thresholds to prevent unacceptable risks (Map 1.5)
    • For example, The NIST AI RMF 1.0 recommends the following: “In cases where an AI system presents unacceptable negative risk levels – such as where significant negative impacts are imminent, severe harms are actually occurring, or catastrophic risks are present – development and deployment should cease in a safe manner until risks can be sufficiently managed. [emphasis added]” (NIST 2023a, p.8) 
  • Identify the potential uses, and misuses or abuses for a GPAIS, and identify reasonably foreseeable potential impacts (e.g., to fundamental rights) (Map 1.1)
  • Identify whether a GPAIS could lead to significant, severe or catastrophic impacts, e.g., because of correlated failures or errors across high-stakes deployment domains, dangerous emergent behaviors, or harmful misuses and abuses by AI actors (Map 5.1)
  • Use red teams and adversarial testing as part of extensive interaction with GPAIS to identify dangerous capabilities, vulnerabilities, or other emergent properties of such systems (Measure 1.1) 
  • Track important identified risks (e.g., vulnerabilities from data poisoning and other attacks or objectives mis-specification) even if they cannot yet be measured (Measure 1.1 and Measure 3.2)
  • Implement risk-reduction controls as appropriate throughout a GPAIS lifecycle, e.g., independent auditing, incremental scale-up, red-teaming, structured access or staged release, and other steps (Manage 1.3, Manage 2.3, and Manage 2.4)  
  • Incorporate identified AI system risk factors, and circumstances that could result in impacts or harms, into reporting to internal and external stakeholders (e.g., to downstream developers, regulators, users, impacted communities, etc.) on the AI system as appropriate, e.g., using model cards, or system cards (Govern 4.2)
  • Check or update, and incorporate, each of the above when making go/no-go decisions, especially on whether to proceed on major stages or investments for development or deployment of cutting-edge large-scale GPAIS (Manage 1.1) 

We also recommend: Document the process used in considering items, the options considered, and reasons for choices, including for guidance in Section 3 of this document. (Documentation on many items should be shared in publicly available material such as system cards. Some details on particular items such as security vulnerabilities can be responsibly omitted from public materials to reduce misuse potential, especially if available to auditors, Information Sharing and Analysis Organizations, or other parties as appropriate.)

GPAIS-related risk topics and corresponding guidance sections in this Profile document include the following. (Some of these topics overlap with others, in part because the guidance often involves iterative assessments for additional depth on issues identified at earlier stages.)

  • Reasonably foreseeable impacts (Section 3.2, Map 1.1), including:
    • To individuals, including impacts to health, safety, well-being, or fundamental rights
    • To groups, including populations vulnerable to disproportionate adverse impacts or harms
    • To society, including environmental impacts
  • Significant, severe, or catastrophic harm factors (Section 3.2, Map 5.1), including:
    • Correlated bias and discrimination
    • Impacts to societal trust or democratic processes
    • Correlated robustness failures
    • Capability to manipulate or deceive humans in harmful ways
    • Loss of understanding and control of an AI system in a real world context
  • AI trustworthiness characteristics (Section 3.4, Measure 2.x), including:
    • Safety, reliability, and robustness (Measure 2.5, Measure 2.6)
    • Security and resiliency (Measure 2.7)
    • Accountability and transparency (Measure 2.8)
    • Explainability and interpretability (Measure 2.9)
    • Privacy (Measure 2.10)
    • Fairness and bias (Measure 2.11)

Additional topics to address in future versions of the Profile are listed in Appendix 3.

Widespread norms for using best practices such as in this Profile can help ensure developers of increasingly general-purpose AI systems can be competitive without compromising on practices for AI safety, security, accountability, and related issues

Why Create this Profile?

Other initial AI RMF profiles have seemed likely to focus on specific industry sectors and end-use applications, e.g., in critical infrastructure or other high-risk categories of the draft EU AI Act. That seems valuable, especially for downstream developers of end-use applications, and could help the AI RMF achieve interoperability with other regulatory regimes such as the EU AI Act. However, an approach focused on end-use applications could overlook an opportunity to provide profile guidance for upstream developers of increasingly general-purpose AI, including AI systems sometimes referred to as “foundation models”. Such AI systems can have many uses, and early-development risk issues such as emergent properties that upstream developers are often in a better position to address than downstream developers building on AI platforms for specific end-use applications.

Guidance in this profile focuses on managing the broad context and associated risks of increasingly general-purpose AI, e.g.:

  • To address important underlying risks and early-development risks in a way that does not rely on having certainty about each specific end-use application of the technology.
  • To provide guidance on sharing of risk management responsibilities between upstream and downstream developers.

Milestones

We with the following profile-creation stages and approximate dates:

  • Planning and preliminary outreach – Q3-Q4 2022
  • Preliminary draft of the profile created, initial workshop and interviews – Q1 2023
  • First full draft of the profile publicly available, second workshop and interviews, alpha test – Q2 2023
  • Second full draft profile publicly available, third workshop and interviews, beta test – Q3 2023
  • Release Profile 1.0 on UC Berkeley Center for Long-Term Cybersecurity and arXiv websites – Q4 2023

Project Leads:

Anthony M. Barrett, Ph.D., PMP
Visiting Scholar, AI Security Initiative, Center for Long-Term Cybersecurity, UC Berkeley
anthony.barrett@berkeley.edu 

Jessica Newman
Director, AI Security Initiative, Center for Long-Term Cybersecurity, UC Berkeley
Co-Director, AI Policy Hub, UC Berkeley

Brandie Nonnecke, Ph.D.
Director, CITRIS Policy Lab, UC Berkeley
Co-Director, AI Policy Hub, UC Berkeley

Seeking Participants

We are seeking experts in AI policy, safety, security, and ethics who are interested in standards development, risk management, and the particular opportunities and challenges associated with increasingly general-purpose AI, such as cutting-edge large language models. Participants will receive invitations to attend optional quarterly workshops, or to provide input or feedback on drafts at their convenience. All activities are optional; no minimum time commitment is required.

Individual-level participation options include:

  • Providing ideas in workshops or interviews
  • Reviewing drafts 
  • Serving as a test user

Organization-level support options include:

  • Providing time for employees to participate
  • Allowing use of organizational logo on the Profile.

Please contact Tony Barrett (anthony.barrett@berkeley.edu) if you or your organization want to participate.

Available Drafts or Versions

The following is our most recent publicly available draft Profile: 

Version 1.0 of the Profile is now available here!

For version history and comparison, following are earlier publicly available draft documents:

Please send input or feedback to anthony.barrett@berkeley.edu. If you would rather not be listed in the Acknowledgements section of future drafts or versions of the Profile as someone that provided input or feedback, please let us know that when you provide us your comments.