UC Berkeley researchers Tony Barrett, Jessica Newman, and Brandie Nonnecke are leading an effort to create an AI risk-management standards profile for general-purpose AI systems (GPAIS), foundation models, and generative AI, such as cutting-edge large language models. The profile guidance will be primarily for use by developers of such AI systems, in conjunction with the NIST AI Risk Management Framework (AI RMF) or the AI risk management standard ISO/IEC 23894. This profile will be a contribution to standards on AI policy, safety, security, and ethics with risk-management practices or controls for identifying, analyzing, and mitigating risks of GPAIS and foundation models.
Below is a policy brief related to the standards, published on September 27, 2023. Additional background information about this initiative can be found here. For more information, email Tony Barrett at anthony.barrett@berkeley.edu.
Policy Brief
Policymakers in the US and EU are considering approaches to regulating AI that include risk management requirements for large language models (LLMs) such as GPT-4, Claude 2, and Llama 2, or other general-purpose AI systems (GPAIS), foundation models, and generative AI.1 These AI systems can provide many beneficial capabilities, but also risks of adverse events with profound consequences.
We are UC Berkeley researchers with expertise in AI research and development, safety, security, policy, and ethics. Over the past year, we have led development and testing of an AI risk management standards “profile” with input and feedback from more than 70 people representing a range of stakeholders, resulting in over 100 pages of guidance for developers of cutting edge GPAIS and foundation models. (See: https://cltc.berkeley.edu/seeking-input-and-feedback-ai-risk-management-standards-profile-for-increasingly-multi-purpose-or-general-purpose-ai/.2) The profile is aligned with the NIST AI Risk Management Framework (AI RMF) and other AI standards such as ISO/IEC 23894.
This policy brief highlights key policy implications of the profile, as well as consideration of what AI risk-related policies would be especially valuable beyond the profile. We recommend that US and EU policymakers employ the following three strategies as they seek to regulate GPAIS, foundation models, and generative AI.
- Ensure that developers of GPAIS, foundation models, and generative AI adhere to appropriate AI risk management standards and guidance
- Ensure that GPAIS, foundation models, and generative AI undergo sufficient pre-release evaluations to identify and mitigate risks of severe harm, including for open source or downloadable releases of models that cannot be made unavailable after release
- Ensure that AI regulations and enforcement agencies provide sufficient oversight and penalties for non-compliance
1. Ensure that developers of GPAIS, foundation models, and generative AI adhere to appropriate AI risk management standards and guidance
The guidance in our profile, which can be used in conjunction with the NIST AI Risk Management Framework (AI RMF) or related standards such as ISO/IEC 23894, provide practices and controls for identifying, analyzing, and mitigating risks of advanced AI systems. These guidelines are currently voluntary, but such guidance should become part of required standards for GPAIS, foundation models, or for models used in high-risk situations. Certifications or licenses of GPAIS and foundation models should not be approved in the absence of appropriate, comprehensive risk management practices throughout the AI lifecycle.
Our profile places high priority on the following risk management steps, among others:
- Check or update, and incorporate, each of the following when making go/no-go decisions, especially on whether to proceed on major stages or investments for development or deployment of cutting-edge, large-scale GPAIS
- Set risk-tolerance thresholds to prevent unacceptable risks (e.g., “where significant negative impacts are imminent, severe harms are actually occurring, or catastrophic risks are present”, as recommended in the NIST AI RMF)
- Identify reasonably foreseeable uses, misuses, or abuses for a GPAIS (e.g., automated generation of toxic or illegal content or disinformation, or aiding with proliferation of cyber, chemical, biological, or radiological weapons), and identify reasonably foreseeable potential impacts (e.g., to fundamental rights)
- Identify whether a GPAIS could lead to significant, severe, or catastrophic impacts, (e.g., because of correlated failures or errors across high-stakes deployment domains, dangerous emergent behaviors or vulnerabilities, or harmful misuses by AI actors)
- Use red teams and adversarial testing as part of extensive interaction with GPAIS (e.g., to identify dangerous capabilities or vulnerabilities of such systems)
- Implement risk-reduction controls, as appropriate, throughout a GPAIS lifecycle (e.g., independent auditing, incremental scale-up, red teaming, structured access or staged release, and other steps)
Many of these risk management steps, such as for internal and external red team evaluations of models, and public reporting of societal risks, are similar to commitments recently made by several GPAIS developers, when developing and releasing models more broadly capable than GPT-4 or other models available as of July 2023.3
2. Ensure that GPAIS, foundation models, and generative AI undergo sufficient pre-release evaluations to identify and mitigate risks of severe harm, including for open source or downloadable releases of models that cannot be made unavailable after release
Our profile provides guidance to AI developers about setting risk tolerance thresholds, including unacceptable-risk thresholds, and about evaluations and risk factors to consider when deciding whether to release, deploy, or use an AI system. Examples of unacceptable risks include “where significant negative impacts are imminent, severe harms are actually occurring, or catastrophic risks are present”, as recommended in the NIST AI RMF. Policymakers should strengthen and expand upon that guidance as appropriate, and include that guidance as part of regulations applicable to GPAIS, foundation models, and generative AI. Developers should not release GPAIS, foundation models, or generative AI models that pose substantial risks to people, communities, society, or the environment.
If severe harms are identified, policymakers, regulators, or others should be able to request an AI system to be removed or otherwise shut down. There is precedent in the United States, as the FTC has required companies to delete algorithmic systems built with data that violated data protection laws.4
However, GPAIS developers that publicly release the model parameter weights for their GPAIS with downloadable, fully open, or open source access to their models, and other GPAIS developers that suffer a leak of model weights, will in effect be unable to shut down or decommission GPAIS that others build using those model weights. This is a consideration that should be weighed against the benefits of open source models, especially for the largest-scale and most broadly capable models that pose the greatest risks of enabling severe harms, including from malicious misuse to harm the public. Many of the benefits of open source, such as review and evaluation from a broader set of stakeholders, can be supported through transparency, engagement, and other openness mechanisms that do not require making a model’s parameter weights downloadable or open source, or by releasing smaller-scale and less broadly capable open source models.
GPAIS and foundation model developers that plan to provide downloadable, fully open, or open source access to their models should first use a staged-release approach (e.g., not releasing parameter weights until after an initial closed source or structured access release where no substantial risks or harms have emerged over a sufficient time period), and should not proceed to a final step of releasing model parameter weights until a sufficient level of confidence in risk management has been established, including for safety risks and risks of misuse and abuse. (The largest-scale or most capable models should be given the greatest duration and depth of pre-release evaluations, as they are the most likely to have dangerous capabilities or vulnerabilities that can take some time to discover.)
3. Ensure that AI regulations and enforcement agencies provide sufficient oversight and penalties for non-compliance
Our profile details numerous recommended practices to manage and mitigate risks of GPAIS and foundation models. But there is limited accountability when developers fail to follow best practices or when harmful impacts materialize. Government agencies and departments should be provided with sufficient resources to uphold pre-existing laws in the age of AI proliferation. Where gaps are identified, additional regulatory authority or new federal AI laws with provisions for sufficient oversight and fines or other penalties for irresponsible actions would also be highly valuable. Without such federal laws and enforcement, companies may perceive net incentives to move too hastily to develop and deploy excessively risky AI systems. Finally, any agency responsible for enforcing a law must have the authority and resources to assess and enforce compliance.5
Conclusion
We broadly support US and EU regulatory requirements for LLMs and other GPAIS, foundation models and generative AI, as outlined in the EU AI Act position of the European Parliament and the bipartisan frameworks and bills proposed by Congress. Harmonized regulatory requirements can help ensure that developers of such systems can compete without compromising on safety, trustworthiness, or other key aspects of such systems.
AI-related regulations and laws should, at minimum, incorporate and build on AI best practices, standards, and guidance, such as in the NIST AI RMF and the Blueprint for an AI Bill of Rights. As part of regulations building on the NIST AI RMF and addressing GPAIS, foundation models, and generative AI policymakers should adapt or incorporate guidance for model developers, such as in our profile, that provides actionable guidance for industry that addresses key risks of severe harm to the public.
Notes
- See, e.g., the “Bipartisan Framework for U.S. AI Act” by Senator Richard Blumenthal and Senator Josh Hawley, and the amendments adopted by the European Parliament on 14 June 2023 on the EU AI Act.
- Our multi-stakeholder engagement process has included participants from developers of GPAIS, foundation models and generative AI, as well as other parts of industry, academia, civil society, and government. We have conducted several workshops and made two full drafts publicly available. We have also tested application of the draft guidance to four recently released, large-scale models (GPT-4, Claude 2, PaLM 2, and Llama 2). We plan to publish Version 1.0 of the profile by the end of 2023.
- Ensuring Safe, Secure, and Trustworthy AI. White House, https://www.whitehouse.gov/wp-content/uploads/2023/07/Ensuring-Safe-Secure-and-Trustworthy-AI.pdf
- See e.g. “The FTC’s biggest AI enforcement tool? Forcing companies to delete their algorithms,” https://cyberscoop.com/ftc-algorithm-disgorgement-ai-regulation/
- NIST received authorization from Congress to develop the AI RMF only as voluntary guidance for industry and other actors. As far as we are aware, neither NIST nor any other US federal agency currently has regulatory authority and policies in place that require industry to use the AI RMF, nor the resources to assess and enforce compliance.