Biden Administration’s Voluntary AI Safety Agreement

Christopher Dodson

3 years ago

The Biden administration announced that it brokered a voluntary agreement with several of the biggest technology and artificial intelligence (AI) companies. The agreement, available here, has the companies taking a number of actions intended to encourage safe, secure, and trustworthy development of AI technologies, particularly generative AI systems. While the commitments are not as extensive as other frameworks, such as the NIST AI Risk Management Framework or the Biden Administration’s Blueprint for an AI Bill of Rights, they are in some ways more concrete and actionable, and could serve as a model for other companies entering the AI market.

Safety

Signatories to the agreement commit to adversarial testing (red-teaming) to evaluate areas such as misuse, societal risks, and national security concerns. Adversarial testing should be performed internally as well as by independent third parties. Testing will include a number of specific areas:

Biological, chemical, and radiological risks, such as the potential of an AI system to lower barriers to entry for weapons design, development, or use;
Cyber capabilities, such as the ways in which an AI system can be used for vulnerability discovery or the exploitation or defense of a computer system;
The effects of an AI system’s interaction with other systems, particularly the capacity to control physical systems;
The capacity for an AI system to self-replicate; and
Societal risks of the AI systems, such as bias and discrimination.

Security

Signatories will invest in cybersecurity and insider threat safeguards in connection with the AI system. Additionally, they will offer incentives, including bug bounties, contests or prizes, for third parties to discover and report unsafe behaviors, vulnerabilities, and other issues with the AI system.

Trust

AI systems should use mechanisms that enable users to understand if audio or visual content is AI-generated, including watermarking that identifies the service or model. Signatories will also develop tools or APIs to determine if a particular piece of content was created with their AI system.

AI companies will publicly report model or system capabilities, limitations, and domains of appropriate and inappropriate use, including a discussion of societal risks, such as effects on fairness and bias. Reports should include information about the safety evaluations conducted (including information about dangerous capabilities, to the extent it is responsible to publicly disclose this information), significant limitations in performance that have implications for the domains of appropriate use, discussion of the model’s effects on societal risks such as fairness and bias, and the results of adversarial testing conducted to evaluate the model’s fitness for deployment.

Signatories should evaluate societal risks posed by an AI system, including the potential for harmful bias and discrimination, and protecting privacy. AI systems should be designed to avoid harmful biases and discrimination from being created or propagated. Companies should also employ trust and safety teams, advance AI safety research, advance privacy, protect children, and proactively manage the risks of an AI system.

About The Author