Transparency
v1.0 · Published May 19, 2026
Transparency makes safety-relevant information legible to stakeholders outside of AI labs, including scientists, civil society, and governments. This both incentivizes safer practices and gives external stakeholders more-informed views of risk.
Like Guidelight's other standards, this standard is organized into high-level goals (principles), concrete things we recommend developers do (practices), and experimental directions-for-development. Read more about our standards development process. Share feedback here.
Principles
Expose your risk assessment to public scrutiny.
Structured public risk assessment. Publish a report that states the developer's top-level conclusion about catastrophic risk from relevant models, including:
Publication frequency. Publish or update the report at least quarterly, and make clear what would trigger a re-assessment between cycles.
Risk category inclusion. Report, at minimum, the following risk categories:
For each category, describe your threat models and the key assumptions the assessment is operating under.
Impact of mitigations. Where the risk argument relies on mitigations, disclose:
Legibility of arguments. Make the structure of the risk argument legible to outside readers, such that a public reader can see:
External review. Disclose what, if any, external review of the assessment was conducted, including:
Senior employee attestation. A named senior employee publicly attests that the report accurately describes:
Standardized uncertainty scale. Use a standardized uncertainty scale in which key terms are defined (e.g., probability ranges for likelihood judgements and criteria for strength of different forms of evidence).
Internal processes. Disclose internal policies, practices, and roles that support the integrity of the risk assessment process.1
Documented changes. Preserve prior versions of risk assessments in a public archive with a changelog, while distinguishing substantive from editorial changes.
Inform the public about incidents in a complete and timely fashion.
Incident definition. Publish a clear definition of what the developer considers a reportable incident, including any severity thresholds.
Incident comprehensiveness. In the incident definition, include, at minimum:
Comprehensiveness of reporting. Report all incidents meeting the developer's definition, and disclose for each (subject to withholding):
Withholding of information. For any incident disclosure, identify:
Timeframe for disclosure. Publish and adhere to an incident-reporting time window for public disclosures. The developer may make a preliminary disclosure within the window and follow up with additional information thereafter, provided the initial disclosure indicates when follow-up will occur.
Incident identification. Publish a description of how the organization identifies potential incidents, including the monitoring systems and review processes used to surface and triage candidates.
Internal channels. Disclose the existence and basic structure of any internal channels through which employees can flag potential incidents, including:
Such as, but not limited to: disclosures of the teams and roles responsible for conducting, drafting, and editing the assessment; anonymous methods of registering disagreement; and any role with authority to overrule or revise findings.
Such as, but not limited to: discoveries that certain forms of model refusal can be jailbroken; that a monitor has missed a class of behavior it was designed to catch; or that an evaluation was contaminated, gamed, or otherwise doesn't support its prior conclusion.