← Standards

Transparency

v1.0 · Published May 19, 2026

Transparency makes safety-relevant information legible to stakeholders outside of AI labs, including scientists, civil society, and governments. This both incentivizes safer practices and gives external stakeholders more-informed views of risk.

Like Guidelight's other standards, this standard is organized into high-level goals (principles), concrete things we recommend developers do (practices), and experimental directions-for-development. Read more about our standards development process. Share feedback here.

Principles

Principle 1

Expose your risk assessment to public scrutiny.

a

Structured public risk assessment. Publish a report that states the developer's top-level conclusion about catastrophic risk from relevant models, including:

ithe type of risk claim being made (e.g., an absolute risk claim, a marginal risk claim relative to existing industry deployments)
iithe conclusion itself
iiithe key premises on which the conclusion depends
b

Publication frequency. Publish or update the report at least quarterly, and make clear what would trigger a re-assessment between cycles.

c

Risk category inclusion. Report, at minimum, the following risk categories:

imisuse while the model remains in the developer's possession
iimisuse arising if the model is stolen or otherwise no longer in the developer's possession
iiiloss of control due to misalignment

For each category, describe your threat models and the key assumptions the assessment is operating under.

d

Impact of mitigations. Where the risk argument relies on mitigations, disclose:

ithe risk the model would pose absent those mitigations
iithe mitigations relied upon
iiithe residual risk with mitigations operating as designed
ivthe risk under foreseeable mitigation failure modes, and which failure modes were considered
e

Legibility of arguments. Make the structure of the risk argument legible to outside readers, such that a public reader can see:

ithe claims being defended
iithe evidence relied on for each claim
iiithe inferential steps connecting evidence to claims
ivthe assumptions that, if false, would invalidate the argument
f

External review. Disclose what, if any, external review of the assessment was conducted, including:

iwho the reviewer was
iiwhat access they had and when
iiiwhether they have audited and verified the underlying factual claims they are relying upon
ivwhether they received any information that was redacted from the public materials and for what reason (and whether any information was redacted even from the external reviewer)
vwhat their review represents (e.g., sign-off, commentary, dissent)
viany potential conflicts of interest between the reviewer and the developer, and how these were managed
g

Senior employee attestation. A named senior employee publicly attests that the report accurately describes:

ithe headline characterization of risk
iithe methodology used
iiithe evidence the conclusions rest on
ivthe assumptions the argument depends on
vknown limitations and disagreements surfaced during drafting
h

Standardized uncertainty scale. Use a standardized uncertainty scale in which key terms are defined (e.g., probability ranges for likelihood judgements and criteria for strength of different forms of evidence).

i

Internal processes. Disclose internal policies, practices, and roles that support the integrity of the risk assessment process.1

j

Documented changes. Preserve prior versions of risk assessments in a public archive with a changelog, while distinguishing substantive from editorial changes.

Principle 2

Inform the public about incidents in a complete and timely fashion.

a

Incident definition. Publish a clear definition of what the developer considers a reportable incident, including any severity thresholds.

b

Incident comprehensiveness. In the incident definition, include, at minimum:

inear-misses of actual harms that would have been incidents had they come to fruition
iievents that materially reduce the developer's confidence in its ability to detect or prevent these incidents, such as realizing that mitigations have been less robust than previously believed.2
c

Comprehensiveness of reporting. Report all incidents meeting the developer's definition, and disclose for each (subject to withholding):

iwhich models were involved, and in what context the model was being used when the incident occurred
iithe evidence establishing that the model was involved in the incident
iiiwhat happened, when it happened (start and end, or best approximation), when and how it was discovered, and the chain of events that led to the incident, and a root cause analysis
ivan assessment of the incident's severity
van overview of the response, including what mitigations were applied and when, what mitigations are still planned, and the expected residual risk remaining, with more detail provided for higher-severity incidents
viany patterns from the developer's monitoring that are reasonably connected to the incident, including aggregate data on related near-misses
d

Withholding of information. For any incident disclosure, identify:

iany categories of information withheld (whether redacted from public disclosure, routed to a confidential regulatory channel, or otherwise not made public), and the reasons for withholding
iiany other parties notified about the incident (e.g., regulators, law enforcement, affected partners), and when they were notified
e

Timeframe for disclosure. Publish and adhere to an incident-reporting time window for public disclosures. The developer may make a preliminary disclosure within the window and follow up with additional information thereafter, provided the initial disclosure indicates when follow-up will occur.

f

Incident identification. Publish a description of how the organization identifies potential incidents, including the monitoring systems and review processes used to surface and triage candidates.

g

Internal channels. Disclose the existence and basic structure of any internal channels through which employees can flag potential incidents, including:

iwhether anonymous reporting is supported
iiwhether there are anti-retaliation protections
iiiwhat mechanisms exist to inform internal reporters of any outcomes of their report
1

Such as, but not limited to: disclosures of the teams and roles responsible for conducting, drafting, and editing the assessment; anonymous methods of registering disagreement; and any role with authority to overrule or revise findings.

2

Such as, but not limited to: discoveries that certain forms of model refusal can be jailbroken; that a monitor has missed a class of behavior it was designed to catch; or that an evaluation was contaminated, gamed, or otherwise doesn't support its prior conclusion.