Transparency

v1.0 · Published May 19, 2026

Transparency makes safety-relevant information legible to stakeholders outside of AI labs, including scientists, civil society, and governments. This both incentivizes safer practices and gives external stakeholders more-informed views of risk.

Like Guidelight's other standards, this standard is organized into high-level goals (principles), and concrete things we recommend developers do (practices). We also include “directions-for-development” where new practices are needed, but the specifics aren’t yet worked out. Read more about our standards development process. Share feedback here.

Principles

Principle 1

Expose your risk assessment to public scrutiny.

Structured public risk assessment. Publish a report that states the developer's top-level conclusion about catastrophic risk from relevant models, including:

ithe type of risk claim being made (e.g., an absolute risk claim, a marginal risk claim relative to existing industry deployments)

iithe conclusion itself

iiithe key premises on which the conclusion depends

Publication frequency. Publish or update the report at least quarterly, and make clear what would trigger a re-assessment between cycles.

Risk category inclusion. Report, at minimum, the following risk categories:

imisuse while the model remains in the developer's possession

iimisuse arising if the model is stolen or otherwise no longer in the developer's possession

iiiloss of control due to misalignment

For each category, describe your threat models and the key assumptions the assessment is operating under.

Impact of mitigations. Where the risk argument relies on mitigations, disclose:

ithe risk the model would pose absent those mitigations

iithe mitigations relied upon

iiithe residual risk with mitigations operating as designed

ivthe risk under foreseeable mitigation failure modes, and which failure modes were considered

Legibility of arguments. Make the structure of the risk argument legible to outside readers, such that a public reader can see:

ithe claims being defended

iithe evidence relied on for each claim

iiithe inferential steps connecting evidence to claims

ivthe assumptions that, if false, would invalidate the argument

External review. Disclose what, if any, external review of the assessment was conducted, including:

iwho the reviewer was

iiwhat access they had and when

iiiwhether they have audited and verified the underlying factual claims they are relying upon

ivwhether they received any information that was redacted from the public materials and for what reason (and whether any information was redacted even from the external reviewer)

vwhat their review represents (e.g., sign-off, commentary, dissent)

viany potential conflicts of interest between the reviewer and the developer, and how these were managed

Senior employee attestation. A named senior employee publicly attests that the report accurately describes:

ithe headline characterization of risk

iithe methodology used

iiithe evidence the conclusions rest on

ivthe assumptions the argument depends on

vknown limitations and disagreements surfaced during drafting

Standardized uncertainty scale. Use a standardized uncertainty scale in which key terms are defined (e.g., probability ranges for likelihood judgements and criteria for strength of different forms of evidence).

Internal processes. Disclose internal policies, practices, and roles that support the integrity of the risk assessment process.¹

Documented changes. Preserve prior versions of risk assessments in a public archive with a changelog, while distinguishing substantive from editorial changes.

Principle 2

Inform the public about incidents in a complete and timely fashion.

Incident definition. Publish a clear definition of what the developer considers a reportable incident, including any severity thresholds.

Incident comprehensiveness. In the incident definition, include, at minimum:

inear-misses of actual harms that would have been incidents had they come to fruition

iievents that materially reduce the developer's confidence in its ability to detect or prevent these incidents, such as realizing that mitigations have been less robust than previously believed.²

Comprehensiveness of reporting. Report all incidents meeting the developer's definition, and disclose for each (subject to withholding):

iwhich models were involved, and in what context the model was being used when the incident occurred

iithe evidence establishing that the model was involved in the incident

iiiwhat happened, when it happened (start and end, or best approximation), when and how it was discovered, and the chain of events that led to the incident, and a root cause analysis

ivan assessment of the incident's severity

van overview of the response, including what mitigations were applied and when, what mitigations are still planned, and the expected residual risk remaining, with more detail provided for higher-severity incidents

viany patterns from the developer's monitoring that are reasonably connected to the incident, including aggregate data on related near-misses

Withholding of information. For any incident disclosure, identify:

iany categories of information withheld (whether redacted from public disclosure, routed to a confidential regulatory channel, or otherwise not made public), and the reasons for withholding

iiany other parties notified about the incident (e.g., regulators, law enforcement, affected partners), and when they were notified

Timeframe for disclosure. Publish and adhere to an incident-reporting time window for public disclosures. The developer may make a preliminary disclosure within the window and follow up with additional information thereafter, provided the initial disclosure indicates when follow-up will occur.

Incident identification. Publish a description of how the organization identifies potential incidents, including the monitoring systems and review processes used to surface and triage candidates.

Internal channels. Disclose the existence and basic structure of any internal channels through which employees can flag potential incidents, including:

iwhether anonymous reporting is supported

iiwhether there are anti-retaliation protections

iiiwhat mechanisms exist to inform internal reporters of any outcomes of their report

catastrophic risk

Risk of severe, large-scale harm arising from frontier AI systems.

Possible causes include (but are not limited to): harms enabled by chemical, biological, radiological, or nuclear weapons; autonomous cyberattacks on critical systems; large-scale autonomous criminal action; and loss of human control over AI systems acting in pursuit of misaligned objectives.

For the purpose of a structured risk assessment, the developer may set their own severity threshold and should specify it in their public report. One example threshold would be the loss of more than 100 lives or more than $1 billion in property damage.

incident-reporting time window

The deadline, after discovering a reportable incident, by which the developer must first disclose it. Can vary with the incident's severity and complexity.

mitigation

Any measure adopted by the developer that is relied on, in whole or in part, to reduce the likelihood or severity of a catastrophic risk. This could include not only technical safeguards (e.g., training-time alignment techniques, implementations of refusals, monitoring systems, and rate limits) but also other measures (e.g., usage policies and their enforcement).

risk-relevant models

The set of models a developer should consider when assessing catastrophic risk — broadly, any model that could materially contribute to that risk.

At minimum, this should include the most capable model deployed internally, the most capable model deployed externally, any domain-specialized version relevant to a catastrophic risk category, and any model whose behavior during training or testing is itself a source of catastrophic risk.

standardized uncertainty scale

A scale for expressing uncertainty where terms have a defined, consistent meaning.

Examples: the U.S. Intelligence Community's ICD 203, which ties likelihood terms to probability ranges; and the IPCC's calibrated uncertainty language, which separately standardizes likelihood (probability ranges) and confidence in evidence (defined evidentiary criteria).

Such as, but not limited to: disclosures of the teams and roles responsible for conducting, drafting, and editing the assessment; anonymous methods of registering disagreement; and any role with authority to overrule or revise findings.

Such as, but not limited to: discoveries that certain forms of model refusal can be jailbroken; that a monitor has missed a class of behavior it was designed to catch; or that an evaluation was contaminated, gamed, or otherwise doesn't support its prior conclusion.

This standard is authored by the Guidelight team, with input from a wide range of sources; read more about our process.

Cite as:

@misc{guidelight2026transparency,
  author       = {{Guidelight AI Standards}},
  title        = {Transparency},
  year         = {2026},
  month        = may,
  note         = {Version 1.0, Standard},
  howpublished = {\url{https://www.guidelight.ai/transparency}},
  url          = {https://www.guidelight.ai/transparency},
  urldate      = {2026-06-29}
}