Our Standards

Guidelight's standards describe what safe frontier AI development looks like in concrete terms, balanced by what's achievable. Read more about our standards development process.

Control (v1.0)

View full standard →

Control refers to the technical and operational measures that constrain what an AI system can do, regardless of whether it is aligned. These measures both reduce catastrophic risk from a misaligned AI and can surface evidence of an AI's misalignment.

Be able to see what your AI is doing during internal deployment.

Scan for signs of concerning behavior.

Stress-test the sufficiency of your scanning.

Stop the AI from taking harmful actions even if it tried.

Have independent third parties verify the adequacy of your control regime.

Prepare for a possible breach of control.

Capability Testing (v1.0)

View full standard →

Capability testing measures what risk-relevant abilities an AI system has, which can inform decisions like what safeguards are needed for deployment. This testing is about ability to cause harm, not about tendency to cause harm.

Evaluate the capabilities most relevant to important threat models

Assess system capabilities at defined milestones

Use a sufficiently strong set of evaluations for measuring a capability

Elicit the maximum reasonably achievable performance on evaluations

Establish well-supported interpretations of evaluation results

Ensure evaluation is insulated from business pressures

Transparency (v1.0)

View full standard →

Transparency makes safety-relevant information legible to stakeholders outside of AI labs, including scientists, civil society, and governments. This both incentivizes safer practices and gives external stakeholders more-informed views of risk.

Expose your risk assessment to public scrutiny.

Inform the public about incidents in a complete and timely fashion.