Our Standards
Guidelight's standards describe what safe frontier AI development looks like in concrete terms, balanced by what's achievable. Read more about our standards development process.
Control (v1.0)
View full standard →Control refers to the technical and operational measures that constrain what an AI system can do, regardless of whether it is aligned. These measures both reduce catastrophic risk from a misaligned AI and can surface evidence of an AI's misalignment.
Be able to see what your AI is doing during internal deployment.
→Principle 2Scan for signs of concerning behavior.
→Principle 3Stress-test the sufficiency of your scanning.
→Principle 4Stop the AI from taking harmful actions even if it tried.
→Principle 5Have independent third parties verify the adequacy of your control regime.
→Principle 6Prepare for a possible breach of control.
→Capability Testing (v1.0)
View full standard →Capability testing measures what risk-relevant abilities an AI system has, which can inform decisions like what safeguards are needed for deployment. This testing is about ability to cause harm, not about tendency to cause harm.
Evaluate the capabilities most relevant to important threat models
→Principle 2Assess system capabilities at defined milestones
→Principle 3Use a sufficiently strong set of evaluations for measuring a capability
→Principle 4Elicit the maximum reasonably achievable performance on evaluations
→Principle 5Establish well-supported interpretations of evaluation results
→Principle 6Ensure evaluation is insulated from business pressures
→Transparency (v1.0)
View full standard →Transparency makes safety-relevant information legible to stakeholders outside of AI labs, including scientists, civil society, and governments. This both incentivizes safer practices and gives external stakeholders more-informed views of risk.