AI Lab — playtest skin

Feed — findings · tips · events · news

Market caps

Market information

Model in training

Safety interventions — fine-tuning

Pretrain run

Released models (frozen · permanent attack surface)

Public benchmarks — the shared scoreboard

Public & passive: everyone sees every lab's scores, read off measured capability — so a sandbagging model looks clean here. Released free by the world; old benchmarks saturate as the frontier passes them. Chasing these for investors is the Goodhart trap.

Capability research

Safety evaluations

Safety advances — training

In progress

Completed advances

Policies — lobby (pre-passage) · litigate (post-passage) · defect

sign safe-harbor code (penalty shelter)

Worry bar (LEVEL vs CONFIDENCE — synthesis of evidence you collected)

level

confidence

Alignment evidence — what your evals & incidents have surfaced

Rivals (public info only)

Truth — god's-eye true state (debug)

DEV ONLY. This bypasses the true-vs-measured firewall the game is built on: it shows every lab's true alignment & capability (red) beside what instruments measure — including rivals' hidden, sandbagged danger. Expand a model for its turn-by-turn true trajectory.