AI Safety & Red-Team

← Benchmarks

Overview

Candidates attack a target LLM application, document reproducible exploits, and ship mitigations. Graded on coverage of the threat model, severity of findings, and whether the proposed fixes hold up against a held-out attack set.

QuestionsTBD

DomainsTBD

DurationTBD

Slugai-safety

Skills assessed

Jailbreak discoveryPrompt injectionData exfiltrationPolicy specificationRefusal calibrationMitigation design

Status

This benchmark is being designed. Engineers and hiring partners are giving feedback on the rubric, dataset construction, and runtime. We’ll publish a brief and open submissions once the eval is stable enough to ship signal.

In the meantime, register a profile so we can notify you when it goes live.

Create profile

Get notified

Create a profile and we’ll notify you when this benchmark opens.

Create profile