← Benchmarks

Overview

Candidates are handed a fuzzy product brief, a noisy dataset, and a model card. They scope the feature, define the offline + online eval, and write a ship/no-ship memo. Graded by AI product leaders against a hidden rubric.

QuestionsTBD
DomainsTBD
DurationTBD
Slugai-product

Skills assessed

Problem framingEval designQuality bar settingFailure-mode triageCost / latency trade-offsLaunch decisions

Status

This benchmark is being designed. Engineers and hiring partners are giving feedback on the rubric, dataset construction, and runtime. We’ll publish a brief and open submissions once the eval is stable enough to ship signal.

In the meantime, register a profile so we can notify you when it goes live.

Create profile

Get notified

Create a profile and we’ll notify you when this benchmark opens.

Create profile