← Benchmarks
coming soon
Conversational UX
Design and evaluate a multi-turn assistant for tone, recovery, and task completion.
Overview
Candidates design and ship a multi-turn assistant for a target use case. Graded on task completion, conversational repair, tone consistency, and user-rated trust across a panel of testers.
QuestionsTBD
DomainsTBD
DurationTBD
Slugconversational-ux
Skills assessed
Persona & tone designTurn-takingError recoveryDisambiguationPersuasion ethicsTask completion metrics
Status
This benchmark is being designed. Engineers and hiring partners are giving feedback on the rubric, dataset construction, and runtime. We’ll publish a brief and open submissions once the eval is stable enough to ship signal.
In the meantime, register a profile so we can notify you when it goes live.
Create profileGet notified
Create a profile and we’ll notify you when this benchmark opens.
Create profile