← Benchmarks
coming soon
LLM Systems & Inference
Optimise an inference stack for latency, throughput, and cost under a fixed quality bar.
Overview
Candidates take a reference inference stack and improve tokens/sec, p95 latency, and $/1k requests without dropping below a quality floor. Graded on the Pareto frontier they reach against a held-out workload.
QuestionsTBD
DomainsTBD
DurationTBD
Slugllm-systems
Skills assessed
Batching & schedulingKV-cache optimisationQuantisationSpeculative decodingGPU profilingCost / quality trade-offs
Status
This benchmark is being designed. Engineers and hiring partners are giving feedback on the rubric, dataset construction, and runtime. We’ll publish a brief and open submissions once the eval is stable enough to ship signal.
In the meantime, register a profile so we can notify you when it goes live.
Create profileGet notified
Create a profile and we’ll notify you when this benchmark opens.
Create profile