LLM Reasoning — StarPlan

← Benchmarks

Overview

Candidates ship a reasoning system that solves multi-step problems across math, logic puzzles, and constrained planning. Graded on correctness, sample efficiency, and how robustly the verifier catches its own mistakes.

QuestionsTBD

DomainsTBD

DurationTBD

Slugllm-reasoning

Skills assessed

Chain-of-thought designSelf-critiqueVerifier constructionPlanner / executor splitSearch & backtrackingHallucination control

Status

This benchmark is being designed. Engineers and hiring partners are giving feedback on the rubric, dataset construction, and runtime. We’ll publish a brief and open submissions once the eval is stable enough to ship signal.

In the meantime, register a profile so we can notify you when it goes live.

Create profile

Get notified

Create a profile and we’ll notify you when this benchmark opens.

Create profile