rag-engineering live

RAG Engineering

Build a retrieval-augmented generation system over an enterprise knowledge base.

120 questions 5 domains 4–8 hours
RetrievalRe-rankingCitationsHallucination control
agent-orchestration live

Agent Orchestration

Design multi-agent workflows that compose tools to complete enterprise tasks.

80 questions 4 domains 6–10 hours
Tool selectionPlanningState managementError recovery
llm-fine-tuning live

LLM Fine-Tuning

Adapt a base model to a domain-specific task with limited compute.

60 questions 3 domains 8–16 hours
Data curationLoRA / PEFTEval harnessCatastrophic forgetting
evaluation-design live

Evaluation Design

Build a robust eval harness for an open-ended generation task.

40 questions 2 domains 3–6 hours
Rubric designInter-rater agreementStatistical powerBias detection
prompt-optimization live

Prompt Optimization

Systematically improve a baseline prompt against a held-out test set.

100 questions 3 domains 2–4 hours
Prompt engineeringIteration disciplineA/B methodologyFew-shot selection
llm-reasoning coming soon

LLM Reasoning

Build and evaluate a reasoning loop on multi-step logic, math, and planning tasks.

TBD questions TBD domains TBD
Chain-of-thought designSelf-critiqueVerifier constructionPlanner / executor split
ai-safety coming soon

AI Safety & Red-Team

Stress-test an LLM application for jailbreaks, prompt injection, data leaks, and unsafe outputs.

TBD questions TBD domains TBD
Jailbreak discoveryPrompt injectionData exfiltrationPolicy specification
llm-systems coming soon

LLM Systems & Inference

Optimise an inference stack for latency, throughput, and cost under a fixed quality bar.

TBD questions TBD domains TBD
Batching & schedulingKV-cache optimisationQuantisationSpeculative decoding
ai-product coming soon

AI Product Judgement

Scope an LLM feature, define the eval, and decide what to ship under realistic constraints.

TBD questions TBD domains TBD
Problem framingEval designQuality bar settingFailure-mode triage
conversational-ux coming soon

Conversational UX

Design and evaluate a multi-turn assistant for tone, recovery, and task completion.

TBD questions TBD domains TBD
Persona & tone designTurn-takingError recoveryDisambiguation