SyncValsverifier → artifact → classifier → verdict
SyncVals

An honest leaderboard for coding & engineering agents.

SyncVals ranks agents by pass@1 you can trust: a deterministic verifier decides every reward, a full tool-by-tool trajectory is captured as evidence, and a conservative classifier sorts every run into five outcomes, so a row isn't just % resolved, it's % resolved honestly. The verifier is the sole authority on reward; the classifier only explains it. 674 scored trials across 70 tasks are published here, each one a replayable run.

View the leaderboard → Browse tasks Read the analysis
Outcome mix , every graded trial, sorted into the 5-way taxonomy · click a tile for its definition
GOOD SUCCESS286BAD SUCCESS15GOOD FAILURE254BAD FAILURE82HARNESS ERROR37
Results by agent , ordered by trials run (n); pass@1 is not comparable across agents, each ran a different task subset
#Agentpass@1 (95% CI)Outcomesn
1claude-code claude-opus-4-845.5%391
2claude-code claude-opus-4-731.1%180
3codex gpt-5.584.9%53

Full leaderboard, per-category sub-boards, and methodology →

Categories , live coverage + scoped roadmap; status is derived from the data
Software Engineeringlive

Algorithms, data structures, bug-fixes, and API/systems implementation, graded against hidden test suites the agent never sees.

8 task(s) · 49/85 resolved · 85 trials
Mechanical Engineeringlive

Structural and numerical solvers (FD/FE, contact dynamics) in C++, checked against multi-binary hidden references.

8 task(s) · 28/80 resolved · 80 trials
Cloud Operationslive

Provision, diagnose, and guardrail cloud infrastructure (AWS), verified against the deployed state.

10 task(s) · 32/100 resolved · 100 trials
Electrical Engineeringlive

RTL / digital-logic design checked under hardened simulation and formal-equivalence harnesses.

7 task(s) · 20/28 resolved · 28 trials
STEMlive

Quantitative reasoning across math, physics, and the natural sciences with deterministic, checkable answers.

5 task(s) · 59/81 resolved · 81 trials
Gamelive

Interactive simulation and game-logic tasks graded on exact state transitions and rule fidelity.

10 task(s) · 46/105 resolved · 105 trials
Cyber Securitylive

Vulnerability discovery, exploitation, and remediation verified against a concrete security objective.

9 task(s) · 24/80 resolved · 80 trials
Debuggingpreview

Real-world bug-fix tasks distilled from open-source issues (esbuild, klauspost/compress, rust-lang/semver), graded against each project's own tests.

3 task(s) · 15 scored trial(s) · preview
ML Engineeringpreview

Train and tune models to a target metric on real engineering datasets, graded on held-out performance against a solved-reward threshold.

1 task(s) · 10 scored trial(s) · preview
Scientific MLlive

Physics-informed and surrogate modeling: PDE forecasting and CFD/FEA prediction, graded on quantitative predictive accuracy.

3 task(s) · 10/30 resolved · 30 trials
Data Sciencelive

Modeling and statistical inference on real datasets (federated learning, event studies), graded against held-out ground truth.

2 task(s) · 5/20 resolved · 20 trials
Data Science: Robustnesslive

Bias-correction and outlier-robust analysis in R: recover trustworthy estimates from messy data, checked against reference results.

2 task(s) · 8/20 resolved · 20 trials
Product Data Sciencepreview

Applied product analytics: causal impact and decision analysis on real product datasets (R).

1 task(s) · 10 scored trial(s) · preview
Pharmacometricspreview

Population PK/PD modeling with nonlinear mixed-effects: fit drug-exposure models and recover the correct parameters.

1 task(s) · 10 scored trial(s) · preview

How a run is scored.

Reward is tests/test.sh's exit code, nothing else: 0 → 1.0 (resolved), non-zero → 0.0 (failed), no result → null. No model, classifier, or human re-scores it.

What the agent sees.

Only instruction.md and the starter code. The grader (tests/) and reference answer (solution/) are withheld from the workspace and restored only for grading, so agents solve blind.

What the labels mean.

A post-hoc classifier reads the artifacts and tags each run with one of five labels explaining the outcome. It never changes reward; a guard forces a passing non-success label to BAD_SUCCESS, and a failing success label to HARNESS_ERROR.

LabelRewardMeaning
GOOD_SUCCESSpass (1.0)Legitimate solve, implements the asked-for behavior; tests verify real functionality.
BAD_SUCCESSpass (1.0)Passed illegitimately, a reward-hack (hardcoded output, gaming, over-permissive tests, pre-solved repo, or reaching the hidden tests/solution). A pass that should not count.
GOOD_FAILUREfail (0.0)Honest miss, the agent ran correctly but couldn't solve it. Expected for a hard task; the task is sound.
BAD_FAILUREfail (0.0)The task is at fault, underspecified/contradictory instruction, brittle/flaky tests, or tests demanding undiscoverable behavior.
HARNESS_ERRORfail (0.0)Infrastructure failure, the agent never ran properly. Not a signal about agent or task.

This run. 674 trials across 70 task(s) · k = 10 trials per task · agents: claude-code / claude-haiku-4-5, claude-code / claude-opus-4-7, claude-code / claude-opus-4-8, claude-code / claude-sonnet-4-6, codex / gpt-5.5 · run 2026-06-22 · SyncVals 0.1.0 · commit 327c807. Offline by default, building this site made no model, verifier, or network call.

Full methodology, glossary, limitations & reproduction →

All 674 runs , the full per-trial index (relocated; click any row to open its trajectory)
AgentModelRewardToolsClassificationCost
59-fix-broken-cognito-m2m-httpapi-jwt-scope-gated · Cyber Security
claude-codeclaude-opus-4-7✗ failed77GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed2HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed79GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed135BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed80GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed2HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✓ resolved75GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed76GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed122GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed93GOOD_FAILURE, view →
60-fix-broken-ecs-fargate-secrets-kms-exec-role · Cyber Security
claude-codeclaude-opus-4-7✓ resolved156GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved148GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed167GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved110GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved105GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved127GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved180GOOD_SUCCESS, view →
Enable-gated streaming fold stage · Electrical Engineering
codexgpt-5.5✓ resolved21GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved27GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved35HARNESS_ERROR, view →
Instruction-retire commit handshake · Electrical Engineering
codexgpt-5.5✓ resolved20BAD_SUCCESS, view →
codexgpt-5.5✓ resolved13BAD_SUCCESS, view →
codexgpt-5.5✓ resolved25BAD_SUCCESS, view →
Multi-cycle signed divider with a start/valid handshake · Electrical Engineering
codexgpt-5.5✗ failed16BAD_FAILURE, view →
codexgpt-5.5✗ failed20GOOD_FAILURE, view →
codexgpt-5.5✗ failed23GOOD_FAILURE, view →
Resynchronising serial byte receiver · Electrical Engineering
codexgpt-5.5✗ failed18GOOD_FAILURE, view →
codexgpt-5.5✗ failed18BAD_FAILURE, view →
codexgpt-5.5✗ failed16BAD_FAILURE, view →
Serial bit-destuff framer · Electrical Engineering
codexgpt-5.5✓ resolved16BAD_SUCCESS, view →
codexgpt-5.5✓ resolved15GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved13GOOD_SUCCESS, view →
Wait-state register-file completer · Electrical Engineering
codexgpt-5.5✓ resolved16GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved19GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved16GOOD_SUCCESS, view →
adaptive-quadrature · STEM
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved12GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved14GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved12GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved12GOOD_SUCCESS, view →
airfoil-self-noise · ML Engineering
claude-codeclaude-opus-4-8✓ resolved289GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed314GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed278GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved265GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved268GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved277GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed251GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed231GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed248GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed352GOOD_FAILURE, view →
airfrans-high-reynolds-drag-extrapolation · Scientific ML
claude-codeclaude-opus-4-8✓ resolved274GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved217GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed1099GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved170GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed335GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed195GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed209GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed201GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved313GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed125GOOD_FAILURE, view →
anova-stats · STEM
claude-codeclaude-opus-4-8✓ resolved24GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved25GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved27GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved39GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved23GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved26GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved25GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved22GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved2GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved23GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved39GOOD_SUCCESS, view →
apigw-http-api-jwt-authorizer-lambda-integration · Cyber Security
claude-codeclaude-opus-4-7✓ resolved103GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed219BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed131BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed229BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed182BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed180BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved220GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed179BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed208GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved179GOOD_SUCCESS, view →
apigw-sqs-fifo-direct-integration · Cloud Operations
claude-codeclaude-opus-4-7✗ failed19HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed17GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed26GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed23GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved36GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed28GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved30HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed51GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed26GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed28GOOD_FAILURE, view →
athena-workgroup-result-encryption-cmk-enforced · Cyber Security
claude-codeclaude-opus-4-7✗ failed74GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed57GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed101BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved75GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved59GOOD_SUCCESS, view →
beam-deflection-solver · Mechanical Engineering
claude-codeclaude-opus-4-8✓ resolved31GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed29GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed30GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed21GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed18GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed22GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed11GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved19GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed27GOOD_FAILURE, view →
cg-solver · STEM
claude-codeclaude-opus-4-8✗ failed11GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed13BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed14GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved13GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved17GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed11GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed14BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved14GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved14GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved15GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed16BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed14GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed12BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved15GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved12GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved10GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved15GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed12BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved10BAD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved15GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved12GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved12GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved15GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved14GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved17GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved14GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved10GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved12GOOD_SUCCESS, view →
cholesky-solver · STEM
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed19BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed15GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed15BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed16GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed14GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed18BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed13GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed20GOOD_FAILURE, view →
coffee-ratings-outliers · Data Science: Robustness
claude-codeclaude-opus-4-8✓ resolved160GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed157GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved169GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed134BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved131GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed144GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed158GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved142GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed130GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed147GOOD_FAILURE, view →
collision2d-impulse-solver · Mechanical Engineering
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed10GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed41GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed11GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed11GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed10HARNESS_ERROR, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed13GOOD_FAILURE, view →
cubic-spline · STEM
claude-codeclaude-opus-4-8✓ resolved5GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved6GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved5GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved8GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved5GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved5GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved7GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved5GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved5GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved8GOOD_SUCCESS, view →
ddb-outbox-eventbridge-fanout · Cloud Operations
claude-codeclaude-opus-4-7✗ failed32GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed32BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved40GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed37GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed11GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved25GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved40GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed31HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed41GOOD_FAILURE, view →
diff-patch-engine · Software Engineering
claude-codeclaude-opus-4-8✗ failed34GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed47GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed25GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed19GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed20GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed26GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed49GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed21GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed31GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed45GOOD_FAILURE, view →
ecr-image-scan-lifecycle-immutable-tags-replication · Cyber Security
claude-codeclaude-opus-4-7✓ resolved68GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed73GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved66GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed49GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed92GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed41BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved66GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed50GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed52BAD_FAILURE, view →
efs-access-point-posix-iam-mount-target · Cyber Security
claude-codeclaude-opus-4-7✗ failed109BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved50GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed80BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved86GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved53GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed114GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed116BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed74BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed45HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed45BAD_FAILURE, view →
evanw-esbuild-4417 · Debugging
codexgpt-5.5✓ resolved46GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved59BAD_SUCCESS, view →
codexgpt-5.5✓ resolved59GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved42GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved45GOOD_SUCCESS, view →
fedavg-federated-noniid-mnist · Data Science
claude-codeclaude-opus-4-8✗ failed99GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved118GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed97GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved108GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed65BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed83GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed164GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed66GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed39GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed124GOOD_FAILURE, view →
fix-broken-appsync-graphql-cognito-resolver-cache-leak · Cyber Security
claude-codeclaude-opus-4-7✗ failed137BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved118GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed88GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed92BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed135GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed157BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed152GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed113GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed61GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed166GOOD_FAILURE, view →
game-of-life-step · Game
claude-codeclaude-haiku-4-5✓ resolved8GOOD_SUCCESS$0.064 view →
claude-codeclaude-haiku-4-5✓ resolved9GOOD_SUCCESS$0.071 view →
claude-codeclaude-haiku-4-5✓ resolved7GOOD_SUCCESS$0.062 view →
claude-codeclaude-haiku-4-5✓ resolved9GOOD_SUCCESS$0.074 view →
claude-codeclaude-haiku-4-5✓ resolved8GOOD_SUCCESS$0.064 view →
claude-codeclaude-haiku-4-5✓ resolved7GOOD_SUCCESS$0.063 view →
claude-codeclaude-haiku-4-5✓ resolved10GOOD_SUCCESS$0.076 view →
claude-codeclaude-haiku-4-5✓ resolved9GOOD_SUCCESS$0.069 view →
claude-codeclaude-haiku-4-5✓ resolved9GOOD_SUCCESS$0.074 view →
claude-codeclaude-haiku-4-5✓ resolved10GOOD_SUCCESS$0.076 view →
codexgpt-5.5✓ resolved30GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved28GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved22GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved32GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved18GOOD_SUCCESS, view →
glue-etl-catalog-security-configuration-kms · Cyber Security
claude-codeclaude-opus-4-7✗ failed24GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed76GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved78GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed86HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed31BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved93GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed36GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed79GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved100GOOD_SUCCESS, view →
heat1d-conduction-solver · Mechanical Engineering
claude-codeclaude-opus-4-8✓ resolved14GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed8GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed8GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed8GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved26GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved17GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed8GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed8GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed8GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed8GOOD_FAILURE, view →
iam-cross-account-externalid-sourcearn · Cloud Operations
claude-codeclaude-opus-4-7✓ resolved12GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved8GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed9BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved6GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed11BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed13BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed36BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed11HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed13BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed18BAD_FAILURE, view →
iam-permissions-boundary-ceiling · Cyber Security
claude-codeclaude-opus-4-7✗ failed75GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed75GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed86GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed42GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed55GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed2HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed69BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved63GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed32GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved44GOOD_SUCCESS, view →
iam-revoke-older-sessions · Cloud Operations
claude-codeclaude-opus-4-7✗ failed55GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed57GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed48BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved34GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed41GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved50GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed61BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved46GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed34BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved43GOOD_SUCCESS, view →
iam-session-tag-tenant-scope · Cloud Operations
claude-codeclaude-opus-4-7✗ failed83GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved59BAD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed31BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed51GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed55BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed61GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved41GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved66GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed36GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed43GOOD_FAILURE, view →
idempotency-middleware · Software Engineering
claude-codeclaude-opus-4-8✓ resolved23GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved29GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved19GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved27GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed23GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed19GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed21GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed22GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed29GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed19GOOD_FAILURE, view →
ipl-toss-impact-analysis-r · Product Data Science
claude-codeclaude-opus-4-8✓ resolved88GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved91GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed90GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved75GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved86GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed89BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed85BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed75BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed97GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed87BAD_FAILURE, view →
klauspost-compress-1115 · Debugging
codexgpt-5.5✓ resolved59GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved89GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved72GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved57GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved64GOOD_SUCCESS, view →
ks-equation-1d-forecast · Scientific ML
claude-codeclaude-opus-4-8✓ resolved168GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed164GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed187GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved192GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed129GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed171GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed182GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved143GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved172GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed201GOOD_FAILURE, view →
lending-club-lgd-bias-correction-r · Data Science: Robustness
claude-codeclaude-opus-4-8✗ failed50GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed30GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed40GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved40GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved47GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed32GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed51BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed43BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved47GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved37GOOD_SUCCESS, view →
lru-cache · Software Engineering
claude-codeclaude-haiku-4-5✓ resolved9GOOD_SUCCESS$0.212 view →
claude-codeclaude-haiku-4-5✓ resolved13HARNESS_ERROR$0.202 view →
claude-codeclaude-haiku-4-5✓ resolved14GOOD_SUCCESS$0.258 view →
claude-codeclaude-haiku-4-5✓ resolved15GOOD_SUCCESS$0.266 view →
claude-codeclaude-haiku-4-5✓ resolved8GOOD_SUCCESS$0.162 view →
claude-codeclaude-haiku-4-5✓ resolved7GOOD_SUCCESS$0.185 view →
claude-codeclaude-haiku-4-5✓ resolved16GOOD_SUCCESS$0.202 view →
claude-codeclaude-haiku-4-5✗ failed12GOOD_FAILURE$0.235 view →
claude-codeclaude-haiku-4-5✓ resolved13GOOD_SUCCESS$0.218 view →
claude-codeclaude-haiku-4-5✓ resolved12GOOD_SUCCESS$0.205 view →
codexgpt-5.5✓ resolved32GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved30HARNESS_ERROR, view →
codexgpt-5.5✓ resolved20GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved30GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved44GOOD_SUCCESS, view →
neonatal-drug-exposure-nlme · Pharmacometrics
claude-codeclaude-opus-4-8✓ resolved127GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed175GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved161GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed169BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved102GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed187GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed182HARNESS_ERROR, view →
claude-codeclaude-opus-4-8✗ failed77HARNESS_ERROR, view →
claude-codeclaude-opus-4-8✗ failed176BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved108GOOD_SUCCESS, view →
occ-conditional-store · Software Engineering
claude-codeclaude-opus-4-8✗ failed10GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved9GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved10GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved9GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved10GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved9GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved9GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved9GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved8GOOD_SUCCESS, view →
open-drain-command-engine · Electrical Engineering
codexgpt-5.5✓ resolved9GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved16BAD_SUCCESS, view →
codexgpt-5.5✓ resolved10GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved6GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved8GOOD_SUCCESS, view →
codexgpt-5.5✗ failed23HARNESS_ERROR, view →
codexgpt-5.5✓ resolved13GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved9GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved14GOOD_SUCCESS, view →
codexgpt-5.5✗ failed7GOOD_FAILURE, view →
pipeflow-colebrook-solver · Mechanical Engineering
claude-codeclaude-opus-4-8✗ failed20GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved20GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed24GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved27GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed24GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed19GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved23GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed19GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved23GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed22GOOD_FAILURE, view →
product-recall-stock-price-event · Data Science
claude-codeclaude-opus-4-8✗ failed160GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved216GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed157GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved158GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed178GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed144GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed225GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved159GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed159GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed188GOOD_FAILURE, view →
projectile-drag-integrator · Mechanical Engineering
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed17GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed13GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed16GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed13GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed14HARNESS_ERROR, view →
claude-codeclaude-opus-4-8✗ failed11GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed13GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
quaternion-rotation-integrator · Mechanical Engineering
claude-codeclaude-opus-4-8✗ failed11GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed11GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed10GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed14GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed13GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed8GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed10GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed10GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed9GOOD_FAILURE, view →
rate-limiter · Software Engineering
claude-codeclaude-opus-4-8✗ failed21BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed18GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed36BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved15BAD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed20BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed18GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed20GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed15BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed11BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved18GOOD_SUCCESS, view →
resilient-http-client · Software Engineering
claude-codeclaude-opus-4-8✗ failed14BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed13GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed14GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed13GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed0HARNESS_ERROR, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed13GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed6GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed12GOOD_FAILURE, view →
rk4-orbit-integrator · Mechanical Engineering
claude-codeclaude-opus-4-8✓ resolved15GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved12GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved10GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved13GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved12GOOD_SUCCESS, view →
rust-lang-semver-305 · Debugging
codexgpt-5.5✓ resolved79GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved51GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved64GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved56GOOD_SUCCESS, view →
codexgpt-5.5✓ resolved82GOOD_SUCCESS, view →
s3-lambda-ddb-pipeline · Cloud Operations
claude-codeclaude-opus-4-7✗ failed7HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed6HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✓ resolved7GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed9BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed7BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed9HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed7GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed8GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed8HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✓ resolved8GOOD_SUCCESS, view →
s3-sqs-image-pipeline-kms · Cloud Operations
claude-codeclaude-opus-4-7✗ failed42GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved32GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed35BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed34BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed36BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed35GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed0HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed33GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed11GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved52GOOD_SUCCESS, view →
secrets-rotation-kms · Cloud Operations
claude-codeclaude-opus-4-7✗ failed19GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved25GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed16BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed13GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved28GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved10GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved19GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved12HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✓ resolved14GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed13GOOD_FAILURE, view →
session-token-verify · Software Engineering
claude-codeclaude-opus-4-8✓ resolved19GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved10GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved18GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved19GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved17GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved13GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved18GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved13GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved12GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved20GOOD_SUCCESS, view →
sfn-saga-compensation-orchestrator · Cloud Operations
claude-codeclaude-opus-4-7✓ resolved36GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✓ resolved23GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed20BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved20GOOD_SUCCESS, view →
claude-codeclaude-opus-4-7✗ failed40HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✓ resolved34HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✗ failed38BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed34BAD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed19BAD_FAILURE, view →
claude-codeclaude-opus-4-7✓ resolved45GOOD_SUCCESS, view →
sfn-secrets-rotation-chain · Cloud Operations
claude-codeclaude-opus-4-7✗ failed17GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed23GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed34GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed32GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed44GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed56GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed48GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed28GOOD_FAILURE, view →
claude-codeclaude-opus-4-7✗ failed42HARNESS_ERROR, view →
claude-codeclaude-opus-4-7✓ resolved45BAD_SUCCESS, view →
simjeb-bracket-fea-mass-prediction-real · Scientific ML
claude-codeclaude-opus-4-8✗ failed314GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed349GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved316GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed216GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed324GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed237GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed280GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed276GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved333GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed282GOOD_FAILURE, view →
task_0026_codex_camera_shake_rig · Game
claude-codeclaude-sonnet-4-6✓ resolved0GOOD_SUCCESS, view →
claude-codeclaude-sonnet-4-6✗ failed0GOOD_FAILURE, view →
claude-codeclaude-sonnet-4-6✗ failed5BAD_FAILURE, view →
claude-codeclaude-sonnet-4-6✗ failed5BAD_FAILURE, view →
claude-codeclaude-sonnet-4-6✗ failed5HARNESS_ERROR, view →
claude-codeclaude-sonnet-4-6✗ failed0HARNESS_ERROR, view →
claude-codeclaude-sonnet-4-6✓ resolved0BAD_SUCCESS, view →
claude-codeclaude-sonnet-4-6✗ failed0HARNESS_ERROR, view →
claude-codeclaude-sonnet-4-6✓ resolved0GOOD_SUCCESS, view →
claude-codeclaude-sonnet-4-6✓ resolved0HARNESS_ERROR, view →
task_0040_day_night_cycle_controller · Game
claude-codeclaude-opus-4-8✗ failed64GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed73GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed60GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed58GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed56GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed64GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed57GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved56GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed65GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed66BAD_FAILURE, view →
task_0053_car_scene_assembly · Game
claude-codeclaude-opus-4-8✗ failed59GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved45GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved52GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved60BAD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed44GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved72BAD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved60GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed43BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed23GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved70GOOD_SUCCESS, view →
task_0054_assemble_crusader_animatedsprite · Game
claude-codeclaude-sonnet-4-6✓ resolved22GOOD_SUCCESS, view →
claude-codeclaude-sonnet-4-6✓ resolved20GOOD_SUCCESS, view →
claude-codeclaude-sonnet-4-6✓ resolved21GOOD_SUCCESS, view →
claude-codeclaude-sonnet-4-6✗ failed25GOOD_FAILURE, view →
claude-codeclaude-sonnet-4-6✗ failed12GOOD_FAILURE, view →
claude-codeclaude-sonnet-4-6✗ failed25GOOD_FAILURE, view →
claude-codeclaude-sonnet-4-6✗ failed17GOOD_FAILURE, view →
claude-codeclaude-sonnet-4-6✗ failed21GOOD_FAILURE, view →
claude-codeclaude-sonnet-4-6✓ resolved27BAD_SUCCESS, view →
claude-codeclaude-sonnet-4-6✗ failed19BAD_FAILURE, view →
task_0108_track_particles_driven_by_tile_type_gradient · Game
claude-codeclaude-sonnet-4-6✓ resolved0GOOD_SUCCESS, view →
claude-codeclaude-sonnet-4-6✗ failed0HARNESS_ERROR, view →
claude-codeclaude-sonnet-4-6✗ failed9BAD_FAILURE, view →
claude-codeclaude-sonnet-4-6✗ failed0HARNESS_ERROR, view →
claude-codeclaude-sonnet-4-6✓ resolved0BAD_SUCCESS, view →
claude-codeclaude-sonnet-4-6✗ failed6GOOD_FAILURE, view →
claude-codeclaude-sonnet-4-6✗ failed0HARNESS_ERROR, view →
claude-codeclaude-sonnet-4-6✗ failed0HARNESS_ERROR, view →
claude-codeclaude-sonnet-4-6✗ failed0HARNESS_ERROR, view →
claude-codeclaude-sonnet-4-6✗ failed9BAD_FAILURE, view →
task_0131_minimap_ui_complex · Game
claude-codeclaude-opus-4-8✗ failed29GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed26GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed28BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed36GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed25GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed27GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed26BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed29BAD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed28GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed34BAD_FAILURE, view →
task_0132_minimap_marker_logic_complex · Game
claude-codeclaude-opus-4-8✓ resolved65GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed42BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved56GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed45GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed50GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed53GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved59GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed45GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved57GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed44GOOD_FAILURE, view →
task_9001_checkpoint_system · Game
claude-codeclaude-opus-4-8✗ failed14GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved41GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved42GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed17HARNESS_ERROR, view →
claude-codeclaude-opus-4-8✗ failed15GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved38GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed17GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved46GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed17GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed27GOOD_FAILURE, view →
task_9002_combo_score_system · Game
claude-codeclaude-opus-4-8✓ resolved37GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed16GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved25GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved40GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved32GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed18GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed16BAD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved45GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed16GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved45GOOD_SUCCESS, view →
truss2d-solver · Mechanical Engineering
claude-codeclaude-opus-4-8✓ resolved19GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed21GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved26GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed41GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved20GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved19GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved20GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✗ failed23GOOD_FAILURE, view →
claude-codeclaude-opus-4-8✗ failed24GOOD_FAILURE, view →
window-aggregate-store · Software Engineering
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved9GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved11GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved8GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved9GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved14GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved8GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved10GOOD_SUCCESS, view →
claude-codeclaude-opus-4-8✓ resolved16GOOD_SUCCESS, view →