SyncValsverifier → artifact → classifier → verdict
SyncVals · Task

apigw-http-api-jwt-authorizer-lambda-integration

Cyber Security3/10 resolved↑ All tasks
pass@130.0%
pass@k100.0% @10
resolved3/10
dominantBAD_FAILURE
During each run the agent saw only the starter workspace, tests/ and solution/ were withheld and restored only for grading. The solution/ answer key below is sealed to keep this task usable as a benchmark.
Runs (10) , every recorded attempt at this task
AgentModelRewardToolsClassification
claude-codeclaude-opus-4-7✓ resolved103GOOD_SUCCESSview →
claude-codeclaude-opus-4-7✗ failed219BAD_FAILUREview →
claude-codeclaude-opus-4-7✗ failed131BAD_FAILUREview →
claude-codeclaude-opus-4-7✗ failed229BAD_FAILUREview →
claude-codeclaude-opus-4-7✗ failed182BAD_FAILUREview →
claude-codeclaude-opus-4-7✗ failed180BAD_FAILUREview →
claude-codeclaude-opus-4-7✓ resolved220GOOD_SUCCESSview →
claude-codeclaude-opus-4-7✗ failed179BAD_FAILUREview →
claude-codeclaude-opus-4-7✗ failed208GOOD_FAILUREview →
claude-codeclaude-opus-4-7✓ resolved179GOOD_SUCCESSview →
Task files , browse the workspace, grader & sealed answer key
Select a file
Select a file from the tree to view it.

The task as the agent saw it and the verifier graded it. Files under solution/ are sealed.