| Agent | Model | Reward | Tools | Classification | |
|---|---|---|---|---|---|
| claude-code | claude-opus-4-8 | ✗ failed | 34 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 47 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 25 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 19 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 20 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 26 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 49 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 21 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 31 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 45 | GOOD_FAILURE | view → |
The task as the agent saw it and the verifier graded it. Files under solution/ are sealed.