| Agent | Model | Reward | Tools | Classification | |
|---|---|---|---|---|---|
| claude-code | claude-opus-4-7 | ✓ resolved | 103 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-7 | ✗ failed | 219 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-7 | ✗ failed | 131 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-7 | ✗ failed | 229 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-7 | ✗ failed | 182 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-7 | ✗ failed | 180 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-7 | ✓ resolved | 220 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-7 | ✗ failed | 179 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-7 | ✗ failed | 208 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-7 | ✓ resolved | 179 | GOOD_SUCCESS | view → |
The task as the agent saw it and the verifier graded it. Files under solution/ are sealed.