| Agent | Model | Reward | Tools | Classification | |
|---|---|---|---|---|---|
| claude-code | claude-opus-4-8 | ✗ failed | 11 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 13 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 14 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 13 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 17 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 11 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 16 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 14 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 14 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 14 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 15 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 16 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 12 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 14 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 12 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 15 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 12 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 16 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 10 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 12 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 15 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 16 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 12 | BAD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 10 | BAD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 15 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 16 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✗ failed | 12 | GOOD_FAILURE | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 12 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 11 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 12 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 15 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 14 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 16 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 17 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 16 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 16 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 14 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 10 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 16 | GOOD_SUCCESS | view → |
| claude-code | claude-opus-4-8 | ✓ resolved | 12 | GOOD_SUCCESS | view → |
The task as the agent saw it and the verifier graded it. Files under solution/ are sealed.