Bracket Calculator Validation Report
We validate the ScrollVault Commander Bracket Calculator against an authoritative reference set of 27 decks: WotC-published Commander precons (Brackets 1–3) and community-canon cEDH archetypes (Bracket 5). Every reference deck has a clickable source URL in the table below — you can audit every entry yourself.
Headline accuracy
Bracket-in-range
Bracket-±1
Bracket-exact
Power-in-range
Methodology
Reference deck sourcing
Every reference deck has a public source URL. We derive bracket assignments only from authoritative sources:
- WotC official precons — decklists from MTGJSON's canonical deck data, secondary URLs to WotC's announcements. Bracket assignment per WotC's Commander Brackets Beta framework: stock precons without Game Changers fall in B1–B2 (boundary fuzzy by design); stock precons with Game Changers are forced to B3 floor.
- cEDH archetype canonicals — sourced from the cEDH Decklist Database, which curates competitive-tier decks via community submission + curator review. By definition, any cEDH archetype is Bracket 5 per WotC's framework. The decklists are representative archetype lists, not specific tournament copies.
Audit methodology
We cross-checked every precon's mainboard against the bracket calculator's 53-card Game Changers list (stored in /tools/commander-bracket/bracket.js's GAME_CHANGERS constant, mirroring the WotC Feb 2026 update). One precon, AbzanArmor (Tarkir Dragonstorm Commander), contains Seedborn Muse, which is on the GC list. Per WotC's framework, any Game Changer forces a Bracket 3 floor — so AbzanArmor's expected_bracket = 3, not B1. This is documented in the reference data and matches the calculator's verdict.
cEDH provenance chain
Every cEDH reference deck is sourced via a two-link chain: cEDH Decklist Database (community-curated tier list of cEDH archetypes) → linked Moxfield primer (community-vetted decklist for that archetype). We fetched the canonical Moxfield decklist via api2.moxfield.com/v3/decks/all/<id> on 2026-05-06 and confirmed each list is exactly 100 cards. Each row's source ↗ link goes to the human-readable Moxfield primer page; you can verify the decklist is identical to ours.
Pass criteria
For each deck, we record three bracket-accuracy criteria:
- Bracket-in-range — predicted bracket ∈ [
expected_bracket_min,expected_bracket_max]. Primary metric. WotC's B1/B2 boundary is intentionally fuzzy, so stock precons get [1,2] range. - Bracket-±1 — predicted within 1 of
expected_bracketmidpoint. Secondary metric reported for comparability with industry tools (ScryCheck reports 80% bracket-exact, 92% bracket-±1). - Bracket-exact — predicted ===
expected_bracketmidpoint. Strictest. Affected by the inherent fuzziness of WotC's framework on stock precons. - Power-in-range — predicted power level ∈ [
expected_power_min,expected_power_max]. Independent check on the engine's continuous output.
Per-bracket accuracy
| Expected | N | In-range | Within-1 | Exact | Power-in-range |
|---|---|---|---|---|---|
| B1 | 5 | 5/5 (100%) | 5/5 (100%) | 2/5 (40%) | 5/5 (100%) |
| B2 | 6 | 6/6 (100%) | 6/6 (100%) | 6/6 (100%) | 6/6 (100%) |
| B3 | 6 | 6/6 (100%) | 6/6 (100%) | 6/6 (100%) | 6/6 (100%) |
| B5 | 10 | 10/10 (100%) | 10/10 (100%) | 10/10 (100%) | 10/10 (100%) |
Confusion matrix
Rows = expected bracket; columns = predicted bracket. Diagonal = exact match.
| Pred B1 | Pred B2 | Pred B3 | Pred B4 | Pred B5 | |
|---|---|---|---|---|---|
| Exp B1 | 2 | 3 | 0 | 0 | 0 |
| Exp B2 | 0 | 6 | 0 | 0 | 0 |
| Exp B3 | 0 | 0 | 6 | 0 | 0 |
| Exp B4 | 0 | 0 | 0 | 0 | 0 |
| Exp B5 | 0 | 0 | 0 | 0 | 10 |
Engine vs frontier LLMs
Independent cross-validation: each model was given the decklist plus WotC's bracket framework and the 53-card Game Changers list, and asked to assign a bracket and power score. The same 27 reference decks were used for every column. Methodology and per-deck verdicts are in llm-validation-results.json.
| Metric | ScrollVault engine | claude-sonnet-4-6 | claude-opus-4-7 | claude-haiku-4-5-20251001 |
|---|---|---|---|---|
| Bracket-in-range | 100% (27/27) | 100% (27/27) | 100% (27/27) | 100% (27/27) |
| Bracket-±1 | 100% (27/27) | 100% (27/27) | 100% (27/27) | 100% (27/27) |
| Bracket-exact | 88.9% (24/27) | 81.5% (22/27) | 81.5% (22/27) | 81.5% (22/27) |
| Power-in-range | 100% (27/27) | 88.9% (24/27) | 77.8% (21/27) | 63% (17/27) |
Per-deck results — 27 decks
Every row links to the deck's source URL. Click "source ↗" to verify decklist + bracket assignment yourself.
| Deck ID | Name | Expected | Predicted | Verdict | Power | Power range | Tipping | Source |
|---|---|---|---|---|---|---|---|---|
wotc-precon-silverquillstatement-c21 |
Silverquill Statement | B1–B2 | B1 | ✓ in range | 1.1 | 1–5.5 | T4 | source ↗ |
wotc-precon-prismariperformance-c21 |
Prismari Performance | B1–B2 | B2 | ✓ in range | 3.7 | 1–5.5 | T5 | source ↗ |
wotc-precon-quantumquandrix-c21 |
Quantum Quandrix | B1–B2 | B2 | ✓ in range | 3.8 | 1–5.5 | T4 | source ↗ |
wotc-precon-witherbloomwitchcraft-c21 |
Witherbloom Witchcraft | B1–B2 | B1 | ✓ in range | 1.1 | 1–5.5 | T4 | source ↗ |
wotc-precon-loreholdlegacies-c21 |
Lorehold Legacies | B1–B2 | B2 | ✓ in range | 4.0 | 1–5.5 | T4 | source ↗ |
wotc-precon-abzanarmor-tdc |
Abzan Armor | B3 | B3 | ✓ in range | 6.3 | 5–7.5 | T3 | source ↗ |
wotc-precon-jeskaistriker-tdc |
Jeskai Striker | B1–B2 | B2 | ✓ in range | 4.1 | 1–5.5 | T3 | source ↗ |
wotc-precon-mardusurge-tdc |
Mardu Surge | B1–B2 | B2 | ✓ in range | 3.9 | 1–5.5 | T3 | source ↗ |
wotc-precon-sultaiarisen-tdc |
Sultai Arisen | B1–B2 | B2 | ✓ in range | 3.7 | 1–5.5 | T4 | source ↗ |
wotc-precon-temurroar-tdc |
Temur Roar | B1–B2 | B2 | ✓ in range | 3.7 | 1–5.5 | T4 | source ↗ |
wotc-precon-eternalmight-drc |
Eternal Might | B1–B2 | B2 | ✓ in range | 4.0 | 1–5.5 | T3 | source ↗ |
wotc-precon-livingenergy-drc |
Living Energy | B1–B2 | B2 | ✓ in range | 4.1 | 1–5.5 | T4 | source ↗ |
wotc-precon-counterblitzfinalfantasyx-fic |
Counter Blitz (FINAL FANTASY X) | B3 | B3 | ✓ in range | 6.7 | 5–7.5 | T3 | source ↗ |
wotc-precon-20waystowin-sld |
20 Ways to Win | B3 | B3 | ✓ in range | 6.8 | 5–7.5 | T3 | source ↗ |
wotc-precon-creativeenergy-m3c |
Creative Energy | B3 | B3 | ✓ in range | 6.5 | 5–7.5 | T4 | source ↗ |
wotc-precon-deadlydisguise-mkc |
Deadly Disguise | B3 | B3 | ✓ in range | 6.5 | 5–7.5 | T4 | source ↗ |
wotc-precon-deepcluesea-mkc |
Deep Clue Sea | B3 | B3 | ✓ in range | 6.2 | 5–7.5 | T4 | source ↗ |
cedh-kinnan-infinite-mana |
Kinnan Infinite Mana | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-thrasios-tymna-blue-farm |
Blue Farm (Thrasios+Tymna) | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-najeela-blade-blossom |
Najeela Combat Combo | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-tivit-stax |
Tivit Stax | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-rograkh-silas-turbo-naus |
Rograkh+Silas Turbo Ad Nauseam | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-kraum-tymna-breach |
Kraum+Tymna Breach (Blue Farm) | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-halana-tymna-hulk |
Halana+Tymna Flash Hulk | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-tana-tymna-turbo-naus |
Tana+Tymna Turbo Ad Nauseam | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
cedh-yuriko-tempo |
Yuriko Tempo | B5 | B5 | ✓ in range | 10.0 | 9–10 | T3 | source ↗ |
cedh-malcolm-tymna-esper-turbo |
Malcolm+Tymna Esper Turbo | B5 | B5 | ✓ in range | 10.0 | 9–10 | T2 | source ↗ |
Limits and honest framing
- Reference set is small (27 decks). This is v1 — initial seed. We're expanding incrementally. Larger sets reduce variance but don't change the fundamental accuracy story for B1 (precons) and B5 (cEDH), which are well-defined by WotC.
- B4 ("Optimized") coverage is deliberately zero. No public source — not WotC, not the cEDH Decklist Database, not any tournament site — publishes a canonical set of "Bracket 4" decks. WotC's announcement explicitly declines to provide example B4 decklists; community labels at B4 are interpretive. Rather than synthesize B4 references and weaken our "every deck has authoritative provenance" claim, we leave the gap and document the standard we'll accept: a B4 reference must (a) link to a publicly hosted decklist (Moxfield, Archidekt, MTGGoldfish), (b) carry independent corroboration of B4 status from at least two non-affiliated sources (e.g., a tournament finish + a published primer + a community tier-list), and (c) not match B5 criteria (cEDH-tier two-card combos, fast-mana density, tutor count). Until those exist for a given deck, we don't include it.
- B3 coverage is currently a single deck (AbzanArmor). Stock precons with one or more Game Changers are the cleanest authoritative path to a B3 reference. Recent precons that include GCs would expand B3 coverage; older precons predate the GC framework and aren't classified.
- cEDH decklists are canonical Moxfield primers from cEDH-DDB tier-list panels. Bracket assignment (B5) is unambiguous per WotC framework. The exact card-by-card list will vary across tournament copies — the primer is the community's reference build at
last_verifieddate. - The B1/B2 boundary is fuzzy by design. WotC's framework defines B1 ("Exhibition") and B2 ("Core") with inherent overlap — modern precons (2024+) are often stronger than pre-2023 precons of the same class. Our calculator tends to report stock precons as B2 even when WotC's framework would call them B1. The bracket-in-range metric accommodates this by accepting [1,2] for stock precons without Game Changers.
- Bracket-exact accuracy is held back by precon B1/B2 ambiguity, not engine error. When we compare predicted-bracket exactly to midpoint, the metric drops to 88.9% even though every prediction is in the legitimate range.
Reproduce these results yourself
This validation is reproducible end-to-end. From a clone of the repo:
node scripts/build-reference-decks.cjs— fetches MTGJSON precon data + cEDH archetype lists intodata/reference-decks.jsonwith full provenance metadata.node scripts/run-validation.cjs— runs each deck through the live bracket calculator via Puppeteer (defaults to staging; pass--prodfor production).node scripts/render-validation-page.cjs— regenerates this page from the latest validation results.
Expected runtime: ~2 minutes for 27 decks (~9490ms per deck on this run).
What's next
- Expand the reference set toward 250+ decks. Priority: more cEDH archetypes (B5), more recent precons (B1–B3), authoritatively-tagged B3–B4 decks (community + tournament).
- Add automated CI: re-run validation on every bracket.js change. Keep accuracy honest as the engine evolves.
Browse the full precon library
Beyond this 17-deck reference set, we've run every recent Commander precon through the same engine. Browse all 61 analyzed precons → — filter by bracket, set, or color identity. Each precon links to a full per-deck analysis with the same passport (bracket, power, Tipping Point) the calculator produces.
The methodology behind the metric
For the long-form story on how the engine produces the Tipping Point chip you see on every analysis — including the WASM Monte Carlo internals, comparison to Frank Karsten's land-count formula, and why no competing bracket calculator can replicate it — read "We Simulated 5 Million Mana Bases. Here's What We Learned About Tipping Points." →