Bracket Calculator Validation Report

We validate the ScrollVault Commander Bracket Calculator against an authoritative reference set of 27 decks: WotC-published Commander precons (Brackets 1–3) and community-canon cEDH archetypes (Bracket 5). Every reference deck has a clickable source URL in the table below — you can audit every entry yourself.

Last run: against https://staging.scrollvault.net · 27 decks tested · run time 285.5s · avg 9490ms per analysis

Headline accuracy

100%

Bracket-in-range

27/27 decks fall within their expected bracket range
100%

Bracket-±1

27/27 decks within 1 bracket of expected midpoint
88.9%

Bracket-exact

24/27 match exact midpoint (B1/B2 boundary is fuzzy by WotC's framework)
100%

Power-in-range

27/27 predicted power scores fall in the deck's expected range

Methodology

Reference deck sourcing

Every reference deck has a public source URL. We derive bracket assignments only from authoritative sources:

  • WotC official precons — decklists from MTGJSON's canonical deck data, secondary URLs to WotC's announcements. Bracket assignment per WotC's Commander Brackets Beta framework: stock precons without Game Changers fall in B1–B2 (boundary fuzzy by design); stock precons with Game Changers are forced to B3 floor.
  • cEDH archetype canonicals — sourced from the cEDH Decklist Database, which curates competitive-tier decks via community submission + curator review. By definition, any cEDH archetype is Bracket 5 per WotC's framework. The decklists are representative archetype lists, not specific tournament copies.

Audit methodology

We cross-checked every precon's mainboard against the bracket calculator's 53-card Game Changers list (stored in /tools/commander-bracket/bracket.js's GAME_CHANGERS constant, mirroring the WotC Feb 2026 update). One precon, AbzanArmor (Tarkir Dragonstorm Commander), contains Seedborn Muse, which is on the GC list. Per WotC's framework, any Game Changer forces a Bracket 3 floor — so AbzanArmor's expected_bracket = 3, not B1. This is documented in the reference data and matches the calculator's verdict.

cEDH provenance chain

Every cEDH reference deck is sourced via a two-link chain: cEDH Decklist Database (community-curated tier list of cEDH archetypes) → linked Moxfield primer (community-vetted decklist for that archetype). We fetched the canonical Moxfield decklist via api2.moxfield.com/v3/decks/all/<id> on 2026-05-06 and confirmed each list is exactly 100 cards. Each row's source ↗ link goes to the human-readable Moxfield primer page; you can verify the decklist is identical to ours.

Pass criteria

For each deck, we record three bracket-accuracy criteria:

  • Bracket-in-range — predicted bracket ∈ [expected_bracket_min, expected_bracket_max]. Primary metric. WotC's B1/B2 boundary is intentionally fuzzy, so stock precons get [1,2] range.
  • Bracket-±1 — predicted within 1 of expected_bracket midpoint. Secondary metric reported for comparability with industry tools (ScryCheck reports 80% bracket-exact, 92% bracket-±1).
  • Bracket-exact — predicted === expected_bracket midpoint. Strictest. Affected by the inherent fuzziness of WotC's framework on stock precons.
  • Power-in-range — predicted power level ∈ [expected_power_min, expected_power_max]. Independent check on the engine's continuous output.

Per-bracket accuracy

ExpectedNIn-rangeWithin-1ExactPower-in-range
B155/5 (100%)5/5 (100%)2/5 (40%)5/5 (100%)
B266/6 (100%)6/6 (100%)6/6 (100%)6/6 (100%)
B366/6 (100%)6/6 (100%)6/6 (100%)6/6 (100%)
B51010/10 (100%)10/10 (100%)10/10 (100%)10/10 (100%)

Confusion matrix

Rows = expected bracket; columns = predicted bracket. Diagonal = exact match.

Pred B1Pred B2Pred B3Pred B4Pred B5
Exp B123000
Exp B206000
Exp B300600
Exp B400000
Exp B5000010

Engine vs frontier LLMs

Independent cross-validation: each model was given the decklist plus WotC's bracket framework and the 53-card Game Changers list, and asked to assign a bracket and power score. The same 27 reference decks were used for every column. Methodology and per-deck verdicts are in llm-validation-results.json.

MetricScrollVault engineclaude-sonnet-4-6claude-opus-4-7claude-haiku-4-5-20251001
Bracket-in-range100% (27/27)100% (27/27)100% (27/27)100% (27/27)
Bracket-±1100% (27/27)100% (27/27)100% (27/27)100% (27/27)
Bracket-exact88.9% (24/27)81.5% (22/27)81.5% (22/27)81.5% (22/27)
Power-in-range100% (27/27)88.9% (24/27)77.8% (21/27)63% (17/27)

Run timestamp: · Models: claude-sonnet-4-6, claude-opus-4-7, claude-haiku-4-5-20251001.

Per-deck results — 27 decks

Every row links to the deck's source URL. Click "source ↗" to verify decklist + bracket assignment yourself.

Deck IDNameExpectedPredictedVerdictPowerPower rangeTippingSource
wotc-precon-silverquillstatement-c21 Silverquill Statement B1–B2 B1 ✓ in range 1.1 1–5.5 T4 source ↗
wotc-precon-prismariperformance-c21 Prismari Performance B1–B2 B2 ✓ in range 3.7 1–5.5 T5 source ↗
wotc-precon-quantumquandrix-c21 Quantum Quandrix B1–B2 B2 ✓ in range 3.8 1–5.5 T4 source ↗
wotc-precon-witherbloomwitchcraft-c21 Witherbloom Witchcraft B1–B2 B1 ✓ in range 1.1 1–5.5 T4 source ↗
wotc-precon-loreholdlegacies-c21 Lorehold Legacies B1–B2 B2 ✓ in range 4.0 1–5.5 T4 source ↗
wotc-precon-abzanarmor-tdc Abzan Armor B3 B3 ✓ in range 6.3 5–7.5 T3 source ↗
wotc-precon-jeskaistriker-tdc Jeskai Striker B1–B2 B2 ✓ in range 4.1 1–5.5 T3 source ↗
wotc-precon-mardusurge-tdc Mardu Surge B1–B2 B2 ✓ in range 3.9 1–5.5 T3 source ↗
wotc-precon-sultaiarisen-tdc Sultai Arisen B1–B2 B2 ✓ in range 3.7 1–5.5 T4 source ↗
wotc-precon-temurroar-tdc Temur Roar B1–B2 B2 ✓ in range 3.7 1–5.5 T4 source ↗
wotc-precon-eternalmight-drc Eternal Might B1–B2 B2 ✓ in range 4.0 1–5.5 T3 source ↗
wotc-precon-livingenergy-drc Living Energy B1–B2 B2 ✓ in range 4.1 1–5.5 T4 source ↗
wotc-precon-counterblitzfinalfantasyx-fic Counter Blitz (FINAL FANTASY X) B3 B3 ✓ in range 6.7 5–7.5 T3 source ↗
wotc-precon-20waystowin-sld 20 Ways to Win B3 B3 ✓ in range 6.8 5–7.5 T3 source ↗
wotc-precon-creativeenergy-m3c Creative Energy B3 B3 ✓ in range 6.5 5–7.5 T4 source ↗
wotc-precon-deadlydisguise-mkc Deadly Disguise B3 B3 ✓ in range 6.5 5–7.5 T4 source ↗
wotc-precon-deepcluesea-mkc Deep Clue Sea B3 B3 ✓ in range 6.2 5–7.5 T4 source ↗
cedh-kinnan-infinite-mana Kinnan Infinite Mana B5 B5 ✓ in range 10.0 9–10 T2 source ↗
cedh-thrasios-tymna-blue-farm Blue Farm (Thrasios+Tymna) B5 B5 ✓ in range 10.0 9–10 T2 source ↗
cedh-najeela-blade-blossom Najeela Combat Combo B5 B5 ✓ in range 10.0 9–10 T2 source ↗
cedh-tivit-stax Tivit Stax B5 B5 ✓ in range 10.0 9–10 T2 source ↗
cedh-rograkh-silas-turbo-naus Rograkh+Silas Turbo Ad Nauseam B5 B5 ✓ in range 10.0 9–10 T2 source ↗
cedh-kraum-tymna-breach Kraum+Tymna Breach (Blue Farm) B5 B5 ✓ in range 10.0 9–10 T2 source ↗
cedh-halana-tymna-hulk Halana+Tymna Flash Hulk B5 B5 ✓ in range 10.0 9–10 T2 source ↗
cedh-tana-tymna-turbo-naus Tana+Tymna Turbo Ad Nauseam B5 B5 ✓ in range 10.0 9–10 T2 source ↗
cedh-yuriko-tempo Yuriko Tempo B5 B5 ✓ in range 10.0 9–10 T3 source ↗
cedh-malcolm-tymna-esper-turbo Malcolm+Tymna Esper Turbo B5 B5 ✓ in range 10.0 9–10 T2 source ↗

Limits and honest framing

  • Reference set is small (27 decks). This is v1 — initial seed. We're expanding incrementally. Larger sets reduce variance but don't change the fundamental accuracy story for B1 (precons) and B5 (cEDH), which are well-defined by WotC.
  • B4 ("Optimized") coverage is deliberately zero. No public source — not WotC, not the cEDH Decklist Database, not any tournament site — publishes a canonical set of "Bracket 4" decks. WotC's announcement explicitly declines to provide example B4 decklists; community labels at B4 are interpretive. Rather than synthesize B4 references and weaken our "every deck has authoritative provenance" claim, we leave the gap and document the standard we'll accept: a B4 reference must (a) link to a publicly hosted decklist (Moxfield, Archidekt, MTGGoldfish), (b) carry independent corroboration of B4 status from at least two non-affiliated sources (e.g., a tournament finish + a published primer + a community tier-list), and (c) not match B5 criteria (cEDH-tier two-card combos, fast-mana density, tutor count). Until those exist for a given deck, we don't include it.
  • B3 coverage is currently a single deck (AbzanArmor). Stock precons with one or more Game Changers are the cleanest authoritative path to a B3 reference. Recent precons that include GCs would expand B3 coverage; older precons predate the GC framework and aren't classified.
  • cEDH decklists are canonical Moxfield primers from cEDH-DDB tier-list panels. Bracket assignment (B5) is unambiguous per WotC framework. The exact card-by-card list will vary across tournament copies — the primer is the community's reference build at last_verified date.
  • The B1/B2 boundary is fuzzy by design. WotC's framework defines B1 ("Exhibition") and B2 ("Core") with inherent overlap — modern precons (2024+) are often stronger than pre-2023 precons of the same class. Our calculator tends to report stock precons as B2 even when WotC's framework would call them B1. The bracket-in-range metric accommodates this by accepting [1,2] for stock precons without Game Changers.
  • Bracket-exact accuracy is held back by precon B1/B2 ambiguity, not engine error. When we compare predicted-bracket exactly to midpoint, the metric drops to 88.9% even though every prediction is in the legitimate range.

Reproduce these results yourself

This validation is reproducible end-to-end. From a clone of the repo:

  1. node scripts/build-reference-decks.cjs — fetches MTGJSON precon data + cEDH archetype lists into data/reference-decks.json with full provenance metadata.
  2. node scripts/run-validation.cjs — runs each deck through the live bracket calculator via Puppeteer (defaults to staging; pass --prod for production).
  3. node scripts/render-validation-page.cjs — regenerates this page from the latest validation results.

Expected runtime: ~2 minutes for 27 decks (~9490ms per deck on this run).

What's next

  • Expand the reference set toward 250+ decks. Priority: more cEDH archetypes (B5), more recent precons (B1–B3), authoritatively-tagged B3–B4 decks (community + tournament).
  • Add automated CI: re-run validation on every bracket.js change. Keep accuracy honest as the engine evolves.

Browse the full precon library

Beyond this 17-deck reference set, we've run every recent Commander precon through the same engine. Browse all 61 analyzed precons → — filter by bracket, set, or color identity. Each precon links to a full per-deck analysis with the same passport (bracket, power, Tipping Point) the calculator produces.

The methodology behind the metric

For the long-form story on how the engine produces the Tipping Point chip you see on every analysis — including the WASM Monte Carlo internals, comparison to Frank Karsten's land-count formula, and why no competing bracket calculator can replicate it — read "We Simulated 5 Million Mana Bases. Here's What We Learned About Tipping Points." →