Lattice AI — Engineering Deep Dive

TL;DR

This project treats AI quality as a consequence of environment design, evaluation rigor, and training signal quality. The simulator provides deterministic ground truth. The corpus pipeline turns heterogeneous expert knowledge into semantically atomic JSONL. The long-term policy loop learns against real outcomes, not weak proxies. Example training records appear later in this article.

The strongest present-tense artifact is the domain training pipeline: curated rules, expert gameplay insights, strategy articles, and card-specific research transformed into a canonical schema designed to support both human-curated knowledge and future simulator-generated preference data.

Current Dataset Snapshot

The corpus is intentionally engineered rather than scraped. Strategy articles, expert gameplay analysis, rules references, and card-level research are curated, reviewed, and normalized into a shared schema so the resulting dataset supports retrieval, supervised learning, and future preference generation without changing the data model.

Strategy Articles

271 curated articles from The EPIC Storm website

Expert Gameplay Review

30+ hours of Bryant Cook gameplay analyzed with Gemini extraction + manual verification

Card Research Corpora

24 curated card-focused deep research corpora

Rules Corpus

Curated subset of comprehensive Magic rules reduced to deck-relevant mechanics

Normalization Target

Canonical shared schema across heterogeneous knowledge sources

Training Output

Compact semantically atomic JSONL records with one clear concept per entry

A decision space where each choice can collapse entire strategies

In high-level play, small resource differences completely change which lines are valid. A single land drop, an extra artifact, or the availability of a discard-compatible payoff can transform a position from dead to winning.

In The EPIC Storm, the system must reason about mana colors, storm count, graveyard accessibility, tutor chains, conditional sequencing, and interactions like:

Echo of Eons reshaping the future decision surface via a seven-card redraw
Lion’s Eye Diamond creating mana while invalidating hand-based lines
Chrome Mox forcing color-specific imprint decisions with opportunity cost
Fetch lands + dual lands expanding color reach through conditional search space
Tendrils of Agony making sequencing quality inseparable from final outcome

That makes the domain useful for AI engineering for the same reason many toy benchmarks are not: success depends on structured reasoning under hard constraints, not just language fluency.

The result is a concrete environment where the central question is: given state S, does decision sequence A convert more reliably than decision sequence B?

This is the key framing shift: the project is not “an LLM for a card game.” It is an evaluation-driven decision system built inside a brutally constrained, high-branching environment that makes good measurement non-optional.

Branching explosion in real decision spaces

The central difficulty of this domain is not rule complexity alone — it is the combinatorial explosion of legal decision sequences. Even a modest mid-game position can produce dozens of legal actions, each of which changes the available future action space.

Initial state S₀ │ │ Battlefield: Lotus Petal │ Hand: Brainstorm, Gamble, Beseech the Mirror, Dark Ritual, Chrome Mox, Echo of Eons │ ├─ crack Lotus Petal → generate BLUE │ │ │ └─ cast Brainstorm │ │ │ ├─ draw 3: Lion's Eye Diamond, Scalding Tarn, & Mox Opal │ │ Hand after draw: │ A: Gamble │ B: Beseech the Mirror │ C: Dark Ritual │ D: Chrome Mox │ E: Echo of Eons │ F: Lion's Eye Diamond │ G: Scalding Tarn │ H: Mox Opal │ │ (choose any 2 cards from {A–H} to put back; 28 branches) │ │ ├─ put back A & {B C D E F G H} │ ├─ put back B & {C D E F G H} │ ├─ put back C & {D E F G H} │ ├─ put back D & {E F G H} │ ├─ put back E & {F G H} │ ├─ put back F & {G H} │ └─ put back G & {H} │ │ (all branches continue with updated hand states) │ │ │ │ └─ if Scalding Tarn (G) ∉ {X,Y} │ │ │ ├─ play Scalding Tarn │ │ ├─ do not crack │ │ │ └─ continue with known top cards (X,Y) │ │ │ │ │ └─ crack Scalding Tarn │ │ ├─ fetch Badlands │ │ ├─ fetch Underground Sea │ │ └─ fetch Taiga │ │ │ │ shuffle library (destroy Brainstorm information) │ │ │ │ effect: │ │ ├─ removes known top cards (X,Y) │ │ └─ randomizes next draws │ │ │ └─ proceed with updated state │ ├─ crack Lotus Petal → generate RED │ │ │ └─ cast Gamble │ │ │ └─ search for Lion's Eye Diamond │ │ │ ├─ discard Brainstorm │ ├─ discard Beseech the Mirror │ ├─ discard Dark Ritual │ ├─ discard Chrome Mox │ ├─ discard Echo of Eons │ └─ discard Lion's Eye Diamond → DEAD END │ │ (remaining branches continue with LED in hand) │ │ │ │ ├─ cast Lion's Eye Diamond │ │ └─ crack Lion's Eye Diamond → generate 3 BLUE │ │ └─ cast Echo of Eons → shuffle graveyard & hand into library and draw 7 cards │ │ │ └─ pass the turn │ ├─ crack Lotus Petal → generate BLACK │ │ │ └─ cast Dark Ritual │ │ │ └─ cast Chrome Mox │ │ │ ├─ imprint Brainstorm │ ├─ imprint Gamble │ ├─ imprint Echo of Eons │ └─ imprint Beseech the Mirror → DEAD END │ │ │ │ └─ remaining branches may cast │ │ │ └─ Beseech the Mirror │ │ │ ├─ search & cast Gaea's Will (Recursive Engine) │ ├─ search & cast Song of Creation (Draw Engine) │ └─ search & cast Tendrils of Agony (Win Condition) │ └─ pass the turn

Each branch produces a new state with a different mana pool, card availability, and future line feasibility. Within only a few turns the total number of reachable states becomes enormous.

The role of the neural policy is not to replace the simulator. It is to guide exploration toward promising regions of this decision space so the deterministic engine can evaluate them precisely.

A naïve exhaustive search is infeasible. The simulator must aggressively prune impossible or dominated branches while still preserving the sequences that represent legitimate winning lines.

Decision branching

Typical mid-combo states can produce dozens of legal actions, each spawning further sequencing branches.

State sensitivity

Small resource differences — a single mana source or card — can completely change which lines remain viable.

Outcome brittleness

Seemingly minor sequencing differences often determine whether a combo line succeeds or collapses.

Engineering implication

Efficient simulation, branch pruning, and candidate ranking become mandatory for tractable evaluation.

System architecture

1) Deterministic symbolic layer

A high-throughput .NET simulator represents full game state, enumerates legal actions under constraints, performs deterministic transitions, and computes reproducible outcome quality.

Deterministic state transitions
Strict legality and invariant enforcement
Branch-sensitive evaluation
Reproducible output for identical inputs

2) Neural policy layer

Python-based model training learns to rank candidate decisions and generalize across states the symbolic layer can evaluate but cannot cheaply search exhaustively.

Supervised fine-tuning
Preference optimization
Future closed-loop policy improvement
ONNX export for in-process inference

DOMAIN KNOWLEDGE PIPELINE ┌────────────────────────────────────────────────────────────────┐ │ Rules · Strategy Articles · Expert Gameplay · Research │ └────────────────────────────────────────────────────────────────┘ │ ▼ ┌──────────────────────────┐ │ Canonical normalization │ │ + schema enforcement │ └──────────────────────────┘ │ ▼ ┌──────────────────────────┐ │ Semantically atomic JSONL│ │ training corpus │ └──────────────────────────┘ │ ▼ NEURAL TRAINING LAYER ┌──────────────────────────┐ │ Python / PyTorch pipeline│ │ SFT · Preference · Eval │ └──────────────────────────┘ │ ▼ ONNX MODEL EXPORT │ ▼ INFERENCE + EVALUATION ┌─────────────────────────────────┐ │ Deterministic C# simulation │ │ - state transitions │ │ - legality enforcement │ │ - candidate line evaluation │ └─────────────────────────────────┘ │ ▼ Comparative outcomes / preferences │ └──────────────┐ ▼ Policy retraining

Engineering the domain knowledge pipeline

Source curation Comprehensive rules, expert gameplay video, long-form strategy articles, and per-card deep research are collected because each source contributes different kinds of signal: formal rules, tactical nuance, sequencing heuristics, and card-specific edge cases.

Model-assisted extraction with human verification Gemini and ChatGPT Deep Research are used as extraction tools, not authorities. Their outputs are manually reviewed, corrected, trimmed, and rewritten where needed before entering the pipeline.

Canonical normalization Raw artifacts are cleaned into a shared schema designed to support both prose-heavy knowledge sources and future simulation-derived records without changing the downstream interface.

Semantically atomic JSONL generation Each record encodes one clear concept, constraint, or strategic principle, making the output useful for retrieval, fine-tuning, preference construction, and auditability.

Raw sources -> extracted notes / draft markdown -> manual verification + deletion of weak content -> cleaned markdown artifact -> canonical structured representation -> semantically atomic JSONL -> training / retrieval / future preference generation

Closed-loop learning design

The long-term learning loop is intentionally evaluation-first. The policy does not learn from vibes, static correctness labels, or the kind of proxy metrics that create "accuracy theater". It learns from deterministic comparative outcomes.

1. Policy proposes candidate lines A neural policy ranks decision sequences for a given state.

2. Simulator evaluates outcomes deterministically Each line is executed against the same symbolic environment with identical rules and constraints.

3. Comparative supervision is generated Better/worse line pairs, conversion metrics, and structured failure reasons are derived from actual outcomes.

4. Policy is retrained and re-benchmarked New SFT and DPO data is folded back into training, then measured longitudinally against prior policy versions.

CURRENT GAME STATE │ ▼ Neural policy proposes lines ┌──────────────┼──────────────┐ ▼ ▼ ▼ Line A Line B Line C │ │ │ ▼ ▼ ▼ Deterministic simulator executes each line │ │ │ ▼ ▼ ▼ Outcome metrics: success rate · resource use · stability │ ▼ Comparative supervision generated (preference pairs / regret / failure reasons) │ ▼ Policy retraining │ ▼ New policy benchmarked vs prior versions

Model Architecture & Training Configuration

The policy model uses a three-stage training curriculum: domain knowledge fine-tuning, behavioral tuning on simulator outcomes, and preference optimization on counterfactual line comparisons.

Base Model

Qwen2.5-14B (INT4 / QLoRA)

Training Hardware

Single NVIDIA RTX 4090 (24GB)

Precision

BF16 + gradient checkpointing

Training Curriculum

Domain SFT → Outcome SFT → Counterfactual DPO

Effective Batch

32

Domain Knowledge → Simulator Outcomes → Counterfactual DPO

Stage 1: Domain Knowledge

High-capacity adapter trained on curated domain corpus.

Method: SFT (QLoRA)
LoRA rank: 48 (α=96)
Target modules: q, k, v, o, gate, up, down
Effective batch: 32
Learning rate: 1e-4
Epochs: 3

Stage 2: Simulator Outcomes

Behavioral tuning on preferred action sequences.

Method: SFT (QLoRA)
LoRA rank: 16 (α=32)
Target modules:
attention projections (q, v)
Effective batch: 32
Learning rate: 2e-4
Epochs: 2

Stage 3: Counterfactual DPO

Preference optimization on simulator-derived line comparisons.

Method: Direct Preference Optimization
β: 0.05
Effective batch: 32
Learning rate: 3e-6
Epochs: 1

Curriculum rationale: Stage 1 injects domain knowledge using a high-capacity adapter across all projection layers. Stage 2 narrows the adapter to attention layers and tunes behavior toward successful simulator outcomes. Stage 3 refines decision quality through counterfactual preference optimization, teaching the model why certain lines outperform alternatives.

Example training records

Each JSONL record encodes one semantically atomic concept derived from the domain knowledge pipeline.

JSONL sample · rules record

{
  "id": "f884d4a7-07ff-4bbc-938b-b415546d5287",
  "record_type": "insight",
  "cards": [],
  "frame_kind": "game_mechanic",
  "event_kind": "rules_definition",
  "insight_type": "rules",
  "text": "Flashback allows an instant or sorcery card to be cast from the graveyard by paying its flashback cost instead of its mana cost.",
  "artifact_id": "681249302ECD202DEF83BDF2205FB9F2",
  "segment_id": "comprehensive_rules_702_34a_1"
}

JSONL sample · technique record

{
  "id": "4b8efbce-5d34-477e-828c-8652a7373ada",
  "record_type": "insight",
  "cards": ["thoughtseize","echo_of_eons"],
  "frame_kind": "graveyard_setup",
  "event_kind": "engine_activation",
  "insight_type": "technique",
  "text": "Targeting yourself with Thoughtseize can place Echo of Eons into the graveyard to enable its flashback ability.",
  "artifact_id": "9E0286190ABAE83F76D5CBB44151DC11",
  "segment_id": "tes_video_match1_2"
}

Design principles shaping the system

Several architectural constraints shape the system. These are not domain-specific choices; they are design principles intended to make policy improvement measurable and reproducible.

Evaluation before modeling

The system begins with a deterministic environment capable of evaluating candidate strategies reproducibly. Model quality is therefore measured against stable outcomes rather than proxy metrics or human intuition.

Deterministic state transitions
Reproducible scenario replay
Comparative outcome evaluation

Symbolic correctness + neural generalization

The symbolic layer enforces legality, constraints, and deterministic execution. The neural layer learns to rank candidate strategies and generalize across previously unseen states.

Hard invariants enforced by simulator
Policy model ranks candidate lines
Clear separation of reasoning roles

Training–deployment symmetry

Training occurs in Python with PyTorch, but deployment is designed for ONNX inference embedded directly inside the simulator. This keeps the runtime decision loop deterministic and avoids Python production dependencies.

PyTorch training pipeline
ONNX model export
In-process runtime inference