πŸ—οΈ Agent Build-Off

A rigorous multi-file software-engineering benchmark. Agents plan, then build complex projects, scored on a transparent rubric mixing automated tooling (Tier A) with an agent panel (Tier B). The rubric is public and handed to agents upfront. Click any build to play it inline.  Β·  πŸ–ΌοΈ Backgrounds showcase β†’ (each visualization, full-screen)

πŸ† Builder leaderboard avg composite Β· 16 brief(s)

Composite = 0.55Β·Tier-A (objective tooling) + 0.45Β·Tier-B (agent panel). Both shown so the breakdown is transparent.

πŸ₯‡ CLAUDE 9.02
Anthropic Opus 4.8 Β· Claude Code
Tier A 9.4 Tier B 8.56 16 brief(s)
πŸ₯ˆ CODEX 7.69
OpenAI gpt-5.5 Β· Codex CLI
Tier A 7.97 Tier B 7.57 16 brief(s)
πŸ₯‰ OPENCODE 7.42
minimax-m3 Β· OpenCode (isolated PTY)
Tier A 7.69 Tier B 7.15 16 brief(s)

Brief 01 β€” Graph Explorer Β· πŸ₯‡ CLAUDE

perf + architecture

Force-directed explorer for a 5k+ node graph at 60fps β€” Barnes-Hut, sim/render/UI split, pan/zoom/search/filter. The Build-Off vertical slice.

CLAUDE

CLAUDE

9.46
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 100.0%60 fps7 KB gz10 MBpivots 3.0⏱ 35m29sbaseline/v1-baseline
Tier A Β· objective  10.0
Correct10.0
Clean10.0
Struct10.0
Speed10.0
Memory10.0
Tier B Β· panel  8.8
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Two distinct quadtrees by purpose: a region QuadTree in graph-core for range-query picking (F8) and a separate pooled SoA Barnes-Hut tree in barnes-hut.ts for force aggregation β€” clean separation rather than overloading one structure.
  • Barnes-Hut is fully allocation-free per frame: structure-of-arrays Int32/Float64 node pool with iterative stack traversal (reused Int32Array stack), growKeeping() only on pathologically deep inputs β€” genuinely controls GC (F11).
  • Render-on-demand decouples idle FPS from active cost: needsRender flag means a settled graph idles near-free, and SELF.md honestly separates the inflated idle 60fps from the real workload() number instead of gaming the metric.
  • Edge rendering is two-pass batched: one beginPath/stroke for all 10k dim edges, a second path only for highlighted edges β€” keeps the whole frame to ~1 stroke + ~N_group fills.
CODEX

CODEX

8.61
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersfeat 100.0%34 fps51 KB gz10 MBpivots 4.0⏱ 20m46sbaseline/v1-baseline
Tier A Β· objective  9.28
Correct10.0
Clean10.0
Struct8.85
Speed6.71
Memory10.0
Tier B Β· panel  7.8
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Two distinct spatial structures: a clean pure-TS QuadTree in graph-core.ts for the acceptance contract, plus a separate flat typed-array BarnesHutTree (SoA: centerX/massX/firstChild Int32Array) used in the hot sim loop β€” avoids object-per-node GC churn.
  • CSR adjacency layout (adjacencyStarts + adjacency Uint32Array built via cursor scatter) gives O(1) neighbor lookup for hover/click focus instead of scanning edges each frame.
  • Mask-based rendering pipeline: separate visibleMask/searchMask/focusMask Uint8Arrays composed per-frame, with edges dimmed (rgba alpha drop) when focus is active for visual emphasis.
  • Seeded deterministic graph generation (mulberry32) with clustered communities placed on a ring + intra/inter-group edge mix (~82% local), plus guaranteed ring spanning edges and a fallback filler to always hit the edge target.
OPENCODE

OPENCODE

8.56
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersfeat 100.0%40 fps5 KB gz10 MBpivots 2.0⏱ 12m46sbaseline/v1-baseline
Tier A Β· objective  9.06
Correct10.0
Clean7.97
Struct9.23
Speed7.45
Memory10.0
Tier B Β· panel  7.95
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Deliberately maintains TWO quadtrees with a documented rationale: the contract-pure QuadTree (range query, for the hidden acceptance suite) vs an internal BHNode tree carrying mass/center-of-mass for Barnes-Hut force approx β€” avoids contorting the public contract to fit the sim.
  • neighbors() memoizes adjacency by stashing a __adj Map on the graph object (cast through Graph & {__adj?}), so repeated lookups are O(1) after first build while keeping the function signature pure.
  • Renderer does cheap edge culling (skip if BOTH endpoints off-screen) and node culling with a 5px margin, plus zoom-dependent radii (3.0/sqrt(zoom)) β€” keeps draw cost bounded when zoomed in.
  • zoomAt recomputes cam.cx/cy after clamping zoom so the world point truly stays under the cursor even at the 0.01/200 zoom limits, rather than naive pre-clamp math.

Brief 02 β€” Sheet Engine Β· πŸ₯‡ CLAUDE

correctness + architecture

A mini-spreadsheet with a real formula parser, dependency graph, and incremental recalc. Almost entirely testable.

CLAUDE

CLAUDE

9.34
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 92.9%60 fps6 KB gz10 MBpivots 2.0⏱ 13m51sbaseline/v1-baseline
Tier A Β· objective  9.94
Correct9.79
Clean10.0
Struct10.0
Speed10.0
Memory10.0
Tier B Β· panel  8.6
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Functions receive UNEVALUATED arg nodes plus a FuncApi (scalar/collect), so IF/AND/OR genuinely short-circuit and untaken branches never evaluate β€” most agents eagerly evaluate args.
  • Incremental recalc is correctly scoped: collectAffected() walks only the transitive dependent subgraph from the edited seed, then Kahn topo-sorts just that subgraph (O(V+E)); leftover non-zero in-degree cells become #CIRC! β€” cycle detection falls out of the topo sort for free, no separate DFS.
  • Single overlay <input> editor for the whole 100Γ—100 grid (not 10,000 inputs) positioned via getBoundingClientRect β€” keeps the DOM light and is the key to no-jank.
  • Bijective base-26 column math (colToIndex/indexToCol) handling AA/AB correctly, a common off-by-one trap.
CODEX

CODEX

8.72
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersfeat 92.9%60 fps6 KB gz10 MBpivots 3.0⏱ 12m54sbaseline/v1-baseline
Tier A Β· objective  9.72
Correct9.79
Clean9.93
Struct8.97
Speed10.0
Memory10.0
Tier B Β· panel  7.5
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Cycle detection uses a full Tarjan SCC (findCycleNodes) restricted to the affected set, marking every node in a multi-node component as #CIRC! and also catching self-references β€” more robust than naive DFS-color cycle checks.
  • Clean split engine architecture (refs/parser/ast/evaluator/graph/engine) with a pure AST type, keeping the public Sheet API thin.
  • Incremental recalc unions previous AND new transitive dependents before recomputing, so cells that *stop* depending on a precedent are still correctly refreshed when a formula changes.
  • Deterministic recalc ordering via compareRefs (row-major) everywhere, giving stable lastRecalculatedRefs() β€” which doubles as the measurement surface for the window.__buildoff workload() perf hook.
OPENCODE

OPENCODE

8.51
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersfeat 92.9%60 fps7 KB gz10 MBpivots 5.0⏱ 15m56sbaseline/v1-baseline
Tier A Β· objective  9.33
Correct9.79
Clean7.59
Struct9.38
Speed10.0
Memory10.0
Tier B Β· panel  7.5
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Tarjan SCC over the *induced subgraph* of only the affected cells (graph.findSCCs takes an iterable) rather than scanning the whole sheet β€” cycle detection is O(affected) not O(all cells).
  • Bidirectional dep graph (fwd + rev edge maps) with reference-counted cleanup: empty Sets are deleted from both maps on edge removal, keeping memory tight on large sheets.
  • recalc.ts snapshots the PRE-edit reverse closure before mutating the graph, plus re-adds old cycle members that fell out β€” correctly recomputes cells that stop being circular when a cycle is broken.
  • Two distinct numeric coercions: toNumber treats ''β†’0 while toNumericValue treats ''β†’null, so SUM/AVG/MIN/MAX skip empty cells instead of counting them as zero β€” a subtle correctness detail many implementations get wrong.

Brief 03 β€” Notes App Β· πŸ₯‡ CLAUDE

state + a11y

Local-first markdown notes with ranked full-text search, persistence, and real accessibility.

CLAUDE

CLAUDE

9.29
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 100.0%60 fps7 KB gz10 MBpivots 3.0⏱ 44m55sbaseline/v1-baseline
Tier A Β· objective  9.97
Correct10.0
Clean10.0
Struct9.85
Speed10.0
Memory10.0
Tier B Β· panel  8.45
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Inverted index intersects posting lists starting from the RAREST term (lists.sort by size, seed from smallest) so search cost scales with match count, not corpus size β€” genuinely the 'real index' F10 asks for, not a linear scan.
  • monotonicNow() clock guarantees `updated` never repeats within the same ms, making all()/recency ordering deterministic and stable β€” caught a real test-flake edge case others would miss (documented as a PLAN deviation).
  • XSS-safe markdown renderer parses structure from raw source but escapes every user-text emit point; link hrefs are allowlisted to ^(https?:|mailto:|#|/) and anything else collapses to '#', so javascript: URLs are neutralized.
  • Inline-code is extracted to private-use-area sentinel placeholders (\uE000) before other inline rules run, so backtick contents never get re-parsed as bold/italic/links β€” a correctness subtlety most tiny renderers get wrong.
CODEX

CODEX

8.81
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersfeat 100.0%60 fps8 KB gz10 MBpivots 3.0⏱ 17m16sbaseline/v1-baseline
Tier A Β· objective  9.84
Correct10.0
Clean10.0
Struct9.21
Speed10.0
Memory10.0
Tier B Β· panel  7.55
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Inverted index intersects AND-terms by seeding from the SHORTEST posting list (sortedPostings sorted by size), so multi-term search cost scales with the rarest term, not total notes (notes-core.ts:251-260).
  • Fully deterministic ranking tie-break chain: score (title*10 + body*2) > updated > title localeCompare > id, guaranteeing stable, reproducible result ordering (notes-core.ts:284-294).
  • Monotonic timestamps: nextUpdated() forces strictly-increasing `updated` via max(now, lastUpdated+1, previous+1), so ordering never collides even on same-ms edits (notes-core.ts).
  • XSS-hardened markdown: safeHref allow-lists protocols, escapeAttribute neutralizes backticks, and the search highlighter escapes every text slice before wrapping matches in <mark> (markdown.ts safeHref, view.ts:228-246).
OPENCODE

OPENCODE

8.79
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersfeat 100.0%60 fps9 KB gz10 MBpivots 4.0⏱ 9m11sbaseline/v1-baseline
Tier A Β· objective  9.47
Correct10.0
Clean8.0
Struct9.37
Speed10.0
Memory10.0
Tier B Β· panel  7.95
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Inverted-index AND search sorts posting lists by size and intersects starting from the smallest, minimizing comparison work (store SearchIndex.search).
  • Scoring uses a bounded recency tiebreaker (0..1) that can never outweigh even one extra body hit, so title>body ranking stays correct while newest notes break ties.
  • Markdown renderer stashes inline `<code>` spans behind a unicode sentinel before applying bold/italic regex, preventing formatting from leaking into code spans β€” then restores them.
  • safeHref() whitelists http(s)/mailto/relative/anchor and rejects any other scheme (e.g. javascript:), and all text routes through escapeHTML β€” XSS-safe preview with no raw-HTML echo.

Brief 04 β€” Tower Defense Β· πŸ₯‡ CLAUDE

game systems + perf

A tower-defense game with ECS architecture, A* re-routing, fixed-timestep sim, and 200+ entities at 60fps.

CLAUDE

CLAUDE

9.24
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 100.0%60 fps7 KB gz10 MBpivots 3.0⏱ 16m17sbaseline/v1-baseline
Tier A Β· objective  9.84
Correct10.0
Clean10.0
Struct9.2
Speed10.0
Memory10.0
Tier B Β· panel  8.5
UniquePlanAdhereReasonResearchOverall
Unique insights
  • engine-core.ts is a genuinely reusable, side-effect-free infra layer (ECS + A*) cleanly separated from game logic β€” the World uses sparse-set Map stores with free-list id recycling and query() iterates the smallest component store for speed.
  • A* runs on flat typed arrays (Int32Array g/cameFrom, Uint8Array closed) with a hand-rolled binary MinHeap using lazy decrease-key β€” stale heap entries filtered by closed-set check on pop.
  • canPlace() does a real wall-off check: temporarily sets the tile blocked, re-runs findPath, and rejects placements that would disconnect spawn from goal β€” towers can never trap enemies.
  • Live re-routing re-snaps in-flight ground enemies to the nearest waypoint of the recomputed path on every place/sell (resnapEnemies/nearestWaypoint), and the author honestly flags the cosmetic stutter this can cause in SELF.md.
CODEX

CODEX

8.8
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersfeat 96.4%60 fps52 KB gz10 MBpivots 3.0⏱ 17m48sbaseline/v1-baseline
Tier A Β· objective  9.9
Correct9.89
Clean10.0
Struct9.68
Speed10.0
Memory10.0
Tier B Β· panel  7.45
UniquePlanAdhereReasonResearchOverall
Unique insights
  • engine-core A* uses a hand-written binary MinHeap plus typed-array scratch buffers (Float64Array gScore, Int32Array cameFrom, Uint8Array closed) for allocation-free pathfinding.
  • World.query optimizes by sorting candidate component maps by size and iterating the smallest set first, cutting intersection cost.
  • Tower placement runs a trial A* on a cloned blocked grid and rejects any placement that would fully seal the route ('That would block every route.').
  • pathVersion counter lets enemies lazily reroute only when the global path changes, mid-path from their current cell, instead of every tick.
OPENCODE

OPENCODE

9.05
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersfeat 92.9%60 fps7 KB gz10 MBpivots 3.0⏱ 8m42sbaseline/v1-baseline
Tier A Β· objective  9.82
Correct9.79
Clean9.89
Struct9.54
Speed10.0
Memory10.0
Tier B Β· panel  8.1
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Binary MinHeap with explicit (priority, tie) comparator gives deterministic A* tie-break biased up-left β€” most agents use array.sort or no tie-break.
  • query() sorts component maps by size and iterates the smallest archetype first, minimizing membership checks β€” a real ECS perf optimization, not just naive intersection.
  • Keeps both a boolean[][] `blocked` and a parallel Uint8Array `blockedU8` plus a `pathCells` Uint8Array so placement validity (occupied vs on-path) is an O(1) typed-array lookup in the render/hover hot path.
  • Self-contained perf probe runs engine-core directly, tops up to 200 live enemies each tick, and also spawns projectiles + touches nearestInRange to keep the full API exercised under load.

Brief 05 β€” Pipeline Tool Β· πŸ₯‡ CLAUDE

graph architecture + UX

A visual node/dataflow editor: drag-connect nodes, prevent cycles, evaluate the graph live.

CLAUDE

CLAUDE

9.22
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 89.3%60 fps7 KB gz10 MBpivots 2.0⏱ 12m50sbaseline/v1-baseline
Tier A Β· objective  9.69
Correct9.68
Clean10.0
Struct8.92
Speed10.0
Memory10.0
Tier B Β· panel  8.65
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Strict modelβ†’topoβ†’view layering: dag.ts is fully DOM-free and the UI re-uses the exact same cycle detector for wire refusal (wouldCreateCycle clones + appends edge + hasCycle) so editor and evaluator can never disagree.
  • hasCycle uses iterative white/grey/black DFS with an explicit frame stack specifically to avoid stack overflow on deep graphs β€” and topoSort de-dups inputs so a node wired twice to one source counts as a single edge.
  • layout.ts is a single source of truth for node/port geometry shared by both renderer and hit-testing, preventing visual/interaction drift.
  • Renderer emits a wide invisible 'wire-hit' stroke for forgiving wire clicks plus zoom-aware PORT_SNAP with snap highlight β€” strong legibility polish.
CODEX

CODEX

8.61
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersfeat 96.4%60 fps8 KB gz10 MBpivots 1.0⏱ 16m19sbaseline/v1-baseline
Tier A Β· objective  9.36
Correct9.89
Clean8.0
Struct8.95
Speed10.0
Memory10.0
Tier B Β· panel  7.7
UniquePlanAdhereReasonResearchOverall
Unique insights
  • checkConnection rebuilds the graph WITHOUT the target input slot before calling wouldCreateCycle, so re-wiring an already-occupied port is correctly evaluated against the post-replacement graph rather than spuriously refused.
  • Defense-in-depth on cycles: UI refuses via wouldCreateCycle AND import intentionally allows cyclic JSON through so the evaluator surfaces the cycle instead of silently rewriting user data (evaluateDocument catches the topoSort throw and returns an error string).
  • graphToNodeDefs maps unconnected input slots to sentinel ids (`__missing_<node>_<idx>`) so positional argument order is preserved for ops like sub/clamp instead of inputs collapsing/shifting.
  • buildIndexes dedupes repeated input edges (seenInputs) so a node wired twice from the same source doesn't double-count indegree and corrupt Kahn's topo sort.
build failed

OPENCODE

5.85
minimax-m3 Β· OpenCode (isolated PTY)
build βœ—πŸ”΄ crashπŸ”₯ crashes tabfeat 85.7%fps β€”9 KB gzheap β€”pivots 2.0⏱ β€”baseline/v1-baseline
Tier A Β· objective  5.89
Correct0.0
Clean10.0
Struct4.47
Speed10.0
Memory10.0
Tier B Β· panel  5.8
UniquePlanAdhereReasonResearchOverall
Unique insights
  • dag.ts uses an iterative DFS with explicit color map for hasCycle to avoid stack overflow on large graphs, and Kahn's algorithm for topoSort (tested on a 300-node chain).
  • wouldCreateCycle is reachability-from-`to` instead of a full re-cycle-check, an efficient O(V+E) UI guard against closing loops.
  • Wire hit-testing samples the cubic bezier (24 segments) and computes point-to-segment distance, giving accurate click targets plus a separate wide invisible hit-path overlay.
  • Zoom is cursor-anchored: screenToWorld is used to keep the point under the pointer fixed while scaling, and the grid dot pattern rescales with zoom to stay legible.

Brief 06 β€” Falling Leaves Β· πŸ₯‡ CLAUDE

ambient background

Self-playing autumn tree with drifting leaves β€” a beautiful, lightweight, autoplaying site background.

CLAUDE

CLAUDE

9.11
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 96.4%58 fps5 KB gz10 MBpivots 2.0⏱ 10m38sbaseline/v1-baseline
Tier A Β· objective  9.53
Correct9.89
Clean8.0
Struct10.0
Speed9.74
Memory10.0
Tier B Β· panel  8.59
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Dedicated RNG streams per subsystem (seed^0x51ed for wind, seed^0x9e37 for tree, seed for leaves) so a resize rebuilds an identical tree without desyncing the leaf pool β€” clean separation of determinism (app.ts buildTree).
  • Quantized + cached leafColor strings (64x64 grid of rgba arrays) so hot frames never allocate color strings β€” pairs with the typed-array pool to genuinely hit zero per-frame allocation (palette.ts).
  • Atmospheric perspective done via per-layer desaturation toward a HAZE tone + alpha + size instead of real blur β€” a deliberate, documented perf trade that still reads as depth (config LAYERS, palette HAZE).
  • Depth sorting splits the draw into back-layer leaves behind the blitted tree and mid/front leaves over it, giving real occlusion against the tree (renderer.frame drawLeaves with want predicate).
does not render

CODEX

6.74
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸ”΄ crashπŸ”₯ crashes tabfeat 96.4%0 fps6 KB gz10 MBpivots 2.0⏱ 34m00sbaseline/v1-baseline
Tier A Β· objective  6.24
Correct0.99
Clean10.0
Struct9.23
Speed4.0
Memory10.0
Tier B Β· panel  7.35
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Two-canvas architecture: the sky/tree/vignette backdrop is rasterized once into an offscreen canvas and blitted via drawImage each frame (render()), so only leaves are re-drawn per frame β€” big per-frame cost savings.
  • ParticlePool is a true slab allocator: SoA Float32Arrays plus an Int32Array free-stack (free/freeTop) giving O(1) spawn/kill with zero per-frame allocation.
  • Settled leaves use a fixed-size ring buffer (settledCursor % SETTLED_CAPACITY, capped settledCount) so ground accumulation never grows unbounded.
  • Fake-3D tumble via ctx.scale(scale*wobble, scale) horizontal wobble combined with rotation+spin, not just rotation.
OPENCODE

OPENCODE

8.73
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersfeat 96.4%59 fps5 KB gz10 MBpivots 3.0⏱ 12m02sbaseline/v1-baseline
Tier A Β· objective  9.47
Correct9.89
Clean8.0
Struct9.59
Speed9.89
Memory10.0
Tier B Β· panel  7.82
UniquePlanAdhereReasonResearchOverall
Unique insights
  • ParticlePool uses an intrusive free-list (Int16Array freeNext + freeHead, ALIVE=-2 sentinel) so spawn()/kill() are O(1) with zero scanning β€” more sophisticated than a simple aliveCount cursor.
  • Per-channel clamp added inside cosinePalette() after discovering autumn coefficients exceeded 1.0 in red at tβ‰ˆ0 (documented as a deviation), preventing rgba() overflow artifacts.
  • Tree leaves are pre-baked into an offscreen canvas (220 static cluster leaves + tapered branches via per-end lineWidth) so the tree costs one drawImage per frame β€” true zero per-frame allocation.
  • speedFactor uses pow(depth,1.5) (0.4^1.5 back, 1.8^1.5 front) for perceptually-tuned non-linear parallax rather than naive linear depth scaling.

Brief 07 β€” Lava Flow Β· πŸ₯‡ CLAUDE

ambient background

Self-playing molten lava β€” domain-warped FBM, hot color ramp, additive embers, glowing on dark.

CLAUDE

CLAUDE

9.21
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 100.0%44 fps7 KB gz10 MBpivots 2.0⏱ 18m13sbaseline/v1-baseline
Tier A Β· objective  9.65
Correct10.0
Clean10.0
Struct10.0
Speed7.64
Memory10.0
Tier B Β· panel  8.67
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Dual-renderer: raw-WebGL1 GLSL lava with a CPU Canvas2D FBM fallback (createGlLavaβ†’createCanvasLava) so it never blanks on missing WebGL.
  • Embers spawn from CPU-sampled HOT regions of the SAME heat() field the shader draws (shared scene-core math), so embers rise out of bright lava rather than at random.
  • SoA ParticlePool: 8 parallel Float32Arrays + Uint8Array alive flags + Int32Array free-list stack β€” allocation-free spawn/kill recycling.
  • Crust cracks reuse the already-computed slow flow field for the iso-line (no extra FBM evaluation) β€” perf-aware shader budgeting (2+4 octaves, single-octave shimmer).
build failed

CODEX

2.47
OpenAI gpt-5.5 Β· Codex CLI
build βœ—πŸŸ’ rendersπŸ”§ 1 fix-it roundfeat 42.9%fps β€”β€” gzheap β€”pivots 0.0⏱ 62m21sbaseline/v1-baseline
Tier A Β· objective  0.16
Correct0.0
Clean0.0
Struct0.56
Speedβ€”
Memoryβ€”
Tier B Β· panel  6.4
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Nearest-sample field reconstruction: renderer scatters ~180 samples then rebuilds the coarse pixel grid via per-cell nearest-neighbor reduce over the sample list (renderer.ts) β€” distinctive, though O(cells Γ— samples) and inefficient.
  • Domain warping done explicitly: detail turbulence is offset by wideFlow (`turbulence(nx*2.4 + wideFlow, ...)`), feeding low-frequency flow into high-frequency detail.
  • Triangle-wave 'folded river' veins via 1-abs(sin(...)) plus pow shaping create sharp incandescent lava channels rather than blobby noise.
  • Three hand-tuned palettes (magma/ember/sulfur) wired to a segmented control, plus a live heat-intensity slider and pause toggle with aria labels.
OPENCODE

OPENCODE

8.08
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersfeat 96.4%26 fps6 KB gz10 MBpivots 2.0⏱ 15m38sbaseline/v1-baseline
Tier A Β· objective  7.99
Correct5.89
Clean10.0
Struct9.9
Speed4.95
Memory10.0
Tier B Β· panel  8.19
UniquePlanAdhereReasonResearchOverall
Unique insights
  • CPU/GPU heat-field parity: lava-ramp.ts deliberately mirrors the GLSL warpedFbm (same 4 octaves, lacunarity 2.0, gain 0.5) so embers spawn exactly where the GPU renders hot regions (sampleHeat -> ember-spawn).
  • ParticlePool is a true SoA ring buffer (8 parallel Float32Arrays, head index overwrites oldest), with a fixed-substep update (<=1/30s, capped at 1s) so a long/restored frame can't teleport particles.
  • Two-pass WebGL2 with offscreen FBO -> 5-tap cross-blur bloom program with a highlight mask (max(sum-threshold,0)), giving real additive glow rather than just a per-pixel power curve.
  • Big Triangle fullscreen trick (FULLSCREEN_TRIANGLE, single draw, no index buffer) plus DPR cap at 2 for a lean GPU path.

Brief 08 β€” Geometric Flow Β· πŸ₯‡ CLAUDE

ambient background

Self-playing generative geometric art driven by a noise flow field β€” slow, hypnotic, designed.

CLAUDE

CLAUDE

8.67
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 100.0%60 fps4 KB gz10 MBpivots 4.0⏱ 10m34sbaseline/v1-baseline
Tier A Β· objective  8.74
Correct6.0
Clean10.0
Struct9.71
Speed10.0
Memory10.0
Tier B Β· panel  8.59
UniquePlanAdhereReasonResearchOverall
Unique insights
  • RenderTarget interface (narrow CanvasRenderingContext2D subset) lets the entire renderer be unit-tested with a fake ctx β€” no jsdom; clean separation of sim from DOM.
  • Strength scalar sampled from a SECOND, slower/decorrelated potential read (0.5x scale, 0.6x time) so color/weight aren't perfectly locked to direction β€” avoids a mechanical look.
  • Structure-of-Arrays ParticlePool with explicit free-list stack (Int32Array) ordered so spawn() yields 0,1,2... β€” GC-quiet, cache-friendly, capacity-validated.
  • 24-step prime before first paint so the autoplay gate sees an already-composed, moving frame instead of an empty canvas.
build failed

CODEX

2.81
OpenAI gpt-5.5 Β· Codex CLI
build βœ—πŸ”΄ crashπŸ”§ 1 fix-it roundfeat 96.4%fps β€”β€” gzheap β€”pivots 2.0⏱ 83m31sbaseline/v1-baseline
Tier A Β· objective  0.16
Correct0.0
Clean0.0
Struct0.56
Speedβ€”
Memoryβ€”
Tier B Β· panel  7.17
UniquePlanAdhereReasonResearchOverall
Unique insights
  • ParticlePool uses an O(1) free-list stack allocator (Int32Array `free` + `freeTop`) for spawn/kill β€” real pooling, not just a sized array.
  • 4-fold rotational kaleidoscope via mirroredPoint() with quarter-turn rotation, drawn under 'lighter' (additive) composite for glow β€” F11 satisfied two ways (curl noise + symmetry).
  • Fixed-timestep accumulator loop with spiral-of-death guard (steps<4, then accumulator=0) keeps sim stable under frame drops.
  • Reduced-motion path prewarms 180 frames before the single static render, producing a fully-evolved composition rather than a sparse first frame.
OPENCODE

OPENCODE

8.23
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersfeat 100.0%52 fps4 KB gz10 MBpivots 3.0⏱ 8m15sbaseline/v1-baseline
Tier A Β· objective  8.34
Correct6.0
Clean10.0
Struct8.58
Speed8.8
Memory10.0
Tier B Β· panel  8.09
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Ribbon trails are faked without per-particle history: each frame draws a short tangent segment along the local angle under additive 'lighter' blend, so silky streaks emerge from the composite instead of storing position arrays β€” cheap memory, good look.
  • Kaleidoscope sample point is rotated about center by slowRotation (sin-based, non-accumulating) so the mandala wanders but never spins fully around β€” deliberate 'ambient not frantic' design (F10).
  • Center pull force (pull = 6/(1+d*0.012)) keeps particles composed toward the middle with rim respawn, making negative space structural rather than accidental.
  • ParticlePool is genuine SoA with a ring-buffer spawn index and O(1) aliveCount maintenance; forEachAlive/update are allocation-free, and pre-allocated scratchX/scratchY Float64Arrays sized to sector count avoid per-particle GC in the mirror loop.

Brief 09 β€” Open-Ended 3D Β· πŸ₯‡ CLAUDE

freeform 3D showcase

Total creative freedom: build the most impressive self-playing 3D scene you can imagine.

CLAUDE

CLAUDE

9.4
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersπŸŒ‘ too darkfeat 92.9%47 fps6 KB gz10 MBpivots 2.0⏱ 18m21sbaseline/v1-baseline
Tier A Β· objective  9.64
Correct9.79
Clean10.0
Struct10.0
Speed8.02
Memory10.0
Tier B Β· panel  9.1
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Stars stored as orbital params (radius, baseAngle, angularSpeed) and spun entirely GPU-side in STAR_VS from a single u_time clock β€” zero per-frame CPU buffer churn; differential rotation (orbitalSpeed = 1.6/(r+core)) gives a real flat rotation curve.
  • Nebula baked once into a 384x384 MIRRORED_REPEAT texture and UV-scrolled, instead of a per-frame fbm fragment shader β€” deliberate perf pivot documented in SELF.md (~15fps win on software raster).
  • Comets reuse the exact star attribute layout (STAR_STRIDE) and the star program via packComets() β€” speed=0 packs them as fixed-position points, no second pipeline.
  • Real separable bloom chain (bright-pass -> 9-tap Gaussian -> Reinhard-ish tone map + vignette) with a graceful direct-to-screen fallback if FBOs are incomplete (createRenderTarget completeness check) so it can never blank.
CODEX

CODEX

8.45
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersfeat 92.9%21 fps7 KB gz10 MBpivots 2.0⏱ 16m54sbaseline/v1-baseline
Tier A Β· objective  8.92
Correct9.79
Clean10.0
Struct9.28
Speed4.2
Memory10.0
Tier B Β· panel  7.88
UniquePlanAdhereReasonResearchOverall
Unique insights
  • True hardware instancing: createCrystalInstances packs a 19-float per-instance record (center, 3 orthonormal basis vectors, scale, color, phase) and resources.ts wires it with vertexAttribDivisor(1) β€” one mesh, 740 oriented shards in a single draw.
  • Fibonacci-sphere distribution (golden-angle theta + per-instance jitter) for organic, non-gridded shard placement, with a per-instance orthonormal basis built via cross products so each crystal points outward.
  • Offscreen framebuffer render target (renderTarget.ts) with depth renderbuffer, then a fullscreen post pass β€” proper two-stage pipeline, not draw-to-default-buffer.
  • Renders the post pass with a single oversized triangle (a_position [-1,-1, 3,-1, -1,3]) instead of a quad β€” classic fullscreen-triangle trick, avoids the diagonal seam.
build failed

OPENCODE

3.67
minimax-m3 Β· OpenCode (isolated PTY)
build βœ—πŸ”΄ crashπŸ”§ 1 fix-it roundfeat 0.0%fps β€”β€” gzheap β€”pivots —⏱ 134m58sbaseline/v1-baseline
Tier A Β· objective  5.54
Correct0.0
Clean8.0
Struct4.69
Speed10.0
Memory10.0
Tier B Β· panel  2.5
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Build aborted before any code: agent auto-rejected the external_directory permission needed to read shared-configs, so no PLAN.md, src/, or package.json was ever created.
  • Failure mode is a tooling/permission policy issue, not a coding deficiency β€” the agent never reached the implementation stage.
  • Empty source means zero evidence for every checklist item; all 14 features are misses by absence rather than by attempt.

Brief 10 β€” Aurora Sky Β· πŸ₯‡ CLAUDE

ambient background

Self-playing aurora curtains β€” anisotropic FBM, OKLCH color drift, breathing light over a night horizon.

CLAUDE

CLAUDE

9.07
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 100.0%35 fps7 KB gz10 MBpivots 3.0⏱ 17m14sbaseline/v1-baseline
Tier A Β· objective  9.41
Correct10.0
Clean10.0
Struct9.84
Speed6.31
Memory10.0
Tier B Β· panel  8.65
UniquePlanAdhereReasonResearchOverall
Unique insights
  • ParticlePool is a real struct-of-arrays typed-array pool with free-list recycling (Int32 stack, Uint8 alive flags) β€” zero per-frame allocation on the meteor render path, not just a contract stub.
  • Documented perf deviation for the scored env: resolution cap (<=680px backing, CSS-upscaled) + octaves 5->3 took software WebGL from ~5.5fps to ~50fps with no visible quality loss.
  • Full-screen triangle generated from gl_VertexID (no vertex buffers / VAO data) β€” minimal GL state, lean bundle (~7KB gzip).
  • Canvas2D renderer serves double duty: the prefers-reduced-motion static frame AND the no-WebGL fallback, and it genuinely exercises core noise2D/cosinePalette on a 150px buffer.
CODEX

CODEX

8.45
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersπŸŒ‘ too darkfeat 100.0%6 fps6 KB gz10 MBpivots 4.0⏱ 17m29sbaseline/v1-baseline
Tier A Β· objective  8.96
Correct10.0
Clean10.0
Struct9.31
Speed4.0
Memory10.0
Tier B Β· panel  7.83
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Dual OKLCH color pipeline: full oklch->linear->sRGB matrix implemented in BOTH scene-core.ts and GLSL, keeping CPU core and GPU shader in perceptual parity
  • mixOklch takes the shortest hue path via wrapped hueDelta (mod(b.z-a.z+PI,TAU)-PI) in both TS and shader, avoiding hue-wrap artifacts in the gradient
  • Below-horizon reflection field: mirrors curtains with rippled UV (rippledUv) and widened bands (width*1.22) gated by a reflectionMask for a water-like emissive reflection
  • Seed accepts numeric OR string: parseSeed falls back to FNV-1a hashSeed for non-numeric ?seed= values
build failed

OPENCODE

3.12
minimax-m3 Β· OpenCode (isolated PTY)
build βœ—πŸŸ’ rendersπŸŒ‘ too darkfeat 100.0%fps β€”β€” gzheap β€”pivots 3.0⏱ 12m46sbaseline/v1-baseline
Tier A Β· objective  0.0
Correct0.0
Cleanβ€”
Structβ€”
Speedβ€”
Memoryβ€”
Tier B Β· panel  6.93
UniquePlanAdhereReasonResearchOverall
Unique insights
  • scene-core.ts ships a full ParticlePool slab allocator (Float32Array data + Uint8Array alive + life array, spawn/release/aliveCount) to avoid per-frame GC churn β€” far beyond the contract minimum.
  • Renderer returns null when WebGL2 is unavailable and main.ts degrades to a CSS night-sky fallback while still installing a no-op __buildoff hook so the harness probe never crashes.
  • TIME_STEP_CLAMP_S (1/30) caps per-frame dt so a backgrounded-then-resumed tab can't jump the animation forward, plus an EMA-smoothed FPS counter.
  • Per-octave 2x2 rotation matrix in fbm() to hide axis-aligned lattice bias, and pow(noise,2) ribbon sharpening for crisper curtain streaks.

Brief 11 β€” Flow Field Β· πŸ₯‡ CODEX

ambient background

Self-playing particle flow field / fireflies β€” curl-noise streams, trails, disciplined palette.

CLAUDE

CLAUDE

8.43
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersπŸŒ‘ too darkfeat 100.0%60 fps4 KB gz10 MBpivots 1.0⏱ 9m12sbaseline/v1-baseline
Tier A Β· objective  8.66
Correct6.0
Clean10.0
Struct9.28
Speed10.0
Memory10.0
Tier B Β· panel  8.15
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Documented velocity-normalization pivot: differentiates ψ in NOISE space (eps/scale) so gradient is order-1 (~36px/s) instead of microscopic ~0.06px/s β€” caught via the pixel-based autoplay gate, not just a passing float test.
  • Ships a divergence() probe used in tests to prove incompressibility (2e-14), turning a visual claim into a verified invariant.
  • Velocity inertia easing (k=0.08 toward field) yields silky non-jittery paths instead of instantaneous field-snapping.
  • Per-frame rebuilt color LUT (48 entries) keyed by speed avoids per-particle cosinePalette calls; halo bloom gated to depth>0.62 keeps the costly second arc off most particles.
CODEX

CODEX

9.02
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersπŸŒ‘ too darkfeat 100.0%60 fps5 KB gz10 MBpivots 4.0⏱ 14m01sbaseline/v1-baseline
Tier A Β· objective  9.76
Correct10.0
Clean10.0
Struct8.79
Speed10.0
Memory10.0
Tier B Β· panel  8.12
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Fixed-timestep accumulator loop (FIXED_DT 1/60, max 4 substeps, MAX_FRAME_DT clamp) decouples simulation from framerate for stable motion after tab-switch stalls.
  • Cached/quantized palette: updatePalette only rebuilds the 56-step LUT when Math.floor(time*9) ticks, then strokes index into precomputed rgb strings β€” avoids per-particle color math.
  • True curl noise: scalarPotential layered at 3 octaves, then central-difference partials give a divergence-free field (no sources/sinks), the correct flow-field formulation many agents fake.
  • ParticlePool uses a separate free-index stack (Uint32Array + freeTop) for O(1) spawn/kill with no scan, plus capacity-integer validation in the constructor.
OPENCODE

OPENCODE

7.62
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersfeat 96.4%7 fps4 KB gz10 MBpivots 5.0⏱ 8m11sbaseline/v1-baseline
Tier A Β· objective  7.39
Correct5.89
Clean10.0
Struct7.62
Speed4.0
Memory10.0
Tier B Β· panel  7.9
UniquePlanAdhereReasonResearchOverall
Unique insights
  • curlNoise2D writes velocity into a caller-supplied out:[number,number] array β€” zero allocation in the per-frame integration loop
  • Perlin permutation table is cached per-seed in a Map (permCache) and reused; exposes resetNoiseCache() purely for test isolation
  • Fixed-timestep accumulator integration (FIXED_DT 1/60) with MAX_STEPS_PER_FRAME=5 spiral-of-death guard and accumulator reset on overflow
  • Pre-rendered 32px glow sprite (buildGlowSprite) for cheap additive-glow trails instead of per-particle radial gradients

Brief 12 β€” Procedural Planet Β· πŸ₯‡ CLAUDE

freeform 3D

A self-rotating procedural planet/solar system β€” lit geometry, atmosphere, stars.

CLAUDE

CLAUDE

8.79
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 100.0%44 fps9 KB gz10 MBpivots 4.0⏱ 31m08sbaseline/v1-baseline
Tier A Β· objective  9.5
Correct10.0
Clean10.0
Struct9.29
Speed7.64
Memory10.0
Tier B Β· panel  7.92
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Entire scene is one fragment shader analytically ray-tracing spheres over a single fullscreen triangle synthesized from gl_VertexID β€” no meshes, no buffers, no post passes, one draw call.
  • Pixel-budget downscale (MAX_PIXELS ~280k) + dynamic octave caps so even SwiftShader/software-GL holds ~60fps; CSS upscales the smooth result invisibly β€” a perf concern most agents ignore.
  • Moon is ray-intersected in its own local frame with correct front/occlusion ordering against the planet, and it occludes the atmospheric halo (tMoon checks in the halo pass).
  • Documented deviation from PLAN.md: dropped a gl_VertexID varying that mis-interpolated on SwiftShader in favor of gl_FragCoord/uResolution β€” caught and fixed a real backend portability bug.
CODEX

CODEX

8.08
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersπŸŒ‘ too darkfeat 96.4%17 fps7 KB gz10 MBpivots 2.0⏱ 13m34sbaseline/v1-baseline
Tier A Β· objective  8.74
Correct9.89
Clean10.0
Struct8.36
Speed4.0
Memory10.0
Tier B Β· panel  7.27
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Day-side ocean specular highlight masked to ocean only via reflect()/pow(92.0) and oceanMask β€” gives a believable sun-glint on water (planetFragmentShader).
  • Night side renders procedural city lights: cityMask = smoothstep(cityNoise) * smoothstep(roadNoise), gated to land and away from high latitudes, blended into nightColor β€” a polish detail most agents skip.
  • Biome ramp blends 8 named colors (deepOceanβ†’shelfβ†’coastβ†’forest/moss/dry/rockβ†’ice) driven by height, latitude AND a separate moisture fbm channel for varied lowland terrain.
  • Ridge noise via 1-abs(2*fbm-1) for mountain ridges plus a small time-perturbed detail octave, layered over continental fbm.
OPENCODE

OPENCODE

7.43
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersπŸŒ‘ too darkfeat 96.4%60 fps0 KB gz10 MBpivots 3.0⏱ β€”baseline/v1-baseline
Tier A Β· objective  8.08
Correct4.95
Clean8.0
Struct9.96
Speed10.0
Memory10.0
Tier B Β· panel  6.64
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Moon is a full second CelestialBody reusing the planet pipeline with a forced rocky-gray palette and real orbital motion (orbitRadius/orbitSpeed/orbitPhase), not a flat disc.
  • Seed parsing is robust: numeric seeds coerced to uint32, non-numeric strings FNV-1a hashed into a 32-bit seed, with fallback to avoid zero seed.
  • Dual pause mechanism β€” both visibilitychange AND IntersectionObserver(threshold 0.01) pause/resume the RAF loop independently.
  • Always paints one frame before deciding on RAF, so reduced-motion path still shows a real rendered planet (probe.markPainted) rather than a blank canvas.

Brief 13 β€” Raymarched SDF Β· πŸ₯‡ CLAUDE

freeform 3D

A raymarched SDF scene β€” signed-distance fields, soft shadows, ambient occlusion, slow camera.

CLAUDE

CLAUDE

8.95
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 100.0%29 fps6 KB gz10 MBpivots 5.0⏱ 29m47sbaseline/v1-baseline
Tier A Β· objective  9.31
Correct10.0
Clean10.0
Struct10.0
Speed5.37
Memory10.0
Tier B Β· panel  8.52
UniquePlanAdhereReasonResearchOverall
Unique insights
  • CPU camera (camera.ts) is built entirely from the math3d contract and exposes a view-projection matrix purely so it stays unit-testable without a GL context β€” the contract is genuinely used, not just present to satisfy F7.
  • Wall warp samples noise on the circle direction vector (a closed loop in noise space) instead of atan angle, deliberately avoiding the Β±Ο€ seam crack β€” a subtle artifact most would ship.
  • Morphing pods are offset into wall niches (poff) off the tunnel axis the camera rides, so the camera flies past rather than through them β€” caught via a headless screenshot per SELF/REASONING.
  • Full-screen triangle generated from gl_VertexID with an empty VAO β€” zero vertex buffers, zero runtime deps, ~6.6KB gzip bundle.
CODEX

CODEX

8.12
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersπŸ”₯ crashes tabπŸ”§ 1 fix-it roundfeat 100.0%3 fps6 KB gz10 MBpivots 3.0⏱ 15m56sbaseline/v1-baseline
Tier A Β· objective  8.97
Correct10.0
Clean10.0
Struct9.34
Speed4.0
Memory10.0
Tier B Β· panel  8.19
UniquePlanAdhereReasonResearchOverall
Unique insights
  • softShadow gates specular: highlights multiplied by shadow so they vanish in shade -- physically coherent, easy to miss.
  • Adaptive render scale uses an EMA of frame time (frameMsEma) with a 90-frame warmup before adjusting, avoiding jittery resolution flips.
  • renderAt accepts forcedTimeSeconds; warmup()/workload() drive deterministic probes feeding window.__buildoff metrics (objects/avgFrameMs/renderScale).
  • Domain warp (warpedPoint) + z-axis mod repetition build a real tunnel of distinct material IDs (rings/walls/columns/pearls/strands/shelves), not a lone sphere.
OPENCODE

OPENCODE

7.29
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersπŸŒ‘ too darkfeat 96.4%1 fps6 KB gz10 MBpivots 2.0⏱ 9m52sbaseline/v1-baseline
Tier A Β· objective  7.27
Correct5.89
Clean8.0
Struct9.04
Speed4.0
Memory10.0
Tier B Β· panel  7.32
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Fullscreen coverage via gl_VertexID bit-twiddling in the vertex shader (single triangle, zero VBO/VAO attributes) β€” quad.ts even defensively disables attrib arrays.
  • Probe is a live mutable object updated in place, so window.__buildoff.fps always reflects the latest value without re-installing the global.
  • Hardened init: try/catch around Scene boot paints a CSS radial-gradient fallback so a context-loss/shader failure never leaves a blank page (main.ts).
  • Adaptive resolution with hysteresis (drop <45fps for 2s, raise >55fps for 2s, clamped [0.5,1.0]) plus a 0.85 default render-scale and DPR cap.

Brief 14 β€” Mandelbrot Deep-Zoom Β· πŸ₯‡ CLAUDE

algorithm showcase

An autoplaying deep-zoom Mandelbrot explorer β€” smooth iteration coloring, continuous descent into the set.

CLAUDE

CLAUDE

9.26
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersπŸŒ‘ too darkfeat 85.7%55 fps5 KB gz10 MBpivots 2.0⏱ 15m53sbaseline/v1-baseline
Tier A Β· objective  9.75
Correct9.57
Clean10.0
Struct9.94
Speed9.25
Memory10.0
Tier B Β· panel  8.65
UniquePlanAdhereReasonResearchOverall
Unique insights
  • AdaptiveResolution controller (adaptiveResolution.ts) β€” EMA-smoothed, damped, 5-frame-gated dynamic render-scale (min 0.32) that sheds pixels to hold 60fps on software WebGL and claws back to full-res on real GPUs; raised headless fps ~29β†’56.
  • True single-source-of-truth math: CPU reference fractal.ts, the GLSL shader, and palettes.ts/targets.ts data are all shared so 'correct in unit tests' equals 'correct on screen' β€” CPU fallback literally calls smoothIterations/cosinePalette per pixel.
  • dt clamp (Math.min(dt,0.1)) in ZoomController.update prevents a backgrounded tab from fast-forwarding the tour on resume; explicitly unit-tested.
  • Full-screen triangle synthesized from gl_VertexID with an empty VAO β€” no vertex/index buffers, minimal JS-side state per frame.
CODEX

CODEX

8.05
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersπŸŒ‘ too darkfeat 89.3%3 fps4 KB gz10 MBpivots 2.0⏱ 7m59sbaseline/v1-baseline
Tier A Β· objective  8.67
Correct9.68
Clean10.0
Struct8.31
Speed4.0
Memory10.0
Tier B Β· panel  7.3
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Wall-clock decoupled from rAF: elapsedSeconds + runStartedAt accumulator (cancel/schedule) freezes the tour clock on pause so resuming continues from the same zoom depth rather than jumping.
  • Zoom interpolated in log-space (mix of log(startScale)..log(endScale) then exp) for true constant-rate exponential deep-zoom, with maxIterations scaled by computed zoomDepth (170 + depth*22, capped 620).
  • Dual pause gating: document visibilitychange AND IntersectionObserver both feed one isPaused() predicate that also folds in reduced-motion and user pause.
  • Reduced-motion renders a curated static frame at a fixed mid-tour time (23.5s) instead of the bland t=0 entry view.
build failed

OPENCODE

6.94
minimax-m3 Β· OpenCode (isolated PTY)
build βœ—πŸŸ’ rendersfeat 89.3%fps β€”44 KB gzheap β€”pivots 4.0⏱ 12m05sbaseline/v1-baseline
Tier A Β· objective  6.19
Correct0.0
Clean8.0
Struct7.95
Speed10.0
Memory10.0
Tier B Β· panel  7.85
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Crossfade tour handoff in zoom.ts: blends center+scale to the next leg during the last 30% so the camera never bottoms out into flat color at deep zoom β€” sidesteps the highp float precision wall (~1.5e-6) by moving on before banding shows.
  • isInKnownBulb() cardioid/period-2 bulb test in fractal.ts (cheap early in/out classification) β€” included for unit-testable correctness beyond the required contract.
  • Shader mirrors CPU math line-by-line and adds perceptual polish: sqrt-stretch of low-iter detail, gamma 0.92 tone-map, and 1.06 contrast lift so deep zooms stay readable.
  • Single full-screen triangle (not a quad) with uniforms-only per-frame state β€” minimal draw call, clean separation where the renderer knows nothing about zoom/palette.

Brief 15 β€” Boids Flocking Β· πŸ₯‡ CLAUDE

algorithm showcase

A self-playing boids flock β€” separation/alignment/cohesion, spatial bins, thousands of agents at 60fps.

CLAUDE

CLAUDE

9.42
Anthropic Opus 4.8 Β· Claude Code
build βœ“πŸŸ’ rendersfeat 96.4%60 fps6 KB gz10 MBpivots 3.0⏱ 22m26sbaseline/v1-baseline
Tier A Β· objective  9.86
Correct9.89
Clean10.0
Struct9.5
Speed9.97
Memory10.0
Tier B Β· panel  8.88
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Renderer deliberately rejects WebGL2 instancing for a flat non-instanced 'triangle soup' (3 verts/boid expanded into one dynamic buffer, single drawArrays) because the harness's SwiftShader software backend collapses instancing to ~13fps β€” isSoftwareRenderer() detection plus a measured tradeoff, a backend-aware optimization most agents would miss.
  • Spatial grid uses a counting-sort rebuild into typed-array SoA (cellStart prefix-sum + cursor scatter) with allocation only on growth, so steady-state frames produce zero GC garbage.
  • step() inlines the three Reynolds rules into a SINGLE closure visitor over the 3Γ—3 cell block β€” one neighbor pass accumulates separation, alignment and cohesion at once instead of three separate sweeps, avoiding any Boid[] allocation per frame while keeping lib/flock.ts as the canonical tested spec.
  • Soft boundary uses a margin-proportional steer-back force (organic) rather than a hard wrap/clamp pop; dt is clamped to 2.5 so returning from a backgrounded tab can't explode the sim.
CODEX

CODEX

8.65
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersfeat 92.9%60 fps4 KB gz10 MBpivots 2.0⏱ 13m08sbaseline/v1-baseline
Tier A Β· objective  9.65
Correct9.79
Clean10.0
Struct8.55
Speed10.0
Memory10.0
Tier B Β· panel  7.42
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Spatial grid uses integer linked-list buckets (heads/next Int32Array) instead of per-cell arrays β€” zero allocation per rebuild, just heads.fill(-1) and pointer rewires.
  • computeSteering fuses separation/alignment/cohesion into a single neighbor pass with nested radius-squared gates (separationβŠ‚alignmentβŠ‚cohesion), avoiding three separate scans.
  • Precomputed cosine color palette (SPEED_BANDSΓ—HEADING_STEPS = 4Γ—128 = 512 strings) keyed by quantized heading+speed band β€” no per-frame color string construction.
  • Trails via translucent TRAIL_FILL overlay + 'lighter' composite for additive glow, with depth-varying length/width/alpha for a fake-3D layered flock.
OPENCODE

OPENCODE

8.82
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersπŸŒ‘ too darkfeat 85.7%60 fps4 KB gz10 MBpivots 4.0⏱ 12m18sbaseline/v1-baseline
Tier A Β· objective  9.81
Correct9.57
Clean10.0
Struct9.71
Speed10.0
Memory10.0
Tier B Β· panel  7.62
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Spatial grid uses a linked-list-in-array (Int32Array head/next) instead of per-cell arrays β€” zero per-frame allocation, cleared with a single head.fill(-1).
  • SoA ParticlePool (4 Float32Arrays) + reused BoidView scratch buffer (writeView) gives the contract's Boid[] shape with no per-frame object churn.
  • Anti-stall edge case: when velocity collapses below minSpeed (m2<1e-6) it restarts at a deterministic golden-ratio angle rather than leaving a frozen boid.
  • dt clamped to <0.05s (with 0.016 fallback) in both main.ts and world.step to avoid integration explosions after tab-resume.

Brief 16 β€” Lofi Video Loop Β· πŸ₯‡ CODEX

Remotion pre-rendered video

A seamlessly-looping relaxing lofi video (Remotion) β€” drifting clouds at dawn/dusk. Reference template provided.

build failed

CLAUDE

7.5
Anthropic Opus 4.8 Β· Claude Code
build βœ—πŸ”΄ crashfeat 89.3%30 fps3699 KB gzheap β€”pivots 3.0⏱ 22m50sbaseline/v1-baseline
Tier A Β· objective  6.93
Correct0.0
Clean10.0
Struct9.89
Speed10.0
Memory9.71
Tier B Β· panel  8.2
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Hand-rolled OKLab color math (color.ts) with full sRGB<->OKLab matrices; sky gradient mixed perceptually so dawn/dusk hues stay luminous instead of muddy through grey β€” genuinely beyond 'lerp two CSS colors'.
  • Seamlessness is structural, not tuned: scrolling layers use INTEGER spatial frequencies scrolled by integerLoops*loopPhase (exact wrap) and breathing uses cos(2pi*phase) (zero slope at the seam, hiding even the 1-frame gap).
  • Reflection layer derives water tint from the LIVE sky horizon stop (skyGradient()[3]) then deepens it in OKLab, so the lake automatically stays in-palette as the sky breathes β€” sun pillar, mirrored+blurred ridges, ripple highlights, warm waterline, depth fade.
  • Objective self-verification with ffmpeg PSNR (0vs299=33.7dB seamless, 0vs150=16.3dB animating) plus a 29-test suite including OKLab round-trips and a per-driver loop-seam suite importing the exact scene-core path the hidden tests use.
CODEX

CODEX

8.68
OpenAI gpt-5.5 Β· Codex CLI
build βœ“πŸŸ’ rendersfeat 89.3%30 fps3891 KB gzheap β€”pivots 3.0⏱ 19m24sbaseline/v1-baseline
Tier A Β· objective  9.13
Correct9.68
Clean8.0
Struct8.95
Speed9.3
Memory9.64
Tier B Β· panel  8.12
UniquePlanAdhereReasonResearchOverall
Unique insights
  • sceneLoopPhase() is a clever seam-aware phase mapping over total-1 but pinning frames 0, total-1 AND total to phase 0, so the rendered last frame equals the first β€” verified byte-identical, not just approximately.
  • Parallax is encoded structurally: cloud drift spread is 340/260/190px for near/mid/far layers, giving depth-scaled motion rather than a uniform pan.
  • All randomness is deterministic via mulberry32(seed); every generator (makeClouds/makeRain/makeSparks/makeBirds) takes seed, so defaultProps.seed yields reproducible variants.
  • Documented a real engineering pivot in SELF.md: default 8x Remotion concurrency OOM'd, dropped to 2x + JPEG frame buffers (q84) to keep renders stable.
OPENCODE

OPENCODE

7.97
minimax-m3 Β· OpenCode (isolated PTY)
build βœ“πŸŸ’ rendersfeat 85.7%30 fps4140 KB gzheap β€”pivots 0.0⏱ β€”baseline/v1-baseline
Tier A Β· objective  9.38
Correct9.57
Clean8.0
Struct9.93
Speed9.96
Memory9.54
Tier B Β· panel  6.24
UniquePlanAdhereReasonResearchOverall
Unique insights
  • Strong unifying conceit: the whole scene is 'sitting inside a cozy room looking out a rain-streaked window' β€” WindowFrame mullions + near-glass Droplets + top-right interior-lamp WarmGlow + Vignette, far more cohesive than a bare gradient.
  • Seamlessness enforced at the type level: RainStreak.speedMul and Droplet.wobbleFreq are 1|2|3 literal types, with comments noting only integer phase multipliers make sin(k*2pi+x)===sin(x) wrap cleanly.
  • Clean contract split: scene-core.ts is a React/DOM-free 'Acceptance Contract' (mulberry32, lerp, clamp, smoothstep, loopPhase, skyGradient) with documented invariants like loopPhase(T,T)===0, separate from scene-utils.ts per-layer helpers.
  • Per-layer deterministic fields via mulberry32 with offset seeds (Bokeh=seed, RainFar=seed+1, RainNear=seed+2, Droplets=seed+3) β€” independent layers, fully reproducible from one prop.

βš–οΈ Judge profiles

Generosity = avg blind score each agent gave across briefs.

RaterGenerosityBallots
CLAUDE7.353
CODEX7.1753
OPENCODE6.7753

πŸ‘₯ Named reputation matrix rows rate cols, by name

rater \ targetCLAUDECODEXOPENCODE
CLAUDE8.87.87.4
CODEX8.758.256.75
OPENCODE8.87.47.6

Blind vs Named β€” bias delta

bias Ξ” = named βˆ’ blind. Positive = build scores higher when judges know who made it (name halo).

builderblindnamedbias Ξ”
CLAUDE8.98.84-0.06
CODEX7.57.79+0.29
OPENCODE7.357.26-0.09

πŸ§ͺ Experiment ledger prompts Γ— methodologies Γ— outcomes

We vary how agents are asked (prompt variant) and the process harness (methodology: baseline / gsd / ralph-loop / skills), then watch what lifts composite, feature coverage, and plan-adherence. Build time = launch β†’ SELF.md.

briefagentpromptmethodcompositefeat%pivotsbuild time
01-graph-explorerCLAUDEv1-baselinebaseline9.46100.03.035m29s
01-graph-explorerCODEXv1-baselinebaseline8.61100.04.020m46s
01-graph-explorerOPENCODEv1-baselinebaseline8.56100.02.012m46s
02-sheet-engineCLAUDEv1-baselinebaseline9.3492.92.013m51s
02-sheet-engineCODEXv1-baselinebaseline8.7292.93.012m54s
02-sheet-engineOPENCODEv1-baselinebaseline8.5192.95.015m56s
03-notes-appCLAUDEv1-baselinebaseline9.29100.03.044m55s
03-notes-appCODEXv1-baselinebaseline8.81100.03.017m16s
03-notes-appOPENCODEv1-baselinebaseline8.79100.04.09m11s
04-tower-defenseCLAUDEv1-baselinebaseline9.24100.03.016m17s
04-tower-defenseCODEXv1-baselinebaseline8.896.43.017m48s
04-tower-defenseOPENCODEv1-baselinebaseline9.0592.93.08m42s
05-pipeline-toolCLAUDEv1-baselinebaseline9.2289.32.012m50s
05-pipeline-toolCODEXv1-baselinebaseline8.6196.41.016m19s
05-pipeline-toolOPENCODEv1-baselinebaseline5.8585.72.0β€”
06-bg-falling-leavesCLAUDEv1-baselinebaseline9.1196.42.010m38s
06-bg-falling-leavesCODEXv1-baselinebaseline6.7496.42.034m00s
06-bg-falling-leavesOPENCODEv1-baselinebaseline8.7396.43.012m02s
07-bg-lava-flowCLAUDEv1-baselinebaseline9.21100.02.018m13s
07-bg-lava-flowCODEXv1-baselinebaseline2.4742.90.062m21s
07-bg-lava-flowOPENCODEv1-baselinebaseline8.0896.42.015m38s
08-bg-geometric-flowCLAUDEv1-baselinebaseline8.67100.04.010m34s
08-bg-geometric-flowCODEXv1-baselinebaseline2.8196.42.083m31s
08-bg-geometric-flowOPENCODEv1-baselinebaseline8.23100.03.08m15s
09-3d-freeformCLAUDEv1-baselinebaseline9.492.92.018m21s
09-3d-freeformCODEXv1-baselinebaseline8.4592.92.016m54s
09-3d-freeformOPENCODEv1-baselinebaseline3.670.0β€”134m58s
10-bg-auroraCLAUDEv1-baselinebaseline9.07100.03.017m14s
10-bg-auroraCODEXv1-baselinebaseline8.45100.04.017m29s
10-bg-auroraOPENCODEv1-baselinebaseline3.12100.03.012m46s
11-bg-flow-fieldCLAUDEv1-baselinebaseline8.43100.01.09m12s
11-bg-flow-fieldCODEXv1-baselinebaseline9.02100.04.014m01s
11-bg-flow-fieldOPENCODEv1-baselinebaseline7.6296.45.08m11s
12-3d-procedural-planetCLAUDEv1-baselinebaseline8.79100.04.031m08s
12-3d-procedural-planetCODEXv1-baselinebaseline8.0896.42.013m34s
12-3d-procedural-planetOPENCODEv1-baselinebaseline7.4396.43.0β€”
13-3d-raymarched-sdfCLAUDEv1-baselinebaseline8.95100.05.029m47s
13-3d-raymarched-sdfCODEXv1-baselinebaseline8.12100.03.015m56s
13-3d-raymarched-sdfOPENCODEv1-baselinebaseline7.2996.42.09m52s
14-mandelbrotCLAUDEv1-baselinebaseline9.2685.72.015m53s
14-mandelbrotCODEXv1-baselinebaseline8.0589.32.07m59s
14-mandelbrotOPENCODEv1-baselinebaseline6.9489.34.012m05s
15-boidsCLAUDEv1-baselinebaseline9.4296.43.022m26s
15-boidsCODEXv1-baselinebaseline8.6592.92.013m08s
15-boidsOPENCODEv1-baselinebaseline8.8285.74.012m18s
16-lofi-videoCLAUDEv1-baselinebaseline7.589.33.022m50s
16-lofi-videoCODEXv1-baselinebaseline8.6889.33.019m24s
16-lofi-videoOPENCODEv1-baselinebaseline7.9785.70.0β€”