Autonomous Build Stack

DUSK × Ghost × MIRROR

Three tools. One loop. Build until reality reflects the design.

🌆
DUSK
The Builder

Autonomous coding loop. Runs through the night. Takes a task file, iterates with Codex CLI (GPT-5.4), checks for interrupts, auto-opens PRs when done.

dusk.sh
👻
Ghost
The Tester

DUSK runs the app as a human user. No mocks. No seeds. Real AI calls, real infra. Playwright drives every flow end-to-end. Screenshots every step.

dusk-ghost.sh
🪞
MIRROR
The Orchestrator

Design-to-reality loop. Takes a North Star (PDF/screenshot), runs Ghost to score fidelity, feeds gaps back to DUSK. Repeats until the app matches the design.

mirror.sh
DUSK builds → Ghost tests → MIRROR scores → DUSK gets the delta → repeat
The MIRROR Loop — Step by Step
0
You provide a North Star

Drop in a design PDF, a Figma screenshot, or an HTML mockup. This is the definition of "done." No North Star = no direction. Design IS the spec.

INPUT
1
MIRROR extracts the delta

On first run: the whole design is the task. On subsequent runs: only the gaps identified by Ghost. Feeds DUSK a precise task — not "build the app," but exactly what's wrong and where.

MIRROR
2
DUSK builds toward the design

Codex CLI (GPT-5.4) works through the task in iterations. Each iteration: read current state → plan one step → execute → commit → check DONE/BLOCKED. Interrupts checked every 3 iterations. Auto-PR when done.

DUSK
3
Ghost drives the real app as a human user

No mocks. Playwright navigates your running app with a real test user, goes through the configured flow end-to-end, captures screenshots at every step, and compares to the North Star.

GHOST
4
Gemini Vision scores fidelity (0–100)

Each step gets a score: ✅ MATCH / ⚠️ CLOSE / ❌ MISS / 💀 BROKEN. Misses become the next DUSK task. If score ≥ 80 on all steps: loop exits with PASS. Otherwise: back to step 1.

MIRROR
Repeat until fidelity ≥ 80 (PASS)

Default: max 5 cycles, configurable. Each cycle tightens the gap. DUSK knows exactly what Ghost found wrong. Ghost knows exactly what DUSK changed. MIRROR holds the score and decides when it's done.

LOOP
Fidelity Score Tiers
MATCH

Visually and functionally matches the North Star. No action needed.

⚠️
CLOSE

Minor differences — layout, copy, color. Loop exits if all steps are CLOSE or better.

MISS

Major divergence — missing elements, broken flow. Becomes the next DUSK task.

💀
BROKEN

Flow can't complete. Highest priority — stops the loop and alerts immediately.

"Design IS the spec. Pictures of done are requirements. Build until reality reflects the design."
// MIRROR philosophy
Under the Hood
Build Engine
Codex CLI · GPT-5.4
Browser Automation
Playwright · real flows
Vision Scorer
Gemini Vision API
North Star Format
PDF · PNG · HTML · MD
Notifications
Discord · Slack · webhook
Target
Any running web app