Next case study
ReflexAIA playbook for AI-assisted design systems
AI defaults to generic output when it works from training data alone. Sistema gives it better source material.

TL;DR
When AI agents tackle design system work without grounded references, they pattern-match from training data — and the output looks like it. Sistema is a knowledge base and playbook tool that gives agents access to curated documentation from real design systems, so every generation starts from the same references a senior designer would consult. I built up a knowledge base and a playbook of useful prompts for design system work, then used it to design itself.
My Role
Sole designer and developer
Outcome
Public beta — 142+ pages, 25 plays, 2 campaigns, full Style Dictionary token pipeline
The Generic Output Problem
When you ask an AI agent to generate a color palette or type scale without any grounding, it produces something that passes every technical test and signals no design intent. Medium-blue primary. Near-white surface with no hue temperature. Border-radius consistent across every component. Type scale differentiated by size alone. The outputs are correct. They're just not designed.
The problem isn't the AI. It's the brief. Without access to real reference material — how Carbon structures its two-tier token architecture, how Material 3 handles elevation, what WCAG 2.2 actually requires for UI components versus body text — the agent defaults to the statistical center of design system patterns it's seen. Safe, generic, forgettable.
This pattern kept surfacing as I was experimenting with AI-assisted design systems work. Every time I started a new design system project with Claude, the first output looked roughly the same. I'd spend most of my time correcting generic defaults rather than making intentional decisions. I was sure that design system foundations were ripe territory for automation — the agent just needed better guidance.
Better Reference, Better Output
The idea behind Sistema is simple: the reference material is the constraint. Give an agent a brief and ask it to generate color tokens, and it produces plausible output. Give it the same brief after it reads Carbon's two-tier token architecture and Material 3's elevation model, and it produces something specific — with decisions it can justify by reference to how real systems solve the same problem.
Sistema is built around that idea. The knowledge base crawls real design systems and UI libraries — Material Design 3, Carbon, Atlassian, Primer, Ant Design, Radix — and makes them available at stable endpoints agents can fetch. Each play in the playbook embeds that material (i.e. links to the markdown) directly in the prompt, so the agent reads the references before generating.
I didn't write the knowledge base content — it's a curated synthesis. That curation is the feature: knowing which parts of Carbon's color architecture are specific to IBM's brand and which patterns generalize, knowing where the W3C standards and the popular frameworks disagree and why, knowing what questions a designer should be answering at each phase of a token system build.
Plays & Campaigns
A play is a structured prompt with fetch instructions baked in. The generate-color-scheme play reads Sistema's color architecture synthesis before producing token files — so the agent sees how Material, Carbon, and Atlassian each solve the same problem before proposing a solution for your project. The result is token output that reflects real design systems knowledge, not statistical averages.
Campaigns compose plays into multi-step sequences with human review gates between phases. The Bootstrap a Design System campaign is the main one: 4 phases, 11 steps, from blank repo to deployed components. It pauses after each phase for approval before advancing — you stay in control of the decisions while the agent handles the generation work.
Proof: Sistema Designed Itself
The cleanest test of whether the tool works is to run it on itself. Late in the build, I ran the Bootstrap campaign on Sistema's own repository. The agent scanned the existing codebase, worked through the establish-context step, generated a visual direction brief, produced a complete DESIGN.md, then generated color, typography, shape, and spacing token files, configured the Style Dictionary v5 pipeline, and applied a design pass across 24 files.
The DESIGN.md it produced became the spec. Sistema's current visual design — the electric blue primary, the three-tier dark mode surface stack, the typography scale — came from human design input meeting Sistema's own playbook and knowledge base. The first end-to-end proof that the campaign works is the product itself.
version: "1.1"
name: Sistema
description: >
Design system knowledge base and playbook tool for designers and developers
building, auditing, or maintaining design systems. Bold, typographically driven,
utilitarian with strong brand expression. Light primary; dark follows
prefers-color-scheme by default.
stack:
framework: Next.js 15 (App Router)
styling: Tailwind CSS v4 + CSS custom properties
tokens: Style Dictionary v5 — source in tokens/semantic/, output to src/styles/tokens/generated.css
fonts: next/font/google (Inter, Fraunces, JetBrains Mono)
language: TypeScript
# ─── Colors ──────────────────────────────────────────────────────────────────
colors:
canvas: "#FFFFFF" # page background — pure white, deliberate flat-surface aesthetic
surface: "#FFFFFF"
surface-raised: "#FFFFFF" # cards, panels
surface-sunken: "#F7F6F2" # input backgrounds, code blocks
on-surface: "#0E1116"
on-surface-muted: "#5B6470"
on-surface-subtle: "#8A929C"
border: "#E4E7EB"
border-strong: "#C9CFD6"
border-focus: "#0070FF"
primary: "#0070FF" # UI components, large text (3:1 on white); use #005CE6 for small text
on-primary: "#FFFFFF"
primary-container: "#E8F1FF"
on-primary-container: "#003A9E"
secondary: "#FFCC33" # accent only on light surfaces; non-text or large text
on-secondary: "#1A1200"
brand-red: "#E60026" # logo and deliberate brand moments only — not for error states
brand-yellow: "#FFCC33"
error: "#B91C1C"
on-error: "#FFFFFF"
success: "#15803D"
on-success: "#FFFFFF"
warning: "#B45309"
on-warning: "#FFFFFF"
# Dark mode — applied via [data-theme="dark"] (set by prefers-color-scheme)
dark-canvas: "#0D0D0D"
dark-surface: "#111111"
dark-surface-raised: "#1C1C1C"
dark-surface-overlay: "#252525"
dark-surface-sunken: "#0A0A0A"
dark-on-surface: "#F3F4F6"
dark-on-surface-muted: "#9CA3AF"
dark-border: "#2D2D2D"
dark-primary: "#4D9FFF"
dark-secondary: "#FFCC33"
dark-error: "#FCA5A5"
dark-success: "#4ADE80"
dark-warning: "#FCD34D"
# ─── Typography ──────────────────────────────────────────────────────────────
fonts:
sans: "'Inter', system-ui, -apple-system, sans-serif"
serif: "'Fraunces', Georgia, 'Times New Roman', serif" # body text; variable weight + optical sizing
mono: "'JetBrains Mono', 'Cascadia Code', 'Fira Mono', monospace"
typography:
display:
fontFamily: sans
fontSize: 56px
fontWeight: 800
lineHeight: 1.0
letterSpacing: -0.025em
heading-xl:
fontFamily: sans
fontSize: 40px
fontWeight: 700
lineHeight: 1.1
letterSpacing: -0.02em
heading-lg:
fontFamily: sans
fontSize: 32px
fontWeight: 700
lineHeight: 1.15
letterSpacing: -0.015em
heading-md:
fontFamily: sans
fontSize: 24px
fontWeight: 600
lineHeight: 1.2
letterSpacing: -0.01em
heading-sm:
fontFamily: sans
fontSize: 20px
fontWeight: 600
lineHeight: 1.3
letterSpacing: 0em
body-lg:
fontFamily: serif
fontSize: 18px
fontWeight: 400
lineHeight: 1.75
letterSpacing: 0em
body-md:
fontFamily: serif
fontSize: 16px
fontWeight: 400
lineHeight: 1.65
letterSpacing: 0em
label:
fontFamily: sans
fontSize: 12px
fontWeight: 500
lineHeight: 1.35
letterSpacing: 0.02em
code:
fontFamily: mono
fontSize: 14px
fontWeight: 400
lineHeight: 1.7
letterSpacing: 0em
# ─── Shape ───────────────────────────────────────────────────────────────────
radii:
none: 0px
sm: 6px # tooltips, tags
md: 10px # buttons, inputs, form controls
lg: 16px # cards, panels
xl: 22px # prompt box, large featured surfaces
full: 9999px # pills, chips, badges
shadows:
sm: "0 1px 2px rgba(14,17,22,0.05)"
md: "0 4px 14px rgba(14,17,22,0.06), 0 1px 2px rgba(14,17,22,0.04)"
# ─── Do's & Don'ts ───────────────────────────────────────────────────────────
dos:
- Use Fraunces for all body copy; Inter for headings, labels, and UI text
- Use primary (#0070FF) for interactive elements and large text; use #005CE6 for small text links
- Use brand-red (#E60026) only for the logo and intentional brand moments; never for errors
- Use error/success/warning semantic tokens for all feedback states
donts:
- Don't use brand-red (#E60026) for error states — use --color-error (#B91C1C)
- Don't use secondary yellow (#FFCC33) as body text on light backgrounds
- Don't use Inter for body prose — Fraunces is the designated body typeface
- Don't hardcode dark mode colors inline — use semantic token variables that resolve per themeWhere Things Stand
Sistema is in public beta. Built in roughly seven days of intensive Claude Code sessions, it's a working demonstration of the core hypothesis: grounding AI design work in real reference material produces better output than prompting from a blank slate.
What's been proven: the tooling works, the knowledge base is queryable, the Bootstrap campaign runs end-to-end, and the dogfooding produced a real design system. What hasn't been proven yet: whether it's useful to designers other than me, at what scale the reference material starts to degrade in quality, and which kinds of design system problems benefit most from structured plays versus open-ended prompting. Those are the open questions — and the interesting ones.
