How We Generate Art

Building consistent AI-generated assets at scale with Gemini

596+ game assets, all visually coherent, fully AI-generated. This page explains the system that makes it possible.

The Core Problem

Without explicit guidance, AI image models produce wildly inconsistent styles. Here's what happens when you just ask for "cyberpunk character portrait":

Without References

Photorealistic

Anime

Oil painting

3D render

Same prompt, four different styles. Unusable for a game.

With Reference System

Same style, different characters. Game-ready.

💡

The Insight: Treat AI image generation as a style transfer problem, not a pure generation problem. Every image we generate receives explicit visual references that define what "correct" looks like.

The Reference Frame Budget

Gemini accepts up to 14 reference images per generation request. This isn't a design choice—it's an API constraint that forces strategic allocation.

⚡

Why 14? This is a hard limit in the Gemini image generation API. More references = better consistency but less creative freedom for the model. We treat it as a budget to allocate wisely.

14 REFERENCE SLOTS Typical allocation for a story scene

STYLE Gold standard

SCENE Scene type

CHAR 1 Portrait

CHAR 1 Face

CHAR 1 Body

CHAR 1 Action

CHAR 2 Portrait

CHAR 2 Face

CHAR 2 Body

... 5 left

A two-character scene with full references uses 9 of 14 slots. Single-character scenes have more headroom for additional style or environmental references.

Three-Layer Reference System

Gold Standard (Style)

Gold standard reference - age1_b_rooftop from style calibration

age1_b_rooftop.png

#0f0f23

#1a1a2e

#00FFFF

#FF0080

#f59e0b

Always included. This specific image was selected through our style calibration process—a 120-image interview that determined exactly which visual characteristics define "CyberIdle style."

Without this anchor, Gemini drifts toward whatever style it "feels like" generating. With it, every image inherits the same DNA: NES-era pixel art aesthetic, limited color palette, clean linework.

 "Use this reference image as a style guide. Match its pixel art style, color palette (magenta #FF0080, cyan #00FFFF, dark teal #1a1a2e), pixel density, and overall aesthetic exactly." 

Scene-Type References (Composition)

Different scene types need different composition rules. A cityscape wide shot has completely different framing than a character portrait or an action sequence. We calibrated 12 scene categories through 120-image interviews.

Cityscape Wide establishing shots

Alley Street-level perspective

Terminal Interior tech spaces

Action Dynamic sequences

Portrait Mood Character emotion focus

Cyberspace Digital environments

Interior Indoor spaces

Rooftop Elevated views

Tech Detail Close-up technology

Dramatic High-contrast lighting

Rain Weather effects

Warmth Amber-lit scenes

Each reference was selected from 4 interview options based on composition quality, style consistency, and how well it guides generation for that scene category.

Character References (Identity)

Major characters get a 4-reference set. Here's GG's complete reference allocation:

GG's 4-Reference Set

PORTRAIT Primary reference

FACE Close-up details

BODY Full figure

ACTION Dynamic pose

Major Characters

4 refs

Portrait + Face + Body + Action

Supporting

3 refs

Portrait + Face + Body

Minor

2 refs

Portrait + Face

Background

1 ref

Primary portrait only

Real Generation Examples

Here's exactly what goes into generating different types of images—the input scene, all reference images, the complete prompt, and the output.

Age 1 Building: Street Hacker Setup

Building sprite • No character refs needed

Scene Description

"A cramped street-level hacker den with exposed wiring, multiple flickering monitors, and makeshift tech equipment. Age 1 starter building vibe."

References Used (2 of 14 slots)

Gold Standard

Terminal Type

Complete Prompt to Gemini

Use this reference image as a style guide. Match its pixel art style, color palette (magenta #FF0080, cyan #00FFFF, dark teal #1a1a2e), pixel density, and overall aesthetic exactly.

SCENE: A cramped street-level hacker den with exposed wiring, multiple flickering monitors, and makeshift tech equipment. Age 1 starter building vibe.

FRAMING: Square composition for building sprite. Clean edges, no character subjects. Focus on environmental detail.

STYLE CONSTRAINTS: NES-era pixel art. No photorealism. No gradients. Clean pixel edges. Dark background for transparency cropping.

Generated Output

Resolution: 2K Aspect: 1:1 Cost: ~$0.04

Story Hero: The Sick Day

Story banner • Atmospheric scene with silhouette

Scene Description

"The Sprawl during a brutal storm - lightning cracks across the skyline while GG's silhouette watches from a window. Cinematic ultra-wide composition."

References Used (4 of 14 slots)

Gold Standard

Rain Type

Cityscape Type

Dramatic Type

Generated Output

The Sprawl during a brutal storm with GG's silhouette

Resolution: 2K Aspect: 21:9 Cost: ~$0.04

Queue-Based Generation System

At scale, we don't generate images one at a time. Jobs are defined in JSON queues and processed in batches—either sequentially for safety or in parallel for speed.

📋

1. Queue Definition

Jobs defined in JSON with prompt, output path, references, aspect ratio, and resolution.

→

⚙️

2. Generation Mode

Sequential (~25s/image, safe) or parallel with 4 workers (~6-7s/image, 4x speedup).

→

✓

3. Checkpoint & Upload

Status tracked per job. Git commit every 3 images. Automatic CDN upload.

Queue job format (generation-queue.json)

{
  "jobs": [
    {
      "id": "age1_building_01",
      "prompt": "Cramped street-level hacker den...",
      "output": "docs-site/public/images/buildings/age1_hacker.png",
      "refs": ["gold_standard", "terminal_scene"],
      "aspect": "1:1",
      "resolution": "2K",
      "status": "pending"
    }
  ]
}

Sequential Mode

./generate-from-queue.sh --all

~25 seconds per image
Safe for long batches
Checkpoint commits every 3
Best for overnight runs

Parallel Mode

./generate-parallel.sh --workers 4

~6-7 seconds per image
4x throughput speedup
Respects API rate limits
Best for batch sprints

Art Interview Philosophy

Why do we use multi-round interviews to calibrate art style instead of just writing detailed prompts? Because visual art direction exists in a space that language can't fully capture.

"Humans cannot successfully communicate visual art guidance through language alone. Art direction exists in a high-dimensional space that gets lossy-compressed when projected into the low-dimensional space of English tokens."

The Problem with Text-Only Prompts

"Make it more cyberpunk"

→ Infinite valid interpretations

"Better composition"

→ Countless framing options

"More dynamic"

→ Motion blur? Pose? Angle?

"Darker mood"

→ Lighting? Palette? Expression?

The Solution: Selection Over Description

Instead of trying to describe what you want, select from options. This is the same approach we use to prompt Gemini (reference images) and to prompt the human art director (interview choices).

Generate 4 dramatically different options (A/B/C/D)

Each should explore a different direction. Include at least one you think they'll hate—negative feedback is valuable.

Designer selects and explains WHY

Not just "I like B" but "B because the lighting feels more intimate and the pose suggests confidence not aggression."

Extract dimensional preferences

Convert subjective feedback into concrete rules: "Prefer warm accent lighting over cool. Neutral poses over aggressive."

Update guides and gold standards

Selected images become future references. Rules get added to style guides. The system learns.

🎯

Discovery Example: Cyber Chomp
Through the interview process, we discovered that Cyber Chomp should appear "zoned out while chaos happens around him"—a personality trait that emerged from selection, not description. We never would have written that in a prompt.

Prompt Construction Pipeline

User scene descriptions get transformed through multiple enrichment stages before hitting the API.

User Input

"GG confronts Helena Voss in Nexus lobby"

↓

+ Style Instruction

"Match pixel art style, color palette..."

↓

+ Framing Hint

"Wide cinematic composition with horizontal emphasis..."

↓

+ Character Note

"Match distinctive visual features EXACTLY..."

↓

Final Prompt

~400 tokens + 6 reference images

Actual prompt transformation (from generate-from-queue.sh)

STYLE_INSTRUCTION="Use this reference image as a style guide.
Match its pixel art style, color palette (magenta #FF0080,
cyan #00FFFF, dark teal #1a1a2e), pixel density, and
overall aesthetic exactly."

# Add framing based on aspect ratio
if aspect_ratio in ["21:9", "16:9"]:
    prompt += """
FRAMING: Wide cinematic composition with horizontal emphasis.
Key subjects should have breathing room. Use the full width."""

# Add character consistency instruction
if char_refs:
    prompt += f"""
CHARACTER CONSISTENCY: Match their distinctive visual
features EXACTLY as shown - same face structure, hair,
clothing, augmentations, and distinctive features."""

The Iteration Process

Major characters go through extensive calibration. Here's Cyber Chomp's reference development:

Round 1 Face Variants

Selected: Option C - best eye rendering

→

Round 2 Body Variants

Selected: Option B - proportions match face

→

Round 3 Action Variants

Selected: Option A - most dynamic pose

→

Final Reference Set

4 references locked for all future generations

This interview process was 52 iterations for Cyber Chomp alone. The investment pays off: every future image of this character will be consistent.

Strategies by Art Type

Type	Refs	Aspect	Strategy
Character Portrait	0-1	1:1	NES pixel locked, contextual background
Story Scene	3-5	16:9	Multi-ref: style + chars + location
Hero Image	2-3	21:9	Ultra-wide, MUST center subjects
Building Sprite	1	varies	Transparent bg, isometric, NO rain
Resource Icon	1	1:1	64px readable, dark teal background

Color Palette & Style Rules

Deep Navy #0f0f23 Backgrounds, void spaces

Dark Teal #1a1a2e Structures, panels, tech

Cyan #00FFFF Tech highlights, data, UI

Magenta #FF0080 Neon accents, danger, energy

Amber #f59e0b Warm light, fire, hope

Neon Green #00ff88 Success states, growth, nature

Hard Rejections (Never Use)

Photorealistic rendering or textures
Gradients that break pixel aesthetic
Pure white backgrounds (#FFFFFF)
Pastel or washed-out colors
3D perspective inconsistency
Text/watermarks in generated images

Technical Implementation

For practitioners building similar systems:

API Details

Model: gemini-3-pro-image-preview
Cost: ~$0.04 per image at 2K
Max refs: 14 images per request
Response: Base64-encoded PNG

Resolution Options

1K: ~1024px, quick iterations
2K: ~2048px, production default
4K: ~4096px, hero images only

Aspect Ratios

21:9: Ultra-wide banners
16:9: Story scenes, standard
3:4 / 9:16: Vertical portraits
1:1: Icons, avatars

Queue-Based Workflow

Jobs defined in JSON, processed sequentially with checkpointing:

./generate-from-queue.sh --all --changelog

Request payload structure

{
  "contents": [{
    "parts": [
      {"inlineData": {"mimeType": "image/png", "data": "...base64..."}},
      {"inlineData": {"mimeType": "image/png", "data": "...base64..."}},
      {"text": "Style instruction + scene prompt + framing + character notes"}
    ]
  }],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"],
    "imageConfig": {
      "aspectRatio": "16:9",
      "imageSize": "2K"
    }
  }
}

Results Gallery

A sample of outputs demonstrating style consistency across different subjects and scenes.

All 596+ assets share the same visual DNA. Characters, buildings, resources, and scenes all feel like they belong in the same universe.

Key Takeaways

Always include a gold standard style reference. Without it, you get random styles.

Budget your 14 reference slots strategically. More character refs = better consistency.

Use selection over description. Art interviews extract preferences that language can't capture.

Invest in character calibration upfront. 52 iterations saves thousands of inconsistent outputs later.

Questions about our art pipeline?

Open an Issue