Skill

banana-pro-director-kie-2.0

kie.ai image generation director. Generates photorealistic still image prompts and executes them via the kie.ai API. Three model paths: Nano Banana Pro (balanced, default — face locks, outfit plates, 6-panel sheets, scene plates), GPT Image 2 (highest fidelity — chest-up portraits and detail shots), and composite (Nano Banana Pro with multiple reference images — two-step outfit compositing). All paths use mid-gray seamless as the locked default backdrop. Requires KIE_API_KEY environment variable. After prompt approval, Claude executes the API call, polls for completion, and returns the output image URL. Use for new character builds, character/outfit refs, character sheets, scene plates, environment plates, detail shots, or any photorealistic still going into a kie.ai video workflow.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/oblique-cowork:banana-pro-director-kie-2.0

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

The locked image prompt grammar for great kie.ai image assets. Six modes, in strict order:

SKILL.md

552 lines · ~9.5k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitJun 26, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Banana Pro Director KIE — Image Asset Builder

The locked image prompt grammar for great kie.ai image assets. Six modes, in strict order:

Face lock (new characters only) — for any character being developed from scratch. Tool fork: Nano Banana Pro single-pass (default, balanced), GPT Image 2 single-pass (highest fidelity, higher credits, chest-up only), or Nano Banana Pro two-pass (iteration path — loose face plate then locked 3:4). All paths use mid-gray seamless (locked default), soft light from camera-left or camera-right, and a locked neutral top (plain black camisole for women, plain black ribbed tank for men).
Single-image character outfit — mid-gray seamless studio, full styling locked, base reference for that character/outfit. Two paths: Nano Banana Pro (full styling from prompt — best for simpler outfits) or Nano Banana Pro two-step (outfit built on a neutral model first, then composited onto the locked character — best for complex custom fits).
6-panel character sheet — built ONLY after a single-image base exists. One 16:9 frame, 3×2 grid: front body, back body, two side-profile headshots, front face headshot, detail shot.
Scene plates — character(s) in a fully realized environment, OR pure environment plates with no characters. Never proposed proactively — only built when the user asks.

Plus two optional capabilities:

GPT Image 2 detail mode — used only for detail face shots and chest-up portraits when the user explicitly asks. Highest fidelity, higher cost.
Outfit replacement — two-reference composite that puts the character from one image into the outfit from another. Used only when the user explicitly asks.

Photoreal is the universal default. Every prompt describes a real human (or real environment) in a real frame.

MODEL ROUTING TABLE

Mode	kie.ai model	input.image_input
Mode 0A (Nano Banana face lock)	`nano-banana-pro`	`[]` (text-only)
Mode 0B (GPT Image 2 face lock)	`gpt-image-2-text-to-image`	n/a
Mode 0.1 (loose face plate)	`nano-banana-pro`	`[]`
Mode 0.2 (locked headshot from 0.1 plate)	`nano-banana-pro`	`[step0.1_url]`
Mode 1A (Nano Banana outfit, ref present)	`nano-banana-pro`	`[char_ref_url]`
Mode 1A (Nano Banana outfit, no ref)	`nano-banana-pro`	`[]`
Mode 1B.1 (neutral model outfit plate)	`nano-banana-pro`	`[]`
Mode 1B.2 (composite char + outfit)	`nano-banana-pro`	`[char_ref_url, outfit_ref_url]`
Mode 2 (6-panel sheet)	`nano-banana-pro`	`[char_ref_url]` or `[char_ref_url, outfit_ref_url]`
Mode 3A (char-in-scene)	`nano-banana-pro`	`[char_ref_url]`
Mode 3B (pure environment)	`nano-banana-pro`	`[]` or `[env_ref_url]`
Mode 4 (GPT Image 2 detail, with ref)	`gpt-image-2-image-to-image`	via `input_urls`
Mode 4 (GPT Image 2 detail, text-only)	`gpt-image-2-text-to-image`	n/a
Mode 5 (outfit replacement)	`nano-banana-pro`	`[face_body_ref_url, outfit_ref_url]`

Aspect ratio defaults by mode:

Face locks and headshots: 2:3 (portrait)
Outfit plates (full body): 2:3
6-panel sheet: 16:9
Scene plates: 16:9 (default), or as specified by user
GPT Image 2 detail (chest-up): 2:3

Resolution defaults: 1K for Nano Banana Pro. GPT Image 2 has no resolution param (fixed by model).

THE WORKFLOW — STRICT ORDER

Step 0 — Is the character already built?

Before anything else, ask the user: does the character already exist, or are we developing them?

If the character exists: ask the user to drop the reference image URL(s) or attach the images. Study and lock — face, bone structure, skin tone, hair color and texture, identity markers, body proportions. Mirror back the locked spec in plain language so the user can confirm or correct before any prompt is built. Wait for confirmation, then proceed to Mode 1 or whichever mode the user asked for.

If the character is new: development happens in two stages — first a text spec, then a face-lock build via Mode 0.

Stage 1 — text spec: let the user describe the character. Mirror back a locked spec covering:

Approximate apparent age register (described by build, not number)
Face: bone structure, eye shape and color, brow shape, nose, lip shape, skin tone and finish
Hair: color, length, texture, style
Body: build, proportions, posture, distinguishing markers
Default makeup register (if any)
Default expression and energy
Key identity markers — piercings, scars, beauty marks, tattoos, signature jewelry

Wait for confirmation or correction. Then move to Stage 2 — Mode 0 face lock.

Mode 0 — Face lock (new characters only)

Three tool options (ask first):

Want to build this in Nano Banana Pro, GPT Image 2, or Nano Banana Pro two-pass? — Nano Banana Pro (default): balanced fidelity, single-pass. Works for most characters. — GPT Image 2 (highest fidelity): chest-up only, sharpest detail. Higher kie.ai cost. Best for tricky identity markers. — Nano Banana Pro two-pass (iteration path): loose plate first to explore the face register, then a second pass to lock finer detail. Slowest but allows iteration before committing.

Mention GPT Image 2 cost once per conversation, then drop it.

Mode 1 — Single-image character outfit (the base outfit reference)

Once the character is locked, the FIRST image for any new outfit is a single-image base on mid-gray seamless. No 6-panel sheet before a base outfit reference exists.

Ask the user to describe the outfit. If they have wardrobe reference images, ask for the URLs.

Then ask which path:

Nano Banana Pro (prompt-driven, single pass) or Nano Banana Pro two-step (outfit on a neutral model first, then composite)?

Mode 2 — 6-panel character sheet

Only after a single-image base reference has been generated. One prompt, one 16:9 image, 3×2 grid.

Mode 3 — Scene plates

Always available. Never proposed proactively. Only when the user asks for a scene, environment, plate, or setting.

Mode 4 — GPT Image 2 detail (gated)

Only for chest-up portraits or detail face shots, and only when explicitly asked. Ask first: "want to run this on GPT Image 2 for the higher-fidelity face read? heads-up — GPT Image 2 costs more on kie.ai than Nano Banana Pro." Mention cost once per conversation. Wait for confirmation.

THE PRE-PROMPT CONFIRMATION RULE (UNIVERSAL)

Every prompt — single image, 6-panel, scene plate, GPT Image 2 — gets a short "here's what I'm about to prompt, sound good?" check before the full prompt is written. Not optional.

Exception: minor iteration on a just-delivered prompt (pose tweak, lighting nudge, swap one wardrobe element) — skip the check and deliver the revised prompt directly.

Format: clean bullet points only. References listed first, always.

Pre-prompt check:

References provided: [list every reference image URL or attachment by short visual descriptor — or "none — pure text composition"]
Character (one bullet — hair, skin, identity markers, expression)
Outfit / styling (one bullet — wardrobe head-to-toe, jewelry)
Backdrop or environment (one bullet)
Framing (one bullet)

Close with a short question line. Wait for the green light. Then drop the full prompt in a fenced code block, then execute the API call.

CORE PHILOSOPHY

No plastic. No CGI sheen. No 3D-render look. No commercial gloss.

Every image should read as a photograph — real camera, real subject. The character should look lived-in: real pore texture, peach fuzz, strand-by-strand hair, fabric with weight, jewelry with surface detail, eyes with reflection and depth.

The flattering-realism ceiling (LOCKED): Full skin realism always on — visible pore texture, peach fuzz, subsurface scattering, flyaways. But realism never means unflattering. No acne, no blemishes, no harsh pores, no scarring, no aggressive skin detail that reads as ugly or clinical. Texture is fine, soft, even, and natural. When realism and flattering seem to pull against each other, resolve toward fine-even-flattering.

UNIVERSAL RENDER RULES — FIGHTING THE AI AESTHETIC

Baked into every Nano Banana Pro and GPT Image 2 prompt.

1. Real human skin.

Real natural pore texture visible — soft, fine, even, never blemishes, never enlarged/cratered/rough
Real peach fuzz catching light along jawline, hairline, temples, upper lip
Real subsurface scattering — semi-translucent biology, not opaque plastic
Skin tone at the character's natural register — never washed out, never cool-shifted
No retouching, no skin smoothing, no porcelain plastic, no waxy AI render
Flattering ceiling locked: fine, soft, even texture under the key; resolve any tension toward flattering

2. Real hair physics.

Strand by strand with realistic flyaways, baby hairs, separation, light transmission
Hair responds to scene environment: still interior = settled, wind = drift, wet = damp matte clumping
Register defaults matte — fine diffuse fiber, never glossy shine unless user requests it

3. Real lens character.

Wide-latitude digital cinema capture as default
Character canonicals and seamless stills: clean fast normal prime around 50mm full-frame at wide aperture — natural round bokeh, gentle background separation
Scene plates: vintage 2x anamorphic — oval bokeh, gentle horizontal squeeze, soft frame-edge falloff, light diffusion bloom lifting highlights

4. Real light physics.

Atmospheric depth default-on — visible haze and air density between planes; distant elements softer, desaturated, lower contrast
Shadow falloff with physically accurate wrap — soft transitions, never hard edges
Subsurface scattering at ear edges, nostrils, eye sockets with warm undertone bleed
Highlights rolled off gently, never clipping to pure white
Lifted blacks, wide dynamic range

5. Real grain.

Color-negative motion-picture film look baked in
Fine theatrical 35mm film grain across the full frame

NIGHT CINEMA REGISTER (FOR NIGHT SCENES)

Night target: Justin Lin / James Wan / Greig Fraser — The Batman, Tokyo Drift, Furious 7, John Wick. Mostly dark, hard punchy practicals cutting through. NOT saturated-teal-everywhere, NOT bright-night.

A. EXTERIOR CANYON / OPEN NIGHT: Light exclusively from practical sources. Sky committed to deep crushed near-black. Faint horizon glow at very deep distance only. Atmospheric haze catches headlight beams as visible warm volumetric god rays. Everything outside the headlight throws falls into deep near-black shadow.

B. INTERIOR / URBAN / LIT NIGHT: Practical sources drive the look — sodium-vapor, fluorescent, neon, dash glow, brake lights. Teal-amber color split where motivated. Atmospheric haze gives light volumetric body.

Universal night rules: Deep cinematic contrast, practicals punch hard, atmospheric haze on, rim and edge light define subjects, skin tone preserved under practical key.

MID-GRAY SEAMLESS BACKDROP (LOCKED DEFAULT)

Mid-gray seamless is the locked default for all character work. Pure white only when the user explicitly asks.

Why gray: Lower subject-to-background contrast means cleaner edge extraction and less inherited contrast when the still seeds downstream video. Same principle as atmospheric perspective.

The background stays neutral; the character does not. Skin renders at its true natural skin tone, wardrobe at its true natural color — never cooled, never washed-out.

Lighting close for a mid-gray plate (lean soft Rembrandt grade):

Mid-gray seamless studio background — even neutral mid-gray, no seam line, no gradient, no falloff to black or white. Relight from scratch overriding any reference lighting: one broad diffused source from camera-[left/right] and slightly above, a soft triangle of light on the shadow cheek, gentle wrap onto the face, no hard shadow edges, no rim light, no hair light, no kicker. Skin reads matte and velvety — zero shine on forehead, nose bridge, cheekbones, temples, and chin, no oily T-zone — in a low-contrast milky look. Real peach fuzz at the jaw and hairline, real soft fine even pore texture, subsurface scattering reading as semi-translucent biology, warmth preserved and natural, never pale or washed-out or cool-shifted, never plastic, never waxy AI render, never glass-skin, never harsh — fine flattering texture that keeps the face looking good, no acne, no blemishes, no rough pores. Photographed on a 50mm prime at a wide aperture, natural round bokeh, even sharpness, soft natural film grain. Photographed not generated.

READING REFERENCE IMAGES

When the user provides reference images (as attachments or URLs), extract everything visible by visual description only — never use names, never invent details not in the image.

For each character, capture: hair (color, length, texture, styling, accessories), makeup (finish, brow, eye treatment, lashes, lip, cheek), wardrobe (every garment top to bottom — fabric, color, fit, structural details), jewelry (earrings, necklaces, rings, bracelets, bags), body markers (piercings, tattoos, nails — only if visible), pose and energy.

Naming rule (CRITICAL). Never use proper names in the prompt output. Refer by visual description: "the rose-pink haired woman in the cropped white ribbed tank." Visual descriptors survive across prompts; names do not.

Brand name rule (CRITICAL). No real brand names in prompt output. Use generic visual descriptors — "black three-stripe athletic sneakers" not brand names.

Age-blind rule. Never describe characters by age. Describe by role, build, and clothing.

Lean prompt rule. When reference images are provided, they carry the visual identity load. The prompt does NOT need to re-describe what the references already show. One distinguishing visual handle per subject is enough — the reference carries the rest. Put the load on what the prompt uniquely needs: composition, pose, light direction, wardrobe specific to this plate.

MODE 0A — NANO BANANA PRO FACE LOCK (DEFAULT)

Single-pass Nano Banana Pro generation. No reference image input for a pure text-only build.

Canonical Step 0A prompt structure:

A clean cinema-character-reference 3:4 headshot, framed from forehead to upper chest with the face filling most of the frame. [Identity essentials — heritage, build, skin tone and finish, hair (color, length, texture), eye shape and color, any key identity markers: piercings with exact position and metal, scars with placement and size, beauty marks with placement]. She wears [a plain black thin-strap camisole / he wears a plain black ribbed tank], no jewelry, no logos, no graphics. Body squared to camera, head level, neutral relaxed expression, eyes to camera, lips closed and relaxed, subtle controlled energy.

Mid-gray seamless studio background — even neutral mid-gray, no seam line, no gradient, no falloff to black or white. Relight from scratch overriding any reference lighting: one broad diffused source from camera-[left/right] and slightly above, a soft triangle of light on the shadow cheek, gentle wrap onto the face, no hard shadow edges, no rim light, no hair light, no kicker. Skin reads matte and velvety — zero shine on forehead, nose bridge, cheekbones, temples, and chin, no oily T-zone — in a low-contrast milky look. Skin renders at its true natural skin tone and wardrobe at its true natural color, warmth preserved and natural against the neutral gray, never pale or washed-out or cool-shifted by the background. Real peach fuzz at the jaw and hairline, real soft fine even pore texture, subsurface scattering reading as semi-translucent biology, never plastic, never waxy AI render, never glass-skin, never harsh — fine flattering texture that keeps the face looking good, no acne, no blemishes, no rough pores. Photographed on a 50mm prime at a wide aperture, natural round bokeh, even sharpness, soft natural film grain. Photographed not generated.

API params: model nano-banana-pro, image_input: [], aspect_ratio: "2:3", resolution: "1K"

MODE 0B — GPT IMAGE 2 FACE LOCK (HIGHEST FIDELITY)

Single-pass GPT Image 2 generation. Chest-up framing only. Same identity essentials, wardrobe lock, mid-gray seamless, and soft lighting as Mode 0A — but routed through GPT Image 2.

API params: model gpt-image-2-text-to-image (no refs) or gpt-image-2-image-to-image (if user provides a rough reference), aspect_ratio: "2:3"

MODE 0.1 + 0.2 — NANO BANANA PRO TWO-PASS FACE LOCK

Step 0.1 — Loose Nano Banana Pro face plate (exploration):

Lean prompt, identity essentials only. No full cinema stack. Goal is to get a face candidate the user can select from.

A [heritage] [woman / man] with a [build], [skin tone and finish], [hair color, length, texture]. [Eye shape and color]. [Large/obvious identity markers only]. [She wears a plain black thin-strap camisole / He wears a plain black ribbed tank], no jewelry, no logos. Body squared to camera, neutral expression, eyes to camera.

Mid-gray seamless studio background, even neutral mid-gray, no seam line. Soft soft natural light from camera-[left/right], very diffused, no hard shadow edges. Skin renders at its true natural skin tone, warmth preserved and natural against the neutral gray. Skin reads matte and slightly diffused, clean and even. Chest-up framing. Real skin pore texture, fine peach fuzz, subtle subsurface scattering. Fine cinema grain. Photographic, not rendered.

API params: model nano-banana-pro, image_input: [], aspect_ratio: "2:3", resolution: "1K"

After delivery, user runs this on kie.ai. The output URL becomes the input for Step 0.2.

Step 0.2 — Nano Banana Pro 3:4 headshot lock:

Second-pass Nano Banana Pro using the Step 0.1 output URL as a reference image. Full identity lock including fine markers.

A clean cinema-character-reference 3:4 headshot of the same character as the reference image, framed from forehead to upper chest with the face filling most of the frame. [Full character descriptor — heritage, build, skin tone and finish, hair, face register (jaw, chin, lips, cheekbones, brow), eye shape and color, all identity markers: piercings with exact position and metal, scars with placement and size, beauty marks with placement, makeup register]. She wears [a plain black thin-strap camisole / he wears a plain black ribbed tank], no jewelry, no logos. Body squared to camera, head level, neutral relaxed expression, eyes to camera.

[Full lean Rembrandt grade close from the mid-gray seamless section above.]

API params: model nano-banana-pro, image_input: ["[step_0.1_output_url]"], aspect_ratio: "2:3", resolution: "1K"

MODE 1A — SINGLE-IMAGE CHARACTER OUTFIT, NANO BANANA PRO PATH

Best for outfits where full prompt control gets the result in one clean shot.

Canonical Mode 1A prompt structure:

[Visual descriptor of the character — hair, makeup, full wardrobe head-to-toe, jewelry, body markers, extracted from references or locked from development]. [Pose direction — body angle, weight distribution, hand position, expression].

Mid-gray seamless studio background — even neutral mid-gray, no seam line, no gradient, no falloff to black or white. Relight from scratch: one broad diffused source from camera-[left/right] and slightly above, gentle wrap onto the figure, no harsh shadows, no rim light, no hair light, no kicker. Skin and fabric read matte and velvety in a low-contrast milky look, no shine. Skin renders at its true natural skin tone and the outfit at its true natural color, warmth preserved and natural against the neutral gray, never pale or washed-out or cool-shifted. Real peach fuzz at the jaw and hairline, real fine even pore texture, subsurface scattering as semi-translucent biology, real fabric weave and drape, never plastic, never waxy. Photographed on a 50mm prime at a wide aperture, natural round bokeh, even sharpness, soft natural film grain. Photographed not generated. [Framing — full body / waist-up / head-to-shoulders].

API params: model nano-banana-pro, image_input: ["[char_ref_url]"] (or [] if text-only build), aspect_ratio: "2:3", resolution: "1K"

MODE 1B — SINGLE-IMAGE CHARACTER OUTFIT, TWO-STEP PATH

Best for complex custom fits where the outfit should be designed separately from the character casting.

Step 1B.1 — Build the outfit on a neutral model

Canonical Step 1B.1 prompt:

A slim [woman / man] standing straight-on to camera in a relaxed neutral stance, weight evenly distributed, arms relaxed at sides, body squared to camera. [Medium-length natural medium-brown hair, simple straight or slight wave / short clean haircut, natural medium-brown color]. Clean even features, neutral natural skin tone, [light natural makeup / no makeup], neutral expression, eyes directly to camera. Slim model build with refined proportions. The figure wears [full outfit description — every garment top to bottom with fabric, color, fit, structural details, layering, hem positions, footwear, jewelry].

Mid-gray seamless studio background, even neutral mid-gray, no shadow falloff, no visible seam line. Soft soft natural light from camera-[left/right], very diffused, gentle wrap, no harsh shadows, no dramatic rim light. Skin and fabric read matte and slightly diffused, the outfit fully readable at its true natural color against the neutral gray. Full body framing from head to just below the footwear. Real fabric texture with visible weave detail, real weight, real drape. Fine cinema grain. Photographic, not rendered.

API params: model nano-banana-pro, image_input: [], aspect_ratio: "2:3", resolution: "1K"

User saves the output URL. That URL is the outfit reference for Step 1B.2.

Step 1B.2 — Composite the outfit onto the locked character

Two reference images: the locked character reference URL and the outfit reference URL from Step 1B.1.

Canonical Step 1B.2 prompt:

Place the face and body from the first reference image onto the outfit from the second reference image. Mid-gray seamless studio background, even neutral mid-gray, skin and outfit at their true natural tone. Soft studio lighting.

API params: model nano-banana-pro, image_input: ["[char_ref_url]", "[outfit_ref_url]"], aspect_ratio: "2:3", resolution: "1K"

MODE 2 — 6-PANEL CHARACTER SHEET (SINGLE 16:9 FRAME)

Only after a single-image base reference has been generated and approved. One prompt, one image, six panels in a 3×2 grid.

6-panel layout (3×2 grid):

Top-left: Full body front
Top-center: Side profile close headshot (left side)
Top-right: Full body back
Bottom-left: Side profile close headshot (right side)
Bottom-center: Front face close headshot
Bottom-right: Detail shot (nails / jewelry / piercing / tattoo / held prop)

Canonical Mode 2 prompt structure:

A 6-panel character reference sheet arranged as a 3-column by 2-row grid in a single horizontal frame, separated by thin clean white gutters between panels. Each panel shows the same single character — [full visual descriptor: build, face, hair, makeup, full wardrobe head-to-toe, all accessories, jewelry, body markers].

Panel 1 (top-left): Full body front — [stance, framing, what's readable].
Panel 2 (top-center): Side profile close headshot, left side — [tight crop from collarbone up, character's left profile facing screen-right, hair and ear and jaw geometry visible].
Panel 3 (top-right): Full body back — [stance, what's visible from behind].
Panel 4 (bottom-left): Side profile close headshot, right side — [tight crop from collarbone up, character's right profile facing screen-left, mirror of Panel 2].
Panel 5 (bottom-center): Front face close headshot — [tight crop from collarbone up, body squared to camera, face filling the frame, eyes to camera].
Panel 6 (bottom-right): Detail shot — [the locked detail close-up: nails / specific jewelry piece / piercing / tattoo / held prop].

Mid-gray seamless studio backdrop applied uniformly across all six panels — even neutral mid-gray, no seam line, no gradient. Relight from scratch uniformly: one broad diffused source from camera-left and slightly above, gentle wrap, no harsh shadows, no rim light, no hair light, no kicker. Skin and fabric read matte and velvety in a low-contrast milky look, at their true natural tone, warmth preserved and natural, never cool-shifted. Sharp focus across every panel. Real fine even pore texture, peach fuzz at the hairline, subsurface scattering, real fabric weave, soft natural film grain. Identical character identity locked across all six panels — same face, same skin, same hair, same wardrobe, same accessories in every cell. Photographed not generated.

API params: model nano-banana-pro, image_input: ["[char_ref_url]"], aspect_ratio: "16:9", resolution: "1K"

MODE 3 — CINEMATIC SCENE PLATE

Two flavors — character-in-environment (3A) or pure environment (3B). Same five cinema modes as cinema-worldbuilder:

Scene type	Cinema mode
Real-world dramatic (street, kitchen, car, bar, interior/exterior)	M1 — Narrative
Studio / editorial / clean set / fashion film	M2 — Studio / Editorial
Action / combat / chase / high-energy	M3 — Action / Combat
Performance / concert / stage	M4 — Performance / Concert
Atmospheric / empty / no-humans / weather plate	M5 — Atmospheric / Empty

Mode 3 prompts are written in cinema-prose register — five paragraphs, no labeled headers, woven into continuous observational prose. See the Mode 3 — Cinema-Prose Register section below.

API params (Mode 3A): model nano-banana-pro, image_input: ["[char_ref_url]"], aspect_ratio: "16:9", resolution: "1K" API params (Mode 3B): model nano-banana-pro, image_input: [], aspect_ratio: "16:9", resolution: "1K"

MODE 3 — CINEMA-PROSE REGISTER

Five-paragraph prose structure (woven, no labeled headers in the output):

Paragraph 1 — Opening shot description. One long sentence: medium ("a cinematic anamorphic still"), framing register, subject identification at high level, camera position and angle in prose, mood/intent.

Paragraph 2 — Character block. Identity markers from the reference written as visible facts ("dark layered mid-length tousled fringe falling across the back of his head, small silver hoop earrings catching faint warm spill, warm fair matte skin"). Pose, attention, held props woven in naturally.

Paragraph 3 — World/environment block. Location as ambience, not architecture. The space's register. Anchor to provided reference if one exists. Background subjects get positional language, not coordinates.

Paragraph 4 — Subject anchor block. The focal anchor of the shot — TV broadcast, second car in BG, dawn horizon, signage. If no focal anchor beyond the character, fold into Paragraph 3.

Paragraph 5 — Camera spec + finish. Full cinema look in one continuous descriptive paragraph: capture register, lens character, diffusion/filtration, film-stock rendition, grain, grade, color cast, optical character — all in plain-language look terms, never brand names — plus the closing realism clause ("Real photographic frame captured on a real cinema camera... no CGI, no rendered look, no plastic surfaces, no AI smoothness, no skin smoothing").

Resolution-aware detail rule: Describe only what the camera at this distance, lens, and motion register would physically resolve. A car shot from 200 feet up → reads as silhouette + color blocks + headlights. A person at 50 yards → reads as silhouette + hair color + wardrobe color blocks. Detail is earned by proximity, lens length, motion stillness, and lighting intensity.

MODE 4 — GPT IMAGE 2 DETAIL MODE (GATED)

Chest-up portraits and detail face shots only. User must explicitly ask.

Use the GPT Image 2 prompt grammar: same identity essentials, wardrobe lock, mid-gray seamless, and soft lighting as Mode 0A. Chest-up framing. Full cinema stack appended.

API params (with reference image): model gpt-image-2-image-to-image, input_urls: ["[char_ref_url]"], aspect_ratio: "2:3" API params (text-only): model gpt-image-2-text-to-image, aspect_ratio: "2:3"

MODE 5 — OUTFIT REPLACEMENT (TWO-REFERENCE COMPOSITE)

Two reference images: the face/body reference and the outfit reference. Single locked prompt.

Canonical Mode 5 prompt:

Place the face and body from the first reference image onto the outfit and pose from the second reference image. Mid-gray seamless studio background, even neutral mid-gray, skin and outfit at their true natural tone. Soft studio lighting. Skin reads matte, fine and even, real peach fuzz, no plastic, no AI render. Photographed not generated.

API params: model nano-banana-pro, image_input: ["[face_body_ref_url]", "[outfit_ref_url]"], aspect_ratio: "2:3", resolution: "1K"

THE CINEMA STACK (LOCKED — APPENDS TO MOST PROMPTS)

Every non-Mode-3 prompt ends with this single merged cinema stack:

Real human skin captured on a real cinema camera — refined and real, peach fuzz catching light along the jawline and hairline, real natural pore texture soft fine and even, subsurface scattering at ear edges, nostrils, and around the eye sockets with warm undertone bleed reading as semi-translucent biology never opaque plastic. No retouching, no skin smoothing, no porcelain plastic look, no waxy AI render, no blemishes, no acne, no marks, no enlarged or rough pores, no harsh clinical texture — fine flattering even skin that always looks good, no dewy wet finish, no glass-skin, no highlighter glow. Hair rendered strand by strand with realistic flyaways and baby hairs at the hairline, hair physics responding to the actual environment of the scene. Fabric with real weave detail, real weight, real drape. Captured with a wide-latitude cinema look, lens character matched to the shot — a clean fast normal prime around a 50mm full-frame field of view at a wide aperture for portraits and character canonicals giving natural round bokeh and even sharpness, OR a vintage 2x anamorphic character for scene plates giving oval bokeh, a gentle horizontal squeeze on out-of-focus highlights, soft frame-edge falloff, organic optical imperfection toward the edges, a light diffusion bloom lifting highlights into a soft halation, and subtle horizontal streak flares on point light sources. Shallow depth of field with strong foreground-to-background separation. True atmospheric perspective with visible haze and air density between planes — distant elements rendered softer, desaturated, and lower contrast than foreground. Key light wrapping around subjects with physically accurate shadow falloff into the neck, jawline, ear shadow, nostril shadow, lip shadow — soft transitions never hard edges. Highlights rolled off gently in a filmic curve, never clipping to pure white. Lifted blacks, wide dynamic range. Color-negative motion-picture film look — daylight-balanced for day registers, tungsten-balanced and pushed for night, fine theatrical 35mm film grain across the entire frame. No HDR overprocessing, no digital oversharpening, no plastic skin rendering — photographed not generated, captured on a real camera by a real cinematographer on a real set.

Modal application:

Modes 0, 1, 2, 4, 5: append the full stack as the closing block (except Mode 1B Step 1 and Step 1B.2 — they have their own lighter closes)
Mode 3: cinema stack language is folded INTO the closing camera-spec paragraph in the prose register — not appended separately

KIE.AI API EXECUTION

After the prompt is approved, Claude executes the generation via kie.ai API using Bash.

Requirement: KIE_API_KEY must be set as an environment variable. If missing, prompt Sean to run: export KIE_API_KEY="your_key_here" (key available at kie.ai/api-key).

Step 1 — Check API key:

if [ -z "$KIE_API_KEY" ]; then echo "KIE_API_KEY not set. Run: export KIE_API_KEY=your_key"; fi

Step 2 — Show the API call being made. Before executing, display to Sean:

Model being used
Prompt (truncated to first 200 chars if long)
Reference images (URLs listed)
Key params (aspect_ratio, resolution, duration if video)

Ask: "Ready to fire this at kie.ai?" — then execute.

Step 3 — Create task (Nano Banana Pro):

KIE_RESPONSE=$(curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
  -H "Authorization: Bearer $KIE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nano-banana-pro",
    "input": {
      "prompt": "PROMPT_HERE",
      "image_input": [],
      "aspect_ratio": "2:3",
      "resolution": "1K",
      "output_format": "png"
    }
  }')
echo "$KIE_RESPONSE"
TASK_ID=$(echo "$KIE_RESPONSE" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('data',{}).get('taskId',''))")
echo "Task ID: $TASK_ID"

Step 3 alt — GPT Image 2 (text-to-image):

KIE_RESPONSE=$(curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
  -H "Authorization: Bearer $KIE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2-text-to-image",
    "input": {
      "prompt": "PROMPT_HERE",
      "aspect_ratio": "2:3"
    }
  }')
TASK_ID=$(echo "$KIE_RESPONSE" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('data',{}).get('taskId',''))")

Step 3 alt — GPT Image 2 (image-to-image):

KIE_RESPONSE=$(curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
  -H "Authorization: Bearer $KIE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2-image-to-image",
    "input": {
      "prompt": "PROMPT_HERE",
      "input_urls": ["REF_URL_HERE"],
      "aspect_ratio": "2:3"
    }
  }')
TASK_ID=$(echo "$KIE_RESPONSE" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('data',{}).get('taskId',''))")

Step 4 — Poll for completion:

echo "Polling task $TASK_ID..."
for i in $(seq 1 60); do
  POLL=$(curl -s "https://api.kie.ai/api/v1/jobs/recordInfo?taskId=$TASK_ID" \
    -H "Authorization: Bearer $KIE_API_KEY")
  STATE=$(echo "$POLL" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('data',{}).get('state','waiting'))" 2>/dev/null)
  echo "[$i] State: $STATE"
  if [ "$STATE" = "success" ]; then
    OUTPUT_URL=$(echo "$POLL" | python3 -c "
import sys, json
d = json.load(sys.stdin)
result = json.loads(d['data']['resultJson'])
print(result['resultUrls'][0])
")
    echo "Done! Image URL: $OUTPUT_URL"
    break
  elif [ "$STATE" = "fail" ]; then
    FAIL_MSG=$(echo "$POLL" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('data',{}).get('failMsg','unknown error'))")
    echo "Generation failed: $FAIL_MSG"
    break
  fi
  sleep 10
done

Step 5 — Return the result. After polling completes, tell Sean:

The output image URL
Which mode and model were used
A reminder that kie.ai output URLs expire after 24 hours — save the image locally if needed

For multi-step modes (0.1/0.2, 1B.1/1B.2): after Step 0.1 or 1B.1 completes, present the output URL to Sean and ask: "happy with this? If so, I'll use it as the reference for Step 0.2 / Step 1B.2." Only proceed to the second step with Sean's confirmation.

REFERENCE IMAGE HANDLING

Input format: Reference images must be provided as publicly accessible URLs (not local file paths). This is the natural format since kie.ai output URLs are publicly accessible for 24 hours after generation.

First generation (no existing references): Run as a text-only generation (image_input: []). The output URL becomes the character reference for subsequent generations.

Subsequent generations: Use the output URLs from previous kie.ai generations as reference image inputs — they're already in the right format.

If Sean has a local image to use as a reference: Ask him to provide it as an attachment in the Claude conversation. Read it, then note that for the actual API call, it needs to be accessible as a URL. Recommend saving it somewhere accessible (Dropbox public link, Google Drive public link, etc.) and providing that URL. Or offer to skip using it as a reference and build from the text spec instead.

Nano Banana Pro: image_input accepts up to 8 URLs. For compositing (Modes 1B.2, 5): image_input: ["char_url", "outfit_url"].

GPT Image 2 i2i: input_urls accepts up to 16 URLs.

OPTIONAL HANDOFF — CINEMA WORLDBUILDER KIE

After generating a still image via this skill, the output URL can be used directly as a reference image in cinema-worldbuilder-kie-2.0 for Seedance video generation. The two skills share the same five cinema modes — when paired, the still and the video share visual DNA.

If Sean mentions wanting to animate a still or create a video of a character, suggest handing off to cinema-worldbuilder-kie-2.0 and note that the kie.ai output URL from this generation can be used directly as the first frame or character reference.

banana-pro-director-kie-2.0

Invocation

Context Preview

SKILL.md

banana-pro-director-kie-2.0

Invocation

Context Preview

SKILL.md

Banana Pro Director KIE — Image Asset Builder

MODEL ROUTING TABLE

THE WORKFLOW — STRICT ORDER

Step 0 — Is the character already built?

Mode 0 — Face lock (new characters only)

Mode 1 — Single-image character outfit (the base outfit reference)

Mode 2 — 6-panel character sheet

Mode 3 — Scene plates

Mode 4 — GPT Image 2 detail (gated)

THE PRE-PROMPT CONFIRMATION RULE (UNIVERSAL)

CORE PHILOSOPHY

UNIVERSAL RENDER RULES — FIGHTING THE AI AESTHETIC

NIGHT CINEMA REGISTER (FOR NIGHT SCENES)

MID-GRAY SEAMLESS BACKDROP (LOCKED DEFAULT)

READING REFERENCE IMAGES

MODE 0A — NANO BANANA PRO FACE LOCK (DEFAULT)

MODE 0B — GPT IMAGE 2 FACE LOCK (HIGHEST FIDELITY)

MODE 0.1 + 0.2 — NANO BANANA PRO TWO-PASS FACE LOCK

MODE 1A — SINGLE-IMAGE CHARACTER OUTFIT, NANO BANANA PRO PATH

MODE 1B — SINGLE-IMAGE CHARACTER OUTFIT, TWO-STEP PATH

Step 1B.1 — Build the outfit on a neutral model

Step 1B.2 — Composite the outfit onto the locked character

MODE 2 — 6-PANEL CHARACTER SHEET (SINGLE 16:9 FRAME)

MODE 3 — CINEMATIC SCENE PLATE

MODE 3 — CINEMA-PROSE REGISTER

MODE 4 — GPT IMAGE 2 DETAIL MODE (GATED)

MODE 5 — OUTFIT REPLACEMENT (TWO-REFERENCE COMPOSITE)

THE CINEMA STACK (LOCKED — APPENDS TO MOST PROMPTS)

KIE.AI API EXECUTION

REFERENCE IMAGE HANDLING

OPTIONAL HANDOFF — CINEMA WORLDBUILDER KIE

Similar Skills

Banana Pro Director KIE — Image Asset Builder

MODEL ROUTING TABLE

THE WORKFLOW — STRICT ORDER

Step 0 — Is the character already built?

Mode 0 — Face lock (new characters only)

Mode 1 — Single-image character outfit (the base outfit reference)

Mode 2 — 6-panel character sheet

Mode 3 — Scene plates

Mode 4 — GPT Image 2 detail (gated)

THE PRE-PROMPT CONFIRMATION RULE (UNIVERSAL)

CORE PHILOSOPHY

UNIVERSAL RENDER RULES — FIGHTING THE AI AESTHETIC

NIGHT CINEMA REGISTER (FOR NIGHT SCENES)

MID-GRAY SEAMLESS BACKDROP (LOCKED DEFAULT)

READING REFERENCE IMAGES

MODE 0A — NANO BANANA PRO FACE LOCK (DEFAULT)

MODE 0B — GPT IMAGE 2 FACE LOCK (HIGHEST FIDELITY)

MODE 0.1 + 0.2 — NANO BANANA PRO TWO-PASS FACE LOCK

MODE 1A — SINGLE-IMAGE CHARACTER OUTFIT, NANO BANANA PRO PATH

MODE 1B — SINGLE-IMAGE CHARACTER OUTFIT, TWO-STEP PATH

Step 1B.1 — Build the outfit on a neutral model

Step 1B.2 — Composite the outfit onto the locked character

MODE 2 — 6-PANEL CHARACTER SHEET (SINGLE 16:9 FRAME)

MODE 3 — CINEMATIC SCENE PLATE

MODE 3 — CINEMA-PROSE REGISTER

MODE 4 — GPT IMAGE 2 DETAIL MODE (GATED)

MODE 5 — OUTFIT REPLACEMENT (TWO-REFERENCE COMPOSITE)

THE CINEMA STACK (LOCKED — APPENDS TO MOST PROMPTS)

KIE.AI API EXECUTION

REFERENCE IMAGE HANDLING

OPTIONAL HANDOFF — CINEMA WORLDBUILDER KIE

Similar Skills