From oblique-cowork
kie.ai image generation director. Generates photorealistic still image prompts and executes them via the kie.ai API. Three model paths: Nano Banana Pro (balanced, default — face locks, outfit plates, 6-panel sheets, scene plates), GPT Image 2 (highest fidelity — chest-up portraits and detail shots), and composite (Nano Banana Pro with multiple reference images — two-step outfit compositing). All paths use mid-gray seamless as the locked default backdrop. Requires KIE_API_KEY environment variable. After prompt approval, Claude executes the API call, polls for completion, and returns the output image URL. Use for new character builds, character/outfit refs, character sheets, scene plates, environment plates, detail shots, or any photorealistic still going into a kie.ai video workflow.
How this skill is triggered — by the user, by Claude, or both
Slash command
/oblique-cowork:banana-pro-director-kie-2.0The summary Claude sees in its skill listing — used to decide when to auto-load this skill
The locked image prompt grammar for great kie.ai image assets. Six modes, in strict order:
The locked image prompt grammar for great kie.ai image assets. Six modes, in strict order:
Plus two optional capabilities:
Photoreal is the universal default. Every prompt describes a real human (or real environment) in a real frame.
| Mode | kie.ai model | input.image_input |
|---|---|---|
| Mode 0A (Nano Banana face lock) | nano-banana-pro | [] (text-only) |
| Mode 0B (GPT Image 2 face lock) | gpt-image-2-text-to-image | n/a |
| Mode 0.1 (loose face plate) | nano-banana-pro | [] |
| Mode 0.2 (locked headshot from 0.1 plate) | nano-banana-pro | [step0.1_url] |
| Mode 1A (Nano Banana outfit, ref present) | nano-banana-pro | [char_ref_url] |
| Mode 1A (Nano Banana outfit, no ref) | nano-banana-pro | [] |
| Mode 1B.1 (neutral model outfit plate) | nano-banana-pro | [] |
| Mode 1B.2 (composite char + outfit) | nano-banana-pro | [char_ref_url, outfit_ref_url] |
| Mode 2 (6-panel sheet) | nano-banana-pro | [char_ref_url] or [char_ref_url, outfit_ref_url] |
| Mode 3A (char-in-scene) | nano-banana-pro | [char_ref_url] |
| Mode 3B (pure environment) | nano-banana-pro | [] or [env_ref_url] |
| Mode 4 (GPT Image 2 detail, with ref) | gpt-image-2-image-to-image | via input_urls |
| Mode 4 (GPT Image 2 detail, text-only) | gpt-image-2-text-to-image | n/a |
| Mode 5 (outfit replacement) | nano-banana-pro | [face_body_ref_url, outfit_ref_url] |
Aspect ratio defaults by mode:
2:3 (portrait)2:316:916:9 (default), or as specified by user2:3Resolution defaults: 1K for Nano Banana Pro. GPT Image 2 has no resolution param (fixed by model).
Before anything else, ask the user: does the character already exist, or are we developing them?
If the character exists: ask the user to drop the reference image URL(s) or attach the images. Study and lock — face, bone structure, skin tone, hair color and texture, identity markers, body proportions. Mirror back the locked spec in plain language so the user can confirm or correct before any prompt is built. Wait for confirmation, then proceed to Mode 1 or whichever mode the user asked for.
If the character is new: development happens in two stages — first a text spec, then a face-lock build via Mode 0.
Stage 1 — text spec: let the user describe the character. Mirror back a locked spec covering:
Wait for confirmation or correction. Then move to Stage 2 — Mode 0 face lock.
Three tool options (ask first):
Want to build this in Nano Banana Pro, GPT Image 2, or Nano Banana Pro two-pass? — Nano Banana Pro (default): balanced fidelity, single-pass. Works for most characters. — GPT Image 2 (highest fidelity): chest-up only, sharpest detail. Higher kie.ai cost. Best for tricky identity markers. — Nano Banana Pro two-pass (iteration path): loose plate first to explore the face register, then a second pass to lock finer detail. Slowest but allows iteration before committing.
Mention GPT Image 2 cost once per conversation, then drop it.
Once the character is locked, the FIRST image for any new outfit is a single-image base on mid-gray seamless. No 6-panel sheet before a base outfit reference exists.
Ask the user to describe the outfit. If they have wardrobe reference images, ask for the URLs.
Then ask which path:
Nano Banana Pro (prompt-driven, single pass) or Nano Banana Pro two-step (outfit on a neutral model first, then composite)?
Only after a single-image base reference has been generated. One prompt, one 16:9 image, 3×2 grid.
Always available. Never proposed proactively. Only when the user asks for a scene, environment, plate, or setting.
Only for chest-up portraits or detail face shots, and only when explicitly asked. Ask first: "want to run this on GPT Image 2 for the higher-fidelity face read? heads-up — GPT Image 2 costs more on kie.ai than Nano Banana Pro." Mention cost once per conversation. Wait for confirmation.
Every prompt — single image, 6-panel, scene plate, GPT Image 2 — gets a short "here's what I'm about to prompt, sound good?" check before the full prompt is written. Not optional.
Exception: minor iteration on a just-delivered prompt (pose tweak, lighting nudge, swap one wardrobe element) — skip the check and deliver the revised prompt directly.
Format: clean bullet points only. References listed first, always.
Pre-prompt check:
Close with a short question line. Wait for the green light. Then drop the full prompt in a fenced code block, then execute the API call.
No plastic. No CGI sheen. No 3D-render look. No commercial gloss.
Every image should read as a photograph — real camera, real subject. The character should look lived-in: real pore texture, peach fuzz, strand-by-strand hair, fabric with weight, jewelry with surface detail, eyes with reflection and depth.
The flattering-realism ceiling (LOCKED): Full skin realism always on — visible pore texture, peach fuzz, subsurface scattering, flyaways. But realism never means unflattering. No acne, no blemishes, no harsh pores, no scarring, no aggressive skin detail that reads as ugly or clinical. Texture is fine, soft, even, and natural. When realism and flattering seem to pull against each other, resolve toward fine-even-flattering.
Baked into every Nano Banana Pro and GPT Image 2 prompt.
1. Real human skin.
2. Real hair physics.
3. Real lens character.
4. Real light physics.
5. Real grain.
Night target: Justin Lin / James Wan / Greig Fraser — The Batman, Tokyo Drift, Furious 7, John Wick. Mostly dark, hard punchy practicals cutting through. NOT saturated-teal-everywhere, NOT bright-night.
A. EXTERIOR CANYON / OPEN NIGHT: Light exclusively from practical sources. Sky committed to deep crushed near-black. Faint horizon glow at very deep distance only. Atmospheric haze catches headlight beams as visible warm volumetric god rays. Everything outside the headlight throws falls into deep near-black shadow.
B. INTERIOR / URBAN / LIT NIGHT: Practical sources drive the look — sodium-vapor, fluorescent, neon, dash glow, brake lights. Teal-amber color split where motivated. Atmospheric haze gives light volumetric body.
Universal night rules: Deep cinematic contrast, practicals punch hard, atmospheric haze on, rim and edge light define subjects, skin tone preserved under practical key.
Mid-gray seamless is the locked default for all character work. Pure white only when the user explicitly asks.
Why gray: Lower subject-to-background contrast means cleaner edge extraction and less inherited contrast when the still seeds downstream video. Same principle as atmospheric perspective.
The background stays neutral; the character does not. Skin renders at its true natural skin tone, wardrobe at its true natural color — never cooled, never washed-out.
Lighting close for a mid-gray plate (lean soft Rembrandt grade):
Mid-gray seamless studio background — even neutral mid-gray, no seam line, no gradient, no falloff to black or white. Relight from scratch overriding any reference lighting: one broad diffused source from camera-[left/right] and slightly above, a soft triangle of light on the shadow cheek, gentle wrap onto the face, no hard shadow edges, no rim light, no hair light, no kicker. Skin reads matte and velvety — zero shine on forehead, nose bridge, cheekbones, temples, and chin, no oily T-zone — in a low-contrast milky look. Real peach fuzz at the jaw and hairline, real soft fine even pore texture, subsurface scattering reading as semi-translucent biology, warmth preserved and natural, never pale or washed-out or cool-shifted, never plastic, never waxy AI render, never glass-skin, never harsh — fine flattering texture that keeps the face looking good, no acne, no blemishes, no rough pores. Photographed on a 50mm prime at a wide aperture, natural round bokeh, even sharpness, soft natural film grain. Photographed not generated.
When the user provides reference images (as attachments or URLs), extract everything visible by visual description only — never use names, never invent details not in the image.
For each character, capture: hair (color, length, texture, styling, accessories), makeup (finish, brow, eye treatment, lashes, lip, cheek), wardrobe (every garment top to bottom — fabric, color, fit, structural details), jewelry (earrings, necklaces, rings, bracelets, bags), body markers (piercings, tattoos, nails — only if visible), pose and energy.
Naming rule (CRITICAL). Never use proper names in the prompt output. Refer by visual description: "the rose-pink haired woman in the cropped white ribbed tank." Visual descriptors survive across prompts; names do not.
Brand name rule (CRITICAL). No real brand names in prompt output. Use generic visual descriptors — "black three-stripe athletic sneakers" not brand names.
Age-blind rule. Never describe characters by age. Describe by role, build, and clothing.
Lean prompt rule. When reference images are provided, they carry the visual identity load. The prompt does NOT need to re-describe what the references already show. One distinguishing visual handle per subject is enough — the reference carries the rest. Put the load on what the prompt uniquely needs: composition, pose, light direction, wardrobe specific to this plate.
Single-pass Nano Banana Pro generation. No reference image input for a pure text-only build.
Canonical Step 0A prompt structure:
A clean cinema-character-reference 3:4 headshot, framed from forehead to upper chest with the face filling most of the frame. [Identity essentials — heritage, build, skin tone and finish, hair (color, length, texture), eye shape and color, any key identity markers: piercings with exact position and metal, scars with placement and size, beauty marks with placement]. She wears [a plain black thin-strap camisole / he wears a plain black ribbed tank], no jewelry, no logos, no graphics. Body squared to camera, head level, neutral relaxed expression, eyes to camera, lips closed and relaxed, subtle controlled energy.
Mid-gray seamless studio background — even neutral mid-gray, no seam line, no gradient, no falloff to black or white. Relight from scratch overriding any reference lighting: one broad diffused source from camera-[left/right] and slightly above, a soft triangle of light on the shadow cheek, gentle wrap onto the face, no hard shadow edges, no rim light, no hair light, no kicker. Skin reads matte and velvety — zero shine on forehead, nose bridge, cheekbones, temples, and chin, no oily T-zone — in a low-contrast milky look. Skin renders at its true natural skin tone and wardrobe at its true natural color, warmth preserved and natural against the neutral gray, never pale or washed-out or cool-shifted by the background. Real peach fuzz at the jaw and hairline, real soft fine even pore texture, subsurface scattering reading as semi-translucent biology, never plastic, never waxy AI render, never glass-skin, never harsh — fine flattering texture that keeps the face looking good, no acne, no blemishes, no rough pores. Photographed on a 50mm prime at a wide aperture, natural round bokeh, even sharpness, soft natural film grain. Photographed not generated.
API params: model nano-banana-pro, image_input: [], aspect_ratio: "2:3", resolution: "1K"
Single-pass GPT Image 2 generation. Chest-up framing only. Same identity essentials, wardrobe lock, mid-gray seamless, and soft lighting as Mode 0A — but routed through GPT Image 2.
API params: model gpt-image-2-text-to-image (no refs) or gpt-image-2-image-to-image (if user provides a rough reference), aspect_ratio: "2:3"
Step 0.1 — Loose Nano Banana Pro face plate (exploration):
Lean prompt, identity essentials only. No full cinema stack. Goal is to get a face candidate the user can select from.
A [heritage] [woman / man] with a [build], [skin tone and finish], [hair color, length, texture]. [Eye shape and color]. [Large/obvious identity markers only]. [She wears a plain black thin-strap camisole / He wears a plain black ribbed tank], no jewelry, no logos. Body squared to camera, neutral expression, eyes to camera.
Mid-gray seamless studio background, even neutral mid-gray, no seam line. Soft soft natural light from camera-[left/right], very diffused, no hard shadow edges. Skin renders at its true natural skin tone, warmth preserved and natural against the neutral gray. Skin reads matte and slightly diffused, clean and even. Chest-up framing. Real skin pore texture, fine peach fuzz, subtle subsurface scattering. Fine cinema grain. Photographic, not rendered.
API params: model nano-banana-pro, image_input: [], aspect_ratio: "2:3", resolution: "1K"
After delivery, user runs this on kie.ai. The output URL becomes the input for Step 0.2.
Step 0.2 — Nano Banana Pro 3:4 headshot lock:
Second-pass Nano Banana Pro using the Step 0.1 output URL as a reference image. Full identity lock including fine markers.
A clean cinema-character-reference 3:4 headshot of the same character as the reference image, framed from forehead to upper chest with the face filling most of the frame. [Full character descriptor — heritage, build, skin tone and finish, hair, face register (jaw, chin, lips, cheekbones, brow), eye shape and color, all identity markers: piercings with exact position and metal, scars with placement and size, beauty marks with placement, makeup register]. She wears [a plain black thin-strap camisole / he wears a plain black ribbed tank], no jewelry, no logos. Body squared to camera, head level, neutral relaxed expression, eyes to camera.
[Full lean Rembrandt grade close from the mid-gray seamless section above.]
API params: model nano-banana-pro, image_input: ["[step_0.1_output_url]"], aspect_ratio: "2:3", resolution: "1K"
Best for outfits where full prompt control gets the result in one clean shot.
Canonical Mode 1A prompt structure:
[Visual descriptor of the character — hair, makeup, full wardrobe head-to-toe, jewelry, body markers, extracted from references or locked from development]. [Pose direction — body angle, weight distribution, hand position, expression].
Mid-gray seamless studio background — even neutral mid-gray, no seam line, no gradient, no falloff to black or white. Relight from scratch: one broad diffused source from camera-[left/right] and slightly above, gentle wrap onto the figure, no harsh shadows, no rim light, no hair light, no kicker. Skin and fabric read matte and velvety in a low-contrast milky look, no shine. Skin renders at its true natural skin tone and the outfit at its true natural color, warmth preserved and natural against the neutral gray, never pale or washed-out or cool-shifted. Real peach fuzz at the jaw and hairline, real fine even pore texture, subsurface scattering as semi-translucent biology, real fabric weave and drape, never plastic, never waxy. Photographed on a 50mm prime at a wide aperture, natural round bokeh, even sharpness, soft natural film grain. Photographed not generated. [Framing — full body / waist-up / head-to-shoulders].
API params: model nano-banana-pro, image_input: ["[char_ref_url]"] (or [] if text-only build), aspect_ratio: "2:3", resolution: "1K"
Best for complex custom fits where the outfit should be designed separately from the character casting.
Canonical Step 1B.1 prompt:
A slim [woman / man] standing straight-on to camera in a relaxed neutral stance, weight evenly distributed, arms relaxed at sides, body squared to camera. [Medium-length natural medium-brown hair, simple straight or slight wave / short clean haircut, natural medium-brown color]. Clean even features, neutral natural skin tone, [light natural makeup / no makeup], neutral expression, eyes directly to camera. Slim model build with refined proportions. The figure wears [full outfit description — every garment top to bottom with fabric, color, fit, structural details, layering, hem positions, footwear, jewelry].
Mid-gray seamless studio background, even neutral mid-gray, no shadow falloff, no visible seam line. Soft soft natural light from camera-[left/right], very diffused, gentle wrap, no harsh shadows, no dramatic rim light. Skin and fabric read matte and slightly diffused, the outfit fully readable at its true natural color against the neutral gray. Full body framing from head to just below the footwear. Real fabric texture with visible weave detail, real weight, real drape. Fine cinema grain. Photographic, not rendered.
API params: model nano-banana-pro, image_input: [], aspect_ratio: "2:3", resolution: "1K"
User saves the output URL. That URL is the outfit reference for Step 1B.2.
Two reference images: the locked character reference URL and the outfit reference URL from Step 1B.1.
Canonical Step 1B.2 prompt:
Place the face and body from the first reference image onto the outfit from the second reference image. Mid-gray seamless studio background, even neutral mid-gray, skin and outfit at their true natural tone. Soft studio lighting.
API params: model nano-banana-pro, image_input: ["[char_ref_url]", "[outfit_ref_url]"], aspect_ratio: "2:3", resolution: "1K"
Only after a single-image base reference has been generated and approved. One prompt, one image, six panels in a 3×2 grid.
6-panel layout (3×2 grid):
Canonical Mode 2 prompt structure:
A 6-panel character reference sheet arranged as a 3-column by 2-row grid in a single horizontal frame, separated by thin clean white gutters between panels. Each panel shows the same single character — [full visual descriptor: build, face, hair, makeup, full wardrobe head-to-toe, all accessories, jewelry, body markers].
Panel 1 (top-left): Full body front — [stance, framing, what's readable].
Panel 2 (top-center): Side profile close headshot, left side — [tight crop from collarbone up, character's left profile facing screen-right, hair and ear and jaw geometry visible].
Panel 3 (top-right): Full body back — [stance, what's visible from behind].
Panel 4 (bottom-left): Side profile close headshot, right side — [tight crop from collarbone up, character's right profile facing screen-left, mirror of Panel 2].
Panel 5 (bottom-center): Front face close headshot — [tight crop from collarbone up, body squared to camera, face filling the frame, eyes to camera].
Panel 6 (bottom-right): Detail shot — [the locked detail close-up: nails / specific jewelry piece / piercing / tattoo / held prop].
Mid-gray seamless studio backdrop applied uniformly across all six panels — even neutral mid-gray, no seam line, no gradient. Relight from scratch uniformly: one broad diffused source from camera-left and slightly above, gentle wrap, no harsh shadows, no rim light, no hair light, no kicker. Skin and fabric read matte and velvety in a low-contrast milky look, at their true natural tone, warmth preserved and natural, never cool-shifted. Sharp focus across every panel. Real fine even pore texture, peach fuzz at the hairline, subsurface scattering, real fabric weave, soft natural film grain. Identical character identity locked across all six panels — same face, same skin, same hair, same wardrobe, same accessories in every cell. Photographed not generated.
API params: model nano-banana-pro, image_input: ["[char_ref_url]"], aspect_ratio: "16:9", resolution: "1K"
Two flavors — character-in-environment (3A) or pure environment (3B). Same five cinema modes as cinema-worldbuilder:
| Scene type | Cinema mode |
|---|---|
| Real-world dramatic (street, kitchen, car, bar, interior/exterior) | M1 — Narrative |
| Studio / editorial / clean set / fashion film | M2 — Studio / Editorial |
| Action / combat / chase / high-energy | M3 — Action / Combat |
| Performance / concert / stage | M4 — Performance / Concert |
| Atmospheric / empty / no-humans / weather plate | M5 — Atmospheric / Empty |
Mode 3 prompts are written in cinema-prose register — five paragraphs, no labeled headers, woven into continuous observational prose. See the Mode 3 — Cinema-Prose Register section below.
API params (Mode 3A): model nano-banana-pro, image_input: ["[char_ref_url]"], aspect_ratio: "16:9", resolution: "1K"
API params (Mode 3B): model nano-banana-pro, image_input: [], aspect_ratio: "16:9", resolution: "1K"
Five-paragraph prose structure (woven, no labeled headers in the output):
Paragraph 1 — Opening shot description. One long sentence: medium ("a cinematic anamorphic still"), framing register, subject identification at high level, camera position and angle in prose, mood/intent.
Paragraph 2 — Character block. Identity markers from the reference written as visible facts ("dark layered mid-length tousled fringe falling across the back of his head, small silver hoop earrings catching faint warm spill, warm fair matte skin"). Pose, attention, held props woven in naturally.
Paragraph 3 — World/environment block. Location as ambience, not architecture. The space's register. Anchor to provided reference if one exists. Background subjects get positional language, not coordinates.
Paragraph 4 — Subject anchor block. The focal anchor of the shot — TV broadcast, second car in BG, dawn horizon, signage. If no focal anchor beyond the character, fold into Paragraph 3.
Paragraph 5 — Camera spec + finish. Full cinema look in one continuous descriptive paragraph: capture register, lens character, diffusion/filtration, film-stock rendition, grain, grade, color cast, optical character — all in plain-language look terms, never brand names — plus the closing realism clause ("Real photographic frame captured on a real cinema camera... no CGI, no rendered look, no plastic surfaces, no AI smoothness, no skin smoothing").
Resolution-aware detail rule: Describe only what the camera at this distance, lens, and motion register would physically resolve. A car shot from 200 feet up → reads as silhouette + color blocks + headlights. A person at 50 yards → reads as silhouette + hair color + wardrobe color blocks. Detail is earned by proximity, lens length, motion stillness, and lighting intensity.
Chest-up portraits and detail face shots only. User must explicitly ask.
Use the GPT Image 2 prompt grammar: same identity essentials, wardrobe lock, mid-gray seamless, and soft lighting as Mode 0A. Chest-up framing. Full cinema stack appended.
API params (with reference image): model gpt-image-2-image-to-image, input_urls: ["[char_ref_url]"], aspect_ratio: "2:3"
API params (text-only): model gpt-image-2-text-to-image, aspect_ratio: "2:3"
Two reference images: the face/body reference and the outfit reference. Single locked prompt.
Canonical Mode 5 prompt:
Place the face and body from the first reference image onto the outfit and pose from the second reference image. Mid-gray seamless studio background, even neutral mid-gray, skin and outfit at their true natural tone. Soft studio lighting. Skin reads matte, fine and even, real peach fuzz, no plastic, no AI render. Photographed not generated.
API params: model nano-banana-pro, image_input: ["[face_body_ref_url]", "[outfit_ref_url]"], aspect_ratio: "2:3", resolution: "1K"
Every non-Mode-3 prompt ends with this single merged cinema stack:
Real human skin captured on a real cinema camera — refined and real, peach fuzz catching light along the jawline and hairline, real natural pore texture soft fine and even, subsurface scattering at ear edges, nostrils, and around the eye sockets with warm undertone bleed reading as semi-translucent biology never opaque plastic. No retouching, no skin smoothing, no porcelain plastic look, no waxy AI render, no blemishes, no acne, no marks, no enlarged or rough pores, no harsh clinical texture — fine flattering even skin that always looks good, no dewy wet finish, no glass-skin, no highlighter glow. Hair rendered strand by strand with realistic flyaways and baby hairs at the hairline, hair physics responding to the actual environment of the scene. Fabric with real weave detail, real weight, real drape. Captured with a wide-latitude cinema look, lens character matched to the shot — a clean fast normal prime around a 50mm full-frame field of view at a wide aperture for portraits and character canonicals giving natural round bokeh and even sharpness, OR a vintage 2x anamorphic character for scene plates giving oval bokeh, a gentle horizontal squeeze on out-of-focus highlights, soft frame-edge falloff, organic optical imperfection toward the edges, a light diffusion bloom lifting highlights into a soft halation, and subtle horizontal streak flares on point light sources. Shallow depth of field with strong foreground-to-background separation. True atmospheric perspective with visible haze and air density between planes — distant elements rendered softer, desaturated, and lower contrast than foreground. Key light wrapping around subjects with physically accurate shadow falloff into the neck, jawline, ear shadow, nostril shadow, lip shadow — soft transitions never hard edges. Highlights rolled off gently in a filmic curve, never clipping to pure white. Lifted blacks, wide dynamic range. Color-negative motion-picture film look — daylight-balanced for day registers, tungsten-balanced and pushed for night, fine theatrical 35mm film grain across the entire frame. No HDR overprocessing, no digital oversharpening, no plastic skin rendering — photographed not generated, captured on a real camera by a real cinematographer on a real set.
Modal application:
After the prompt is approved, Claude executes the generation via kie.ai API using Bash.
Requirement: KIE_API_KEY must be set as an environment variable. If missing, prompt Sean to run: export KIE_API_KEY="your_key_here" (key available at kie.ai/api-key).
Step 1 — Check API key:
if [ -z "$KIE_API_KEY" ]; then echo "KIE_API_KEY not set. Run: export KIE_API_KEY=your_key"; fi
Step 2 — Show the API call being made. Before executing, display to Sean:
Ask: "Ready to fire this at kie.ai?" — then execute.
Step 3 — Create task (Nano Banana Pro):
KIE_RESPONSE=$(curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
-H "Authorization: Bearer $KIE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nano-banana-pro",
"input": {
"prompt": "PROMPT_HERE",
"image_input": [],
"aspect_ratio": "2:3",
"resolution": "1K",
"output_format": "png"
}
}')
echo "$KIE_RESPONSE"
TASK_ID=$(echo "$KIE_RESPONSE" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('data',{}).get('taskId',''))")
echo "Task ID: $TASK_ID"
Step 3 alt — GPT Image 2 (text-to-image):
KIE_RESPONSE=$(curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
-H "Authorization: Bearer $KIE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-2-text-to-image",
"input": {
"prompt": "PROMPT_HERE",
"aspect_ratio": "2:3"
}
}')
TASK_ID=$(echo "$KIE_RESPONSE" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('data',{}).get('taskId',''))")
Step 3 alt — GPT Image 2 (image-to-image):
KIE_RESPONSE=$(curl -s -X POST "https://api.kie.ai/api/v1/jobs/createTask" \
-H "Authorization: Bearer $KIE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-2-image-to-image",
"input": {
"prompt": "PROMPT_HERE",
"input_urls": ["REF_URL_HERE"],
"aspect_ratio": "2:3"
}
}')
TASK_ID=$(echo "$KIE_RESPONSE" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('data',{}).get('taskId',''))")
Step 4 — Poll for completion:
echo "Polling task $TASK_ID..."
for i in $(seq 1 60); do
POLL=$(curl -s "https://api.kie.ai/api/v1/jobs/recordInfo?taskId=$TASK_ID" \
-H "Authorization: Bearer $KIE_API_KEY")
STATE=$(echo "$POLL" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('data',{}).get('state','waiting'))" 2>/dev/null)
echo "[$i] State: $STATE"
if [ "$STATE" = "success" ]; then
OUTPUT_URL=$(echo "$POLL" | python3 -c "
import sys, json
d = json.load(sys.stdin)
result = json.loads(d['data']['resultJson'])
print(result['resultUrls'][0])
")
echo "Done! Image URL: $OUTPUT_URL"
break
elif [ "$STATE" = "fail" ]; then
FAIL_MSG=$(echo "$POLL" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('data',{}).get('failMsg','unknown error'))")
echo "Generation failed: $FAIL_MSG"
break
fi
sleep 10
done
Step 5 — Return the result. After polling completes, tell Sean:
For multi-step modes (0.1/0.2, 1B.1/1B.2): after Step 0.1 or 1B.1 completes, present the output URL to Sean and ask: "happy with this? If so, I'll use it as the reference for Step 0.2 / Step 1B.2." Only proceed to the second step with Sean's confirmation.
Input format: Reference images must be provided as publicly accessible URLs (not local file paths). This is the natural format since kie.ai output URLs are publicly accessible for 24 hours after generation.
First generation (no existing references): Run as a text-only generation (image_input: []). The output URL becomes the character reference for subsequent generations.
Subsequent generations: Use the output URLs from previous kie.ai generations as reference image inputs — they're already in the right format.
If Sean has a local image to use as a reference: Ask him to provide it as an attachment in the Claude conversation. Read it, then note that for the actual API call, it needs to be accessible as a URL. Recommend saving it somewhere accessible (Dropbox public link, Google Drive public link, etc.) and providing that URL. Or offer to skip using it as a reference and build from the text spec instead.
Nano Banana Pro: image_input accepts up to 8 URLs. For compositing (Modes 1B.2, 5): image_input: ["char_url", "outfit_url"].
GPT Image 2 i2i: input_urls accepts up to 16 URLs.
After generating a still image via this skill, the output URL can be used directly as a reference image in cinema-worldbuilder-kie-2.0 for Seedance video generation. The two skills share the same five cinema modes — when paired, the still and the video share visual DNA.
If Sean mentions wanting to animate a still or create a video of a character, suggest handing off to cinema-worldbuilder-kie-2.0 and note that the kie.ai output URL from this generation can be used directly as the first frame or character reference.
npx claudepluginhub seanng23/oblique-power-skills --plugin oblique-coworkSets up isolated workspaces using native worktree tools or git worktree fallback. Use before starting feature work to protect the current branch.