From pika
Edits any part of a video (background, outfit, lighting, weather) from a free-form prompt while preserving original face, motion, speech, and audio using gpt-image-2 and Kling reference-video.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pika:fix-my-lookThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Edit the source's first usable frame with `gpt-image-2` from the user's prompt,
Edit the source's first usable frame with gpt-image-2 from the user's prompt,
then propagate that look across the clip with kling reference-video while
locking the original face, motion and audio via the original video + audio as
references. All prep happens in one mcp__plugin_pika_pika__normalize_video
call for short clips, or one normalize call per segment for longer clips. The
output ratio uses the normalized clip's closest supported output ratio; this
skill does NOT reframe the source video.
<source> — path or URL to a video file with audio<change_prompt> — what to change (e.g. "make it night with neon lights",
"change my shirt to a leather jacket", "put me on a beach in Hawaii")Working dir: ~/Downloads/fix-my-look/<run-id>/.
Use the mcp__plugin_pika_pika__* names below as the canonical plugin
namespace. If the host exposes the same tools under a local namespace such as
mcp__pika-mcp__* or mcp__pika-prod__*, map by tool suffix and keep the same
arguments.
Start a timer when the source and change prompt are known. Before paid
generation, call mcp__plugin_pika_pika__estimate_cost for the planned
mcp__plugin_pika_pika__generate_image,
mcp__plugin_pika_pika__generate_reference_video, any multi-segment
mcp__plugin_pika_pika__edit_concat, and any optional audio/lipsync repair
call. If cost is not surfaced by the host, say
Cost not surfaced by this harness in the final report instead of guessing.
When any tool returns a
task_id, copy the exact value into the run notes and reuse it verbatim; do not
hand-type long JWT-style task IDs.
Local file? mcp__plugin_pika_pika__upload_asset it first; an HTTPS media URL
passes directly. Decide the source windows before normalizing: use one 14.8s
window for sources <=15s, and split longer sources into ordered 14.8s windows.
Call
mcp__plugin_pika_pika__normalize_video(video_url=<source>, start_s=<offset>, max_duration_s=14.8, extract_audio=true, extract_face_frame=true)
once per window. Use the first window's face_frame_url for the edited still;
use each window's video_url as that segment's motion/identity reference. For
multi-window clips, also call
mcp__plugin_pika_pika__extract_audio_from_video(video_url=<source>) so the
final merged output can be restored to one continuous source audio track.
Wire the result into the rest: face_frame_url is the Step 2 edit target;
each normalized video_url is Kling's reference for that segment in Step 4;
set aspect_ratio = result.aspect_ratio ?? result.closest_aspect_ratio for
each normalize result, then carry that local aspect_ratio through the image
and video calls. If neither field is present, stop and report that normalization
output is missing an aspect label. Compute
duration = max(4, min(15, round(duration_s))) per segment, and use
resolution="720p"
unless the user asked for high res. If face_found is false, no clear face was
found and face_frame_url fell back to the t=0 frame — proceed but warn
identity may drift, or re-run with a start_s at a section where the subject
faces camera.
Reference-video providers can reject oversized reference assets. If the
normalize result or the downstream provider error shows a normalized video is
over the provider limit, retry mcp__plugin_pika_pika__normalize_video once
with crf=28 and the same start_s, max_duration_s, extract_audio, and
extract_face_frame values. If the reference is still too large, stop before
another paid video attempt and report that
mcp__plugin_pika_pika__normalize_video needs a worker-side 1080-edge /
reference-size cap. Do not patch this with local shell media commands.
mcp__plugin_pika_pika__generate_image with provider="gpt-image-2",
aspect_ratio=<aspect_ratio>, resolution="2K",
reference_images=[<face_frame_url>], quality="high", prompt:
"Modify the reference photograph as follows:
<change_prompt>. Keep the person's face, identity, hair, body and pose EXACTLY as in the reference. CRITICAL: preserve every object the subject is holding or touching — phones, products, drinks, bags, props, jewelry — in the exact same hand, position, orientation and scale; never remove, replace or restyle them. Change only the requested scene, background, clothing, lighting or environment, not who the person is."
Keep the "preserve held objects" clause verbatim on every re-render — without it gpt-image-2 silently drops products/phones the subject is holding.
If gpt-image-2 returns a content-policy false positive for fashion, glam, or beauty prompts, retry once with the same intent but a modest / editorial wording such as "polished event styling, opaque clothing, natural pose, non-sexual fashion portrait". For makeup prompts, explicitly preserve the original eye shape, eyelids, iris color and gaze; heavy eyeliner/eye shadow is a high-risk identity-drift source.
Surface the edited frame and STOP. Ask "Approve for video generation, or tweak and re-render?" Do NOT call video generation until approved. For tweaks, re-run Step 2 (locked clauses verbatim) and loop.
For each normalized segment, call mcp__plugin_pika_pika__generate_reference_video
with provider="kling", reference_videos=[<segment video_url>],
reference_images=[<edited_frame_url>], aspect_ratio=<aspect_ratio>,
duration=<segment duration>, sound=false, video_keep_sounds=[true],
prompt:
"Apply the change shown in <<<image_1>>> to <<<video_1>>>. Keep the person in <<<video_1>>> with the EXACT same face, identity, expressions, motion and timing; preserve the original video's kept sound track. The new scene/background/clothing/lighting should match <<<image_1>>>. CRITICAL: preserve every object the subject is holding or touching in <<<video_1>>> — phones, products, drinks, bags, props — in the same hand and orientation every frame. Keep mouth motion active through the final frame when the person is speaking. Do not alter the person's identity."
Append any extra creative direction (e.g. "very cinematic, soft golden light") after the locked text — never replace it.
Do not pass sound=true to Kling with a video input. Kling rejects that
combination with error:1201 sound on is not supported with video input; use
sound=false plus video_keep_sounds=[true] to keep the source video's audio.
If the source was split into multiple windows, call
mcp__plugin_pika_pika__edit_concat(video_urls=[<segment outputs in order>]).
After concat, run
mcp__plugin_pika_pika__edit_audio_replace(video_url=<concat_url>, audio_url=<full_source_audio_url>, duration_policy="video")
when the merged output audio is missing, drifted, or discontinuous.
Only try Seedance if the user explicitly asks for it, or if Kling fails and a second provider attempt is useful. Use the same segmenting rule and record the provider error plainly if Seedance rejects the input or drops speech/action.
Async handling: if any call returns a {task_id, status} envelope, poll
mcp__plugin_pika_pika__task_status({task_id}) in a tight loop until terminal.
Before reporting success, verify the generated video against the source:
PASS.If the video is visually acceptable but speech audio is missing, incomplete, or drifted, offer one paid repair pass:
mcp__plugin_pika_pika__edit_audio_replace(video_url=<generated_video_url>, audio_url=<full_source_audio_url or segment_audio_url>, duration_policy="video")mcp__plugin_pika_pika__edit_lipsync(video_url=<audio_restored_url>, audio_url=<full_source_audio_url or segment_audio_url>, variant="v2-pro")If the model froze the mouth near the end, do not keep escalating to sync-3
automatically; lip-sync cannot reliably recover a face track with no mouth motion.
Offer trim / regenerate instead.
Download the result to ~/Downloads/fix-my-look/<run-id>/result.mp4 and return
that path plus the final report fields: source, edited frame URL, final video
URL, provider, job/task IDs, cost estimate or not surfaced, elapsed
time, QA notes, and follow-up issue.
| Symptom | Cause | Fix |
|---|---|---|
| Output face drifts from the original | gpt-image-2 over-edited the face OR the provider under-weighted the source video | Re-run Step 2 with a stronger "keep the face the same" clause; soften change_prompt. |
| Output looks like the original (no change) | Edited image too similar, OR you passed the raw frame not the edited output | Re-run Step 2 with a more dramatic prompt; confirm the edited frame URL. |
| Output aspect doesn't match source | Source aspect not in {16:9, 9:16, 1:1, 4:3, 3:4} | Step 1 returns aspect_ratio, or closest_aspect_ratio on older worker payloads; use it as the closest supported output label and ask the user for exotic aspects. |
| Provider rejects the normalized video as too large | normalize output can remain too large for 4K/iPhone sources | Retry normalize once with crf=28; if still too large, stop and file worker follow-up for a 1080-edge / reference-size cap. |
| Long source only returns the first short window | The caller normalized once with max_duration_s=14.8 and skipped segmenting | Split into 14.8s windows, generate each segment, then mcp__plugin_pika_pika__edit_concat in order and restore full source audio if needed. |
| Speaking clip loses sound, drops words, or freezes mouth at the tail | Provider regenerated speech/audio instead of preserving the source, or the face track has no mouth motion to drive | Mark as not pass. Offer one mcp__plugin_pika_pika__edit_audio_replace + mcp__plugin_pika_pika__edit_lipsync repair pass; if tail mouth motion is frozen, offer trim/regenerate instead. |
| Approved frame fix disappears in the video | Provider propagation reintroduced the original artifact | Re-render from a stronger approved frame or mark provider propagation caveat; do not claim the frame correction shipped. |
Kling rejects with error:1201 sound on is not supported with video input | sound=true was passed with a video reference | Retry the Kling call with sound=false and video_keep_sounds=[true]; do not use reference_audio for Kling video input. |
| Kling output is shorter than the normalized source | Provider returned a shorter render, or the caller accidentally passed a trimmed reference | Do not mark pass. Compare output duration to the normalized source, then regenerate that segment or ask the user for a shorter window. |
npx claudepluginhub pika-labs/pika-plugins --plugin pikaPerforms video-to-video transformations via fal.ai APIs: Kling O1 editing (style/object/general), Sora Remix, upscaling (2x/4x), frame interpolation, style transfer (anime/painting/noir). Ensures flicker-free results.
Provides prompting techniques for AI video generation models on Replicate. Covers scene description, camera direction, and cinematography language for better video outputs.
Creates and edits videos with filters, effects, transitions, subtitles, speed changes, trimming, merging, and composition utilities. Useful for programmatic video generation and editing tasks.