From agent-media
Lip-sync a face image or video clip to a user-uploaded audio track, producing a 9:16 talking-head video. No TTS or voice cloning.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-media:make-lip-syncThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Bring your own audio: lip-sync a face (an R2-hosted image / character sheet, OR an existing clip) to a provided audio track. No text-to-speech or voice cloning — the character speaks your uploaded recording. Output is a 9:16 talking-head video.
Bring your own audio: lip-sync a face (an R2-hosted image / character sheet, OR an existing clip) to a provided audio track. No text-to-speech or voice cloning — the character speaks your uploaded recording. Output is a 9:16 talking-head video.
Call this skill when the user asks for the outcome described above. It runs on the agent-media vNext primitive runtime via the mcp__agent-media__make_lip_sync MCP tool. Authentication is the user's existing agent-media Bearer token (issued by agent-media login).
Preferred path: MCP tool mcp__agent-media__make_lip_sync. Schema is auto-published via tools/list against the same MCP server, so don't restate the schema here — trust the server's response.
Fallback path: REST.
POST https://api.agent-media.ai/v1/skills/make_lip_sync/run
Authorization: Bearer $AGENT_MEDIA_API_KEY
Content-Type: application/json
Idempotency-Key: <any unique string per intent>
{
"image_url": "https://pub-...r2.dev/vnext/primitive-runs/<id>/character-sheet.png",
"audio_url": "https://pub-...r2.dev/vnext/<your-uploaded-audio>.mp3",
"duration": 10,
"aspect_ratio": "9:16"
}
140/280/420 (5s/10s/15s)420–480sGET https://api.agent-media.ai/v1/primitives/runs/<run_id>
Authorization: Bearer $AGENT_MEDIA_API_KEY
agent-media login.This file is auto-generated by scripts/generate-public-skill.ts from the registry at services/api-v2/src/skills/registry.ts. Do not hand-edit; CI rejects drift.
npx claudepluginhub gitroomhq/agent-media --plugin agent-mediaGenerates a vertical selfie video with lip-synced audio from a text description or portrait. Accepts text description, portrait URL, or uploaded image; auto-generates missing portrait and character sheet.
Adds facial animation and lipsync to characters: phoneme-driven mouth movement and emotional expressions (smile, frown, surprise). Works with Rhubarb Lip Sync, ARKit visemes, and audio from TTS.
Pipeline for AI avatar video production: single avatar, translation, batch, and hybrid real+AI workflows. Tools: HeyGen, Synthesia, ElevenLabs, Captions, Rask AI, Vbee. Includes voice cloning, anti-detection, and VN ethics compliance.