From creative-media-generation
Use for any request to create, generate, or edit media — images, product photos, social ads, A/B ad variants, short videos, talking-head / avatar / presenter videos, video messages, translating or dubbing an existing video, music, jingles, sound effects, ambience, reusable visual styles, or face/voice identities. Use even when the user explicitly names HeyGen or Higgsfield. Just ask in plain language. This is the single entry point for making media: it understands the request, grounds it in the project's brand and design, makes the actual prompt or script, confirms before spending any credits, creates the asset, and saves the result. For web design systems use site-designer; for slide or carousel layouts use the template-mediated visual-asset bridge; for page or offer strategy use conversion-designer or funnel-strategist; for copy use content-creation-framework.
How this skill is triggered — by the user, by Claude, or both
Slash command
/creative-media-generation:creative-media-generationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Just ask in plain language. For example:
MANIFEST.yamlREADME.mdknowledge/content-shapes.mdknowledge/format-specs.mdknowledge/lever-map.mdknowledge/prompt-craft.mdknowledge/shot-planning.mdknowledge/spend-safety.mdknowledge/style-pack-schema.mdreferences/heygen-provider.mdreferences/higgsfield-provider.mdreferences/onboarding.mdreferences/project-workspace-contract-v2.mdreferences/provider-routing.mdtemplates/identity-ref.mdtemplates/learning-log.mdtemplates/optimized-prompt.mdtemplates/style-pack.mdJust ask in plain language. For example:
You do not need to name a provider, pick a tool, or use any commands — describe what you want and this skill takes it from there. It always tells you what it is about to make and which wallet it will spend from, and asks for a yes before spending any credits.
The brain and router for the iurFriend media-asset production layer. This skill
understands what is being produced and for whom, grounds it in the ecosystem's
brand / design / conversion artifacts, routes to the right provider sub-skill,
discovers and applies that provider's levers, runs a lightweight conversational
spend check, and normalizes the result into the slim durable media/ artifact
contract. It does not execute raw generation itself — that is delegated to the
vendored provider skills (HeyGen, Higgsfield) inside this plugin.
Two core jobs:
The advertised front door, the spend-safety prose, billing-pool disclosure, lever discovery, and artifact normalization live only here. The vendored provider skills can execute spend; they hold none of that. The no-bypass rule is a convention, not hard enforcement:
There is no cost_authorization_token a vendored skill must hold — metering and
token-minting are exactly the engineered governance surface we deliberately dropped
(see knowledge/spend-safety.md). A determined direct invocation of a vendored
sub-skill could still spend; the protection is advertised/social, not mechanical.
That is the right weight for a single-user conversational tool.
Startup preflight = read MANIFEST.md + run detect-only provider discovery, before
routing or asking. The two preflight reads are unordered with respect to each other —
both complete before any routing decision or user question — and are detailed below as
Phase 0 (provider discovery) and Phase 0.5 (manifest-first grounding).
references/provider-routing.md states the same ordering.
Before routing anything, discover which providers are configured this session.
Each probe is detect-only — no spend, no value printed — and resolves to
usable | not-usable. The three HeyGen signals are independent — do not conflate
them:
| Signal | Probe | What it means |
|---|---|---|
heygen_mcp_usable | mcp__heygen__* tools present (e.g. mcp__heygen__get_current_user callable) | the HeyGen MCP route is usable now |
heygen_cli_usable | the heygen binary is installed and authed: heygen --version exits 0 and heygen auth status authenticated | the HeyGen CLI route is usable now |
api_key_present | HEYGEN_API_KEY present ([ -n "${HEYGEN_API_KEY+x}" ]; value never printed) | footgun signal ONLY — triggers the API-key footgun stop. NOT a "configured"/usable signal on its own. |
| Higgsfield · CLI | higgsfield --version exits 0 and higgsfield auth status authenticated | the Higgsfield CLI route is usable now |
HeyGen is "configured/usable" only when heygen_mcp_usable OR heygen_cli_usable
is true — never on api_key_present alone. A bare HEYGEN_API_KEY with no usable
MCP surface and no installed+authed CLI does not make HeyGen usable; it only arms
the footgun stop. Discovery never authenticates on the user's behalf and never stores
anything (R50). If a HeyGen route is usable and api_key_present is also true, do
not pick silently — that feeds the API-key footgun stop (see Spend safety).
Full discovery + onboarding-fallback detail: references/provider-routing.md and
references/onboarding.md.
MANIFEST.md at project root. Parse the entries: block (manifest-first).brand/<brand>/ → BRAND.md, VOICE.md, AUDIENCE.md (subject, tone, who it's for).design/<instance>/ → DESIGN.md + design-tokens.json (palette, mood, type feel).conversion/<instance>/ → CTA / proof / objections (ad creative + page heroes).media/_styles/ style packs and media/_identities/ identities to reuse.media/<instance>/ write location. Single-instance projects may use
media/default/.MANIFEST.md yetA fresh project may have no MANIFEST.md and no media/ zone. This is not an
error — proceed gracefully:
brand/, design/, or
conversion/ to read) — just ask the user the few things you need instead (P6).media/<instance>/
instance directory (default media/default/) and create a minimal MANIFEST.md
at the project root if none exists, adding a single entry for the new media
instance (the learning log is the per-instance handle). Keep it minimal — one
entries: row — not a full project scaffold.Reading the project root (
MANIFEST.md,brand/,design/,conversion/,media/) is a project-artifact read via the sanctioned project surfaces — it is conformant with R63, not a violation. Skill-resource reads (this skill's ownreferences/,knowledge/,templates/) resolve inside the plugin dir.
After discovery, route. The ladder, in one line: discover → degrade-to-available → honour-explicit-if-capable → classify-by-content-shape → ask-if-ambiguous.
references/onboarding.md.Content-shape → provider (full lever map in knowledge/lever-map.md):
| Content shape | Provider |
|---|---|
| Presenter-led / talking-head / avatar / video message | HeyGen |
| Translate / dub / lip-sync of an existing video | HeyGen |
| Brand / product / editorial image · product shot · visual variant | Higgsfield |
| Non-presenter short video | Higgsfield |
| Reusable image style · face-consistent generative identity | Higgsfield |
| Original music / jingle / SFX / ambience (audio generation) | Higgsfield (higgsfield-generate → sonilo_music music · mirelo_text_to_audio SFX) |
| Animated explainer · kinetic typography · lower-third · title / chapter card · motion data-viz · 3D spin · Lottie clip (motion graphics / animation) | Remotion (remotion-motion — local studio) |
If the user wants a new video in another language (not a translation of an existing one), generate the original presenter video in that language via HeyGen rather than translating an English one.
Presenter vs ad-creative overlap (the product-demo trap). Both providers can make "a video about a product", so disambiguate by whether a specific human presenter is the primary asset:
Motion graphics vs the cloud siblings (the animation lane). When the request is an
animation / motion-graphics piece — an animated explainer, kinetic-typography intro,
lower-third, title or chapter card, motion data-viz, 3D spin, or Lottie clip — route to
remotion-motion. The three siblings differ by how the asset is made:
remotion-motion = deterministic, code-defined, brand-exact motion graphics +
animated / data-driven overlays (Claude writes and renders code; the same source
reproduces the video exactly). Reach for it when the user wants precise, on-brand,
text- or data-driven motion rather than generative footage. The user never says
"Remotion" — route by intent and keep the "just ask, don't pick an engine" UX.Unlike the cloud executors, remotion-motion runs a local Bash/Node pipeline: on
first use it runs an environment preflight + guided onboarding (Node / git checks; it
auto-runs what it can and transmits OS-specific copy-paste commands for the rest) and a
one-time license acknowledgment, then scaffolds + renders locally. The orchestrator
still owns the durable artifact write and MANIFEST.md indexing (P11; see the artifact
contract below); the executor produces the composition source + render.
Never silent: never silently switch providers. If the preferred route is unavailable, explain the unavailable route, the proposed fallback, and the billing difference, then ask. If the user says "always use X for this project/session", honour it until unavailable/unsafe/superseded and record it on the optimized-prompt surface or the learning log — not a hidden global setting.
Provider not set up yet (needs-setup state). If the best-fit provider's CLI is not
installed/authed — including when a vendored skill reports back a "Higgsfield CLI not
set up" needs-setup state instead of running — the orchestrator owns the install
decision. Do not let a vendored skill silently auto-install anything. Instead,
explain in plain language that the provider tool needs a one-time setup, show the
recommended setup step from references/onboarding.md, and ask the user for
permission before any install runs. Proceed only on a yes.
Routing by content shape is necessary but insufficient. The value is knowing what levers each provider exposes for that shape — a style reference, a brand kit, a brand glossary, an identity/character, a voice, an aspect/format, a quality tier — then discovering the user's available levers and applying the right one. "Discover and apply the right levers" is a first-class orchestrator job, not an afterthought.
list_* calls: list_brand_kits,
list_brand_glossaries, list_avatar_groups / list_avatar_looks, list_voices.
Apply by passing the resolved brand_kit / glossary / avatar_id / voice_id into
the HeyGen call.brand_kit_id / ad_reference_id / soul_id and passing aspect /
quality.The style-pack and identity artifacts ARE the provider-neutral abstraction over these
levers — captured once, mapped into each provider's native knobs, reused many times.
This is the deeper reason they survive the slim artifact set. Where a provider lacks an
equivalent lever, say so and do not invent parity (HeyGen has no Higgsfield-style
soul_id image-style lever; Higgsfield has no presenter avatar/voice identity). The full
discover→apply procedure is in knowledge/lever-map.md.
Spend safety is two plain-language checks, detailed in knowledge/spend-safety.md.
There is no engineered cost gate, no cost_authorization_token, no metering, no
budgets, no reconciliation, and no tracking integration — programmatic cost metering is
infeasible over an MCP/CLI (we never see the provider API call) and would only ever help
the one route (direct API) v1 does not use.
style extract is no-spend but
still needs rights/approval + consent for identifiable people.HEYGEN_API_KEY is detected
(presence-only via [ -n "${HEYGEN_API_KEY+x}" ]; value never printed, echoed,
hashed, logged, or inspected), the orchestrator does not auto-proceed on the CLI/API
route the way the bare HeyGen skill would (its rule is "API-key presence is an explicit
signal … No question asked" — the orchestrator overrides that). Instead it STOPS
and: (a) discloses the billing pool — HeyGen API balance (pay-as-you-go) on the
CLI/API route vs plan credits via MCP OAuth; (b) asks for an explicit choice —
authorise API-balance spend on this route OR clear/ignore the key and route through MCP
plan credits; (c) only then proceeds. Key presence is never an auto-green-light. This
is a question, not code. This runs FIRST, before the HeyGen transport ladder.Everything else is plain prose:
userConfig
sensitive: true keychain key for the bundled connector — never hardcoded); HeyGen CLI =
HEYGEN_API_KEY detected-only, never printed; Higgsfield = delegated CLI OAuth. No
~/.skills/creative-media-generation/credentials.json in v1; no secrets in chat; no
project .env walk-up; no secret in the bundled MCP config source.brand/, design/, conversion/ (P3 — search
before asking).Do not just pass the user's words to a model. Make it good by construction, then hand off:
knowledge/format-specs.md; script/pacing formulas in
knowledge/content-shapes.md.style capture first.knowledge/prompt-craft.md. This is judgment over a rubric — never a quality score
or PASS/FAIL gate. Show the user what is going out and why (the optimized-prompt
surface).media/<instance>/assets/..meta.yaml sidecars.artifact-reviewer with the
generated_media_reviewer briefing. Surface findings; never self-greenlight.The concrete seam between governing (here) and executing (the vendored skill).
Inputs the orchestrator passes to the vendored skill (per run):
| Input | Meaning |
|---|---|
objective | what is being produced, for which brand / channel / campaign |
grounded_prompt / script | the brand-grounded, prompt-crafted instruction (orchestrator-authored; R32) |
output_dir | the raw-asset write target — media/<instance>/assets/ ONLY |
provider · transport | the resolved route (heygen_mcp / heygen_cli / higgsfield_cli) |
style_id / identity_id + resolved levers | provider-neutral handles + the discovered provider levers mapped into native knobs |
There is no cost_authorization_token — no-bypass is a convention. The vendored
skill's preflight redirect note points back here; nothing mints or checks a token.
Output-directory contract. The vendored skill writes raw provider assets under
media/<instance>/assets/ only. It does not write optimized-prompt.md,
learning-log.md, _styles/, or _identities/ — those durable handles are the
orchestrator's exclusive writes.
Fields captured back: job_id / session_id / asset_id(s); asset_paths;
provider · transport · billing_pool · levers_applied.
Failure handling. If the vendored run fails (auth lost, provider error, partial
output): record the failure + error class in the learning log (a learning signal), do
not write success artifacts, leave any partial files under assets/ flagged as
partial, and surface the failure with next-step options (retry gets a fresh pre-spend
confirmation; switch route per never-silent). No autonomous retry that re-spends.
These are orchestrator-level, provider-neutral capabilities (capture-once / reuse-many) — the provider-neutral abstraction over the per-provider levers. They MUST stay here, never owned by a vendored skill.
media/_styles/<style_id>/, never under brand/<brand>/. Operations: derive
(from brand + design), elicit (short interview), promote (fold a winning run in),
extract (reverse-capture from approved reference images — no credit spend). Concept +
capture detail: knowledge/style-pack-schema.md; template templates/style-pack.md.
For extract, confirm every reference image is approved and any identifiable person has
consent; refuse protected-IP / living-artist / unconsented-likeness imitation; capture
observable visual traits, not proprietary identity; cite provenance (P7).soul-id) or a
HeyGen avatar/voice identity. Home: media/_identities/<id>.md with a consent note.
Requires the user's own photos or documented consent.media/ zone, project-workspace-contract@2)The orchestrator writes the slim durable handle set; vendored skills write only raw
assets under assets/. Rationale inline (so the contract's value is legible, not
assumed): the only hard requirement is manifest-first discoverability — other skills,
future sessions, and the user must be able to find prior work, so exactly one durable
per-instance handle is indexed in MANIFEST.md (the learning log). Beyond that, each kept
artifact earns its place: the learning log is where learning compounds run over run
(R26/P15), so it doubles as the manifest handle; style-packs and identities are
reusable levers (the abstraction over the per-provider levers); the optimized prompt is
a value-add prompt-optimization surface (it shows the orchestrator's prompt-craft). The cut
artifacts were bureaucracy: the generation-log existed largely for cost reconciliation,
whose value died with cost-tracking; the per-asset .meta.yaml sidecars were
bureaucratic provenance for binaries nobody indexes individually. Both are cut.
The type: column is the frontmatter type: value each artifact carries — always the
media/<kind> form (see the frontmatter note below). Raw rendered assets are binaries and
carry no Markdown frontmatter, so they have no type:.
| Artifact | Path | type: (frontmatter) | Notes |
|---|---|---|---|
| Optimized prompt | media/<instance>/optimized-prompt.md | media/prompt-brief | what is going out + why; carries objective, channel, levers_applied, resolved provider/transport, billing_pool, format, source assets, rights/consent. Template templates/optimized-prompt.md. |
| Learning log | media/<instance>/learning-log.md | media/learning-log | the self-improvement engine AND the manifest-first handle. What prompt/lever combos worked, practical cost observations, reusable patterns. Template templates/learning-log.md. |
| Rendered assets | media/<instance>/assets/… | (raw binaries; no frontmatter) | binaries NOT individually indexed — the learning log is the instance handle. No per-asset sidecars. |
| Style pack | media/_styles/<style_id>/style-pack.md | media/style-pack | plugin-scoped reusable lever; never under brand/<brand>/. |
| Identity ref | media/_identities/<id>.md | media/identity-ref | plugin-scoped reusable lever; source-photo consent note. |
| Motion brief | media/motion/briefs/<slug>.md | media/motion-brief | per-instance handle for a remotion-motion animation — recipe, library, composition id, format, brand grounding, storyboard. Orchestrator-owned durable write, indexed in MANIFEST.md. Lives in the media/motion/ Remotion app (composition source + public/ versioned; node_modules/ + out/ gitignored; the rendered MP4 is a regenerable, surfaced deliverable). Template skills/remotion-motion/templates/motion-brief.md. |
Normalization fields are recorded on the optimized-prompt surface (and what-worked
signals on the learning log) — not a separate generation log: provider:
(higgsfield_cli / heygen_mcp / heygen_cli); billing_pool: (higgsfield_credits /
heygen_plan_credits / heygen_api_balance); transport: (HeyGen only: mcp or cli);
levers_applied: (the resolved provider levers + the style-pack/identity they map from);
cost_estimate: (plain context only when the provider exposes one; otherwise omitted —
there is no cost_actual reconciliation field).
Every Markdown output carries the ecosystem minimum frontmatter (id, title, type,
status, scope, brand, updated, produced_by, references). Use type: media/<kind>
(the type: column above gives the exact value per artifact) so artifact-reviewer derives
generated_media_reviewer when no explicit role is passed.
The orchestrator owns initial status: draft writes and may propose draft → review; it
never writes status: greenlit, published, or archived without explicit user
direction (P8/R25). When writing a MANIFEST.md entry, translate the artifact's internal
status: to the manifest-entry routing vocabulary (draft|review|approved|superseded) per
the contract §2 table. Contract child copy: references/project-workspace-contract-v2.md.
Operate in the user's working language. Elicitation questions and briefs follow the
conversation language (or MANIFEST.md output_language:). Prompts sent to image/video
models are most reliable in English — translate the user's intent into an English
optimized prompt while preserving native terms that carry meaning (brand names, culturally
specific concepts). Show the user the optimized prompt so they confirm intent survived
translation. For presenter videos, the viewer-facing script matches the user's requested
video language; English technical directives (style blocks, motion, provider flags) guide the
renderer.
higgsfield / heygen CLI flags, MCP tool names, culturally specific concepts.ue/oe/ss).kind:/status: enum values stay English (downstream tooling
interoperability); free-text frontmatter values may follow the content language.Web design systems/tokens → site-designer. Brand canonicals → brand-dna.
Page/CTA/offer strategy → conversion-designer/funnel-strategist. Template-mediated layout
(slides, carousels, one-pagers) → the visual-asset bridge. Copywriting →
content-creation-framework. Publishing → the publish routine (R64). Review judgment →
artifact-reviewer (generated_media_reviewer briefing, R52). Raw provider execution → the
vendored provider skills inside this plugin (not advertised directly).
Standard runs end normally — surface the artifact + the next step, nothing more. But this skill sits on a deviation-rich surface (prompt iteration, lever choice, routing, per-model conventions, cost variance), so when a run reveals something the skill itself should learn, say so in one line and offer to fold it back. Raise these only on a genuine deviation — not as per-run ceremony:
knowledge/ convention, may be wrong for this brand.prompt-craft.md per-model convention produced an off-brand result → flag the
convention for revision.style
capture should have caught that descriptor by default.knowledge/lever-map.md.cost_estimate diverged materially from observed spend → tighten the
guidance or warn earlier.references/provider-routing.md — intent→provider routing table, selection steps, billing /
credential posture.references/heygen-provider.md — HeyGen MCP-first→CLI ladder, capability map, lever
discovery, style mapping, source-input quality.references/higgsfield-provider.md — Higgsfield CLI surface, levers, native style mapping.references/onboarding.md — first-time provider setup walkthrough (plain language).references/project-workspace-contract-v2.md — the workspace/zone contract (child copy).knowledge/lever-map.md — the per-provider lever map; discover→apply; style-packs/identities
as the abstraction over levers.knowledge/spend-safety.md — the two lightweight conversational checks.knowledge/prompt-craft.md — per-model prompt optimization + cinematography vocabulary (R32
inputs to judgment).knowledge/format-specs.md — platform aspect/duration/safe-zone specs + AIDA timing.knowledge/content-shapes.md — script formulas, WPM pacing, scene/transition tables, ad
beat structure.knowledge/style-pack-schema.md — style-pack concept + capture operations.knowledge/shot-planning.md — continuity rules (180°, match-on-action, eyeline, screen
direction) + shot-list→generate→stitch.templates/ — optimized-prompt.md, learning-log.md, style-pack.md, identity-ref.md.npx claudepluginhub cmgramse/skill-development --plugin creative-media-generationGenerates brand assets: logos (55+ styles, Gemini AI), CIP mockups, HTML slides (Chart.js), banners (22 styles), SVG icons (15 styles), and social media photos. Routes to sub-skills for design tokens and UI styling.