From podcast-creator
Mix multi-voice TTS speech with a brand music kit into the final podcast episode file. Invoked by the podcast-studio orchestrator after the TTS step. Supports kit mode (contract-driven music kit with mood-matched transitions, bookend ducking, optional voice-over-opening and backchannel overlays) and a speech-only legacy fallback when no kit is supplied. Writes {workspace}/audio/final/episode.mp3.
How this skill is triggered — by the user, by Claude, or both
Slash command
/podcast-creator:audio-mixingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Combine TTS speech and brand music into the final podcast episode file. This
Combine TTS speech and brand music into the final podcast episode file. This
producer skill is invoked by the podcast-studio orchestrator; it reads speech
from the run's --workspace and the music kit from the --kit directory the
orchestrator passes in. Supports two modes: kit mode (preferred,
contract-driven) and legacy mode (speech-only / single-overlay fallback).
The kit directory is not this skill's concern to locate. The orchestrator
resolves it from the credential store's music_kit_path or the active show
profile's music_kit reference and passes it as --kit <dir>. The script never
hardcodes a kit path. If no kit is supplied (or the supplied kit has no
kit.yaml), the mixer runs speech-only via legacy mode — the orchestrator
should mention that in the final message.
Interpreter (R70).
mix_audio.pyimportspyyaml+pydub, which live in the orchestrator's uv-managed venv — run it through the venv interpreter the orchestrator resolved in Step 0 ("$PODCAST_PY" <script>), not barepython3. Thepython3 …in the examples is shorthand; substitute the resolved venv interpreter. (pydubalso needsffmpegon PATH for non-wav formats — that is a separate system binary, unrelated to the venv.) See podcast-studio SKILL.md Step 0.
Uses a structured music kit (contract C1) and a speech manifest (contract C4) to assemble episodes with mood-matched transitions and bookend ducking.
python3 "${CLAUDE_SKILL_DIR}/scripts/mix_audio.py" --workspace <run-dir> --kit <path-to-music-kit>
<path-to-music-kit> is supplied by the orchestrator (resolved from
music_kit_path / the profile's music_kit). Do not hardcode it.
| Argument | Default | Description |
|---|---|---|
--workspace | workspace | Root run directory (the orchestrator's --workspace <run-dir>) |
--kit | None | Path to music-kit directory containing kit.yaml (supplied by orchestrator) |
{workspace}/audio/speech/manifest.json — turn list with file, duration_ms, followed_by_transition per turn (contract C4).{kit}/kit.yaml — kit descriptor with roles (opening, bed, transitions, outro) and mix config.{kit}/clips/ — WAV clip files referenced in kit.yaml.opening_voice_overlap is set — see opening_voice_overlap below.)opening_bed.fade_out_ms after the turn ends. (Skipped in overlap mode — the fading opening tail serves as the musical bed.)followed_by_transition set, a mood-matched transition clip (from roles.transitions) is inserted with silence padding on both sides (pause_around_transition_ms). Unknown/null moods cycle round-robin through all available clips.outro_overlap_ms of overlap at the end; the overlap section is ducked to outro_duck_gain_db. The remainder of the outro appends after.fade_ms.in / fade_ms.out applied to the finished timeline.| Key | Purpose |
|---|---|
opening_bed.gain_db | Bed attenuation under turn 1 |
opening_bed.fade_out_ms | Bed fade-out duration after turn 1 |
outro_overlap_ms | How far the outro starts before the end of speech |
outro_duck_gain_db | Outro gain during the overlap window |
pause_around_transition_ms | Silence gap before and after each transition clip |
fade_ms.in / fade_ms.out | Global episode fades |
opening_voice_overlap | (optional) Voice-over-opening mode — see below |
backchannel_overlap | (optional) Overlay backchannel affirmations onto the previous turn's tail — see below |
When this key is present, any turn whose manifest entry has "backchannel": true is overlaid onto the tail of the immediately preceding turn instead of being appended. If the turn also has "backchannel_at": NN (0–100), the overlay starts at that percentage into the preceding turn (mid-turn placement) rather than using lead_ms.
mix:
backchannel_overlap:
lead_ms: 1200 # ms back from the current timeline end (note: after turn 1 this includes the bed-fade tail when a bed is active)
gain_db: -3 # attenuation applied to the backchannel segment
Behaviour:
lead_ms is clamped to len(previous_turn_audio) so it never starts before that turn began.pos = max(0, len(timeline) - lead) — the bc is overlaid at that position.pos + len(bc) > len(timeline)), the timeline is extended with silence so the tail is audible.gain_db reduction.backchannel_overlap key is absent or null in kit.yaml (backward-compatible default)."backchannel" field is false or absent.followed_by_transition marker (a transition was just inserted; overlaying onto a pause+clip+pause block would be wrong)."speaker" equals the previous turn's "speaker" (a speaker never backchannels themselves).When this key is present the host's first turn enters mid-opening rather than after it:
mix:
opening_voice_overlap:
entry_ms: 13100 # voice enters this many ms into the opening
duck_gain_db: -2 # opening remainder attenuated by this amount (~80% ≈ -2 dB)
Both
entry_msandduck_gain_dbare required whenopening_voice_overlapis present. Omitting either key raises aKeyErrorat mix time (by design — partial config is rejected loudly).
Behaviour:
opening[:entry_ms] plays alone at full volume ("head").opening[entry_ms:] ("tail") receives duck_gain_db attenuation and a slow fade-out across its entire length (fade_out(len(tail))).max(len(turn_1), len(tail)), so whichever is longer wins the timeline.bed role is skipped in this mode — the opening tail already provides the musical bed, so the separate bed clip would be redundant.entry_ms is clamped to [0, len(opening)]; values outside that range do not raise.null) leaves behaviour identical to today (backward compatible).Speech-only / single-overlay fallback (no --kit flag, or kit not found). Uses a single concatenated speech.wav with an optional background music overlay.
python3 "${CLAUDE_SKILL_DIR}/scripts/mix_audio.py" --workspace <run-dir>
{workspace}/audio/speech/speech.wav.{workspace}/audio/music/background.mp3 (if present).If no background music exists, produces speech-only output with fades. This
is the fallback path when the orchestrator supplies no --kit.
| File | Path | Format |
|---|---|---|
| MP3 (distribution) | {workspace}/audio/final/episode.mp3 | MP3, 192 kbps |
pydubpyyamlffmpeg (system)npx claudepluginhub cmgramse/skill-development --plugin podcast-creatorBuilds a throwaway prototype to answer a design question about UI appearance or state/logic behavior. Guides you through two branches: interactive terminal app for logic validation, or multiple UI variations for visual exploration.