Skill

audio-mixing

Mix multi-voice TTS speech with a brand music kit into the final podcast episode file. Invoked by the podcast-studio orchestrator after the TTS step. Supports kit mode (contract-driven music kit with mood-matched transitions, bookend ducking, optional voice-over-opening and backchannel overlays) and a speech-only legacy fallback when no kit is supplied. Writes {workspace}/audio/final/episode.mp3.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/podcast-creator:audio-mixing

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Combine TTS speech and brand music into the final podcast episode file. This

Supporting Files

MANIFEST.yamlREADME.mdscripts/mix_audio.py

SKILL.md

159 lines · ~2.1k tokens

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitJun 26, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Audio Mixing (v2)

Combine TTS speech and brand music into the final podcast episode file. This producer skill is invoked by the podcast-studio orchestrator; it reads speech from the run's --workspace and the music kit from the --kit directory the orchestrator passes in. Supports two modes: kit mode (preferred, contract-driven) and legacy mode (speech-only / single-overlay fallback).

The kit directory is not this skill's concern to locate. The orchestrator resolves it from the credential store's music_kit_path or the active show profile's music_kit reference and passes it as --kit <dir>. The script never hardcodes a kit path. If no kit is supplied (or the supplied kit has no kit.yaml), the mixer runs speech-only via legacy mode — the orchestrator should mention that in the final message.

Interpreter (R70). mix_audio.py imports pyyaml + pydub, which live in the orchestrator's uv-managed venv — run it through the venv interpreter the orchestrator resolved in Step 0 ("$PODCAST_PY" <script>), not bare python3. The python3 … in the examples is shorthand; substitute the resolved venv interpreter. (pydub also needs ffmpeg on PATH for non-wav formats — that is a separate system binary, unrelated to the venv.) See podcast-studio SKILL.md Step 0.

Kit Mode (preferred)

Uses a structured music kit (contract C1) and a speech manifest (contract C4) to assemble episodes with mood-matched transitions and bookend ducking.

python3 "${CLAUDE_SKILL_DIR}/scripts/mix_audio.py" --workspace <run-dir> --kit <path-to-music-kit>

<path-to-music-kit> is supplied by the orchestrator (resolved from music_kit_path / the profile's music_kit). Do not hardcode it.

Arguments

Argument	Default	Description
`--workspace`	`workspace`	Root run directory (the orchestrator's `--workspace <run-dir>`)
`--kit`	`None`	Path to music-kit directory containing `kit.yaml` (supplied by orchestrator)

Required inputs

{workspace}/audio/speech/manifest.json — turn list with file, duration_ms, followed_by_transition per turn (contract C4).
{kit}/kit.yaml — kit descriptor with roles (opening, bed, transitions, outro) and mix config.
{kit}/clips/ — WAV clip files referenced in kit.yaml.

Assembly sequence

Opening clip plays at full length. (Replaced by overlap mode when opening_voice_overlap is set — see opening_voice_overlap below.)
Turn 1 is overlaid with the ducked bed clip; bed fades out over opening_bed.fade_out_ms after the turn ends. (Skipped in overlap mode — the fading opening tail serves as the musical bed.)
Subsequent turns are appended in order. Wherever a turn has followed_by_transition set, a mood-matched transition clip (from roles.transitions) is inserted with silence padding on both sides (pause_around_transition_ms). Unknown/null moods cycle round-robin through all available clips.
Outro is overlaid with outro_overlap_ms of overlap at the end; the overlap section is ducked to outro_duck_gain_db. The remainder of the outro appends after.
Global fade_ms.in / fade_ms.out applied to the finished timeline.

mix config keys (kit.yaml)

Key	Purpose
`opening_bed.gain_db`	Bed attenuation under turn 1
`opening_bed.fade_out_ms`	Bed fade-out duration after turn 1
`outro_overlap_ms`	How far the outro starts before the end of speech
`outro_duck_gain_db`	Outro gain during the overlap window
`pause_around_transition_ms`	Silence gap before and after each transition clip
`fade_ms.in` / `fade_ms.out`	Global episode fades
`opening_voice_overlap`	(optional) Voice-over-opening mode — see below
`backchannel_overlap`	(optional) Overlay backchannel affirmations onto the previous turn's tail — see below

backchannel_overlap (optional)

When this key is present, any turn whose manifest entry has "backchannel": true is overlaid onto the tail of the immediately preceding turn instead of being appended. If the turn also has "backchannel_at": NN (0–100), the overlay starts at that percentage into the preceding turn (mid-turn placement) rather than using lead_ms.

mix:
  backchannel_overlap:
    lead_ms: 1200     # ms back from the current timeline end (note: after turn 1 this includes the bed-fade tail when a bed is active)
    gain_db: -3       # attenuation applied to the backchannel segment

Behaviour:

lead_ms is clamped to len(previous_turn_audio) so it never starts before that turn began.
pos = max(0, len(timeline) - lead) — the bc is overlaid at that position.
If the bc spills past the current timeline end (pos + len(bc) > len(timeline)), the timeline is extended with silence so the tail is audible.
The speaking voice is never ducked or faded — the bc rides underneath it at gain_db reduction.
Guards — the overlay is skipped (bc appended normally) when any of the following hold:
- backchannel_overlap key is absent or null in kit.yaml (backward-compatible default).
- The turn's "backchannel" field is false or absent.
- The preceding turn had a followed_by_transition marker (a transition was just inserted; overlaying onto a pause+clip+pause block would be wrong).
- The backchannel turn's "speaker" equals the previous turn's "speaker" (a speaker never backchannels themselves).
The first turn (index 0) never overlays.
Outro anchoring is on the actual timeline end after all bc overlays; if the last turn is a fully-inside backchannel the outro may slightly overlap the previous speaker's tail — this is acceptable.

opening_voice_overlap (optional)

When this key is present the host's first turn enters mid-opening rather than after it:

mix:
  opening_voice_overlap:
    entry_ms: 13100      # voice enters this many ms into the opening
    duck_gain_db: -2     # opening remainder attenuated by this amount (~80% ≈ -2 dB)

Both entry_ms and duck_gain_db are required when opening_voice_overlap is present. Omitting either key raises a KeyError at mix time (by design — partial config is rejected loudly).

Behaviour:

opening[:entry_ms] plays alone at full volume ("head").
opening[entry_ms:] ("tail") receives duck_gain_db attenuation and a slow fade-out across its entire length (fade_out(len(tail))).
The tail is overlaid under turn 1, starting exactly at the voice-entry point. The block duration is max(len(turn_1), len(tail)), so whichever is longer wins the timeline.
The bed role is skipped in this mode — the opening tail already provides the musical bed, so the separate bed clip would be redundant.
entry_ms is clamped to [0, len(opening)]; values outside that range do not raise.
Absence of this key (or null) leaves behaviour identical to today (backward compatible).

Legacy Mode

Speech-only / single-overlay fallback (no --kit flag, or kit not found). Uses a single concatenated speech.wav with an optional background music overlay.

python3 "${CLAUDE_SKILL_DIR}/scripts/mix_audio.py" --workspace <run-dir>

What it does

Loads speech from {workspace}/audio/speech/speech.wav.
Adds 3 seconds of silence padding.
Loads background music from {workspace}/audio/music/background.mp3 (if present).
Applies -15 dB attenuation to music; overlays 15 s intro and 15 s outro sections.
Adds global fade-in (500 ms) / fade-out (2 s).
Exports as MP3.

If no background music exists, produces speech-only output with fades. This is the fallback path when the orchestrator supplies no --kit.

Output

File	Path	Format
MP3 (distribution)	`{workspace}/audio/final/episode.mp3`	MP3, 192 kbps

Dependencies

pydub
pyyaml
ffmpeg (system)

audio-mixing

Invocation

Context Preview

Supporting Files

SKILL.md

audio-mixing

Invocation

Context Preview

Supporting Files

SKILL.md

Audio Mixing (v2)

Kit Mode (preferred)

Arguments

Required inputs

Assembly sequence

mix config keys (kit.yaml)

backchannel_overlap (optional)

opening_voice_overlap (optional)

Legacy Mode

What it does

Output

Dependencies

Similar Skills

Audio Mixing (v2)

Kit Mode (preferred)

Arguments

Required inputs

Assembly sequence

mix config keys (kit.yaml)

backchannel_overlap (optional)

opening_voice_overlap (optional)

Legacy Mode

What it does

Output

Dependencies

Similar Skills