Help us improve
Share bugs, ideas, or general feedback.
From phantom
Pre-mix audio analysis and problem detection for audio engineering. Runs Phantom MCP diagnostic tools on stems, catalogs issues by severity (dealbreaker/significant/moderate/minor), identifies frequency masking between stems, and produces a structured mix brief. Use this skill whenever the user wants to analyze audio stems or files before mixing, diagnose audio problems (phase issues, clipping, noise, hum, mud, harshness), assess recording quality, prepare a mix session overview, check if a mix is ready for mastering, or investigate why something "sounds wrong." Also use when the user provides WAV file paths and asks for analysis, quality checks, or problem identification -- even if they don't explicitly mention "diagnostics."
npx claudepluginhub fadelabs/phantom --plugin phantomHow this skill is triggered — by the user, by Claude, or both
Slash command
/phantom:audio-diagnosticianThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Workflow:** diagnostician -> session-architect -> mix-engineer -> effects-engineer -> mastering-engineer. Always analyze first.
Professional mixing methodology for audio engineering. Guides through pre-mix analysis, phase checking, gain staging, EQ decisions, compression selection, spatial processing, and automation. Encodes the decision-making process of a senior mix engineer backed by Phantom MCP measurement tools. Use this skill whenever the user wants to mix stems or tracks, balance a mix, make EQ or compression decisions, set up signal chains, choose compressor types, solve frequency conflicts between instruments, set up spatial processing (reverb, delay, panning), automate volume or effects, or compare their mix against a reference. Also use when the user mentions muddy mixes, harsh frequencies, buried vocals, kick/bass conflicts, or any mixing problem -- even if they don't say "mix" explicitly.
Polishes raw Suno audio by processing per-stem WAVs (vocals, drums, bass, etc.) with cleanup, EQ, and compression, then remixing into a polished stereo WAV for mastering.
Provides FFmpeg commands for audio encoding (AAC/MP3/Opus/FLAC), EBU R128 loudness normalization, extraction from video, format conversion, volume/EQ adjustments, channel ops, and podcast/broadcast chains.
Share bugs, ideas, or general feedback.
Workflow: diagnostician -> session-architect -> mix-engineer -> effects-engineer -> mastering-engineer. Always analyze first.
Collect all stem file paths (WAV, mono or stereo). If given a directory, list WAV files.
If genre or reference track is unknown, ask now -- every downstream skill needs genre context.
Alternate takes: If filenames suggest multiple takes, ask which to use. Exclude unused takes -- they corrupt masking analysis.
multi_stem_masking accepts a maximum of 20 stems. If the session has more, group stems by instrument family and run multiple passes.
Call batch_diagnostic with every stem path in one shot. This runs all six analysis types (spectrum, loudness, dynamics, stereo, phase, problems) on every stem simultaneously. One call, complete picture.
Work through in order -- each can stop the session:
Sample rate mismatch. Different rates = dealbreaker. Stop, flag which stems mismatch and what to convert to.
Bit depth and dynamic range. Note bit depth of each stem. If stems mix 16-bit, 24-bit, and 32-bit float, flag it. The session-architect needs the highest bit depth to set the project. Bit depth determines theoretical dynamic range: 16-bit = 96 dB, 24-bit = 144 dB, 32-bit float = ~1500 dB. Downconverting 32-bit float to 16-bit without dither introduces quantization noise.
Silent or near-silent stems. Check the integrated LUFS of every stem. If a stem reads below -70 LUFS or the loudness analysis returns None (near-silent), flag it immediately -- it's likely an export mistake (empty track, muted channel accidentally bounced) or an unintended room tone track. Ask the user before including it in further analysis.
Duration alignment. If stems differ by more than a few seconds, note it -- may indicate different export sections or excessive head/tail silence.
Loudness level spread. Compare integrated LUFS across all stems. If the loudest and quietest differ by more than 20 LU (excluding intentionally quiet elements like room mics or pads), flag a gain staging problem. Stems tracked at wildly different levels suggest a multi-session recording or inconsistent preamp gain.
Pre-printed effects detection. If a stem shows very low crest factor (<6 dB) on a source that should be dynamic (acoustic guitar, vocals), or if the spectral balance shows steep filter slopes that look like intentional EQ rather than mic character, flag it as "possibly pre-printed effects." Pre-printed reverb, compression, or EQ limits mixing options. Ask the user if effects were intentionally printed.
Timing drift between stems. If stems were recorded in separate sessions, check for timing drift: compare transient alignment at the start and end of the files. Even 5-10ms of drift over a 4-minute song creates audible flamming and phase smear.
Phase problems require specific tools (polarity flip, time alignment, phase rotation plugins like Waves InPhase or Sound Radix Auto-Align). Standard EQ/compression worsens them. Find them now.
Run analyze_phase on every stereo stem. Flag any polarity_inverted: true immediately.
For multi-mic recordings (drums, guitar cabs), run compare_phase between close and room/overhead mics:
Sound travels ~1ms/foot. A room mic 10 feet away has 10ms delay causing comb filtering.
Interpreting phase results (instrument-aware):
compare_phase between the problem stem and everything it's layered with.Mid-side encoded files. Very low/negative correlation across all bands with one channel sounding "hollow" = possibly M/S encoded. Confirm with analyze_stereo (distinctive width/balance patterns). Ask user before proceeding.
Review detect_problems results (11 detectors: clipping, DC offset, ISPs, noise floor, SNR, hum, sibilance, mud, harshness, resonant peaks, lossy artifacts). Four severity tiers -- dealbreakers first.
Dealbreaker -- fix before mixing, no exceptions:
Significant -- address early, before you start building the mix:
floor + 10*log10(N) dB. Examples: 4@-55 = -49 dBFS; 6@-58 = -50.2 dBFS; 8@-58 = -49 dBFS; 16@-60 = -48 dBFS. Calculate and report the actual summed floor.Moderate -- address during mixing:
Minor -- optional: slight spectral imbalances, minor noise bursts, low-level clicks.
Run multi_stem_masking with all stems. Key conflict pairs:
High masking at 200-500 Hz = mud. EQ prescription: Cut 2-4 dB at Q 1.5-3 on the less important stem in each pair. Boost the other 1-2 dB only if needed.
Masking vs bleed (live recordings). Bleed causes overreported masking. Ask about recording setup, discount severity in bleed frequencies, focus on each instrument's direct signal energy.
When masking is actually an arrangement problem. If 4+ stems show high masking in the same band, EQ carving won't fix it. Flag as an arrangement issue -- recommend thinning simultaneous elements.
If most stems peak above -6 dBTP, flag that gain staging is critical. Note aggregate headroom in the mix brief.
If genre is known: list_profiles -> load_profile -> compare_to_profile. Available: ambient, edm, electronic, hip-hop, lo-fi, metal, pop, rock, rock-metal. If user has a reference WAV: compare_to_reference (spectrum, loudness, dynamics, stereo width deviations).
Genre context matters. Lo-fi at -55 dBFS noise with rolled-off highs = aesthetic, not a problem. Always interpret through genre intent.
Fill in mix-brief-template.md. The brief must include:
Processing order checklist (include in brief with stem-specific details):
sox in.wav -r 48000 out.wav)compare_phase)These thresholds drive your triage decisions. For the complete measurement-to-action translation tables, see measurement-actions.md.
| Measurement | Condition | What it means |
|---|---|---|
| Crest factor | > 15 dB | Highly dynamic, likely uncompressed (note: source-dependent -- drums and transient material naturally sit higher; a legato string section may be 6-8 dB without any compression) |
| Crest factor | 8-12 dB | Well-recorded, normal range for mixing |
| Crest factor | < 6 dB | Over-compressed -- do not add more compression |
| True peak | > 0 dBTP | Hard clipping -- dealbreaker |
| True peak | > -1 dBTP | Exceeds EBU R128 / streaming limits -- significant |
| True peak | > -3 dBTP | Tight headroom for mastering |
| Phase correlation | > +0.8 | Excellent mono compatibility |
| Phase correlation | +0.5 to +0.8 | Good, normal stereo content |
| Phase correlation | +0.3 to +0.5 | Instrument-dependent -- problem for bass/kick/vocal; normal for overheads/rooms/stereo keys |
| Phase correlation | < +0.3 | Problem for any source |
| Phase correlation | sustained negative | Possible polarity inversion (near -1 = definite) |
| SNR | > 70 dB | Professional recording quality |
| SNR | 60-70 dB | Good -- acceptable, gate during silence if needed |
| SNR | 50-60 dB | Acceptable, may need noise treatment |
| SNR | 40-50 dB | Poor -- dedicated noise reduction required |
| SNR | < 40 dB | Unacceptable -- re-record if possible |
| Noise floor | below -70 dBFS | Clean, no treatment needed |
| Noise floor | -60 to -50 dBFS | Audible in quiet passages -- gate or treat quiet sections |
| Noise floor | above -50 dBFS | Significant noise, address before mixing |
| Masking severity | "high" at 200-500 Hz | Mud -- complementary EQ needed between conflicting stems |
| Spectral centroid | below 1 kHz (non-bass) | Dark/muddy recording, may need high-end lift |
| Spectral centroid | above 5 kHz | Bright/thin recording, check for missing body |
| LRA | > 15 LU | Very dynamic -- classical, ambient, some jazz |
| LRA | 7-12 LU | Normal range for pop/rock |
| LRA | < 5 LU | Heavily compressed -- expected for EDM/hip-hop |
Spectral centroid reference ranges by instrument:
| Instrument | Expected range | Edge cases |
|---|---|---|
| Vocals | 1-3 kHz | Falsetto/whisper can push to 4+ kHz |
| Bass guitar (fingerstyle) | 200-600 Hz | Normal -- low centroid expected |
| Bass guitar (pick/aggressive) | 400-1.2 kHz | Higher centroid is normal, not a mislabel |
| Distorted/overdriven bass | 800-2 kHz | Harmonics shift centroid up -- expected with distortion |
| Acoustic guitar | 1-3 kHz | Nylon-string sits lower (800 Hz-2 kHz) |
| Electric guitar (clean) | 1-3 kHz | |
| Electric guitar (distorted) | 2-5 kHz | Distortion adds upper harmonics |
| Drums (overhead) | 3-6 kHz | |
| Drums (kick) | 60-150 Hz | A "kick" at 2 kHz = mislabeled stem |
| Synth pad | 500 Hz-3 kHz | Extremely variable by design |
| Full mix | 2-4 kHz | Genre-dependent: metal 3-5 kHz, lo-fi 1-2 kHz |
If a stem's centroid falls far outside its expected range, investigate: possible naming mismatch, heavy filtering, or unusual recording technique.
Mono in stereo containers (correlation=1.0, width=0) is normal. Never conclude stems are AI-separated or pre-processed unless the user says so. Note unusual measurements neutrally; don't speculate on provenance.
Only when user confirms. Expect: bleed between stems (higher masking than real recordings), phase anomalies from the algorithm, possible false stereo. Check phase coherence between all stems.
Run full_diagnostic on the mix file, then compare_to_profile for the genre. Check these criteria:
| # | Check | Pass | Borderline | Fail |
|---|---|---|---|---|
| 1 | Corrective EQ needed | < 2 dB anywhere | 2-3 dB in 1-2 bands | > 3 dB or multiple bands |
| 2 | Fundamental balance | Vocals, bass, drums sit naturally | One element slightly off | Vocals buried, bass overwhelming, or drums dominating |
| 3 | Phase (mix bus correlation) | > +0.5 | +0.3 to +0.5 | < +0.3 |
| 4 | True peak | < -1 dBTP | -1 to 0 dBTP (tight but workable) | > 0 dBTP (baked clipping) |
| 5 | Noise floor | < -60 dBFS | -60 to -50 dBFS | > -50 dBFS |
| 6 | Dynamic range (LRA) | Within genre norms +/- 3 LU | 3-5 LU outside norms | > 5 LU outside genre norms |
Borderline handling: If all checks pass but 1-2 are borderline, the mix can go to mastering with a note. If 3+ are borderline, recommend another mix pass. Any single fail = send back with specific fix instructions.
For a single stem with a specific complaint ("this vocal sounds weird"), run full_diagnostic on that one file. Don't overcomplicate it. But always check phase if the complaint involves how it sounds in context with other stems -- that's the phase cancellation signature.
If you notice a stem's spectral characteristics don't match its filename -- a "kick.wav" with a spectral centroid at 2 kHz, or a "bass.wav" with most energy above 1 kHz -- flag it. A mislabeled stem will silently corrupt every downstream decision: the masking map, bus routing, and processing choices. When in doubt, note the anomaly and ask the user to confirm.
If a stem shows perfectly regular transient peaks at a consistent frequency (often around 1 kHz or 2-3 kHz for woodblock samples), it may contain click track bleed or an accidentally exported metronome. Flag it for the user -- a click track in the masking analysis would produce misleading results.