From aside
Transcribes audio sessions from aside recorder or files, aligns transcripts with real-time memos, and distills into structured Obsidian vault notes using Enzyme. For processing calls, lectures, or interviews.
npx claudepluginhub jshph/aside --plugin asideThis skill is limited to using the following tools:
Take an aside session end-to-end: transcribe audio, align the transcript with the user's real-time memo, distill into a structured vault note connected to existing thinking via Enzyme.
Processes audio recordings, transcripts, podcasts, lectures into structured Obsidian notes with action items, decisions, glossary. Runs intake interview; suggests agent chaining.
Processes raw meeting transcripts into structured Obsidian notes with YAML frontmatter, action items, summary, and formatted transcript. Useful for Granola transcripts or direct pastes.
Converts raw meeting transcript .txt files into structured .md notes with metadata, TL;DR, key topics, action items, and quotes. Useful for turning transcripts into formatted documentation.
Share bugs, ideas, or general feedback.
Take an aside session end-to-end: transcribe audio, align the transcript with the user's real-time memo, distill into a structured vault note connected to existing thinking via Enzyme.
Environment: $OBSIDIAN_VAULT refers to the Obsidian vault root (the additional working directory configured for this project).
fluidaudiocli binary in $PATH — built from FluidInference/FluidAudio (Parakeet TDT v3 via CoreML)pip install diarize — for mono/pre-recorded audio diarization (WeSpeaker + Silero VAD, no HF gating).aside/<session>_aligned.md) — interleaved transcript + memo on a shared timelineIf --align-only is passed, only the aligned timeline (step 1) is produced.
$ARGUMENTS format: <session-name> [--align-only] [--audio <file>] [--num-speakers N]
my-call). Used to find/create:
<session-name>.md (in the aside working directory).aside/<session-name>_seg*.wav (aside recorder) or --audio file.aside/<session-name>.meta.json (for segment offsets and durations)Before transcribing, the skill should infer the transcription mode and confirm with the user:
Read the memo — look for cues about the recording type:
.aside/<name>_seg*.wav) → stereo (mode is automatic)Check the audio — stereo vs mono:
Ask the user (for mono audio only):
How should I transcribe this?
- Single speaker — lecture, voice memo, sermon (fastest)
- Two speakers — conversation, interview, phone call
- Multiple speakers — meeting, group discussion (specify count)
Skip asking if --num-speakers was provided or if cues are unambiguous.
$ARGUMENTS = "standup"
-> session: "standup"
-> memo: "standup.md"
-> audio: ".aside/standup_seg*.wav" (stereo, auto)
-> output: ".aside/standup_aligned.md"
$ARGUMENTS = "standup --align-only"
-> session: "standup"
-> output: ".aside/standup_aligned.md"
-> STOP after Phase I
$ARGUMENTS = "coffee-chat --audio ~/Downloads/recording.m4a"
-> session: "coffee-chat"
-> audio: ~/Downloads/recording.m4a
-> infer mode from memo + ask user
$ARGUMENTS = "coffee-chat --audio ~/Downloads/recording.m4a --num-speakers 3"
-> session: "coffee-chat"
-> audio: ~/Downloads/recording.m4a, 3 speakers (no need to ask)
<session-name>.md — contains lines like:
[00:05] discussing API redesign
[01:30 ~02:15] revisited auth approach — decided on JWT
[05:00] action item: draft RFC by Friday
.aside/<session-name>.meta.json for segment info (segment_index, wav_path, offset_ms, duration_secs) and vault_note_path (if the session was published to the vault on quit)..aside/<session-name>_seg*.wavIf the memo file doesn't exist, ask the user for the correct session name.
Run transcription using the unified CLI:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/aside/scripts/aside.py transcribe "<audio-file>" --output "<output-path>" [--num-speakers N]
Modes (auto-detected from input format + --num-speakers):
| Input | --num-speakers | Behavior |
|---|---|---|
| Stereo WAV | (ignored) | Splits channels. Ch0 = mic, Ch1 = system audio. |
| Mono/any format | 1 (default) | Plain transcription, single channel. |
| Mono/any format | 2+ | Diarize (WeSpeaker + spectral clustering), then transcribe. |
For aside recorder sessions (stereo segments):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/aside/scripts/aside.py transcribe ".aside/<session-name>_seg<N>.wav" --output "/tmp/<session-name>_seg<N>_transcript.json"
When there are multiple segments, adjust transcript timestamps by each segment's offset_ms from meta so they align to the session's global timeline.
For mono recordings (voice memos, external audio):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/aside/scripts/aside.py transcribe "<audio-file>" --output ".aside/<session-name>_transcript.json" --num-speakers 2
Accepts any ffmpeg-readable format (m4a, wav, mp3, etc.). With --num-speakers > 1:
SPEAKER_00, SPEAKER_01, etc.Speaker identification: When diarization is used, the skill should ask the user to identify which speaker is which during distillation (Step 4). Present a short sample from each channel and ask the user to label them.
Run:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/aside/scripts/aside.py align \
--memo "<session-name>.md" \
--transcripts /tmp/<session-name>_seg0_transcript.json [seg1...] \
--meta ".aside/<session-name>.meta.json" \
--output ".aside/<session-name>_aligned.md"
This produces the aligned timeline. In the output, ch0 = mic (the local user), ch1 = system audio (the remote participant). Report to the user:
Aligned memo lines with transcript entries. Output:
.aside/<session-name>_aligned.md
If --align-only was passed, stop here.
The memo is the user's real-time attention signal — what they found important enough to write down during the conversation, not in hindsight. Use it to drive analysis priority.
Extract topics from memo lines first. Each memo line marks a moment the user chose to note. Classify each line:
Edited memos signal reconsideration. A memo line followed by *[edited at MM:SS]* means the user came back to revise it at that later timestamp. This indicates the topic was important enough to revisit. Weight these higher.
Extract topics from un-noted transcript. Scan the transcript for significant topics that the user did not memo. These are secondary — the user may have chosen not to note them for a reason, or they may have been too absorbed to write.
Build a prioritized topic list. Memo-marked topics first (ordered by classification weight: decisions > action items > insights > tensions > questions > observations), then significant un-noted topics.
Connect the conversation to existing vault thinking.
Run mcp__enzyme__start_exploring_vault first. This returns the slate — trending entities with catalysts that represent where the vault has already found language for things. Use the slate to calibrate search queries: if the transcript discusses "knowledge management tools" but the vault uses "pkm" or "tool-thinking", reach for the vault's language.
Use two search strategies:
Structured search (Grep) — for concrete anchors that exist verbatim in the vault:
[[Person Name]]#pkm, #ai-uxRun Grep for each concrete anchor. Prioritize anchors that appear near memo-marked topics.
Semantic search — for themes and concepts without a concrete anchor:
Good queries (drawn from specific themes, calibrated to vault language):
Bad queries (generic):
For each query, run mcp__enzyme__semantic_search with result_limit: 5.
After both structured and semantic results come back:
[[Person Name]]) that appearSelect and load template from $CLAUDE_PLUGIN_ROOT/skills/aside/templates/:
1on1-idea-exchange.md — default for most 1:1 conversations (idea exchange, catch-ups, brainstorms)discovery-call.md — client/prospect conversations focused on needs, fit, and next stepsgroup-conversation.md — 3+ participants where tracking who thinks what matterstalk-reflection.md — sermons, lectures, talks, or any session where the user is listening and reflecting, not conversing. The memo captures their thinking in response to a speaker.Choose based on the memo and transcript content. If unclear, default to 1on1-idea-exchange.
Generate draft following these principles:
Topic ordering: Use the memo-weighted priority from Step 4. Decisions and action items surface first, then insights and tensions, then un-noted topics. This reflects what the user actually cared about during the conversation.
Content principles:
[[wikilink]] citations where connections existpeople: field using [[Name]] formatWriting style:
Citation integration:
as explored in [[note title]] or connects to [[note title]]![[file#^block-id]] only when the source has explicit block IDs and the quote is concise and directly relevantPresent the complete draft to the user. Ask:
Apply revisions if requested. Iterate until the user is satisfied.
Once approved:
vault_note_path exists in session metadata (from Step 1): the note was already created on session quit. Read the existing vault note, replace everything after frontmatter with the approved distilled content, and update frontmatter tags/people fields from Enzyme results.vault_note_path is absent: fall back to creating the note via ./scripts/new-note.sh (run from $OBSIDIAN_VAULT/), then populate with the approved content using the Edit tool.[timestamp] chat with [person] about [topic].mdmv "$OBSIDIAN_VAULT/inbox/[old-filename].md" "$OBSIDIAN_VAULT/inbox/[old-filename-prefix] [descriptive name].md"
vault_note_path in the session metadata:
.aside/<session-name>.meta.json, set vault_note_path to the new path, write back.Many transcripts come from speech-to-text and contain fragmented, garbled text. When you encounter this:
[unclear] rather than guessing/aside standup
# Full pipeline: transcribe stereo audio, align, distill into vault note
/aside standup --align-only
# Only produce the aligned timeline, skip distillation
/aside coffee-chat --diarize ~/Downloads/recording.m4a
# Diarize mono audio (2 speakers default), then distill
/aside group-call --diarize ~/Downloads/meeting.wav --num-speakers 4
# Diarize with 4 expected speakers