Skill

aside

Transcribes audio sessions from aside recorder or files, aligns transcripts with real-time memos, and distills into structured Obsidian vault notes using Enzyme. For processing calls, lectures, or interviews.

Bash

Python

Markdown

automation

documentation

npx claudepluginhub jshph/aside --plugin aside

Tool Access

This skill is limited to using the following tools:

BashReadWriteEditGlobGrepAskUserQuestionmcp__enzyme__semantic_searchmcp__enzyme__start_exploring_vault

Preview

Take an aside session end-to-end: transcribe audio, align the transcript with the user's real-time memo, distill into a structured vault note connected to existing thinking via Enzyme.

Supporting Assets

scripts/aside.pytemplates/1on1-idea-exchange.mdtemplates/discovery-call.mdtemplates/group-conversation.md

SKILL.md

Similar Skills

transcribe

2.7k

Processes audio recordings, transcripts, podcasts, lectures into structured Obsidian notes with action items, decisions, glossary. Runs intake interview; suggests agent chaining.

my-brain-is-full-crew

process-meeting-transcript

Processes raw meeting transcripts into structured Obsidian notes with YAML frontmatter, action items, summary, and formatted transcript. Useful for Granola transcripts or direct pastes.

meeting-transcript

transcript-notes

Converts raw meeting transcript .txt files into structured .md notes with metadata, TL;DR, key topics, action items, and quotes. Useful for turning transcripts into formatted documentation.

7 tools

ravn-ai-toolkit

Stats

Stars20

Forks0

Last CommitApr 20, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Aside — Capture to Vault Note

Take an aside session end-to-end: transcribe audio, align the transcript with the user's real-time memo, distill into a structured vault note connected to existing thinking via Enzyme.

Environment: $OBSIDIAN_VAULT refers to the Obsidian vault root (the additional working directory configured for this project).

Prerequisites

fluidaudiocli binary in $PATH — built from FluidInference/FluidAudio (Parakeet TDT v3 via CoreML)
pip install diarize — for mono/pre-recorded audio diarization (WeSpeaker + Silero VAD, no HF gating)

What this produces

Aligned timeline (.aside/<session>_aligned.md) — interleaved transcript + memo on a shared timeline
Vault note — structured note written to the obsidian vault using a template, with Enzyme-sourced connections

If --align-only is passed, only the aligned timeline (step 1) is produced.

Arguments

$ARGUMENTS format: <session-name> [--align-only] [--audio <file>] [--num-speakers N]

session-name (required): The aside session name (e.g., my-call). Used to find/create:
- Memo: <session-name>.md (in the aside working directory)
- Audio: .aside/<session-name>_seg*.wav (aside recorder) or --audio file
- Metadata: .aside/<session-name>.meta.json (for segment offsets and durations)
--align-only (optional): Stop after producing the aligned timeline. Skip distillation.
--audio <file> (optional): Explicit audio file path (for non-aside recordings).
--num-speakers N (optional): Override speaker count for diarization. If not provided, the skill infers the mode.

Mode inference

Before transcribing, the skill should infer the transcription mode and confirm with the user:

Read the memo — look for cues about the recording type:
- Names/speakers mentioned → likely multi-speaker
- "call", "interview", "chat" in the session name → likely multi-speaker
- "lecture", "sermon", "talk", "notes" → likely single-speaker
- Aside recorder segments (.aside/<name>_seg*.wav) → stereo (mode is automatic)
Check the audio — stereo vs mono:
- Stereo → aside recorder output, automatic channel separation
- Mono → present options to user
Ask the user (for mono audio only):
How should I transcribe this?
1. Single speaker — lecture, voice memo, sermon (fastest)
2. Two speakers — conversation, interview, phone call
3. Multiple speakers — meeting, group discussion (specify count)
Skip asking if --num-speakers was provided or if cues are unambiguous.

Parsing $ARGUMENTS

$ARGUMENTS = "standup"
-> session: "standup"
-> memo: "standup.md"
-> audio: ".aside/standup_seg*.wav" (stereo, auto)
-> output: ".aside/standup_aligned.md"

$ARGUMENTS = "standup --align-only"
-> session: "standup"
-> output: ".aside/standup_aligned.md"
-> STOP after Phase I

$ARGUMENTS = "coffee-chat --audio ~/Downloads/recording.m4a"
-> session: "coffee-chat"
-> audio: ~/Downloads/recording.m4a
-> infer mode from memo + ask user

$ARGUMENTS = "coffee-chat --audio ~/Downloads/recording.m4a --num-speakers 3"
-> session: "coffee-chat"
-> audio: ~/Downloads/recording.m4a, 3 speakers (no need to ask)

Phase I — Alignment

Step 1: Locate session artifacts

Read the memo file <session-name>.md — contains lines like:

[00:05] discussing API redesign
[01:30 ~02:15] revisited auth approach — decided on JWT
[05:00] action item: draft RFC by Friday

Read .aside/<session-name>.meta.json for segment info (segment_index, wav_path, offset_ms, duration_secs) and vault_note_path (if the session was published to the vault on quit).
List WAV segments: .aside/<session-name>_seg*.wav

If the memo file doesn't exist, ask the user for the correct session name.

Step 2: Transcribe audio

Run transcription using the unified CLI:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/aside/scripts/aside.py transcribe "<audio-file>" --output "<output-path>" [--num-speakers N]

Modes (auto-detected from input format + --num-speakers):

Input	--num-speakers	Behavior
Stereo WAV	(ignored)	Splits channels. Ch0 = mic, Ch1 = system audio.
Mono/any format	1 (default)	Plain transcription, single channel.
Mono/any format	2+	Diarize (WeSpeaker + spectral clustering), then transcribe.

For aside recorder sessions (stereo segments):

python3 ${CLAUDE_PLUGIN_ROOT}/skills/aside/scripts/aside.py transcribe ".aside/<session-name>_seg<N>.wav" --output "/tmp/<session-name>_seg<N>_transcript.json"

When there are multiple segments, adjust transcript timestamps by each segment's offset_ms from meta so they align to the session's global timeline.

For mono recordings (voice memos, external audio):

python3 ${CLAUDE_PLUGIN_ROOT}/skills/aside/scripts/aside.py transcribe "<audio-file>" --output ".aside/<session-name>_transcript.json" --num-speakers 2

Accepts any ffmpeg-readable format (m4a, wav, mp3, etc.). With --num-speakers > 1:

Speaker separation uses diarization instead of stereo channels
Channels map to SPEAKER_00, SPEAKER_01, etc.
Long files are automatically chunked (30-min default) with cross-chunk speaker consistency
VAD filtering drops tokens outside speech regions

Speaker identification: When diarization is used, the skill should ask the user to identify which speaker is which during distillation (Step 4). Present a short sample from each channel and ask the user to label them.

Step 3: Align memo + transcript

Run:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/aside/scripts/aside.py align \
  --memo "<session-name>.md" \
  --transcripts /tmp/<session-name>_seg0_transcript.json [seg1...] \
  --meta ".aside/<session-name>.meta.json" \
  --output ".aside/<session-name>_aligned.md"

This produces the aligned timeline. In the output, ch0 = mic (the local user), ch1 = system audio (the remote participant). Report to the user:

Aligned memo lines with transcript entries. Output: .aside/<session-name>_aligned.md

If --align-only was passed, stop here.

Phase II — Distillation

Step 4: Memo-guided analysis

The memo is the user's real-time attention signal — what they found important enough to write down during the conversation, not in hindsight. Use it to drive analysis priority.

Extract topics from memo lines first. Each memo line marks a moment the user chose to note. Classify each line:
- Decision — a choice was made ("decided on JWT")
- Action item — a commitment or next step ("draft RFC by Friday")
- Insight — a realization or interesting framing ("the real bottleneck is onboarding, not retention")
- Tension — a disagreement or unresolved question ("revisited auth approach")
- Question — something to follow up on
- Observation — neutral notation of what's being discussed
Edited memos signal reconsideration. A memo line followed by *[edited at MM:SS]* means the user came back to revise it at that later timestamp. This indicates the topic was important enough to revisit. Weight these higher.
Extract topics from un-noted transcript. Scan the transcript for significant topics that the user did not memo. These are secondary — the user may have chosen not to note them for a reason, or they may have been too absorbed to write.
Build a prioritized topic list. Memo-marked topics first (ordered by classification weight: decisions > action items > insights > tensions > questions > observations), then significant un-noted topics.

Step 5: Enzyme vault search

Connect the conversation to existing vault thinking.

Phase A: Explore the vault

Run mcp__enzyme__start_exploring_vault first. This returns the slate — trending entities with catalysts that represent where the vault has already found language for things. Use the slate to calibrate search queries: if the transcript discusses "knowledge management tools" but the vault uses "pkm" or "tool-thinking", reach for the vault's language.

Phase B: Search for connections

Use two search strategies:

Structured search (Grep) — for concrete anchors that exist verbatim in the vault:

People mentioned: [[Person Name]]
Tags from the slate that match transcript topics: #pkm, #ai-ux
Companies or proper nouns mentioned in the conversation
Wikilinks or note titles

Run Grep for each concrete anchor. Prioritize anchors that appear near memo-marked topics.

Semantic search — for themes and concepts without a concrete anchor:

Formulate 2-3 queries from the prioritized topic list (Step 4), using the vault's vocabulary where possible
Focus on memo-marked topics first, then significant un-noted topics
Queries should be substantive and specific, drawn from actual conversation content

Good queries (drawn from specific themes, calibrated to vault language):

"happenstance interfaces and serendipity in knowledge tools"
"creative tool vs consumer tool positioning"
"behavioral graph as enabler business"

Bad queries (generic):

"meeting notes"
"conversation summary"
"knowledge management"

For each query, run mcp__enzyme__semantic_search with result_limit: 5.

Phase C: Read and collect

After both structured and semantic results come back:

Read the top 3-5 most relevant notes
Note existing tags that appear in those notes (for use in the output — never invent tags)
Note people links ([[Person Name]]) that appear
Note connections between the transcript content and vault content — these become citations in the draft

Step 6: Template + draft

Select and load template from $CLAUDE_PLUGIN_ROOT/skills/aside/templates/:
- 1on1-idea-exchange.md — default for most 1:1 conversations (idea exchange, catch-ups, brainstorms)
- discovery-call.md — client/prospect conversations focused on needs, fit, and next steps
- group-conversation.md — 3+ participants where tracking who thinks what matters
- talk-reflection.md — sermons, lectures, talks, or any session where the user is listening and reflecting, not conversing. The memo captures their thinking in response to a speaker.
Choose based on the memo and transcript content. If unclear, default to 1on1-idea-exchange.
Generate draft following these principles:

Topic ordering: Use the memo-weighted priority from Step 4. Decisions and action items surface first, then insights and tensions, then un-noted topics. This reflects what the user actually cared about during the conversation.

Content principles:
- Preserve specific language and direct quotes — use their actual words
- Reconstruct fragmented speech-to-text into intended meaning; flag uncertain reconstructions with [reconstructed]
- Weave in vault context as natural [[wikilink]] citations where connections exist
- Populate frontmatter with tags extracted from Enzyme results only (never invent tags)
- Add people to the people: field using [[Name]] format
- Skip pleasantries, logistics, and small talk unless they contained real content
- Prioritize specificity over comprehensiveness — five vivid points beat fifteen generic bullets
Writing style:
- Direct statements over contrast constructions (no "doesn't X, but Y" patterns)
- Use em dashes sparingly
- No rhetorical questions as transitions
- Avoid AI-typical phrases: "disappears into the background", "perhaps the better question is", "conceived as"
- Active voice, concrete language, varied sentence construction
Citation integration:
- Reference vault notes naturally: as explored in [[note title]] or connects to [[note title]]
- Use block embeds ![[file#^block-id]] only when the source has explicit block IDs and the quote is concise and directly relevant
- Don't force connections — only cite where the link genuinely enriches the note

Step 7: Review

Present the complete draft to the user. Ask:

Does the structure capture what mattered in this conversation?
Any sections to expand, trim, or restructure?
Any quotes or moments missing that should be included?

Apply revisions if requested. Iterate until the user is satisfied.

Step 8: Write to vault

Once approved:

If vault_note_path exists in session metadata (from Step 1): the note was already created on session quit. Read the existing vault note, replace everything after frontmatter with the approved distilled content, and update frontmatter tags/people fields from Enzyme results.
If vault_note_path is absent: fall back to creating the note via ./scripts/new-note.sh (run from $OBSIDIAN_VAULT/), then populate with the approved content using the Edit tool.
Rename with a descriptive suffix following vault naming conventions:
- Keep timestamp prefix
- Add 3-7 word descriptive name, lowercase
- Pattern for conversations: [timestamp] chat with [person] about [topic].md

mv "$OBSIDIAN_VAULT/inbox/[old-filename].md" "$OBSIDIAN_VAULT/inbox/[old-filename-prefix] [descriptive name].md"

After any rename, update vault_note_path in the session metadata:
- Read .aside/<session-name>.meta.json, set vault_note_path to the new path, write back.

Handling Poor Transcript Quality

Many transcripts come from speech-to-text and contain fragmented, garbled text. When you encounter this:

Reconstruct the most likely intended meaning from context
Use memo lines to disambiguate unclear passages
Preserve distinctive phrasing even when surrounding text is garbled
If a passage is genuinely unrecoverable, note it as [unclear] rather than guessing
Don't reproduce speech-to-text artifacts ("I. Mean. That. The.") — clean them up

Example Invocations

/aside standup
# Full pipeline: transcribe stereo audio, align, distill into vault note

/aside standup --align-only
# Only produce the aligned timeline, skip distillation

/aside coffee-chat --diarize ~/Downloads/recording.m4a
# Diarize mono audio (2 speakers default), then distill

/aside group-call --diarize ~/Downloads/meeting.wav --num-speakers 4
# Diarize with 4 expected speakers