Orchestrates multimodal AI interactions across text, image, voice, code, tools, and structured data. Guides modality selection, seamless transitions, conflict resolution, and design artifacts.
npx claudepluginhub owl-listener/ai-design-skills --plugin model-interaction-designThis skill uses the workspace's default tool permissions.
AI interactions increasingly span multiple modalities — text, images, voice, code, tools, and structured data. Designing how these modalities work together is orchestration.
Provides patterns for building multimodal AI applications combining text, images, audio, and video. Covers vision APIs, audio transcription, Whisper, and unified pipelines.
Designs human-AI conversations with patterns for turn-taking, repair sequences, grounding, and structures like interviews, co-creation, or guided workflows.
Provides patterns for multimodal LLM integration: vision (image analysis, document understanding), audio (STT, TTS), video generation (Kling, Sora, Veo, Runway). Use for AI pipelines with images, audio, video.
Share bugs, ideas, or general feedback.
AI interactions increasingly span multiple modalities — text, images, voice, code, tools, and structured data. Designing how these modalities work together is orchestration.
Each modality has strengths:
When the interaction switches modalities, design the transition:
Sometimes modalities compete: