Vision, audio, and multimodal models. Includes CLIP (image-text matching), Whisper (speech recognition), LLaVA (visual chat), BLIP-2 (vision-language), Segment Anything (image segmentation), Stable Diffusion (text-to-image), and AudioCraft (music/audio generation). Use when working with images, audio, or multimodal tasks.
/plugin marketplace add zechenzhangAGI/AI-research-SKILLs/plugin install multimodal@ai-research-skillsExpert guidance for Next.js Cache Components and Partial Prerendering (PPR). Proactively activates in projects with cacheComponents enabled.
Adds educational insights about implementation choices and codebase patterns (mimics the deprecated Explanatory output style)
Easily create hooks to prevent unwanted behaviors by analyzing conversation patterns
Frontend design skill for UI/UX implementation