Community Plugin

multimodal

Vision, audio, and multimodal models. Includes CLIP (image-text matching), Whisper (speech recognition), LLaVA (visual chat), BLIP-2 (vision-language), Segment Anything (image segmentation), Stable Diffusion (text-to-image), and AudioCraft (music/audio generation). Use when working with images, audio, or multimodal tasks.

1.0.0

Updated 1 month ago

Capabilities

Commands

Agents

Skills

Hooks

MCP Servers

LSP Servers

Install

Add the repository(one-time)

/plugin marketplace add zechenzhangAGI/AI-research-SKILLs

Install the plugin

/plugin install multimodal@ai-research-skills

Component Details

No components detected in this plugin's metadata.

Stats

Stars746

Forks59

MaintenanceGood

Last Commit1 month ago

Collections

Links

View on GitHub

View README

Plugin Marketplace JSON

Similar Plugins

cache-components

137.2k

Expert guidance for Next.js Cache Components and Partial Prerendering (PPR). Proactively activates in projects with cacheComponents enabled.

v1.0.0

explanatory-output-style

57.0k

Adds educational insights about implementation choices and codebase patterns (mimics the deprecated Explanatory output style)

2mo

v1.0.0

hookify

57.0k

Easily create hooks to prevent unwanted behaviors by analyzing conversation patterns

1mo

v0.1.0

frontend-design

57.0k

109

Frontend design skill for UI/UX implementation

2mo

v1.0.0