Multi-modal prompting with vision, audio, and document understanding
Enables processing of images, documents, audio, and video for analysis, extraction, and QA tasks. Claude uses this when users upload files or request content analysis from visual/audio sources.
/plugin marketplace add pluginagentmarketplace/custom-plugin-prompt-engineering/plugin install prompt-engineering-assistant@pluginagentmarketplace-prompt-engineeringThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/config.yamlassets/schema.jsonreferences/GUIDE.mdreferences/PATTERNS.mdscripts/validate.pyBonded to: advanced-techniques-agent
Skill("custom-plugin-prompt-engineering:multi-modal")
parameters:
modality:
type: enum
values: [vision, document, audio, video]
required: true
task_type:
type: enum
values: [analysis, extraction, generation, qa]
default: analysis
detail_level:
type: enum
values: [low, medium, high]
default: medium
Analyze this image and provide:
1. Main subjects and objects
2. Actions or activities
3. Setting and context
4. Notable details
5. Overall interpretation
Be specific and descriptive.
Look at the image carefully.
Question: {question}
Provide a detailed answer based only on what you can see in the image.
Analyze this chart/graph:
1. Type of visualization
2. Axes and labels
3. Key data points
4. Trends or patterns
5. Main insights
6. Limitations or caveats
Extract the following from this document:
- Title and headers
- Key information: {fields}
- Tables (if any)
- Important dates/numbers
Output as structured JSON.
extraction_schema:
document_type: "invoice|form|contract"
fields:
- name: vendor
type: string
- name: date
type: date
- name: total
type: currency
- name: line_items
type: array
Transcribe and enhance:
1. Accurate transcription
2. Speaker identification
3. Timestamps for key points
4. Summary of main topics
5. Action items (if applicable)
best_practices:
image_prompts:
- Be specific about what to look for
- Request structured output
- Ask for confidence levels
document_prompts:
- Define extraction schema
- Handle multi-page documents
- Validate extracted data
audio_prompts:
- Specify language if known
- Request speaker diarization
- Ask for timestamps
| Issue | Cause | Solution |
|---|---|---|
| Hallucinated details | Over-interpretation | Ask for visible-only info |
| Missed text in images | Low resolution | Request higher detail |
| Wrong document parsing | Complex layout | Break into sections |
| Inaccurate transcription | Audio quality | Acknowledge limitations |
See: GPT-4V documentation, Claude Vision Guide
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.