Help us improve
Share bugs, ideas, or general feedback.
From zai-glm
Use this skill when the user asks about GLM models, GLM-5, GLM-4.7, GLM-4.6, GLM-4.5, GLM-4V, ChatGLM, CogView, CogVideoX, z.ai model capabilities, model selection for different tasks, or comparing GLM models.
npx claudepluginhub nsheaps/ai-mktpl --plugin zai-glmHow this skill is triggered — by the user, by Claude, or both
Slash command
/zai-glm:glm-modelsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The GLM (General Language Model) family is developed by z.ai (formerly Zhipu AI / 智谱AI). These models support text generation, vision, code, embeddings, image generation, and video generation. All recent models are open-weight under MIT license.
Queries OpenRouter API to list, search, compare, and resolve 300+ AI models by pricing, context lengths, capabilities, throughput; checks provider latency, uptime, performance.
Discover Venice models, their capabilities, constraints, and pricing. Covers GET /models (with ?type filter), /models/traits, /models/compatibility_mapping, the ModelResponse schema (capabilities, constraints, pricing per type), and how to use this to pick the right model programmatically.
Compares Replicate models by cost, speed, quality, and capabilities to help select the best model for your use case.
Share bugs, ideas, or general feedback.
The GLM (General Language Model) family is developed by z.ai (formerly Zhipu AI / 智谱AI). These models support text generation, vision, code, embeddings, image generation, and video generation. All recent models are open-weight under MIT license.
https://api.z.ai/api/paas/v4/| Model | Architecture | Context | Key Features |
|---|---|---|---|
glm-5 | ~745B MoE (44B active) | 200K in / 128K out | Agentic engineering, tool streaming, long-horizon tasks, MIT |
glm-5-turbo | Same, optimized | 200K in / 128K out | Improved stability for long-chain agent tasks |
glm-4.7 | ~400B MoE | 200K in / 128K out | Coding-focused, Preserved Thinking, Turn-level Thinking, MIT |
glm-4.7-flash | Lightweight | Reduced | Free tier, lighter capability |
glm-4.6 | 355B total | 200K | Strong code benchmarks, agent frameworks, MIT |
glm-4.5 | 355B / 32B active | 128K | Hybrid reasoning (thinking/non-thinking modes), deep thinking |
glm-4.5-x | Premium tier | 128K | Higher capability, premium pricing |
glm-4.5-air | 106B / 12B active | 128K | Compact variant of GLM-4.5 |
glm-4.5-flash | Lightweight | 128K | Free tier |
GLM-4.5+ models support hybrid reasoning — toggle between deep thinking and instant response:
{
"model": "glm-4.7",
"messages": [{ "role": "user", "content": "Solve this step by step" }],
"thinking": { "type": "enabled" }
}
tool_stream: true)| Model | Parameters | Context | Description |
|---|---|---|---|
glm-4.6v | 106B / 12B active | 128K | Vision understanding, function calling |
glm-4.6v-flash | 9B | — | Free, open weights, commercial license |
glm-4.5v | 106B VLM | — | Vision-language model |
curl "https://api.z.ai/api/paas/v4/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "glm-4.6v",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
}'
| Model | Category | Description |
|---|---|---|
glm-image | Image generation | Text-to-image (Jan 2026) |
glm-ocr | OCR | Document and image OCR |
cogview-3-plus | Image gen | High-quality text-to-image |
cogvideox | Video gen | Text-to-video generation |
cogvideox-flash | Video gen | Fast video generation |
| Model | Dimensions | Description |
|---|---|---|
embedding-3 | 2048 | General-purpose text embeddings |
embedding-2 | 1024 | Previous generation embeddings |
curl "https://api.z.ai/api/paas/v4/embeddings" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "embedding-3",
"input": "What is machine learning?"
}'
| Use Case | Recommended Model | Why |
|---|---|---|
| Agentic tasks | glm-5 | Tool streaming, long-horizon planning |
| Coding | glm-4.7 | Coding-focused, Preserved Thinking |
| Complex reasoning | glm-4.5 | Hybrid reasoning with deep thinking |
| General chat | glm-4.5-flash | Free, good quality |
| High throughput | glm-4.5-air | Compact, fast inference |
| Image understanding | glm-4.6v | Best vision model with function calling |
| Embeddings/search | embedding-3 | Latest generation |
| Image creation | glm-image | Latest generation (Jan 2026) |
| Budget-conscious | glm-4.5-flash | Free tier available |
When using z.ai's Anthropic-compatible endpoint with Claude Code, map models to slots:
| Claude Code Slot | Recommended GLM Model | Rationale |
|---|---|---|
| Opus | glm-5 | Most capable, agentic |
| Sonnet | glm-4.7 | Strong coding, balanced cost |
| Haiku | glm-4.5-air | Fast, cost-effective |
| Model | Input | Output |
|---|---|---|
glm-5 | ~$1.00 | ~$3.20 |
glm-4.7 | $0.60 | $2.20 |
glm-4.7-flash | Free | Free |
glm-4.5 | ~$0.20 | ~$1.10 |
glm-4.5-x | — | $8.90 |
glm-4.5-flash | Free | Free |
glm-4.6v | ~$0.14 | ~$0.41 |
glm-4.6v-flash | Free | Free |
Prices approximate; see docs.z.ai/guides/overview/pricing for current rates. Batch API available at 50% cost.
glm-4.5-flash, glm-4.7-flash, glm-4.6v-flash are free