LLM architecture, tokenization, transformers, and inference optimization. Use for understanding and working with language models.
Provides LLM fundamentals including transformer architecture, tokenization, and inference parameters. Use when discussing model capabilities, optimizing prompts, or comparing LLMs for specific tasks.
/plugin marketplace add pluginagentmarketplace/custom-plugin-ai-engineer/plugin install pluginagentmarketplace-ai-engineer-plugin@pluginagentmarketplace/custom-plugin-ai-engineerThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/model_config.yamlreferences/MODEL_SELECTION_GUIDE.mdscripts/llm_client.pyMaster the fundamentals of Large Language Models.
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain transformers briefly."}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
inputs = tokenizer("Hello, how are", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
Input → Embedding → [N × Transformer Block] → Output
Transformer Block:
┌───────────────────────────┐
│ Multi-Head Self-Attention │
├───────────────────────────┤
│ Layer Normalization │
├───────────────────────────┤
│ Feed-Forward Network │
├───────────────────────────┤
│ Layer Normalization │
└───────────────────────────┘
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
text = "Hello, world!"
# Encode
tokens = tokenizer.encode(text)
print(tokens) # [15496, 11, 995, 0]
# Decode
decoded = tokenizer.decode(tokens)
print(decoded) # "Hello, world!"
# Generation parameters
params = {
'temperature': 0.7, # Randomness (0-2)
'max_tokens': 1000, # Output length limit
'top_p': 0.9, # Nucleus sampling
'top_k': 50, # Top-k sampling
'frequency_penalty': 0, # Reduce repetition
'presence_penalty': 0 # Encourage new topics
}
| Model | Parameters | Context | Best For |
|---|---|---|---|
| GPT-4 | ~1.7T | 128K | Complex reasoning |
| GPT-3.5 | 175B | 16K | General tasks |
| Claude 3 | N/A | 200K | Long context |
| Llama 2 | 7-70B | 4K | Open source |
| Mistral 7B | 7B | 32K | Efficient inference |
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Run a model
ollama run llama2
# API usage
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?"
}'
from vllm import LLM, SamplingParams
llm = LLM(model="meta-llama/Llama-2-7b-hf")
sampling = SamplingParams(temperature=0.8, max_tokens=100)
outputs = llm.generate(["Hello, my name is"], sampling)
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(prompt: str) -> str:
return client.chat.completions.create(...)
| Symptom | Cause | Solution |
|---|---|---|
| Rate limit errors | Too many requests | Add exponential backoff |
| Empty response | max_tokens=0 | Check parameter values |
| High latency | Large model | Use smaller model |
| Timeout | Prompt too long | Reduce input size |
def test_llm_completion():
response = call_llm("Hello")
assert response is not None
assert len(response) > 0
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.