Production-grade caching strategies skill for Redis patterns, CDN configuration, cache invalidation, and performance optimization
Designs production caching layers with Redis/CDN patterns, invalidation strategies, and performance metrics. Used when you need to optimize data access patterns, reduce database load, or implement cache-aside/write-through architectures.
/plugin marketplace add pluginagentmarketplace/custom-plugin-system-design/plugin install custom-plugin-system-design@pluginagentmarketplace-system-designThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/config.yamlassets/schema.jsonreferences/GUIDE.mdreferences/PATTERNS.mdscripts/validate.pyPurpose: Atomic skill for caching architecture with comprehensive invalidation patterns and performance metrics.
| Attribute | Value |
|---|---|
| Scope | Redis, CDN, Cache Invalidation |
| Responsibility | Single: Caching layer design and optimization |
| Invocation | Skill("caching-strategies") |
parameters:
caching_context:
type: object
required: true
properties:
use_case:
type: string
enum: [session, api_response, database, static_assets, compute]
required: true
data_profile:
type: object
required: true
properties:
size_per_item: { type: string, pattern: "^\\d+[KMGB]?B?$" }
total_items: { type: integer, minimum: 1 }
update_frequency: { type: string, enum: [real_time, seconds, minutes, hours, days] }
access_pattern: { type: string, enum: [uniform, hot_cold, temporal] }
requirements:
type: object
properties:
hit_rate_target: { type: number, minimum: 0, maximum: 100 }
max_latency_ms: { type: integer, minimum: 1 }
consistency: { type: string, enum: [strict, eventual] }
budget_monthly: { type: string }
validation_rules:
- name: "hit_rate_feasibility"
rule: "hit_rate_target <= 99.9"
error: "100% hit rate is not achievable in practice"
- name: "memory_estimate"
rule: "size_per_item * total_items <= available_memory"
warning: "May require cache eviction or sharding"
output:
type: object
properties:
architecture:
type: object
properties:
layers: { type: array }
technology: { type: string }
topology: { type: string }
configuration:
type: object
properties:
memory_allocation: { type: string }
eviction_policy: { type: string }
ttl_strategy: { type: object }
connection_pool: { type: object }
invalidation:
type: object
properties:
strategy: { type: string }
triggers: { type: array }
implementation: { type: string }
metrics:
type: object
properties:
expected_hit_rate: { type: number }
memory_usage: { type: string }
latency_p99: { type: string }
L1: Application Memory
├── Technology: Caffeine, Guava
├── Latency: ~0.1ms
├── Size: MB range
├── TTL: Seconds
└── Use: Hot data, thread-local
L2: Distributed Cache
├── Technology: Redis, Memcached
├── Latency: 1-5ms
├── Size: GB-TB range
├── TTL: Minutes to hours
└── Use: Shared state, sessions
L3: CDN Edge
├── Technology: CloudFront, Fastly
├── Latency: 5-50ms (network)
├── Size: Unlimited
├── TTL: Hours to days
└── Use: Static assets, API responses
L4: Database Cache
├── Technology: Query cache, buffer pool
├── Latency: ~1ms
├── Size: GB range
├── TTL: Until invalidated
└── Use: Query results
Cache-Aside (Lazy Loading):
├── Read: Check cache → Miss → DB → Store → Return
├── Write: Update DB → Invalidate cache
├── Pros: Simple, resilient to cache failure
├── Cons: Cache miss penalty, stale on DB update
└── Use: Read-heavy, tolerance for staleness
Write-Through:
├── Write: Update cache + DB atomically
├── Read: Always from cache
├── Pros: Cache always fresh
├── Cons: Write latency, complexity
└── Use: Read-after-write needed
Write-Behind:
├── Write: Update cache → Async DB write
├── Pros: Low write latency
├── Cons: Data loss risk, complexity
└── Use: Write-heavy, acceptable loss
Read-Through:
├── Read: Cache handles DB fetch on miss
├── Pros: Simplified application
├── Cons: Cache dependency
└── Use: Predictable access patterns
TTL-Based:
├── Simple time expiry
├── Formula: TTL = max_acceptable_staleness
├── Jitter: TTL * (1 + random(-0.1, 0.1))
└── Prevents: Thundering herd
Event-Based:
├── Invalidate on data change
├── Implementation: CDC, Pub/Sub
├── Latency: Near real-time
└── Complexity: Event system required
Version-Based:
├── Key: user:{id}:v{version}
├── Bump version on change
├── Old versions expire naturally
└── Benefit: No explicit invalidation
Tag-Based:
├── Associate keys with tags
├── Invalidate by tag
├── Example: Tag "product:123" on all related
└── Use: Related data groups
retry_config:
cache_read:
max_attempts: 2
timeout_ms: 50
on_failure: proceed_without_cache
cache_write:
max_attempts: 3
timeout_ms: 100
on_failure: log_and_continue
redis_connection:
max_attempts: 5
initial_delay_ms: 100
max_delay_ms: 5000
multiplier: 2.0
circuit_breaker:
failure_threshold: 5
reset_timeout_seconds: 30
half_open_requests: 1
log_schema:
level: { type: string }
timestamp: { type: string, format: ISO8601 }
skill: { type: string, value: "caching-strategies" }
event:
type: string
enum:
- cache_hit
- cache_miss
- cache_set
- cache_invalidate
- cache_evict
- ttl_expired
- circuit_open
context:
type: object
properties:
key: { type: string }
ttl_seconds: { type: integer }
latency_ms: { type: number }
size_bytes: { type: integer }
example:
level: INFO
event: cache_hit
context:
key: "user:123:profile"
latency_ms: 0.5
size_bytes: 1024
metrics:
- name: cache_requests_total
type: counter
labels: [operation, result] # hit, miss, error
- name: cache_latency_seconds
type: histogram
labels: [operation]
buckets: [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05]
- name: cache_memory_bytes
type: gauge
labels: [cache_name]
- name: cache_evictions_total
type: counter
labels: [policy]
- name: cache_hit_ratio
type: gauge
labels: [cache_name]
| Issue | Cause | Resolution |
|---|---|---|
| Low hit rate | TTL too short | Increase TTL, analyze patterns |
| High latency | Cache miss + DB | Warm cache, optimize DB |
| Memory pressure | Too much data | Increase memory, evict |
| Stale data | TTL mismatch | Event-based invalidation |
| Thundering herd | Mass expiry | Jittered TTL, singleflight |
| Hot key | Popularity skew | Replicate, local cache |
□ Hit rate measured (>90% target)?
□ Memory usage within limits?
□ Eviction rate acceptable?
□ Latency p99 within SLA?
□ Invalidation working?
□ Cluster health OK?
□ Connection pool sized right?
# test_caching_strategies.py
def test_valid_caching_context():
params = {
"caching_context": {
"use_case": "session",
"data_profile": {
"size_per_item": "1KB",
"total_items": 1000000,
"update_frequency": "minutes",
"access_pattern": "hot_cold"
},
"requirements": {
"hit_rate_target": 99,
"max_latency_ms": 5,
"consistency": "eventual"
}
}
}
result = validate_parameters(params)
assert result.valid == True
def test_memory_estimation():
result = estimate_memory(
size_per_item="1KB",
total_items=1000000,
overhead_factor=1.5 # Redis overhead
)
assert result.total == "1.5GB"
assert result.recommended_allocation == "2GB" # 25% buffer
def test_ttl_with_jitter():
base_ttl = 3600
jittered = apply_jitter(base_ttl, factor=0.1)
assert 3240 <= jittered <= 3960 # ±10%
def test_hit_rate_infeasibility():
params = {
"caching_context": {
"requirements": {
"hit_rate_target": 100 # Impossible
}
}
}
result = validate_parameters(params)
assert result.valid == False
assert "not achievable" in result.errors[0]
def test_ttl_invalidation():
cache = MockCache()
cache.set("key", "value", ttl=60)
assert cache.get("key") == "value"
advance_time(61)
assert cache.get("key") is None
def test_event_based_invalidation():
cache = MockCache()
cache.set("user:123", {"name": "Alice"})
# Simulate update event
event = {"type": "user_updated", "id": 123}
process_invalidation_event(cache, event)
assert cache.get("user:123") is None
def test_tag_based_invalidation():
cache = MockCache()
cache.set("product:123:details", data, tags=["product:123"])
cache.set("product:123:reviews", reviews, tags=["product:123"])
cache.invalidate_by_tag("product:123")
assert cache.get("product:123:details") is None
assert cache.get("product:123:reviews") is None
def test_thundering_herd_prevention():
cache = MockCache()
# Simulate concurrent requests for same key
requests = [fetch_with_singleflight(cache, "hot_key") for _ in range(100)]
results = await asyncio.gather(*requests)
# Only one DB call should have been made
assert cache.db_calls == 1
assert all(r == results[0] for r in results)
| Version | Date | Changes |
|---|---|---|
| 2.0.0 | 2025-01 | Production-grade rewrite with invalidation patterns |
| 1.0.0 | 2024-12 | Initial release |
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.