Apply quantization to reduce memory by 4-32x. Enable HNSW indexing for 150x faster search. Configure caching strategies and implement batch operations. Use when optimizing memory usage, improving search speed, or scaling to millions of vectors. Deploy these optimizations to achieve 12,500x performance gains.
/plugin marketplace add DNYoussef/context-cascade/plugin install dnyoussef-context-cascade@DNYoussef/context-cascadeThis skill inherits all available tools. When active, it can use any tool Claude has access to.
SKILL-meta.yamlexamples/example-1-quantization.mdexamples/example-2-hnsw-tuning.mdexamples/example-3-batching.mdgraphviz/workflow.dotreadme.mdreferences/hnsw-parameters.mdreferences/quantization-techniques.mdreferences/readme-gold.mdresources/scripts/batch_ops.pyresources/scripts/cache_optimize.pyresources/scripts/hnsw_tuning.shresources/scripts/quantize_vectors.pyresources/templates/cache-config.jsonresources/templates/hnsw-params.jsonresources/templates/quantization-config.yamltests/test-1-quantization-4x.mdtests/test-2-hnsw-tuning.mdtests/test-3-batch-ops.mdBefore writing ANY code, you MUST check:
.claude/library/catalog.json.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.mdD:\Projects\*| Match | Action |
|---|---|
| Library >90% | REUSE directly |
| Library 70-90% | ADAPT minimally |
| Pattern exists | FOLLOW pattern |
| In project | EXTRACT |
| No match | BUILD (add to library after) |
Use this skill to apply comprehensive performance optimization techniques for AgentDB vector databases. Implement quantization strategies (binary, scalar, product) to achieve 4-32x memory reduction. Enable HNSW indexing for 150x-12,500x performance improvements. Configure caching strategies and deploy batch operations to reduce memory usage while maintaining accuracy.
Performance: <100µs vector search, <1ms pattern retrieval, 2ms batch insert for 100 vectors.
Install Node.js 18+ and AgentDB v1.0.7+ via agentic-flow. Verify you have an existing AgentDB database or application ready for optimization.
Execute these steps to measure and optimize your AgentDB performance.
Execute benchmarks to establish baseline performance:
# Comprehensive performance benchmarking
npx agentdb@latest benchmark
# Results show:
# ✅ Pattern Search: 150x faster (100µs vs 15ms)
# ✅ Batch Insert: 500x faster (2ms vs 1s for 100 vectors)
# ✅ Large-scale Query: 12,500x faster (8ms vs 100s at 1M vectors)
# ✅ Memory Efficiency: 4-32x reduction with quantization
import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';
// Optimized configuration
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/optimized.db',
quantizationType: 'binary', // 32x memory reduction
cacheSize: 1000, // In-memory cache
enableLearning: true,
enableReasoning: true,
});
Select the appropriate quantization strategy based on your memory and accuracy requirements.
Apply binary quantization for maximum memory reduction:
Best For: Large-scale deployments (1M+ vectors), memory-constrained environments Trade-off: ~2-5% accuracy loss, 32x memory reduction, 10x faster
const adapter = await createAgentDBAdapter({
quantizationType: 'binary',
// 768-dim float32 (3072 bytes) → 96 bytes binary
// 1M vectors: 3GB → 96MB
});
Use Cases:
Performance:
Best For: Balanced performance/accuracy, moderate datasets Trade-off: ~1-2% accuracy loss, 4x memory reduction, 3x faster
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar',
// 768-dim float32 (3072 bytes) → 768 bytes (uint8)
// 1M vectors: 3GB → 768MB
});
Use Cases:
Performance:
Best For: High-dimensional vectors, balanced compression Trade-off: ~3-7% accuracy loss, 8-16x memory reduction, 5x faster
const adapter = await createAgentDBAdapter({
quantizationType: 'product',
// 768-dim float32 (3072 bytes) → 48-96 bytes
// 1M vectors: 3GB → 192MB
});
Use Cases:
Performance:
Best For: Maximum accuracy, small datasets Trade-off: No accuracy loss, full memory usage
const adapter = await createAgentDBAdapter({
quantizationType: 'none',
// Full float32 precision
});
Hierarchical Navigable Small World - O(log n) search complexity
AgentDB automatically builds HNSW indices:
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/vectors.db',
// HNSW automatically enabled
});
// Search with HNSW (100µs vs 15ms linear scan)
const results = await adapter.retrieveWithReasoning(queryEmbedding, {
k: 10,
});
// Advanced HNSW configuration
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/vectors.db',
hnswM: 16, // Connections per layer (default: 16)
hnswEfConstruction: 200, // Build quality (default: 200)
hnswEfSearch: 100, // Search quality (default: 100)
});
Parameter Tuning:
const adapter = await createAgentDBAdapter({
cacheSize: 1000, // Cache 1000 most-used patterns
});
// First retrieval: ~2ms (database)
// Subsequent: <1ms (cache hit)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
k: 10,
});
Cache Tuning:
// Cache automatically evicts least-recently-used patterns
// Most frequently accessed patterns stay in cache
// Monitor cache performance
const stats = await adapter.getStats();
console.log('Cache Hit Rate:', stats.cacheHitRate);
// Aim for >80% hit rate
// ❌ SLOW: Individual inserts
for (const doc of documents) {
await adapter.insertPattern({ /* ... */ }); // 1s for 100 docs
}
// ✅ FAST: Batch insert
const patterns = documents.map(doc => ({
id: '',
type: 'document',
domain: 'knowledge',
pattern_data: JSON.stringify({
embedding: doc.embedding,
text: doc.text,
}),
confidence: 1.0,
usage_count: 0,
success_count: 0,
created_at: Date.now(),
last_used: Date.now(),
}));
// Insert all at once (2ms for 100 docs)
for (const pattern of patterns) {
await adapter.insertPattern(pattern);
}
// Retrieve multiple queries efficiently
const queries = [queryEmbedding1, queryEmbedding2, queryEmbedding3];
// Parallel retrieval
const results = await Promise.all(
queries.map(q => adapter.retrieveWithReasoning(q, { k: 5 }))
);
// Enable automatic pattern consolidation
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
domain: 'documents',
optimizeMemory: true, // Consolidate similar patterns
k: 10,
});
console.log('Optimizations:', result.optimizations);
// {
// consolidated: 15, // Merged 15 similar patterns
// pruned: 3, // Removed 3 low-quality patterns
// improved_quality: 0.12 // 12% quality improvement
// }
// Manually trigger optimization
await adapter.optimize();
// Get statistics
const stats = await adapter.getStats();
console.log('Before:', stats.totalPatterns);
console.log('After:', stats.totalPatterns); // Reduced by ~10-30%
// Prune low-confidence patterns
await adapter.prune({
minConfidence: 0.5, // Remove confidence < 0.5
minUsageCount: 2, // Remove usage_count < 2
maxAge: 30 * 24 * 3600, // Remove >30 days old
});
# Get comprehensive stats
npx agentdb@latest stats .agentdb/vectors.db
# Output:
# Total Patterns: 125,430
# Database Size: 47.2 MB (with binary quantization)
# Avg Confidence: 0.87
# Domains: 15
# Cache Hit Rate: 84%
# Index Type: HNSW
const stats = await adapter.getStats();
console.log('Performance Metrics:');
console.log('Total Patterns:', stats.totalPatterns);
console.log('Database Size:', stats.dbSize);
console.log('Avg Confidence:', stats.avgConfidence);
console.log('Cache Hit Rate:', stats.cacheHitRate);
console.log('Search Latency (avg):', stats.avgSearchLatency);
console.log('Insert Latency (avg):', stats.avgInsertLatency);
const adapter = await createAgentDBAdapter({
quantizationType: 'binary', // 32x memory reduction
cacheSize: 5000, // Large cache
hnswM: 8, // Fewer connections = faster
hnswEfSearch: 50, // Low search quality = faster
});
// Expected: <50µs search, 90-95% accuracy
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar', // 4x memory reduction
cacheSize: 1000, // Standard cache
hnswM: 16, // Balanced connections
hnswEfSearch: 100, // Balanced quality
});
// Expected: <100µs search, 98-99% accuracy
const adapter = await createAgentDBAdapter({
quantizationType: 'none', // No quantization
cacheSize: 2000, // Large cache
hnswM: 32, // Many connections
hnswEfSearch: 200, // High search quality
});
// Expected: <200µs search, 100% accuracy
const adapter = await createAgentDBAdapter({
quantizationType: 'binary', // 32x memory reduction
cacheSize: 100, // Small cache
hnswM: 8, // Minimal connections
});
// Expected: <100µs search, ~10MB for 100K vectors
const adapter = await createAgentDBAdapter({
quantizationType: 'none', // Full precision
cacheSize: 500,
hnswM: 8,
});
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar', // 4x reduction
cacheSize: 1000,
hnswM: 16,
});
const adapter = await createAgentDBAdapter({
quantizationType: 'binary', // 32x reduction
cacheSize: 2000,
hnswM: 32,
});
const adapter = await createAgentDBAdapter({
quantizationType: 'product', // 8-16x reduction
cacheSize: 5000,
hnswM: 48,
hnswEfConstruction: 400,
});
# Check database size
npx agentdb@latest stats .agentdb/vectors.db
# Enable quantization
# Use 'binary' for 32x reduction
// Increase cache size
const adapter = await createAgentDBAdapter({
cacheSize: 2000, // Increase from 1000
});
// Reduce search quality (faster)
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
k: 5, // Reduce from 10
});
// Disable or use lighter quantization
const adapter = await createAgentDBAdapter({
quantizationType: 'scalar', // Instead of 'binary'
hnswEfSearch: 200, // Higher search quality
});
Test System: AMD Ryzen 9 5950X, 64GB RAM
| Operation | Vector Count | No Optimization | Optimized | Improvement |
|---|---|---|---|---|
| Search | 10K | 15ms | 100µs | 150x |
| Search | 100K | 150ms | 120µs | 1,250x |
| Search | 1M | 100s | 8ms | 12,500x |
| Batch Insert (100) | - | 1s | 2ms | 500x |
| Memory Usage | 1M | 3GB | 96MB | 32x (binary) |
Category: Performance / Optimization Difficulty: Intermediate Estimated Time: 20-30 minutes
AgentDB Performance Optimization operates on 3 fundamental principles:
Compress vectors by 4-32x with minimal accuracy loss (1-5%) using binary, scalar, or product quantization strategies.
In practice:
Replace O(n) linear scans with hierarchical navigable small world graphs for 150-12,500x performance improvements.
In practice:
Aggregate operations and cache frequent patterns to achieve 500x faster batch inserts and <1ms cache hits.
In practice:
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Sequential Inserts | 1s for 100 vectors due to individual database writes and index updates | Use batch insert pattern: collect all patterns, insert in single transaction (2ms for 100 vectors) |
| Full Precision Everywhere | 3GB memory for 1M vectors causes OOM on mobile/edge devices | Apply binary quantization (96MB, 32x reduction) with <5% accuracy loss for memory-constrained environments |
| Ignoring Cache Tuning | Cache too small = low hit rate, too large = memory waste and eviction overhead | Set cacheSize based on workload: 100-500 (small), 500-2000 (medium), 2000-5000 (large). Monitor hit rate >80% |
AgentDB Performance Optimization transforms vector search from memory-intensive, slow operations into production-ready systems capable of handling millions of vectors with sub-millisecond latency. By applying quantization strategies tailored to your accuracy requirements, enabling HNSW indexing for logarithmic search complexity, and implementing intelligent caching and batch operations, you achieve 150-12,500x performance improvements while reducing memory footprint by 4-32x.
Use this skill when scaling to large vector datasets (>10K vectors), deploying to memory-constrained environments (mobile, edge devices), or optimizing production systems requiring <10ms p99 latency. The key insight is strategic trade-offs: quantization trades minimal accuracy for massive memory savings, HNSW trades insertion time for exponentially faster search, and caching trades memory for latency reduction. Start with balanced configurations (scalar quantization, M=16, cacheSize=1000) and tune based on benchmarks for your specific workload.
This skill should be used when the user asks to "create a hookify rule", "write a hook rule", "configure hookify", "add a hookify rule", or needs guidance on hookify rule syntax and patterns.
Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.