Optimize SQL/JPA query performance
Analyzes database queries and suggests performance optimizations with measurable improvements.
/plugin marketplace add xinqilin/claude-dev-toolkit-marketplace/plugin install bill-java-developer@bill-lin-dev-toolkitfile-or-querysonnetHelp optimize database queries (SQL/JPA) for better performance in production environments. Focus on real-world performance bottlenecks with measurable improvements.
Gather baseline metrics:
-- Execution time
EXPLAIN ANALYZE SELECT ...;
-- Row counts
SELECT COUNT(*) FROM orders;
SELECT COUNT(*) FROM order_items;
-- Table size
SELECT
table_name,
table_rows,
data_length / 1024 / 1024 AS size_mb
FROM information_schema.tables
WHERE table_schema = 'your_database';
Required Information:
Symptom: Multiple queries when one would suffice
// Bad: N+1 Problem (1 + N queries)
List<Order> orders = orderRepository.findAll(); // 1 query
for (Order order : orders) {
List<OrderItem> items = itemRepository.findByOrderId(order.getId()); // N queries
order.setItems(items);
}
Impact: If you have 1000 orders, this executes 1001 queries!
Solution:
// Good: Single query with JOIN FETCH
@Query("SELECT DISTINCT o FROM Order o LEFT JOIN FETCH o.items")
List<Order> findAllWithItems();
// Alternative: Batch loading
@EntityGraph(attributePaths = {"items"})
List<Order> findAll();
Performance Gain: 1001 queries → 1 query (99.9% reduction)
Symptom: Full table scan on WHERE clause
-- EXPLAIN output shows: type=ALL, rows=5000000
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
Solution:
-- Create index on commonly queried column
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Verify index usage
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
-- Should now show: type=ref, key=idx_orders_customer_id, rows=~50
Performance Gain: Full scan of 5M rows → Index lookup of ~50 rows
// Bad: OFFSET grows, performance degrades linearly
@Query("SELECT o FROM Order o ORDER BY o.createdAt DESC")
Page<Order> findAll(Pageable pageable); // OFFSET 10000 LIMIT 20
Problem: Database must scan 10,020 rows to return 20 rows.
Solution: Keyset pagination
// Good: Use last seen ID
@Query("SELECT o FROM Order o WHERE o.id < :lastId ORDER BY o.id DESC")
List<Order> findNextPage(@Param("lastId") Long lastId, Pageable pageable);
// Bad: Fetching unnecessary data
@Query("SELECT o FROM Order o WHERE o.status = :status")
List<Order> findByStatus(String status); // Loads all columns
// Good: Projection for specific fields
@Query("SELECT new com.example.dto.OrderSummary(o.id, o.totalAmount, o.status) " +
"FROM Order o WHERE o.status = :status")
List<OrderSummary> findSummaryByStatus(String status);
-- Bad: Cartesian product risk
SELECT *
FROM orders o
JOIN order_items oi
WHERE o.customer_id = 123; -- Missing join condition!
-- Good: Proper join condition
SELECT o.*, oi.*
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.customer_id = 123;
Create index when:
-- High selectivity: Good candidate
SELECT * FROM users WHERE email = 'user@example.com'; -- Returns 1 row
-- Low selectivity: Poor candidate
SELECT * FROM users WHERE is_active = true; -- Returns 90% of rows
Rule: Most selective column first, then by query frequency
-- Query pattern
WHERE status = 'PENDING' AND customer_id = 123 AND created_at > '2024-01-01'
-- Analyze selectivity
SELECT
COUNT(DISTINCT status) as status_values, -- Result: 5
COUNT(DISTINCT customer_id) as customers, -- Result: 50000
COUNT(DISTINCT DATE(created_at)) as dates -- Result: 365
FROM orders;
-- Optimal index: customer_id (most selective) first
CREATE INDEX idx_orders_lookup ON orders(customer_id, status, created_at);
Index contains all columns needed by query:
-- Query needs: id, customer_id, total_amount
SELECT id, total_amount
FROM orders
WHERE customer_id = 123;
-- Covering index (no table access needed)
CREATE INDEX idx_orders_covering ON orders(customer_id, id, total_amount);
Benefit: Index-only scan - no table access needed
// Default LAZY loading causes N+1
@Entity
public class Order {
@OneToMany(mappedBy = "order") // Default: LAZY
private List<OrderItem> items;
}
// Solution 1: EAGER (use cautiously)
@OneToMany(mappedBy = "order", fetch = FetchType.EAGER)
private List<OrderItem> items;
// Solution 2: Query-specific fetch (preferred)
@Query("SELECT o FROM Order o LEFT JOIN FETCH o.items WHERE o.id = :id")
Optional<Order> findByIdWithItems(@Param("id") Long id);
// Solution 3: EntityGraph
@EntityGraph(attributePaths = {"items", "customer"})
Optional<Order> findById(Long id);
// Configure in application.properties
spring.jpa.properties.hibernate.default_batch_fetch_size=10
// Or per-entity
@Entity
@BatchSize(size = 10)
public class Order {
@OneToMany
@BatchSize(size = 10)
private List<OrderItem> items;
}
Impact: Fetches associations in batches instead of one-by-one
// Read-only query optimization
@QueryHints(@QueryHint(name = "org.hibernate.readOnly", value = "true"))
List<Order> findByStatus(String status);
// Cache query results
@QueryHints(@QueryHint(name = "org.hibernate.cacheable", value = "true"))
List<Product> findAllProducts();
// Before: Join on every query
SELECT o.id, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.id;
// After: Denormalize frequently accessed data
@Entity
public class Order {
private Long customerId;
private String customerName; // Denormalized
private String customerEmail; // Denormalized
}
Trade-off: Faster reads, slower writes, data duplication
MySQL:
-- InnoDB buffer pool sizing
SET GLOBAL innodb_buffer_pool_size = 4G;
-- Query cache (MySQL 5.7 and earlier)
SET GLOBAL query_cache_size = 256M;
SET GLOBAL query_cache_type = 1;
PostgreSQL:
-- Shared buffers
shared_buffers = 4GB
-- Work memory for sorts
work_mem = 64MB
-- Expensive aggregation query
SELECT
customer_id,
COUNT(*) as order_count,
SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id;
-- Create materialized view
CREATE MATERIALIZED VIEW customer_stats AS
SELECT
customer_id,
COUNT(*) as order_count,
SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id;
-- Refresh periodically
REFRESH MATERIALIZED VIEW customer_stats;
For each optimization, provide:
Execution Time: 3.5 seconds
Rows Examined: 5,000,000
Rows Returned: 150
Index Used: NONE
Problem: Full table scan on orders table
The query uses WHERE customer_id = 123 but there's no index
on customer_id column. MySQL performs a full table scan,
examining all 5M rows to find matching records.
EXPLAIN output:
type: ALL
rows: 5000000
Extra: Using where
SQL Changes:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
JPA Changes:
@Entity
@Table(name = "orders", indexes = {
@Index(name = "idx_orders_customer_id", columnList = "customer_id")
})
public class Order {
// entity fields
}
Execution Time: 3.5s → 45ms (98.7% faster)
Rows Examined: 5,000,000 → 150 (99.997% reduction)
Index Used: idx_orders_customer_id
EXPLAIN output after:
type: ref
key: idx_orders_customer_id
rows: 150
-- Test query performance
SET profiling = 1;
SELECT * FROM orders WHERE customer_id = 123;
SHOW PROFILES;
-- Verify index usage
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
-- Check index statistics
SHOW INDEX FROM orders;
Index Overhead:
Monitoring:
-- Check index usage over time
SELECT
index_name,
rows_read,
rows_read / (SELECT SUM(rows_read) FROM sys.schema_index_statistics) * 100 as usage_pct
FROM sys.schema_index_statistics
WHERE table_name = 'orders';
-- Bad: Prevents index usage
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
-- Good: Index-friendly
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
-- Bad: Can't use composite index efficiently
SELECT * FROM orders WHERE customer_id = 123 OR status = 'PENDING';
-- Good: Use UNION
SELECT * FROM orders WHERE customer_id = 123
UNION
SELECT * FROM orders WHERE status = 'PENDING';
-- Bad: customer_id is INT, but querying with STRING
SELECT * FROM orders WHERE customer_id = '123'; -- Index not used!
-- Good: Match column type
SELECT * FROM orders WHERE customer_id = 123;
Before optimizing:
After optimizing:
IMPORTANT: All output must be in Traditional Chinese (繁體中文)