02 - SQL & Databases Expert

Overview

The SQL & Databases Expert role equips you with the technical skills to retrieve, manipulate, and analyze data directly from databases. Unlike spreadsheets, databases handle millions of records efficiently, enable real-time data access, and support multiple concurrent users. This role bridges the gap between raw data and analysis, teaching you how data is actually stored, organized, and accessed in enterprise environments.

Why This Matters: 95% of enterprise data lives in databases. SQL proficiency is the most demanded skill in data analytics and opens doors to advanced roles including analytics engineering, data engineering, and business intelligence.

Learning Path Overview

This learning journey transforms you from a spreadsheet analyst to a database-fluent professional who can:

Write efficient SQL queries for complex analyses
Understand database architecture and design
Optimize query performance for large datasets
Work with multiple database systems
Bridge gap between raw data and business insights

Timeline: 12-16 weeks of focused learning | Skill Level: Intermediate Developer

Learning Phase 1: SQL Fundamentals & Core Queries

Objectives

Master basic SQL syntax and structure
Learn SELECT, WHERE, and filtering operations
Understand database concepts and data types
Write simple queries against real databases
Understand query execution and results

Core Topics

SQL Basics & Data Types

-- SQL Statement Structure
SELECT column1, column2 FROM table_name WHERE condition ORDER BY column1;

-- Basic Data Types
INT / BIGINT          -- Integer numbers
DECIMAL(10,2)         -- Numbers with decimals
VARCHAR(255)          -- Text, variable length
CHAR(10)              -- Text, fixed length
DATE                  -- Date only (YYYY-MM-DD)
DATETIME/TIMESTAMP    -- Date and time
BOOLEAN               -- TRUE/FALSE

-- Database Creation Example
CREATE DATABASE company_analytics;
USE company_analytics;

CREATE TABLE employees (
  employee_id INT PRIMARY KEY AUTO_INCREMENT,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  department VARCHAR(50),
  salary DECIMAL(10,2),
  hire_date DATE,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

SELECT & WHERE Clauses

-- Basic SELECT
SELECT employee_id, first_name, last_name FROM employees;

-- Select all columns
SELECT * FROM employees;

-- WHERE conditions
SELECT * FROM employees WHERE department = 'Sales';
SELECT * FROM employees WHERE salary > 50000;
SELECT * FROM employees WHERE hire_date >= '2023-01-01';

-- Multiple conditions
SELECT * FROM employees
WHERE department = 'Sales' AND salary > 50000;

SELECT * FROM employees
WHERE department = 'Sales' OR department = 'Marketing';

SELECT * FROM employees
WHERE salary BETWEEN 50000 AND 100000;

SELECT * FROM employees
WHERE first_name LIKE 'J%';  -- Names starting with J

SELECT * FROM employees
WHERE department IS NULL;  -- Find missing values

SELECT * FROM employees
WHERE employee_id IN (1, 5, 10);

Sorting & Limiting Results

-- ORDER BY
SELECT * FROM employees ORDER BY salary DESC;  -- Highest to lowest
SELECT * FROM employees ORDER BY last_name ASC, first_name ASC;

-- LIMIT results
SELECT * FROM employees LIMIT 10;  -- First 10 records
SELECT * FROM employees LIMIT 10 OFFSET 20;  -- Skip 20, get next 10 (pagination)

-- Combining all
SELECT employee_id, first_name, salary
FROM employees
WHERE department = 'Sales'
ORDER BY salary DESC
LIMIT 5;  -- Top 5 sales people by salary

Basic Aggregation

-- Aggregate Functions
SELECT COUNT(*) FROM employees;  -- Number of records
SELECT COUNT(DISTINCT department) FROM employees;  -- Unique values

SELECT SUM(salary) FROM employees;
SELECT AVG(salary) FROM employees;
SELECT MIN(salary) FROM employees;
SELECT MAX(salary) FROM employees;

-- GROUP BY
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department;

SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

-- HAVING (filter groups after aggregation)
SELECT department, COUNT(*) as count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;  -- Departments with more than 5 employees

String & Date Functions

-- String Functions
SELECT UPPER(first_name) FROM employees;
SELECT LOWER(last_name) FROM employees;
SELECT CONCAT(first_name, ' ', last_name) as full_name FROM employees;
SELECT SUBSTRING(first_name, 1, 3) FROM employees;  -- First 3 characters
SELECT LENGTH(first_name) FROM employees;
SELECT TRIM(department) FROM employees;  -- Remove spaces

-- Date Functions
SELECT CURDATE();  -- Today's date
SELECT NOW();  -- Current date and time
SELECT DATE_FORMAT(hire_date, '%Y-%m-%d') FROM employees;
SELECT YEAR(hire_date) FROM employees;
SELECT MONTH(hire_date) FROM employees;
SELECT DAY(hire_date) FROM employees;
SELECT DATEDIFF(CURDATE(), hire_date) as days_employed FROM employees;

-- Calculate age from hire date
SELECT
  first_name,
  hire_date,
  YEAR(CURDATE()) - YEAR(hire_date) as years_employed
FROM employees;

Tools & Resources

SQL Playground:
- SQLite Online (sqliteonline.com)
- DB Fiddle (dbfiddle.uk)
- HackerRank SQL challenges
Database Systems:
- MySQL (open source, widely used)
- PostgreSQL (advanced, excellent for analytics)
- SQL Server (Microsoft ecosystem)
- SQLite (embedded, great for learning)
Learning Resources:
- Mode Analytics SQL Tutorial (free)
- DataCamp SQL course
- LeetCode Database problems

Key Deliverables

Write 50+ SQL queries using different WHERE conditions
Create a practice database with 5 tables
Master all basic functions (string, date, aggregation)
Build SQL reference guide with examples
Complete SQL challenges on HackerRank (Easy level)

Learning Phase 2: JOINs & Complex Queries

Objectives

Master table relationships and JOINs
Learn to combine data from multiple sources
Understand database schema design
Write complex queries for business analysis
Optimize query performance

Core Topics

Relational Database Concepts

Database Schema Example:

EMPLOYEES Table:
├── employee_id (Primary Key)
├── first_name
├── last_name
├── department_id (Foreign Key -> DEPARTMENTS)
└── salary

DEPARTMENTS Table:
├── department_id (Primary Key)
├── department_name
└── budget

PROJECTS Table:
├── project_id (Primary Key)
├── project_name
├── department_id (Foreign Key -> DEPARTMENTS)
└── budget

ASSIGNMENTS Table:
├── assignment_id (Primary Key)
├── employee_id (Foreign Key -> EMPLOYEES)
├── project_id (Foreign Key -> PROJECTS)
└── hours_allocated

JOIN Operations

-- INNER JOIN (most common)
-- Returns only matching records from both tables
SELECT
  e.first_name,
  e.last_name,
  d.department_name,
  e.salary
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id;

-- LEFT JOIN (keep all from left table)
-- Keep all employees even if no department assigned
SELECT
  e.first_name,
  d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id;

-- RIGHT JOIN (keep all from right table)
SELECT
  e.first_name,
  d.department_name
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.department_id;

-- FULL OUTER JOIN (all records from both)
SELECT
  e.first_name,
  d.department_name
FROM employees e
FULL OUTER JOIN departments d ON e.department_id = d.department_id;

-- CROSS JOIN (Cartesian product)
SELECT COUNT(*) FROM employees CROSS JOIN departments;  -- Every employee x every department

-- Multiple JOINs
SELECT
  e.first_name,
  d.department_name,
  p.project_name,
  a.hours_allocated
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
INNER JOIN assignments a ON e.employee_id = a.employee_id
INNER JOIN projects p ON a.project_id = p.project_id;

Subqueries & CTEs

-- Subquery in WHERE clause
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Find all employees earning more than average

-- Subquery in SELECT clause
SELECT
  department_id,
  (SELECT COUNT(*) FROM employees e2 WHERE e2.department_id = e1.department_id) as emp_count
FROM employees e1;

-- Common Table Expression (CTE) - more readable
WITH avg_salary AS (
  SELECT AVG(salary) as avg_sal FROM employees
)
SELECT * FROM employees, avg_salary
WHERE salary > avg_salary.avg_sal;

-- Multiple CTEs
WITH department_stats AS (
  SELECT
    department_id,
    COUNT(*) as emp_count,
    AVG(salary) as avg_salary
  FROM employees
  GROUP BY department_id
),
high_salary_depts AS (
  SELECT * FROM department_stats
  WHERE avg_salary > 75000
)
SELECT * FROM high_salary_depts;

Advanced Grouping & Analysis

-- GROUP BY with multiple columns
SELECT
  department_id,
  YEAR(hire_date) as hire_year,
  COUNT(*) as count,
  AVG(salary) as avg_salary
FROM employees
GROUP BY department_id, YEAR(hire_date)
ORDER BY department_id, hire_year;

-- HAVING clause for group filtering
SELECT
  department_id,
  COUNT(*) as emp_count,
  AVG(salary) as avg_salary
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5 AND AVG(salary) > 60000
ORDER BY emp_count DESC;

-- Window Functions (available in MySQL 8.0+, PostgreSQL, SQL Server)
SELECT
  first_name,
  salary,
  department_id,
  ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as rank_in_dept,
  RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) as dense_rank,
  AVG(salary) OVER (PARTITION BY department_id) as dept_avg_salary
FROM employees;

-- Running totals
SELECT
  employee_id,
  salary,
  SUM(salary) OVER (ORDER BY employee_id) as running_total
FROM employees;

UNION & Set Operations

-- UNION (removes duplicates)
SELECT first_name, 'Employee' as type FROM employees
UNION
SELECT customer_name, 'Customer' as type FROM customers;

-- UNION ALL (keeps duplicates)
SELECT salary FROM employees
UNION ALL
SELECT contract_rate FROM contractors;

-- INTERSECT (common values)
SELECT employee_id FROM employees
INTERSECT
SELECT employee_id FROM project_assignments;

-- EXCEPT (in first but not second)
SELECT employee_id FROM employees
EXCEPT
SELECT employee_id FROM project_assignments;

Tools & Resources

Interactive SQL Tutorials: Mode Analytics advanced SQL
Practice Environments:
- LeetCode Database (Medium level)
- HackerRank SQL (Intermediate)
- DataCamp SQL Advanced course
Documentation:
- MySQL official documentation
- PostgreSQL wiki
- SQL Server Books Online

Key Deliverables

Design a 5+ table database schema
Master all JOIN types with practical examples
Write 50+ complex queries with multiple JOINs
Complete 20+ LeetCode Database problems
Create a query performance analysis report

Learning Phase 3: Query Optimization & Performance Tuning

Objectives

Understand query execution plans
Learn indexing strategies
Optimize slow queries
Understand query performance implications
Work with large datasets efficiently

Core Topics

Execution Plans & Query Analysis

-- View execution plan (varies by database)
EXPLAIN SELECT * FROM employees WHERE salary > 100000;

-- More detailed explanation (MySQL)
EXPLAIN EXTENDED SELECT * FROM employees WHERE salary > 100000;

-- PostgreSQL EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT * FROM employees WHERE salary > 100000;

-- Key metrics to understand:
-- - Query type (SIMPLE, PRIMARY, SUBQUERY, UNION)
-- - Table access method (seq scan vs index scan)
-- - Rows examined vs rows returned
-- - Filter cost

Indexing Strategies

-- Create index
CREATE INDEX idx_department ON employees(department_id);
CREATE INDEX idx_salary ON employees(salary);

-- Composite index (for multiple conditions)
CREATE INDEX idx_dept_salary ON employees(department_id, salary);

-- When to use indexes:
✓ Columns used in WHERE clause frequently
✓ Columns used in JOIN conditions
✓ Columns used in ORDER BY
✓ Columns used for filtering in large tables

✗ Columns with low selectivity (few unique values)
✗ Small tables (full table scan faster)
✗ Columns with many NULL values
✗ Columns modified frequently

-- View indexes
SHOW INDEXES FROM employees;
SHOW INDEX FROM employees;  -- MySQL

-- Drop index
DROP INDEX idx_salary ON employees;

Query Optimization Techniques

-- BAD: SELECT * (retrieves unnecessary columns)
SELECT * FROM employees WHERE department_id = 1;

-- GOOD: Select only needed columns
SELECT employee_id, first_name, salary FROM employees WHERE department_id = 1;

-- BAD: Multiple subqueries
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
AND department_id IN (SELECT department_id FROM departments WHERE budget > 1000000);

-- GOOD: Use CTE instead
WITH dept_budgets AS (
  SELECT department_id FROM departments WHERE budget > 1000000
),
avg_salary AS (
  SELECT AVG(salary) as avg_sal FROM employees
)
SELECT e.* FROM employees e, avg_salary
WHERE e.salary > avg_salary.avg_sal
AND e.department_id IN (SELECT department_id FROM dept_budgets);

-- BAD: Function in WHERE clause (prevents index use)
SELECT * FROM employees WHERE YEAR(hire_date) = 2023;

-- GOOD: Date range instead
SELECT * FROM employees
WHERE hire_date >= '2023-01-01' AND hire_date < '2024-01-01';

-- BAD: LIKE with leading wildcard (slow)
SELECT * FROM employees WHERE first_name LIKE '%john%';

-- BETTER: Use FULLTEXT search or limit wildcard position
SELECT * FROM employees WHERE first_name LIKE 'john%';

Data Type & Schema Optimization

-- Use appropriate data types
✓ INT for IDs (not VARCHAR)
✓ DECIMAL(10,2) for money (not FLOAT)
✓ DATE for dates (not VARCHAR)
✓ SMALLINT for small ranges
✓ VARCHAR(n) with appropriate length

-- Avoid data type conversions
-- BAD: Implicit conversion wastes CPU
SELECT * FROM employees WHERE employee_id = '123';  -- String to INT

-- GOOD: Explicit matching types
SELECT * FROM employees WHERE employee_id = 123;

-- Denormalization for analytics (after optimization fails)
-- Sometimes calculated fields speed up analysis
ALTER TABLE employees ADD COLUMN dept_name VARCHAR(50);
UPDATE employees e
SET dept_name = (SELECT department_name FROM departments d WHERE d.department_id = e.department_id);

Tools & Resources

Performance Tools:
- MySQL Workbench (query profiling)
- PostgreSQL pgAdmin (EXPLAIN ANALYZE)
- SQL Server Management Studio (Execution Plan)
Learning Resources:
- "Use the Index, Luke!" (online, free)
- DataCamp "Writing Efficient SQL" course
- Official documentation for your database

Key Deliverables

Analyze 20+ query execution plans
Optimize 10 slow queries (measure improvements)
Design proper indexing strategy for sample databases
Document performance optimization checklist
Build query performance baseline & monitoring system

Learning Phase 4: Database Design & Data Warehousing

Objectives

Understand database normalization principles
Learn data warehouse design patterns
Design schemas for analytical queries
Understand dimensional modeling
Work with star schemas and fact tables

Core Topics

Database Normalization

Normalization Levels:

0NF (Unnormalized): Repeating groups in table
EMPLOYEES table with multiple phone numbers in same row

1NF (First Normal Form): Remove repeating groups
✓ Atomic values only
✓ No repeating columns
✓ Unique row identifier (Primary Key)

EMPLOYEES Table:
├── employee_id (PK)
├── first_name
└── last_name

PHONE_NUMBERS Table:
├── phone_id (PK)
├── employee_id (FK)
└── phone_number

2NF (Second Normal Form): Remove partial dependencies
✓ 1NF requirements
✓ Non-key attributes depend on entire primary key
✓ Separate tables for composite key dependencies

3NF (Third Normal Form): Remove transitive dependencies
✓ 2NF requirements
✓ Non-key attributes depend only on primary key
✓ No dependencies between non-key attributes

BCNF (Boyce-Codd): Every determinant is a candidate key
(Most data should be 3NF; BCNF for complex scenarios)

Star Schema Design

Analytical Database Design:

               FACTS                    DIMENSIONS

sales_fact_table                 date_dimension
├── fact_id (PK)        ←→      ├── date_id
├── date_id (FK)                ├── date
├── product_id (FK)             ├── year
├── customer_id (FK)            ├── quarter
├── amount                       ├── month
└── quantity                     └── day_name

                        product_dimension
                        ├── product_id (PK)
                        ├── product_name
                        ├── category
                        └── price

                        customer_dimension
                        ├── customer_id (PK)
                        ├── customer_name
                        ├── segment
                        └── region

Benefits:
✓ Optimized for analytical queries
✓ Easy to understand and maintain
✓ Fast aggregations
✓ Handles slowly changing dimensions

Creating Dimension & Fact Tables

-- Date Dimension (common in all data warehouses)
CREATE TABLE date_dimension (
  date_id INT PRIMARY KEY,
  date DATE UNIQUE,
  year INT,
  quarter INT,
  month INT,
  day INT,
  day_name VARCHAR(20),
  month_name VARCHAR(20),
  is_weekend BOOLEAN,
  fiscal_year INT
);

-- Product Dimension
CREATE TABLE product_dimension (
  product_id INT PRIMARY KEY,
  product_code VARCHAR(50) UNIQUE,
  product_name VARCHAR(255),
  category VARCHAR(100),
  subcategory VARCHAR(100),
  price DECIMAL(10,2),
  cost DECIMAL(10,2),
  supplier_id INT,
  is_active BOOLEAN,
  created_date DATE,
  updated_date DATE
);

-- Sales Fact Table
CREATE TABLE sales_fact (
  fact_id BIGINT PRIMARY KEY AUTO_INCREMENT,
  date_id INT,
  product_id INT,
  customer_id INT,
  store_id INT,
  quantity INT,
  unit_price DECIMAL(10,2),
  amount DECIMAL(12,2),
  discount_amount DECIMAL(10,2),
  net_amount DECIMAL(12,2),
  created_timestamp TIMESTAMP,
  FOREIGN KEY (date_id) REFERENCES date_dimension(date_id),
  FOREIGN KEY (product_id) REFERENCES product_dimension(product_id),
  FOREIGN KEY (customer_id) REFERENCES customer_dimension(customer_id),
  FOREIGN KEY (store_id) REFERENCES store_dimension(store_id)
);

-- Create indexes on fact table for analytical queries
CREATE INDEX idx_fact_date ON sales_fact(date_id);
CREATE INDEX idx_fact_product ON sales_fact(product_id);
CREATE INDEX idx_fact_customer ON sales_fact(customer_id);
CREATE INDEX idx_fact_store ON sales_fact(store_id);

Slowly Changing Dimensions

-- Type 1: Overwrite (lose history)
UPDATE product_dimension
SET category = 'Electronics'
WHERE product_id = 100;

-- Type 2: Add new row (keep history)
INSERT INTO product_dimension
SELECT *, '2024-01-01' as effective_date, '9999-12-31' as end_date
FROM product_dimension
WHERE product_id = 100;

UPDATE product_dimension
SET end_date = '2024-01-01'
WHERE product_id = 100 AND end_date = '9999-12-31';

-- Type 3: Add column for previous value
ALTER TABLE product_dimension ADD COLUMN previous_category VARCHAR(100);
UPDATE product_dimension SET previous_category = category WHERE product_id = 100;
UPDATE product_dimension SET category = 'Electronics' WHERE product_id = 100;

Star Schema Analytics Queries

-- Sales by product and month
SELECT
  d.year,
  d.month_name,
  p.product_name,
  SUM(sf.quantity) as total_quantity,
  SUM(sf.amount) as total_amount
FROM sales_fact sf
JOIN date_dimension d ON sf.date_id = d.date_id
JOIN product_dimension p ON sf.product_id = p.product_id
GROUP BY d.year, d.month_name, p.product_name
ORDER BY d.year, d.month, p.product_name;

-- Sales trend over time
SELECT
  d.date,
  SUM(sf.amount) as daily_sales,
  AVG(SUM(sf.amount)) OVER (ORDER BY d.date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as 7day_avg
FROM sales_fact sf
JOIN date_dimension d ON sf.date_id = d.date_id
GROUP BY d.date
ORDER BY d.date;

-- Top products by revenue
SELECT
  p.product_name,
  p.category,
  SUM(sf.amount) as total_revenue,
  SUM(sf.quantity) as total_units,
  COUNT(DISTINCT sf.customer_id) as unique_customers
FROM sales_fact sf
JOIN product_dimension p ON sf.product_id = p.product_id
GROUP BY p.product_id, p.product_name, p.category
ORDER BY total_revenue DESC
LIMIT 10;

Tools & Resources

Data Warehouse Design:
- "The Data Warehouse Toolkit" by Ralph Kimball
- "Database Design for Smart Data Scientists" course
Tools:
- Lucidchart (schema design)
- pgAdmin (PostgreSQL design)
- MySQL Workbench (ER diagrams)
Documentation: Your company's data warehouse design documentation

Key Deliverables

Design normalized schema for 3+ datasets
Create star schema for e-commerce business scenario
Document normalization levels for sample database
Build fact and dimension tables from raw data
Write complex queries analyzing multiple dimensions

Real-World Projects

Project 1: E-Commerce Analytics Database

Scenario: Build analytics database from transactional e-commerce system.

Objectives:

Design normalized transactional schema
Create star schema for analytics
Migrate data from transactional to analytical
Build 20+ analytical queries
Optimize for performance

Deliverables:

Normalized database schema (ER diagram)
Star schema (fact and dimension tables)
ETL/migration scripts
20+ production queries
Performance report with indexes

Skills Applied: Schema design, normalization, star schemas, ETL, query optimization

Project 2: HR Analytics Data Warehouse

Scenario: Create data warehouse for HR analytics across company.

Objectives:

Consolidate data from multiple HR systems
Design dimensions (Employee, Department, Manager, Time)
Create fact tables (Headcount, Compensation, Turnover)
Build analytical queries for HR metrics
Implement role-based access

Deliverables:

HR data warehouse schema
Dimension and fact table DDL
Historical data handling plan
15+ HR analytics queries
Security and access control implementation

Skills Applied: Multi-source integration, dimensional modeling, slowly changing dimensions, data governance

Project 3: Performance Tuning Project

Scenario: Optimize poorly performing analytics database.

Objectives:

Audit current database performance
Identify slow queries
Analyze execution plans
Design indexing strategy
Measure improvements
Document optimization techniques

Deliverables:

Performance audit report
Slow query analysis (20+ queries)
Indexing recommendations with rationale
Before/after performance metrics
Optimization procedures manual
Query performance monitoring setup

Skills Applied: Query optimization, indexing, execution plan analysis, performance monitoring

Career Progression

Timeline & Advancement

Months 1-3:     Basic Competency
├── Write simple SELECT queries
├── Understand JOIN basics
├── Work with single tables effectively
└── Understand basic database concepts

Months 4-6:     Intermediate Competency
├── Master complex JOINs and subqueries
├── Optimize simple queries
├── Design basic schemas
├── Work with multiple tables confidently
└── Understand indexes and performance

Months 7-10:    Advanced Competency
├── Design normalized schemas
├── Optimize complex queries
├── Understand star schemas
├── Build data warehouses
└── Work with multiple database systems

Months 11-16:   Expert Competency
├── Lead database design initiatives
├── Architect enterprise data systems
├── Mentor others on best practices
├── Implement advanced optimization
└── Ready for engineering roles

Role Opportunities

SQL Analyst (Query-focused)
Database Developer (Schema-focused)
Analytics Engineer (ETL-focused)
Data Engineer (Infrastructure-focused)
Business Intelligence Developer (Warehouse-focused)

Salary Expectations (2024 US Market)

Entry Level (0-2 years):        $60,000 - $85,000
Mid Level (2-5 years):          $85,000 - $115,000
Advanced (5+ years):            $115,000 - $150,000
Senior/Lead (8+ years):         $150,000 - $200,000+

Best Practices

1. Query Writing Standards

-- Use consistent formatting
-- Capitalize keywords
-- Indent sub-clauses
-- Include comments for complex logic
SELECT
  e.employee_id,
  e.first_name,
  d.department_name,
  -- Calculate tenure in years
  YEAR(CURDATE()) - YEAR(e.hire_date) as tenure_years
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
WHERE e.salary > 75000
ORDER BY e.last_name;

2. Database Security

Security Principles:
├── Principle of Least Privilege
│   └── Users get minimum permissions needed
├── Role-Based Access Control
│   └── Assign permissions by role, not individual
├── Encryption
│   └── Encrypt sensitive data in transit and at rest
├── Audit Logging
│   └── Track who accessed what and when
├── Regular Backups
│   └── Test restore procedures regularly
└── Access Review
    └── Quarterly review of permissions

3. Database Maintenance

Regular Maintenance Tasks:

Daily:
- Monitor query performance
- Check error logs
- Verify backups completed

Weekly:
- Update table statistics
- Review slow query logs
- Check disk space

Monthly:
- Analyze index usage
- Rebuild fragmented indexes
- Review security access
- Performance baseline comparison

Quarterly:
- Full backup verification
- Disaster recovery testing
- Capacity planning
- Schema optimization review

4. Version Control for SQL

Directory Structure:
├── /schemas/
│   ├── 01_initial_schema.sql
│   └── 02_add_new_tables.sql
├── /migrations/
│   ├── 001_create_employees.sql
│   └── 002_add_indexes.sql
├── /procedures/
│   └── sp_daily_reporting.sql
├── /views/
│   └── vw_sales_summary.sql
└── README.md

Use version control:
- Git for SQL scripts
- Migration frameworks (Flyway, Liquibase)
- Deployment automation
- Rollback procedures

5. Documentation Standards

Document:
1. Purpose of each table
2. Business rules and logic
3. Expected data ranges
4. Update frequency
5. Data refresh schedule
6. Access requirements
7. Known limitations
8. Contact person for questions

Template:

TABLE: sales_fact
Purpose: Records individual sales transactions
Source: Point of sale system, synced nightly
Refresh: Daily at 2 AM
Key Fields:
  - date_id: Reference to date dimension
  - product_id: Reference to product dimension
  - amount: Gross sales amount
Refresh: Nightly at 2 AM
Owner: Analytics Team

6. Testing & Validation

Before deploying to production:

1. Syntax check
   - Query runs without errors
   - Schema changes valid

2. Logic validation
   - Results match expected output
   - Edge cases handled

3. Performance testing
   - Execution plan reviewed
   - Acceptable performance

4. Data quality check
   - Row counts match expectations
   - No unexpected NULL values
   - Referential integrity maintained

5. Backup & rollback
   - Backup created before change
   - Rollback procedure documented
   - Rollback tested (if possible)

Best Tools & Resources

Database Systems

MySQL - Open source, reliable, widely used
PostgreSQL - Advanced features, excellent for analytics
SQL Server - Enterprise, strong tooling
SQLite - Lightweight, perfect for learning

Development Tools

MySQL Workbench - Design, query, administration
pgAdmin - PostgreSQL administration and development
SQL Server Management Studio - SSMS for SQL Server
DBeaver - Universal database tool (supports all)
VSCode with extensions - SQL development

Learning Resources

Mode Analytics - Free SQL tutorial (best for beginners)
DataCamp - Comprehensive SQL courses
LeetCode Database - Practice problems (free tier available)
HackerRank SQL - SQL challenges with progression
Books:
- "SQL Performance Explained" - Markus Winand
- "The Art of SQL" - Stephane Faroult
- "Use the Index, Luke!" - Free online

Practice Datasets

Kaggle Datasets - Real-world data for practice
Stack Exchange Data Dump - Large dataset for learning
TPC-H Benchmark - Industry standard test data
Sample databases - MySQL, PostgreSQL official samples

Next Steps

Immediate Actions (Next 2 Weeks)

Set up learning environment
- Install MySQL or PostgreSQL
- Learn basic SQL syntax
- Complete first 20 queries
Establish daily practice
- SQL 30 minutes daily on LeetCode
- Build personal SQL reference
- Join SQL community forums
Understand your company's databases
- Identify available databases
- Document key tables
- Review data dictionary

Short-term Goals (1-3 Months)

Master all JOIN types
- Complete complex query challenges
- Build query portfolio
Optimize queries
- Learn to read execution plans
- Create and test indexes
- Measure performance improvements
Design first schema
- Create normalized database
- Document schema design
- Write analytical queries

Advanced Preparation (3-6 Months)

Prepare for advanced roles
- Deep dive into data warehousing
- Learn ETL concepts
- Prepare for Python/Statistics roles
Lead database initiatives
- Propose schema improvements
- Optimize company databases
- Mentor others on SQL

Recommended Learning Sequence

Current Role: SQL & Databases Expert ✓ (You are here)
        ↓
Option A: Deepen SQL expertise
        ↓
Option B: Move to Phase 3 - Statistics Specialist
        ↓
Option C: Move to Phase 5 - Programming Expert
        ↓
Multiple Advanced Roles (4, 6)
        ↓
Career Leadership Roles (7 - Career Coach)

Key Takeaways

As a SQL & Databases Expert, you'll understand that:

SQL is fundamental - Every analytics career includes SQL work
Design matters - Good schema design makes analysis easy and fast
Performance is critical - Slow queries frustrate users and cost money
Normalization improves data quality - Proper schema prevents data inconsistencies
Indexing is an art and science - Right indexes dramatically improve performance
Documentation enables collaboration - Clear documentation helps others use your work

SQL expertise opens doors to advanced analytics roles and makes you valuable in any data organization.

FAQ

Q: How long should I spend learning SQL before moving to advanced roles? A: 12-16 weeks minimum. Most analysts benefit from 6 months of focused learning.

Q: Which database should I learn first? A: Start with MySQL or PostgreSQL (open source, widely used). SQL concepts transfer between systems.

Q: Should I memorize all SQL functions? A: No. Learn common functions well, know reference resources for others.

Q: How do I know a query needs optimization? A: If execution takes >2 seconds, review the execution plan. Analyze if index could help.

Q: What's the difference between HAVING and WHERE? A: WHERE filters before grouping. HAVING filters after grouping (use HAVING with GROUP BY).

Q: Can I optimize a query that takes 5 minutes? A: Usually yes. Start with execution plan analysis. Check for missing indexes, unnecessary columns, or redesign opportunities.

Last Updated: November 2024 Difficulty Level: Intermediate Estimated Time to Completion: 12-16 weeks

02-sql-databases-expert

02 - SQL & Databases Expert

Overview

Learning Path Overview

Learning Phase 1: SQL Fundamentals & Core Queries

Objectives

Core Topics

SQL Basics & Data Types

SELECT & WHERE Clauses

Sorting & Limiting Results

Basic Aggregation

String & Date Functions

Tools & Resources

Key Deliverables

Learning Phase 2: JOINs & Complex Queries

Objectives

Core Topics

Relational Database Concepts

JOIN Operations

Subqueries & CTEs

Advanced Grouping & Analysis

UNION & Set Operations

Tools & Resources

Key Deliverables

Learning Phase 3: Query Optimization & Performance Tuning

Objectives

Core Topics

Execution Plans & Query Analysis

Indexing Strategies

Query Optimization Techniques

Data Type & Schema Optimization

Tools & Resources

Key Deliverables

Learning Phase 4: Database Design & Data Warehousing

Objectives

Core Topics

Database Normalization

Star Schema Design

Creating Dimension & Fact Tables

Slowly Changing Dimensions

Star Schema Analytics Queries

Tools & Resources

Key Deliverables

Real-World Projects

Project 1: E-Commerce Analytics Database

Project 2: HR Analytics Data Warehouse

Project 3: Performance Tuning Project

Career Progression

Timeline & Advancement

Role Opportunities

Salary Expectations (2024 US Market)

Best Practices

1. Query Writing Standards

2. Database Security

3. Database Maintenance

4. Version Control for SQL

5. Documentation Standards

6. Testing & Validation

Best Tools & Resources

Database Systems

Development Tools

Learning Resources

Practice Datasets

Next Steps

Immediate Actions (Next 2 Weeks)

Short-term Goals (1-3 Months)

Advanced Preparation (3-6 Months)

Recommended Learning Sequence

Key Takeaways

FAQ

Similar Agents