Define data quality rules, profiling strategies, validation frameworks, and quality metrics.
Define data quality rules, profiling strategies, validation frameworks, and quality metrics. Use when planning data quality checks, creating validation rules, or setting up monitoring for data pipelines.
/plugin marketplace add melodic-software/claude-code-plugins/plugin install data-architecture@melodic-softwareThis skill is limited to using the following tools:
Use this skill when:
Data quality planning establishes rules, processes, and metrics to ensure data is fit for its intended purpose. Quality dimensions help categorize and measure data health.
| Dimension | Definition | Example Metric |
|---|---|---|
| Accuracy | Data correctly represents reality | % records matching source of truth |
| Completeness | All required data is present | % non-null required fields |
| Consistency | Data agrees across systems | % matching cross-system values |
| Timeliness | Data is available when needed | Avg latency from source to target |
| Uniqueness | No duplicate records | % unique on key columns |
| Validity | Data conforms to rules | % records passing validation |
# Quality Assessment: Customer Domain
| Dimension | Weight | Target | Current | Gap |
|-----------|--------|--------|---------|-----|
| Accuracy | 25% | 99% | 97% | -2% |
| Completeness | 25% | 98% | 95% | -3% |
| Consistency | 20% | 99% | 94% | -5% |
| Timeliness | 10% | 99% | 99% | 0% |
| Uniqueness | 15% | 100% | 98% | -2% |
| Validity | 5% | 99% | 96% | -3% |
| **Overall** | 100% | 98.6% | 96.2% | -2.4% |
| Type | Purpose | Output |
|---|---|---|
| Column | Understand data distribution | Min, Max, Null %, Distinct count |
| Cross-column | Find relationships | Correlations, functional dependencies |
| Cross-table | Validate referential integrity | Orphan records, FK violations |
| Cross-system | Compare across sources | Discrepancies, sync issues |
# Column Profile: customers.email
## Statistics
| Metric | Value |
|--------|-------|
| Total Rows | 1,250,000 |
| Distinct Values | 1,180,000 |
| Null Count | 12,500 (1%) |
| Empty String | 2,340 (0.2%) |
| Min Length | 5 |
| Max Length | 254 |
| Avg Length | 24.3 |
## Pattern Analysis
| Pattern | Count | Example |
|---------|-------|---------|
| \w+@\w+\.\w+ | 1,185,000 | user@domain.com |
| NULL | 12,500 | NULL |
| Invalid format | 50,160 | user@domain, @domain.com |
## Value Distribution
| Domain | Count | % |
|--------|-------|---|
| gmail.com | 312,000 | 26% |
| outlook.com | 187,500 | 15% |
| company.com | 125,000 | 10% |
| Other | 612,500 | 49% |
## Issues Found
- 4% invalid email format
- 1% null values (required field)
- 5.6% duplicate emails
| Category | Description | Example |
|---|---|---|
| Format | Pattern matching | Email regex, phone format |
| Range | Value boundaries | Age 0-120, price > 0 |
| Referential | FK constraints | Order.customer_id exists |
| Business | Domain logic | Discount <= 50% |
| Cross-field | Field relationships | End date >= Start date |
| Aggregate | Group-level | Daily sales > $1000 |
# Validation Rule: VR-CUST-001
## Metadata
- Name: Valid Email Format
- Domain: Customer
- Severity: Error
- Owner: Customer Data Steward
## Rule Definition
- Field: email
- Type: Format
- Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
## Threshold
- Error: < 99% compliance
- Warning: < 99.5% compliance
- Target: 99.9% compliance
## Remediation
1. Flag invalid emails for review
2. Attempt auto-correction (lowercase, trim)
3. Queue for customer contact if unresolvable
-- Completeness check
SELECT
'email' AS column_name,
COUNT(*) AS total_rows,
COUNT(email) AS non_null_rows,
CAST(COUNT(email) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS completeness_pct
FROM customers;
-- Uniqueness check
SELECT
'customer_id' AS column_name,
COUNT(*) AS total_rows,
COUNT(DISTINCT customer_id) AS unique_values,
CAST(COUNT(DISTINCT customer_id) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS uniqueness_pct
FROM customers;
-- Format validation
SELECT
COUNT(*) AS total_rows,
SUM(CASE WHEN email LIKE '%_@_%.__%' THEN 1 ELSE 0 END) AS valid_format,
SUM(CASE WHEN email NOT LIKE '%_@_%.__%' THEN 1 ELSE 0 END) AS invalid_format
FROM customers;
-- Referential integrity
SELECT
o.order_id,
o.customer_id
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;
-- Business rule
SELECT *
FROM orders
WHERE discount_percentage > 50
OR total_amount < 0
OR order_date > GETDATE();
public interface IValidationRule<T>
{
string RuleName { get; }
ValidationSeverity Severity { get; }
ValidationResult Validate(T entity);
}
public class EmailFormatRule : IValidationRule<Customer>
{
private static readonly Regex EmailPattern = new(
@"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$",
RegexOptions.Compiled);
public string RuleName => "VR-CUST-001";
public ValidationSeverity Severity => ValidationSeverity.Error;
public ValidationResult Validate(Customer entity)
{
if (string.IsNullOrEmpty(entity.Email))
{
return ValidationResult.Fail(RuleName, "Email is required");
}
if (!EmailPattern.IsMatch(entity.Email))
{
return ValidationResult.Fail(RuleName,
$"Invalid email format: {entity.Email}");
}
return ValidationResult.Pass(RuleName);
}
}
public class DataQualityValidator<T>
{
private readonly IEnumerable<IValidationRule<T>> _rules;
private readonly ILogger _logger;
public async Task<QualityReport> ValidateAsync(
IEnumerable<T> records,
CancellationToken ct)
{
var report = new QualityReport
{
StartTime = DateTime.UtcNow,
TotalRecords = 0
};
var ruleResults = _rules.ToDictionary(
r => r.RuleName,
r => new RuleResult { RuleName = r.RuleName });
foreach (var record in records)
{
report.TotalRecords++;
foreach (var rule in _rules)
{
var result = rule.Validate(record);
var ruleResult = ruleResults[rule.RuleName];
if (result.IsValid)
{
ruleResult.PassCount++;
}
else
{
ruleResult.FailCount++;
ruleResult.Failures.Add(result);
}
}
}
report.RuleResults = ruleResults.Values.ToList();
report.EndTime = DateTime.UtcNow;
report.OverallScore = CalculateOverallScore(ruleResults.Values);
return report;
}
}
# Data Quality Dashboard
## Overall Score
| Domain | Score | Trend | Status |
|--------|-------|-------|--------|
| Customer | 96.2% | ↑ +0.5% | 🟡 Warning |
| Product | 98.5% | ↑ +0.1% | 🟢 Healthy |
| Order | 99.1% | → 0% | 🟢 Healthy |
| Inventory | 94.8% | ↓ -1.2% | 🔴 Critical |
## Rule Compliance
| Rule ID | Rule Name | Target | Actual | Status |
|---------|-----------|--------|--------|--------|
| VR-CUST-001 | Valid Email | 99% | 95.4% | 🔴 |
| VR-CUST-002 | Unique SSN | 100% | 99.9% | 🟢 |
| VR-ORD-001 | Valid Total | 100% | 100% | 🟢 |
| VR-INV-001 | Positive Qty | 100% | 97.2% | 🔴 |
## Trend (Last 30 Days)
| Week | Customer | Product | Order | Inventory |
|------|----------|---------|-------|-----------|
| W1 | 95.7% | 98.4% | 99.1% | 96.0% |
| W2 | 95.9% | 98.5% | 99.1% | 95.5% |
| W3 | 96.0% | 98.5% | 99.1% | 95.0% |
| W4 | 96.2% | 98.5% | 99.1% | 94.8% |
| Operation | Description | Example |
|---|---|---|
| Standardization | Consistent format | "USA" → "United States" |
| Deduplication | Remove duplicates | Merge customer records |
| Enrichment | Add missing data | Append geocodes |
| Correction | Fix known errors | Typo correction |
| Imputation | Fill missing values | Mean/median substitution |
public class CustomerCleansingPipeline
{
public Customer Cleanse(Customer raw)
{
var cleansed = new Customer
{
CustomerId = raw.CustomerId,
// Standardization
Email = raw.Email?.Trim().ToLowerInvariant(),
Phone = StandardizePhone(raw.Phone),
// Formatting
FirstName = ToTitleCase(raw.FirstName?.Trim()),
LastName = ToTitleCase(raw.LastName?.Trim()),
// Enrichment
State = LookupStateFromZip(raw.PostalCode) ?? raw.State,
// Validation
PostalCode = ValidatePostalCode(raw.PostalCode)
? raw.PostalCode
: null // Mark for review
};
return cleansed;
}
private string StandardizePhone(string phone)
{
if (string.IsNullOrEmpty(phone)) return null;
var digits = new string(phone.Where(char.IsDigit).ToArray());
return digits.Length switch
{
10 => $"+1-{digits[..3]}-{digits[3..6]}-{digits[6..]}",
11 when digits[0] == '1' => $"+{digits[0]}-{digits[1..4]}-{digits[4..7]}-{digits[7..]}",
_ => phone // Return original if non-standard
};
}
}
# Quality Alert Configuration
## Critical Alerts (Immediate)
| Condition | Action |
|-----------|--------|
| Overall score < 90% | Page on-call, Slack #data-critical |
| Rule failure > 10% | Email data steward, create incident |
| New duplicate rate > 1% | Slack #data-ops |
## Warning Alerts (Daily Digest)
| Condition | Action |
|-----------|--------|
| Score decrease > 2% | Include in daily report |
| Approaching threshold | Email data steward |
| Trend declining 3+ days | Escalate to owner |
## Informational (Weekly Report)
| Condition | Action |
|-----------|--------|
| All scores stable | Include in weekly summary |
| Improvements noted | Celebrate in report |
Inputs from:
data-governance skill → Quality standardsconceptual-modeling skill → Business ruleser-modeling skill → Constraints and relationshipsOutputs to:
migration-planning skill → Validation stepsThis skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.