Implements AWS Macie to automatically discover, classify, and protect sensitive data like PII, financial info, credentials in S3 buckets using ML, CLI jobs, Terraform, and custom identifiers.
npx claudepluginhub killvxk/cybersecurity-skills-zhThis skill uses the workspace's default tool permissions.
Amazon Macie(亚马逊 Macie)是一项全托管的数据安全与隐私服务,使用机器学习(Machine Learning)和模式匹配(Pattern Matching)来发现和保护 Amazon S3 中的敏感数据。Macie 每天自动评估您的 S3 存储桶清单,识别包含 PII(个人身份信息)、金融信息、凭据及其他敏感数据类型的对象。它提供两种发现方式:用于广泛可见性的自动化敏感数据发现,以及用于深度分析的定向发现任务。
Implement AWS Macie to discover, classify, and protect sensitive data in S3 buckets using ML and pattern matching for PII, financial data, and credentials.
Implements AWS Macie to discover, classify, and protect sensitive data like PII in S3 buckets using ML, CLI jobs, and Terraform configs for compliance.
Implements cloud DLP using Amazon Macie, Azure Information Protection, and Google Cloud DLP API to discover, classify, and protect sensitive data in cloud storage, databases, and pipelines. For compliance like GDPR, HIPAA, PCI DSS.
Share bugs, ideas, or general feedback.
Amazon Macie(亚马逊 Macie)是一项全托管的数据安全与隐私服务,使用机器学习(Machine Learning)和模式匹配(Pattern Matching)来发现和保护 Amazon S3 中的敏感数据。Macie 每天自动评估您的 S3 存储桶清单,识别包含 PII(个人身份信息)、金融信息、凭据及其他敏感数据类型的对象。它提供两种发现方式:用于广泛可见性的自动化敏感数据发现,以及用于深度分析的定向发现任务。
# 在当前账户/区域启用 Macie
aws macie2 enable-macie
# 验证 Macie 已启用
aws macie2 get-macie-session
# 启用自动化敏感数据发现
aws macie2 update-automated-discovery-configuration \
--status ENABLED
resource "aws_macie2_account" "main" {}
resource "aws_macie2_classification_export_configuration" "main" {
depends_on = [aws_macie2_account.main]
s3_destination {
bucket_name = aws_s3_bucket.macie_results.id
key_prefix = "macie-findings/"
kms_key_arn = aws_kms_key.macie.arn
}
}
aws macie2 create-classification-job \
--job-type ONE_TIME \
--name "pii-scan-production-buckets" \
--s3-job-definition '{
"bucketDefinitions": [{
"accountId": "123456789012",
"buckets": [
"production-data-bucket",
"customer-records-bucket"
]
}]
}' \
--managed-data-identifier-selector ALL
aws macie2 create-classification-job \
--job-type SCHEDULED \
--name "weekly-sensitive-data-scan" \
--schedule-frequency-details '{
"weekly": {
"dayOfWeek": "MONDAY"
}
}' \
--s3-job-definition '{
"bucketDefinitions": [{
"accountId": "123456789012",
"buckets": ["all-data-bucket"]
}],
"scoping": {
"includes": {
"and": [{
"simpleScopeTerm": {
"comparator": "STARTS_WITH",
"key": "OBJECT_KEY",
"values": ["uploads/", "documents/"]
}
}]
}
}
}'
aws macie2 create-custom-data-identifier \
--name "internal-employee-id" \
--description "匹配内部员工 ID 格式 EMP-XXXXXX" \
--regex "EMP-[0-9]{6}" \
--severity-levels '[
{"occurrencesThreshold": 1, "severity": "LOW"},
{"occurrencesThreshold": 10, "severity": "MEDIUM"},
{"occurrencesThreshold": 50, "severity": "HIGH"}
]'
aws macie2 create-custom-data-identifier \
--name "project-code-identifier" \
--description "匹配格式为 PRJ-XXXX-XX 的项目代码" \
--regex "PRJ-[A-Z]{4}-[0-9]{2}" \
--keywords '["project", "code", "initiative"]' \
--maximum-match-distance 50
aws macie2 create-allow-list \
--name "test-data-exclusions" \
--description "排除已知测试数据模式" \
--criteria '{
"regex": "TEST-[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}"
}'
Macie 提供 300 余种托管数据标识符,涵盖以下类别:
| 类别 | 示例 |
|---|---|
| PII | 社会安全号码、护照号码、驾驶证、出生日期、姓名、地址 |
| 金融 | 信用卡号、银行账号、SWIFT 代码 |
| 凭据 | AWS 密钥、API 密钥、SSH 私钥、OAuth 令牌 |
| 医疗 | HIPAA 标识符、医疗保险理赔号 |
| 法务 | 税务识别号、国家身份证号 |
# 获取敏感数据发现结果
aws macie2 list-findings \
--finding-criteria '{
"criterion": {
"severity.description": {
"eq": ["High"]
},
"category": {
"eq": ["CLASSIFICATION"]
}
}
}' \
--sort-criteria '{"attributeName": "updatedAt", "orderBy": "DESC"}' \
--max-results 25
aws macie2 get-findings \
--finding-ids '["finding-id-1", "finding-id-2"]'
# Macie 自动将发现结果发布到 Security Hub
# 验证集成状态:
aws macie2 get-macie-session --query 'findingPublishingFrequency'
{
"source": ["aws.macie"],
"detail-type": ["Macie Finding"],
"detail": {
"severity": {
"description": ["High", "Critical"]
}
}
}
import boto3
import json
s3 = boto3.client('s3')
sns = boto3.client('sns')
def lambda_handler(event, context):
finding = event['detail']
severity = finding['severity']['description']
bucket = finding['resourcesAffected']['s3Bucket']['name']
key = finding['resourcesAffected']['s3Object']['key']
sensitive_types = [d['type'] for d in finding.get('classificationDetails', {}).get('result', {}).get('sensitiveData', [])]
if severity in ['High', 'Critical']:
# 为对象打标签以供审查
s3.put_object_tagging(
Bucket=bucket,
Key=key,
Tagging={
'TagSet': [
{'Key': 'macie-finding', 'Value': severity},
{'Key': 'sensitive-data', 'Value': ','.join(sensitive_types)},
{'Key': 'requires-review', 'Value': 'true'}
]
}
)
# 通知安全团队
sns.publish(
TopicArn='arn:aws:sns:us-east-1:123456789012:security-alerts',
Subject=f'Macie {severity} 发现结果: {bucket}/{key}',
Message=json.dumps({
'bucket': bucket,
'key': key,
'severity': severity,
'sensitive_data_types': sensitive_types,
'finding_id': finding['id']
}, indent=2)
)
return {'statusCode': 200}
# 从管理账户执行
aws macie2 enable-organization-admin-account \
--admin-account-id 111111111111
# 从管理员账户执行
aws macie2 create-member \
--account '{"accountId": "222222222222", "email": "security@example.com"}'
aws macie2 get-usage-statistics \
--filter-by '[{"comparator": "GT", "key": "accountId", "values": []}]' \
--sort-by '{"key": "accountId", "orderBy": "ASC"}'
aws macie2 list-classification-jobs \
--filter-criteria '{"includes": [{"comparator": "EQ", "key": "jobStatus", "values": ["RUNNING"]}]}'