Skill

performing-malware-triage-with-yara

Triages and classifies malware samples using YARA rules to match strings, byte sequences, file patterns, and structures. Guides rule creation, scanning, and workflow integration for signature-based detection.

Python

Bash

security

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/cybersecurity-skills-zh:performing-malware-triage-with-yara

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

- 快速将大批恶意软件样本与已知家族签名进行匹配分类

Supporting Files

LICENSEreferences/api-reference.mdscripts/agent.py

SKILL.md

362 lines · ~2.2k tokens

Stats

LanguagePython

Stars13

Forks2

MaintenanceExcellent

Last CommitApr 28, 2026

Actions

View Source View Plugin View on GitHub View README

使用 YARA 进行恶意软件分级

适用场景

快速将大批恶意软件样本与已知家族签名进行匹配分类
基于唯一字节模式为新分析的恶意软件家族编写检测规则
扫描文件共享、终端或内存转储以查找特定威胁的指标
构建自动化分级流程，在人工分析前先对样本进行分类
使用 YARA 扫描在企业范围内追踪已知威胁的变种

不适用作为唯一分析方法；YARA 分级可识别已知模式，但无法揭示新型或未知恶意软件的行为。

前置条件

YARA 4.x（apt install yara 或 pip install yara-python）
YARA 规则仓库（YARA-Rules、awesome-yara、Malpedia rules、Florian Roth 的 signature-base）
Python 3.8+，配合 yara-python 进行脚本化扫描
样本集合按目录结构组织，用于批量扫描
了解 PE 文件格式、十六进制模式和正则表达式，用于规则编写

工作流程

步骤 1：使用现有规则集扫描样本

应用社区和商业 YARA 规则对样本进行分类：

# 扫描单个文件
yara -s malware_rules.yar suspect.exe

# 扫描样本目录
yara -r malware_rules.yar /path/to/samples/

# 使用多个规则文件扫描
yara -r rules/apt_rules.yar rules/ransomware_rules.yar rules/trojan_rules.yar suspect.exe

# 设置超时（防止大文件卡死）
yara -t 30 malware_rules.yar suspect.exe

# 扫描并显示匹配字符串
yara -s -r malware_rules.yar suspect.exe

# 使用编译规则扫描（重复扫描时速度更快）
yarac malware_rules.yar compiled_rules.yarc
yara compiled_rules.yarc suspect.exe

# 下载社区规则集
git clone https://github.com/Yara-Rules/rules.git yara-community-rules
git clone https://github.com/Neo23x0/signature-base.git signature-base

# 使用 signature-base 扫描
yara -r signature-base/yara/*.yar suspect.exe

步骤 2：编写基于唯一字符串模式的规则

根据恶意软件分析中提取的字符串创建 YARA 规则：

rule MalwareX_Strings {
    meta:
        description = "Detects MalwareX based on unique strings"
        author = "analyst"
        date = "2025-09-15"
        reference = "Internal Analysis Report #1547"
        hash = "e3b0c44298fc1c149afbf4c8996fb924"
        tlp = "WHITE"

    strings:
        // C2 URL 模式
        $url1 = "/gate.php?id=" ascii
        $url2 = "/panel/connect.php" ascii

        // 唯一互斥体名称
        $mutex = "Global\\CryptLocker_2025" ascii wide

        // User-Agent 字符串
        $ua = "Mozilla/5.0 (compatible; MSIE 10.0)" ascii

        // 注册表持久化路径
        $reg = "Software\\Microsoft\\Windows\\CurrentVersion\\Run\\WindowsUpdate" ascii

        // 活动标识符
        $campaign = "campaign_2025_q3" ascii

    condition:
        uint16(0) == 0x5A4D and      // PE 文件（MZ 头）
        filesize < 500KB and          // 大小限制
        ($url1 or $url2) and          // 至少一个 C2 URL
        ($mutex or $campaign) and     // 活动标识符
        $ua                           // 特定 User-Agent
}

步骤 3：编写基于字节模式的规则

创建匹配特定代码序列的规则：

rule MalwareX_Decryptor {
    meta:
        description = "Detects MalwareX XOR decryption routine"
        author = "analyst"
        date = "2025-09-15"

    strings:
        // XOR 解密循环（x86 汇编）
        // mov al, [esi+ecx]
        // xor al, [edi+ecx]
        // mov [esi+ecx], al
        // inc ecx
        // cmp ecx, edx
        // jl loop
        $xor_loop = { 8A 04 0E 32 04 0F 88 04 0E 41 3B CA 7C F3 }

        // RC4 KSA 初始化（256 字节循环）
        $rc4_ksa = { 33 C0 88 04 ?8 40 3D 00 01 00 00 7? }

        // 嵌入的 RSA 公钥标记
        $rsa_key = { 06 02 00 00 00 A4 00 00 52 53 41 31 }  // PUBLICKEYBLOB

    condition:
        uint16(0) == 0x5A4D and
        ($xor_loop or $rc4_ksa) and
        $rsa_key
}

步骤 4：使用 PE 模块编写规则

利用 YARA 的 PE 模块进行结构化检测：

import "pe"
import "hash"
import "math"

rule MalwareX_PE_Characteristics {
    meta:
        description = "Detects MalwareX by PE structure and imports"
        author = "analyst"

    condition:
        pe.is_pe and

        // 在特定时间范围内编译
        pe.timestamp > 1693526400 and   // 2023-09-01 之后
        pe.timestamp < 1727740800 and   // 2024-10-01 之前

        // 特定导入哈希
        pe.imphash() == "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6" or

        // 可疑导入组合
        (
            pe.imports("kernel32.dll", "VirtualAllocEx") and
            pe.imports("kernel32.dll", "WriteProcessMemory") and
            pe.imports("kernel32.dll", "CreateRemoteThread") and
            pe.imports("wininet.dll", "InternetOpenA")
        ) or

        // 高熵 .text 节（已打包）
        (
            for any section in pe.sections : (
                section.name == ".text" and
                math.entropy(section.raw_data_offset, section.raw_data_size) > 7.0
            )
        )
}

rule MalwareX_Rich_Header {
    meta:
        description = "Detects MalwareX by Rich header hash"

    condition:
        pe.is_pe and
        hash.md5(pe.rich_signature.clear_data) == "abc123def456abc123def456abc123de"
}

步骤 5：使用 Python 进行批量分级

自动化扫描样本集合：

import yara
import os
import json
import hashlib
from datetime import datetime

# 编译所有规则文件
rule_files = {
    "apt": "rules/apt_rules.yar",
    "ransomware": "rules/ransomware_rules.yar",
    "trojan": "rules/trojan_rules.yar",
    "custom": "rules/custom_rules.yar",
}
rules = yara.compile(filepaths=rule_files)

# 扫描样本目录
results = []
sample_dir = "/path/to/samples"

for filename in os.listdir(sample_dir):
    filepath = os.path.join(sample_dir, filename)
    if not os.path.isfile(filepath):
        continue

    with open(filepath, "rb") as f:
        data = f.read()
        sha256 = hashlib.sha256(data).hexdigest()

    matches = rules.match(filepath)

    result = {
        "filename": filename,
        "sha256": sha256,
        "size": len(data),
        "matches": [],
        "classification": "UNKNOWN",
    }

    for match in matches:
        result["matches"].append({
            "rule": match.rule,
            "namespace": match.namespace,
            "tags": match.tags,
            "strings": [(hex(s[0]), s[1], s[2].decode("utf-8", errors="replace")[:100])
                       for s in match.strings] if match.strings else []
        })

    if result["matches"]:
        result["classification"] = result["matches"][0]["namespace"].upper()

    results.append(result)

# 汇总
classified = sum(1 for r in results if r["classification"] != "UNKNOWN")
print(f"已扫描：{len(results)} 个样本")
print(f"已分类：{classified} 个（{classified/len(results)*100:.1f}%）")
print(f"未知：{len(results)-classified} 个")

# 导出结果
with open("triage_results.json", "w") as f:
    json.dump(results, f, indent=2)

步骤 6：验证并优化规则

测试规则的误报率和性能：

# 检查规则语法
yara -C custom_rules.yar

# 扫描已知干净目录以检查误报
yara -r custom_rules.yar /path/to/clean_files/ > false_positives.txt
wc -l false_positives.txt

# 基准测试规则性能
time yara -r custom_rules.yar /path/to/large_sample_collection/

# 分析单个规则性能
yara -p custom_rules.yar suspect.exe

核心概念

术语	定义
YARA 规则	模式匹配规则，定义字符串、字节序列和条件，用于识别特定文件或恶意软件家族
条件（Condition）	将字符串匹配、文件属性和模块函数组合的布尔表达式，用于判断规则是否匹配
十六进制字符串（Hex String）	带有可选通配符（??）和跳转（[N-M]）的字节模式，用于匹配机器码或二进制数据
PE 模块	YARA 模块，提供对 PE 文件属性（导入、节、时间戳、资源）的访问，用于结构化匹配
Imphash	PE 文件导入表的 MD5 哈希；同一家族的样本通常共享相同的导入哈希
Rich Header	PE 文件中未记录的结构，包含编译器/链接器元数据；在恶意软件构建环境中保持一致
YARA-C	编译后的 YARA 规则格式，通过预编译规则加快重复扫描速度

工具与系统

YARA：模式匹配引擎，基于文本、十六进制和结构模式识别和分类恶意软件
yara-python：YARA 的 Python 绑定，支持脚本化扫描、规则编译及与分析流程集成
yarGen：自动 YARA 规则生成器，通过识别恶意软件样本中的唯一字符串和操作码创建规则
YARA-Rules（GitHub）：社区维护的 YARA 规则仓库，涵盖恶意软件家族、漏洞利用和可疑指标
Malpedia YARA：来自 Malpedia 恶意软件百科的精选 YARA 规则，提供高质量的家族专属规则

常见场景

场景：为新恶意软件家族创建检测规则

背景：对新恶意软件样本的逆向工程已识别出唯一字符串、字节模式和 PE 特征。需要 YARA 规则用于企业范围内的追踪和持续检测。

方法：

从解包的二进制文件中提取唯一字符串（C2 URL、互斥体名称、注册表路径）
从加密例程或 C2 协议中识别唯一字节序列（来自 Ghidra 分析）
记录 PE 特征（imphash、Rich header 哈希、节名称、编译时间戳范围）
编写结合字符串、字节模式和 PE 模块条件的 YARA 规则
对已知恶意软件样本测试以确认真阳性检测
对干净文件语料库（Windows 系统文件、常用应用程序）测试以验证零误报
部署到企业扫描基础设施和威胁情报平台

注意事项：

不要编写过于针对单一样本的规则（细微变化的变种将无法检测到）
不要编写过于宽泛的规则（可能匹配合法软件，导致误报）
不要使用出现在常见库或框架中的字符串（如 OpenSSL 字符串）
不要在部署前对足够大的干净语料库进行测试

输出格式

YARA 分级结果
=====================
扫描日期：        2025-09-15
规则集：          apt_rules（847 个规则）、ransomware_rules（312 个规则）、
                  trojan_rules（1,204 个规则）、custom_rules（45 个规则）
已扫描样本：      2,500 个
处理时间：        47 秒

分类汇总
APT：             12 个样本（0.5%）
勒索软件：        187 个样本（7.5%）
木马：            423 个样本（16.9%）
未知：            1,878 个样本（75.1%）

命中频率最高的规则
规则                         命中数  家族
MalwareX_C2_Beacon           45      MalwareX
LockBit3_Ransom_Note         38      LockBit 3.0
Emotet_Epoch5_Loader         32      Emotet
CobaltStrike_Beacon_Config   28      Cobalt Strike
QakBot_DLL_Loader            25      QakBot

样本详情
文件：    suspect.exe
SHA-256： e3b0c44298fc1c149afbf4c8996fb924...
命中规则：
  [1] MalwareX_Strings（custom）
      - $url1 位于 0x4A20："/gate.php?id="
      - $mutex 位于 0x5100："Global\\CryptLocker_2025"
  [2] MalwareX_Decryptor（custom）
      - $xor_loop 位于 0x401200：{ 8A 04 0E 32 04 0F ... }
  [3] MalwareX_PE_Characteristics（custom）
      - PE 导入组合匹配
分类结果：MALWAREX（高置信度）

performing-malware-triage-with-yara

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

performing-malware-triage-with-yara

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

使用 YARA 进行恶意软件分级

适用场景

前置条件

工作流程

步骤 1：使用现有规则集扫描样本

步骤 2：编写基于唯一字符串模式的规则

步骤 3：编写基于字节模式的规则

步骤 4：使用 PE 模块编写规则

步骤 5：使用 Python 进行批量分级

步骤 6：验证并优化规则

核心概念

工具与系统

常见场景

场景：为新恶意软件家族创建检测规则

输出格式

Similar Skills

使用 YARA 进行恶意软件分级

适用场景

前置条件

工作流程

步骤 1：使用现有规则集扫描样本

步骤 2：编写基于唯一字符串模式的规则

步骤 3：编写基于字节模式的规则

步骤 4：使用 PE 模块编写规则

步骤 5：使用 Python 进行批量分级

步骤 6：验证并优化规则

核心概念

工具与系统

常见场景

场景：为新恶意软件家族创建检测规则

输出格式

Similar Skills