Skill

homer-motif-analysis

Performs de novo and known TF motif enrichment in ChIP-seq/ATAC-seq peaks using HOMER's findMotifsGenome.pl. Annotates peaks with nearest genes, TSS distances, and features via annotatePeaks.pl. Use after MACS3 for TF identification and validation.

Python

Bash

cli-tools

data-engineering

Popularity

Stars

200

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/sciagent-skills:homer-motif-analysis

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of Perl/C++ tools for analyzing genomic regulatory elements. Its two primary commands are `findMotifsGenome.pl`, which performs de novo motif discovery and known motif enrichment against JASPAR/HOMER databases, and `annotatePeaks.pl`, which maps each peak to the nearest gene, distance to TSS, and genomic feature class (promoter,...

SKILL.md

582 lines · ~5.9k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Stars200

Forks21

MaintenanceExcellent

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

HOMER — Motif Analysis and Peak Annotation

Overview

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of Perl/C++ tools for analyzing genomic regulatory elements. Its two primary commands are findMotifsGenome.pl, which performs de novo motif discovery and known motif enrichment against JASPAR/HOMER databases, and annotatePeaks.pl, which maps each peak to the nearest gene, distance to TSS, and genomic feature class (promoter, intron, intergenic, repeat). HOMER takes BED-format peak files from MACS3 or similar peak callers and a reference genome assembly as input, and outputs HTML/text reports ranking enriched motifs by p-value and fold enrichment over a matched background.

When to Use

Identifying which transcription factors are bound in a ChIP-seq peak set by enriching their known motifs from JASPAR or the HOMER motif library
Discovering novel sequence motifs de novo in open chromatin regions from ATAC-seq without prior knowledge of the binding TF
Comparing motif landscapes between two conditions (e.g., treated vs. untreated peak sets) by running HOMER with one set as target and the other as background
Annotating genomic peaks with nearest genes and distance to TSS for downstream functional analysis or integration with DESeq2 results
Validating ChIP-seq experiment quality: a successful pull-down should show the target TF's canonical motif as the top hit
Use macs3-peak-calling first to generate the peak BED files that serve as input to HOMER
Use jaspar-database to cross-reference HOMER-discovered motifs with JASPAR IDs and additional TF metadata
Use MEME-CHIP (web or local) when you need a more probabilistic ZOOPS/TCM model or the MEME Suite ecosystem
Use AME (part of MEME Suite) as a faster alternative for known motif scanning without de novo discovery

Prerequisites

Software: HOMER (Perl + compiled binaries), conda or manual install
Genomes: must download genome sequence via installGenome.pl after HOMER install
Input: BED file of peaks (at minimum: chr, start, end columns); ideally summit-centered peaks from MACS3
Python packages (for parsing/visualization): pandas, matplotlib, seaborn

Check before installing: The tool may already be available in the current environment (e.g., inside a pixi / conda env). Run command -v findMotifsGenome.pl first and skip the install commands below if it returns a path. When running inside a pixi project, invoke the tool via pixi run findMotifsGenome.pl rather than bare findMotifsGenome.pl.

# Install HOMER via conda (recommended — handles Perl dependencies)
conda install -c bioconda homer

# Verify installation
findMotifsGenome.pl 2>&1 | head -3
# Usage: findMotifsGenome.pl <peak/BED file> <genome> <output directory> [options]

annotatePeaks.pl 2>&1 | head -3
# Usage: annotatePeaks.pl <peak/BED file> <genome> [options]

# Install reference genomes (downloads 2-way masker + sequence; ~3–10 GB each)
installGenome.pl hg38
installGenome.pl mm10

# Install Python parsing dependencies
pip install pandas matplotlib seaborn

Quick Start

# Run de novo + known motif enrichment on TF ChIP-seq peaks (hg38, 200 bp window)
findMotifsGenome.pl peaks/tf_chip_summits.bed hg38 motif_output/ \
    -size 200 -mask -p 4

# Annotate peaks with nearest genes and genomic features
annotatePeaks.pl peaks/tf_chip_peaks.narrowPeak hg38 > annotated_peaks.txt

echo "Top known motif:"
head -2 motif_output/knownResults.txt | tail -1 | cut -f1-4

echo "Annotated peaks: $(wc -l < annotated_peaks.txt) lines"

Workflow

Step 1: Installation and Genome Setup

Install HOMER and download the reference genome sequence required for motif analysis.

# Activate conda environment (or use existing env)
conda create -n homer_env -c bioconda homer python=3.10 -y
conda activate homer_env

# List available genomes
installGenome.pl list

# Install human (hg38) and mouse (mm10) genomes
# Downloads masked genome sequence and annotation files
installGenome.pl hg38
# Output: Installing hg38... Done. (3-5 min, ~3 GB)

installGenome.pl mm10
# Output: Installing mm10... Done. (3-5 min, ~2.5 GB)

# Verify genome is installed
ls ~/.homer/data/genomes/hg38/
# genome.fa  chrom.sizes  ...

# Check HOMER motif database
ls ~/.homer/data/knownTFs/
# vertebrates.motifs  jaspar.motifs  ...

Step 2: Prepare Input Peak File

Prepare a summit-centered BED file from MACS3 output for optimal motif resolution.

# Option A: Use MACS3 summit file directly (already 1 bp summit positions)
# Expand summits to ±100 bp (200 bp total) centered on summit
awk 'BEGIN{OFS="\t"} {
    start = ($2 - 100 < 0) ? 0 : $2 - 100;
    print $1, start, $2 + 100, $4, $5
}' peaks/tf_chip_summits.bed > peaks/tf_chip_200bp.bed

echo "Summit-centered peaks: $(wc -l < peaks/tf_chip_200bp.bed)"
# Summit-centered peaks: 12453

# Option B: Use narrowPeak file directly (HOMER accepts multi-column BED)
# HOMER uses columns 1-3 (chr, start, end) and centers internally with -size
cp peaks/tf_chip_peaks.narrowPeak peaks/input_peaks.bed

# Option C: Prepare a custom background region file (matched GC content)
# HOMER auto-generates background if not provided, but explicit background
# is recommended when comparing two peak sets
# Use the control peak set or random genomic regions as background:
bedtools shuffle -i peaks/tf_chip_peaks.narrowPeak \
    -g ~/.homer/data/genomes/hg38/chrom.sizes \
    -excl peaks/tf_chip_peaks.narrowPeak > peaks/background_regions.bed

echo "Background regions: $(wc -l < peaks/background_regions.bed)"
# Background regions: 12453

Step 3: De Novo Motif Discovery

Run findMotifsGenome.pl for de novo motif discovery and known motif enrichment simultaneously.

mkdir -p motif_output/

# Full run: de novo + known motif enrichment
# -size 200: use 200 bp window centered on peak midpoint
# -mask: mask repetitive elements (recommended for clean motifs)
# -p 4: use 4 CPU threads
# -S 25: find top 25 de novo motifs (default)
findMotifsGenome.pl peaks/tf_chip_200bp.bed hg38 motif_output/ \
    -size 200 \
    -mask \
    -p 4 \
    -S 25

# Check progress output:
# Reading genome sizes for hg38 ...
# Scanning for motifs...
# Optimizing 25 motifs...
# Done! Output in motif_output/

echo "Known results: $(wc -l < motif_output/knownResults.txt) motifs"
echo "De novo motifs: $(ls motif_output/homerResults/*.motif 2>/dev/null | wc -l) motifs"
# Known results: 392 motifs
# De novo motifs: 25 motifs

# For mouse peaks (mm10)
# findMotifsGenome.pl peaks/atac_peaks.bed mm10 motif_output_mm10/ \
#     -size 200 -mask -p 4

Step 4: Known Motif Enrichment Only

Scan peaks for occurrences of a specific known motif or skip de novo discovery for speed.

# Skip de novo discovery (faster when you only need known motifs)
findMotifsGenome.pl peaks/tf_chip_200bp.bed hg38 motif_output_known/ \
    -size 200 \
    -mask \
    -p 4 \
    -nomotif

echo "Known motif results: $(wc -l < motif_output_known/knownResults.txt)"
# Known motif results: 392

# Find occurrences of a specific motif across peaks (outputs peak-level annotation)
# Extract the motif matrix file for the TF of interest from homerResults/
findMotifsGenome.pl peaks/tf_chip_200bp.bed hg38 motif_scan_out/ \
    -size 200 \
    -mask \
    -find motif_output/homerResults/motif1.motif \
    > peaks_with_motif1.txt

echo "Peaks containing motif1: $(wc -l < peaks_with_motif1.txt)"
# Peaks containing motif1: 8941

# Custom background: compare treated vs. control peak sets
findMotifsGenome.pl peaks/treated_peaks.bed hg38 motif_treated_vs_ctrl/ \
    -size 200 \
    -mask \
    -p 4 \
    -bg peaks/control_peaks.bed

Step 5: Peak Annotation

Use annotatePeaks.pl to assign each peak to a genomic feature and nearest gene.

# Annotate peaks with nearest gene and TSS distance
# Outputs a tab-delimited file with genomic context for each peak
annotatePeaks.pl peaks/tf_chip_peaks.narrowPeak hg38 \
    > annotated_peaks.txt

echo "Annotated peaks: $(($(wc -l < annotated_peaks.txt) - 1)) peaks"
# Annotated peaks: 12453 peaks

# Preview column headers and first peak
head -2 annotated_peaks.txt | cut -f1-10

# Annotate ATAC-seq peaks (same command, different input)
annotatePeaks.pl peaks/atac_sample_peaks.narrowPeak hg38 \
    > annotated_atac.txt

# Generate TSS-distance histogram (for tag density plots)
# annotatePeaks.pl can compute read density around peaks with -d flag
# annotatePeaks.pl tss hg38 -size 4000 -hist 10 \
#     -d chip_tagdir/ > tss_histogram.txt

Step 6: Parse HOMER Results with Python

Read knownResults.txt and de novo motif files into pandas for downstream analysis.

import pandas as pd
import subprocess
import io

# --- Parse known motif enrichment results ---
# knownResults.txt columns:
# Motif Name | Consensus | P-value | Log P-value | q-value | # Target Seqs w/ motif | % Target | # Bg Seqs w/ motif | % Bg
known_cols = [
    "motif_name", "consensus", "pvalue", "log_pvalue",
    "qvalue", "n_target_seqs", "pct_target",
    "n_bg_seqs", "pct_bg"
]
known = pd.read_csv(
    "motif_output/knownResults.txt",
    sep="\t", header=0, names=known_cols
)

# Convert string percentages to floats
known["pct_target"] = known["pct_target"].str.rstrip("%").astype(float)
known["pct_bg"] = known["pct_bg"].str.rstrip("%").astype(float)
known["fold_enrichment"] = known["pct_target"] / known["pct_bg"].replace(0, 0.001)
known["-log10_pvalue"] = -known["log_pvalue"] / 2.303  # log10 from natural log

print(f"Total known motifs tested: {len(known)}")
print(f"Significant (p < 1e-5): {(known['pvalue'].astype(float) < 1e-5).sum()}")
print("\nTop 5 enriched motifs:")
print(
    known[["motif_name", "pct_target", "pct_bg", "fold_enrichment", "-log10_pvalue"]]
    .head(5)
    .to_string(index=False)
)
# Total known motifs tested: 391
# Significant (p < 1e-5): 47
# Top 5 enriched motifs:
#  motif_name  pct_target  pct_bg  fold_enrichment  -log10_pvalue
#  CTCF(Zf)/GM12878-CTCF-ChIP-Seq...   78.21    8.32           9.40         312.4
#  ZNF143(Zf)/...                        42.11    5.10           8.26         188.7
#  ...

Step 7: Parse Peak Annotations and Visualize

Read annotated peak output, summarize genomic feature distribution, and plot motif enrichments.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns

# --- Parse annotated peaks ---
# annotatePeaks.pl output header: PeakID chr start end strand score ...
# Key columns: "Annotation", "Distance to TSS", "Nearest RefSeq", "Gene Name"
annot = pd.read_csv("annotated_peaks.txt", sep="\t", header=0, low_memory=False)
annot.columns = annot.columns.str.strip()

# Simplify annotation categories
def simplify_annotation(ann):
    if pd.isna(ann):
        return "Other"
    ann = str(ann)
    if "promoter" in ann.lower():
        return "Promoter (<2 kb)"
    elif "exon" in ann.lower():
        return "Exon"
    elif "intron" in ann.lower():
        return "Intron"
    elif "tts" in ann.lower() or "downstream" in ann.lower():
        return "Downstream"
    elif "intergenic" in ann.lower():
        return "Intergenic"
    else:
        return "Other"

annot["simple_ann"] = annot["Annotation"].apply(simplify_annotation)
ann_counts = annot["simple_ann"].value_counts()

print("Peak annotation summary:")
print(ann_counts.to_string())
# Peak annotation summary:
# Intron             5832
# Intergenic         3102
# Promoter (<2 kb)   2218
# Exon                891
# Downstream          287
# Other               123

# --- Plot 1: Annotation pie chart ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

colors = ["#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00", "#A65628"]
axes[0].pie(
    ann_counts.values,
    labels=ann_counts.index,
    colors=colors[:len(ann_counts)],
    autopct="%1.1f%%",
    startangle=140,
    textprops={"fontsize": 9}
)
axes[0].set_title("Peak Genomic Annotation\n(n=12,453 peaks)", fontsize=11)

# --- Plot 2: Top known motif enrichments ---
known = pd.read_csv("motif_output/knownResults.txt", sep="\t", header=0)
known.columns = [
    "motif_name", "consensus", "pvalue", "log_pvalue",
    "qvalue", "n_target_seqs", "pct_target", "n_bg_seqs", "pct_bg"
]
known["pct_target"] = known["pct_target"].str.rstrip("%").astype(float)
known["pct_bg"] = known["pct_bg"].str.rstrip("%").astype(float)
known["-log10_p"] = (-known["log_pvalue"].astype(float)) / 2.303
known["short_name"] = known["motif_name"].str.split("/").str[0]

top_motifs = known.head(15).copy()
sns.barplot(
    data=top_motifs,
    x="-log10_p",
    y="short_name",
    palette="Blues_r",
    ax=axes[1]
)
axes[1].set_xlabel("-log10(p-value)", fontsize=10)
axes[1].set_ylabel("Motif", fontsize=10)
axes[1].set_title("Top 15 Enriched Known Motifs\n(HOMER/JASPAR database)", fontsize=11)
axes[1].axvline(x=5, color="red", linestyle="--", linewidth=0.8, label="p=1e-5")
axes[1].legend(fontsize=8)

plt.tight_layout()
plt.savefig("homer_summary.png", dpi=150, bbox_inches="tight")
plt.close()
print("Saved: homer_summary.png")

Key Parameters

Parameter	Default	Range / Options	Effect
`-size`	`200`	50–1000	Window size (bp) centered on peak midpoint for motif search; 200 for TF ChIP-seq, 150 for ATAC-seq
`-mask`	off	flag	Mask repetitive elements (N-mask); strongly recommended to reduce false positives
`-p`	`1`	1–64	Number of parallel CPU threads; use 4–8 for typical datasets
`-S`	`25`	1–100	Number of de novo motifs to find; 25 is usually sufficient
`-nomotif`	off	flag	Skip de novo motif discovery; only perform known motif enrichment (10× faster)
`-bg`	auto-generated	BED file path	Custom background region file; auto-background is GC-matched if omitted
`-find`	—	`.motif` file path	Scan peaks for occurrences of a specific motif matrix; outputs per-peak annotation
`-mis`	`2`	0–4	Maximum mismatches allowed when matching known motifs
`-len`	`8,10,12`	comma-separated integers	Motif lengths to try for de novo search; adding `6` finds shorter motifs
`genome`	required	`hg38`, `mm10`, `hg19`, `dm6`, `ce11`, etc.	Reference genome assembly; must be installed with `installGenome.pl`

Common Recipes

Recipe 1: Python subprocess Wrapper for findMotifsGenome.pl

Run HOMER from a Python script and capture completion status.

import subprocess
import sys
from pathlib import Path

def run_homer_motifs(
    peaks_bed: str,
    genome: str,
    output_dir: str,
    size: int = 200,
    mask: bool = True,
    cpus: int = 4,
    n_motifs: int = 25,
    denovo: bool = True,
    bg_bed: str = None
) -> int:
    """Run findMotifsGenome.pl and return exit code."""
    Path(output_dir).mkdir(parents=True, exist_ok=True)
    cmd = [
        "findMotifsGenome.pl", peaks_bed, genome, output_dir,
        "-size", str(size),
        "-p", str(cpus),
        "-S", str(n_motifs),
    ]
    if mask:
        cmd.append("-mask")
    if not denovo:
        cmd.append("-nomotif")
    if bg_bed:
        cmd.extend(["-bg", bg_bed])

    print(f"Running: {' '.join(cmd)}")
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        print(f"HOMER stderr:\n{result.stderr[-2000:]}", file=sys.stderr)
        return result.returncode
    print(f"Done. Output in {output_dir}/")
    return 0

# Usage
exit_code = run_homer_motifs(
    peaks_bed="peaks/tf_chip_200bp.bed",
    genome="hg38",
    output_dir="motif_output/",
    size=200, mask=True, cpus=4, n_motifs=25
)
print(f"Exit code: {exit_code}")
# Running: findMotifsGenome.pl peaks/tf_chip_200bp.bed hg38 motif_output/ -size 200 -p 4 -S 25 -mask
# Done. Output in motif_output/
# Exit code: 0

Recipe 2: Batch Motif Analysis Across Multiple Peak Sets

Run HOMER on multiple ChIP-seq samples in a loop.

#!/bin/bash
# Batch motif analysis for several ChIP-seq experiments (same genome)
GENOME="hg38"
PEAK_DIR="peaks"
MOTIF_DIR="motif_results"
mkdir -p "$MOTIF_DIR"

SAMPLES=(CTCF H3K4me3 FOXA2 RUNX1)

for sample in "${SAMPLES[@]}"; do
    echo "=== Processing $sample ==="
    peak_file="${PEAK_DIR}/${sample}_summits.bed"

    if [ ! -f "$peak_file" ]; then
        echo "  Skipping $sample: $peak_file not found"
        continue
    fi

    # Center on summit ±100 bp
    awk 'BEGIN{OFS="\t"} {s=$2-100<0?0:$2-100; print $1,s,$2+100,$4,$5}' \
        "$peak_file" > "${PEAK_DIR}/${sample}_200bp.bed"

    findMotifsGenome.pl "${PEAK_DIR}/${sample}_200bp.bed" "$GENOME" \
        "${MOTIF_DIR}/${sample}/" \
        -size 200 -mask -p 4 -nomotif \
        2> "${MOTIF_DIR}/${sample}.log"

    n=$(wc -l < "${MOTIF_DIR}/${sample}/knownResults.txt")
    echo "  $sample: $n motifs tested. Top hit: $(sed -n '2p' "${MOTIF_DIR}/${sample}/knownResults.txt" | cut -f1)"
done
# === Processing CTCF ===
#   CTCF: 392 motifs tested. Top hit: CTCF(Zf)/GM12878-CTCF-ChIP-Seq(GSE32465)/Homer
# === Processing FOXA2 ===
#   FOXA2: 392 motifs tested. Top hit: Foxa2(Forkhead)/Liver-Foxa2-ChIP-Seq(GSE25694)/Homer

Recipe 3: Parse De Novo Motif Matrices from homerResults/

Load de novo motif PWM matrices for downstream comparison or plotting.

import os
import re
import pandas as pd

def parse_homer_motif(motif_file: str) -> dict:
    """Parse a HOMER .motif file into name, log_odds_threshold, and PWM."""
    with open(motif_file) as f:
        header = f.readline().strip()  # >motif_name\tlog_odds\tlog_p-value\t0\tnucs
        rows = []
        for line in f:
            line = line.strip()
            if line:
                rows.append([float(x) for x in line.split("\t")])
    parts = header.lstrip(">").split("\t")
    name = parts[0]
    log_odds = float(parts[1]) if len(parts) > 1 else 0.0
    log_p = float(parts[2]) if len(parts) > 2 else 0.0
    pwm = pd.DataFrame(rows, columns=["A", "C", "G", "T"])
    return {"name": name, "log_odds": log_odds, "log_p": log_p, "pwm": pwm}

# Load all de novo motifs
motif_dir = "motif_output/homerResults/"
motifs = []
for fn in sorted(os.listdir(motif_dir)):
    if fn.endswith(".motif"):
        motif = parse_homer_motif(os.path.join(motif_dir, fn))
        motifs.append(motif)
        print(f"{fn}: {motif['name']} ({len(motif['pwm'])} positions, log_p={motif['log_p']:.1f})")

print(f"\nLoaded {len(motifs)} de novo motifs")
# motif1.motif: CTCF-motif (19 positions, log_p=-8234.1)
# motif2.motif: CTCFL-motif (17 positions, log_p=-3421.7)
# ...
# Loaded 25 de novo motifs

Recipe 4: Annotate Peaks and Join with Differential Expression Results

Combine peak annotations with RNA-seq DE results to find regulated genes near peaks.

import pandas as pd

# Load annotated peaks
annot = pd.read_csv("annotated_peaks.txt", sep="\t", header=0, low_memory=False)
annot.columns = annot.columns.str.strip()

# Key columns from HOMER annotation
peak_genes = annot[["PeakID (cmd=annotatePeaks.pl peaks.bed hg38)",
                     "Chr", "Start", "End",
                     "Annotation", "Distance to TSS",
                     "Nearest RefSeq", "Gene Name"]].copy()
peak_genes.columns = ["peak_id", "chr", "start", "end",
                      "annotation", "tss_dist", "refseq", "gene_name"]

# Filter promoter-proximal peaks (within 2 kb of TSS)
promoter_peaks = peak_genes[peak_genes["tss_dist"].abs() < 2000].copy()
print(f"Promoter-proximal peaks (|TSS| < 2kb): {len(promoter_peaks)}")
# Promoter-proximal peaks (|TSS| < 2kb): 2218

# Load DESeq2 results (gene_name, log2FC, padj)
de_results = pd.read_csv("deseq2_results.csv")
de_sig = de_results[de_results["padj"] < 0.05].copy()

# Merge: find DE genes with a nearby peak
merged = promoter_peaks.merge(de_sig, on="gene_name", how="inner")
print(f"DE genes with promoter-proximal peak: {len(merged)}")
print(merged[["gene_name", "tss_dist", "annotation", "log2FC", "padj"]].head())
# DE genes with promoter-proximal peak: 347

Expected Outputs

Output	Format	Description
`motif_output/knownResults.txt`	TSV	All known motifs tested: name, p-value, q-value, % target, % background; primary result file
`motif_output/knownResults.html`	HTML	Interactive HTML report with motif logos, statistics, and links
`motif_output/homerResults/`	Directory	De novo motifs: `motifN.motif` (PWM matrix), `motifN.logo.png`, `similar.motifs.txt`
`motif_output/homerResults.html`	HTML	Interactive HTML report for de novo motifs
`annotated_peaks.txt`	TSV	One row per peak: chr, start, end, annotation, TSS distance, nearest gene, RefSeq ID
`peaks_with_motif1.txt`	TSV	Per-peak motif occurrence scores (from `-find` mode)

Troubleshooting

Problem	Cause	Solution
`ERROR: Genome not found`	Genome not installed via `installGenome.pl`	Run `installGenome.pl hg38`; confirm `~/.homer/data/genomes/hg38/` exists
`command not found: findMotifsGenome.pl`	HOMER binaries not in `$PATH`	Add HOMER bin directory to PATH: `export PATH=~/.homer/bin:$PATH` (or activate conda env)
Poor de novo motif quality (GC-rich artifacts)	Unmasked repetitive elements	Always use `-mask`; also try increasing `-size` to 300 or reducing to 150
No significant known motifs (all p > 0.01)	Too few peaks or wrong peak size	Ensure ≥500 peaks; check `-size` matches expected binding footprint (150–200 for TFs)
Conda HOMER conflicts (Perl module errors)	HOMER's Perl scripts conflict with conda environment Perl	Install HOMER in a dedicated conda env: `conda create -n homer_only -c bioconda homer`
`annotatePeaks.pl` gives all "Intergenic"	Genome annotation not installed	HOMER annotation requires `installGenome.pl` which downloads GTF; check `~/.homer/data/genomes/hg38/` contains `*.ann` files
Very slow run time (>2 hr)	Single-threaded or large peak set	Add `-p 8`; reduce `-S` to 10 for speed; use `-nomotif` for known-only runs
`-bg` custom background gives poor motifs	Background regions have very different GC content	Use a GC-matched background; run `bedtools shuffle` with `-noOverlapping` to generate random same-size regions

References

HOMER official documentation — comprehensive guide to all HOMER commands, parameters, and output formats
Heinz S et al. (2010) "Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities." Molecular Cell 38(4):576–589. DOI:10.1016/j.molcel.2010.05.004
HOMER GitHub: samtools/homer — source code and issue tracker
JASPAR 2024 database — curated TF binding motif database used by HOMER for known motif enrichment

homer-motif-analysis

Popularity

Invocation

Context Preview

SKILL.md

homer-motif-analysis

Popularity

Invocation

Context Preview

SKILL.md

HOMER — Motif Analysis and Peak Annotation

Overview

When to Use

Prerequisites

Quick Start

Workflow

Step 1: Installation and Genome Setup

Step 2: Prepare Input Peak File

Step 3: De Novo Motif Discovery

Step 4: Known Motif Enrichment Only

Step 5: Peak Annotation

Step 6: Parse HOMER Results with Python

Step 7: Parse Peak Annotations and Visualize

Key Parameters

Common Recipes

Recipe 1: Python subprocess Wrapper for findMotifsGenome.pl

Recipe 2: Batch Motif Analysis Across Multiple Peak Sets

Recipe 3: Parse De Novo Motif Matrices from homerResults/

Recipe 4: Annotate Peaks and Join with Differential Expression Results

Expected Outputs

Troubleshooting

References

Similar Skills

HOMER — Motif Analysis and Peak Annotation

Overview

When to Use

Prerequisites

Quick Start

Workflow

Step 1: Installation and Genome Setup

Step 2: Prepare Input Peak File

Step 3: De Novo Motif Discovery

Step 4: Known Motif Enrichment Only

Step 5: Peak Annotation

Step 6: Parse HOMER Results with Python

Step 7: Parse Peak Annotations and Visualize

Key Parameters

Common Recipes

Recipe 1: Python subprocess Wrapper for findMotifsGenome.pl

Recipe 2: Batch Motif Analysis Across Multiple Peak Sets

Recipe 3: Parse De Novo Motif Matrices from homerResults/

Recipe 4: Annotate Peaks and Join with Differential Expression Results

Expected Outputs

Troubleshooting

References

Similar Skills