Help us improve
Share bugs, ideas, or general feedback.
From encode-toolkit
Guides integration of JASPAR PWMs with ENCODE ChIP-seq peaks to validate TF binding targets, discover co-factors, and scan motifs in regulatory regions.
npx claudepluginhub ammawla/encode-toolkit --plugin encode-toolkitHow this skill is triggered — by the user, by Claude, or both
Slash command
/encode-toolkit:jaspar-motifsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Integrate JASPAR position weight matrices (PWMs) with ENCODE ChIP-seq peaks to validate TF binding targets, discover co-binding partners, and scan regulatory elements for TF binding potential.
Guides integration of JASPAR PWMs with ENCODE ChIP-seq peaks to validate TF binding targets, discover co-factors, and scan motifs in regulatory regions.
Queries JASPAR for TF binding profiles (PWMs/PFMs). Searches by name, species, or class; scans DNA for binding sites; compares matrices. For regulatory genomics and GWAS variant interpretation.
Retrieves JASPAR 2024 TF binding profiles (PFMs/PWMs) via REST API and pyJASPAR. Scans DNA for TFBS; queries by TF name, ID, species, or family. For motif enrichment, TFBS scanning, regulatory analysis.
Share bugs, ideas, or general feedback.
Integrate JASPAR position weight matrices (PWMs) with ENCODE ChIP-seq peaks to validate TF binding targets, discover co-binding partners, and scan regulatory elements for TF binding potential.
The question: "Does the expected TF binding motif appear in my ENCODE ChIP-seq peaks, and what other TF motifs are enriched?"
ENCODE TF ChIP-seq experiments identify where a transcription factor binds in the genome, but the peak coordinates alone do not confirm direct DNA binding or reveal the binding sequence specificity. JASPAR provides curated position weight matrices (PWMs) — mathematical representations of TF binding preferences — that enable two critical analyses:
Target validation: If CTCF ChIP-seq peaks are enriched for the CTCF motif (JASPAR MA0139.1), the experiment worked correctly. If they are NOT enriched, something may be wrong with the antibody, crosslinking, or peak calling.
Co-factor discovery: Motif enrichment analysis in ChIP-seq peaks often reveals motifs for co-binding TFs that were not the ChIP target, uncovering regulatory complexes.
| ENCODE provides | JASPAR provides | Together |
|---|---|---|
| Where a TF binds (peak coordinates) | How a TF recognizes DNA (binding motif) | Validated binding sites with sequence specificity |
| TF binding in specific tissues | Universal binding preferences | Tissue-specific motif usage |
| Co-occupancy data (multiple ChIP-seq) | Co-factor motif profiles | Regulatory complex architecture |
| Chromatin context (accessibility, marks) | Motif sequence requirements | Context-dependent binding rules |
| Scenario | How JASPAR Helps |
|---|---|
| Validating ENCODE TF ChIP-seq | Check if target TF motif is enriched in peaks |
| Finding co-binding TFs | Scan peaks for additional enriched motifs |
| Interpreting ENCODE enhancers | Identify which TFs can bind enhancer sequences |
| Variant in TF binding site | Check if variant disrupts a JASPAR motif |
| Comparing TF binding across tissues | Determine if same motif is used in different contexts |
| Planning CRISPR validation | Identify core motif bases to mutate |
Base URL: https://jaspar.genereg.net/api/v1/
No authentication required. Responses are JSON.
| Endpoint | Purpose | Key Parameters |
|---|---|---|
/matrix/ | List/search all profiles | name, collection, tax_group, tf_class |
/matrix/{id}/ | Get specific profile | Matrix ID (e.g., MA0139.1) |
/matrix/{id}/?format=pfm | Get PFM (counts) | — |
/matrix/{id}/?format=pwm | Get PWM (log-odds) | — |
/matrix/{id}/?format=jaspar | Get JASPAR format | — |
/matrix/{id}/?format=meme | Get MEME format | Ready for FIMO scanning |
/taxon/ | List taxonomic groups | — |
/tfclass/ | List TF structural classes | — |
| TF | JASPAR ID | Class | Notes |
|---|---|---|---|
| CTCF | MA0139.1 | C2H2 zinc finger | Most common ENCODE TF ChIP-seq target |
| TP53 (p53) | MA0106.3 | p53 family | Tumor suppressor |
| SP1 | MA0079.5 | C2H2 zinc finger | GC-rich promoter binding |
| FOXA1 | MA0148.4 | Forkhead | Pioneer factor |
| FOXA2 | MA0047.3 | Forkhead | Liver, pancreas |
| HNF4A | MA0114.4 | Nuclear receptor | Hepatocyte-enriched |
| NRF1 | MA0506.2 | bZIP | Mitochondrial regulation |
| REST (NRSF) | MA0138.2 | C2H2 zinc finger | Neuronal gene repressor |
| MYC | MA0147.3 | bHLH | Oncogene, E-box binding |
| JUN (AP-1) | MA0488.1 | bZIP | Immediate early response |
| GATA4 | MA0482.2 | GATA | Cardiac, endoderm |
| PAX6 | MA0069.1 | Paired box | Eye, brain development |
# Find TF ChIP-seq experiments
encode_search_experiments(
assay_title="TF ChIP-seq",
target="CTCF",
organ="pancreas",
biosample_type="tissue"
)
# Get IDR thresholded peaks (highest confidence)
encode_list_files(
experiment_accession="ENCSR...",
file_format="bed",
output_type="IDR thresholded peaks",
assembly="GRCh38",
preferred_default=True
)
Track the experiment:
encode_track_experiment(accession="ENCSR...", notes="CTCF ChIP-seq for motif analysis")
import requests
def get_jaspar_matrix(tf_name, tax_group="vertebrates", collection="CORE"):
"""Get JASPAR matrix for a TF."""
url = "https://jaspar.genereg.net/api/v1/matrix/"
params = {
"name": tf_name,
"tax_group": tax_group,
"collection": collection,
"format": "json"
}
response = requests.get(url, params=params)
results = response.json()["results"]
if results:
# Return the highest-version profile
return sorted(results, key=lambda x: x["version"], reverse=True)[0]
return None
ctcf_profile = get_jaspar_matrix("CTCF")
print(f"ID: {ctcf_profile['matrix_id']}, Version: {ctcf_profile['version']}")
def get_pfm(matrix_id):
"""Get Position Frequency Matrix from JASPAR."""
url = f"https://jaspar.genereg.net/api/v1/matrix/{matrix_id}/"
params = {"format": "json"}
response = requests.get(url, params=params)
data = response.json()
pfm = data["pfm"]
# pfm is a dict with keys A, C, G, T, each a list of counts per position
return pfm
pfm = get_pfm("MA0139.1")
print(f"Motif length: {len(pfm['A'])} bp")
for base in ["A", "C", "G", "T"]:
print(f"{base}: {pfm[base]}")
def get_meme_format(matrix_id):
"""Get motif in MEME format for use with FIMO."""
url = f"https://jaspar.genereg.net/api/v1/matrix/{matrix_id}/"
params = {"format": "meme"}
response = requests.get(url, params=params)
return response.text
meme_motif = get_meme_format("MA0139.1")
# Save to file for FIMO input
with open("ctcf_motif.meme", "w") as f:
f.write(meme_motif)
Before scanning for motifs, extract the DNA sequences underlying ENCODE peaks.
# Download reference genome (if not available)
# GRCh38: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
# Extract sequences from ENCODE peak regions
# Use summit +/- 100bp for narrow peaks (better motif enrichment)
awk 'BEGIN{OFS="\t"} {mid=int(($2+$3)/2); print $1, mid-100, mid+100, $4}' \
encode_peaks.bed > peak_summits_200bp.bed
bedtools getfasta \
-fi hg38.fa \
-bed peak_summits_200bp.bed \
-fo peak_sequences.fa
For motif analysis, center on peak summits (column 10 in narrowPeak format):
# narrowPeak summit is relative to peak start (column 10)
awk 'BEGIN{OFS="\t"} {summit=$2+$10; print $1, summit-100, summit+100, $4}' \
encode_narrowpeak.bed > summit_regions.bed
FIMO scans sequences for individual occurrences of a given motif.
# Scan peak sequences for CTCF motif
fimo --thresh 1e-4 \
--oc fimo_output/ \
ctcf_motif.meme \
peak_sequences.fa
# Output: fimo_output/fimo.tsv with columns:
# motif_id, motif_alt_id, sequence_name, start, stop, strand, score, p-value, q-value, matched_sequence
HOMER findMotifsGenome.pl is the most popular tool for ChIP-seq motif analysis.
# Known motif enrichment
findMotifsGenome.pl \
encode_peaks.bed \
hg38 \
homer_output/ \
-size 200 \
-mask \
-p 4
# Output includes:
# knownResults.html — enrichment of known motifs (including JASPAR)
# homerResults.html — de novo discovered motifs
# Full motif analysis pipeline
meme-chip \
-oc meme_output/ \
-db JASPAR2022_CORE_vertebrates.meme \
peak_sequences.fa
| Enrichment Result | Interpretation |
|---|---|
| Target TF motif ranked #1 with p < 1e-100 | Excellent — ChIP-seq validated |
| Target TF motif ranked #1 but p only 1e-5 | Moderate — may indicate indirect binding or weak motif |
| Target TF motif NOT in top 10 | Concerning — may indicate antibody cross-reactivity, indirect binding, or wrong motif version |
| Unexpected TF motif ranked #1 | May indicate co-factor or pioneer factor binding |
| Multiple TF motifs highly enriched | Regulatory hub — multiple TFs co-bind |
When a variant falls within an ENCODE TF ChIP-seq peak, check whether it disrupts the underlying motif.
import numpy as np
def score_sequence(sequence, pwm):
"""Score a sequence against a PWM (log-odds)."""
base_to_idx = {"A": 0, "C": 1, "G": 2, "T": 3}
score = 0
for i, base in enumerate(sequence.upper()):
if base in base_to_idx:
score += pwm[base][i]
return score
def variant_motif_impact(ref_seq, alt_seq, pwm):
"""Calculate motif score change from variant."""
ref_score = score_sequence(ref_seq, pwm)
alt_score = score_sequence(alt_seq, pwm)
delta = alt_score - ref_score
return {
"ref_score": ref_score,
"alt_score": alt_score,
"delta_score": delta,
"disrupted": delta < -2 # Threshold: >2 log-odds decrease
}
library(motifbreakR)
library(BSgenome.Hsapiens.UCSC.hg38)
# Define variant
variant <- snps.from.rsid(
rsid = "rs7903146",
dbSNP = SNPlocs.Hsapiens.dbSNP155.GRCh38
)
# Scan against JASPAR motifs
results <- motifbreakR(
snpList = variant,
filterp = TRUE,
pwmList = MotifDb,
threshold = 1e-4,
method = "log",
bkg = c(A=0.25, C=0.25, G=0.25, T=0.25),
show.neutral = FALSE
)
ENCODE often has ChIP-seq for multiple TFs in the same biosample. JASPAR motifs can reveal co-binding logic.
encode_search_experiments(
assay_title="TF ChIP-seq",
biosample_term_name="K562"
)
TF cooperativity often requires specific motif spacing:
# Find GATA and TAL1 motifs in co-occupied peaks
fimo --thresh 1e-4 gata_motif.meme cooccupied_peaks.fa > gata_hits.tsv
fimo --thresh 1e-4 tal1_motif.meme cooccupied_peaks.fa > tal1_hits.tsv
# Analyze spacing between GATA and TAL1 motifs in same peaks
# Characteristic spacing indicates cooperative binding
| TF ChIP-seq | Target Motif | JASPAR ID | Enrichment p-value | % Peaks with Motif | Validation |
|---|---|---|---|---|---|
| CTCF | CTCF | MA0139.1 | 1e-2456 | 78% | Strong |
| FOXA2 | FOXA2 | MA0047.3 | 1e-345 | 45% | Strong |
| HNF4A | HNF4A | MA0114.4 | 1e-189 | 52% | Strong |
| EP300 | No specific motif | — | — | — | Expected (coactivator) |
| TF ChIP-seq Target | Unexpected Enriched Motif | JASPAR ID | p-value | Interpretation |
|---|---|---|---|---|
| HNF4A in liver | FOXA2 | MA0047.3 | 1e-78 | Known co-binding at liver enhancers |
| GATA1 in K562 | TAL1 | MA0140.2 | 1e-234 | Erythroid TF complex |
| TP53 | SP1 | MA0079.5 | 1e-12 | Co-regulation at GC-rich promoters |
Goal: Scan ENCODE-defined enhancer peaks for enriched transcription factor binding motifs using the JASPAR database to predict which TFs regulate enhancer activity. Context: ENCODE H3K27ac peaks mark active enhancers, but don't reveal which TFs bind there. JASPAR motif scanning predicts TF occupancy.
encode_search_experiments(assay_title="Histone ChIP-seq", organ="liver", target="H3K27ac", organism="Homo sapiens")
Expected output:
{
"total": 6,
"results": [
{"accession": "ENCSR100LIV", "assay_title": "Histone ChIP-seq", "target": "H3K27ac", "biosample_summary": "liver"}
]
}
encode_list_files(accession="ENCSR100LIV", file_format="bed", output_type="IDR thresholded peaks", assembly="GRCh38")
Expected output:
{
"files": [
{"accession": "ENCFF150ENH", "output_type": "IDR thresholded peaks", "file_format": "bed narrowPeak", "file_size_mb": 1.1}
]
}
Using JASPAR REST API (via skill guidance):
GET https://jaspar.elixir.no/api/v2/matrix/?tax_id=9606&collection=CORE&profile_class=liver
Key liver TFs and their JASPAR matrix IDs:
# Extract sequences under peaks
bedtools getfasta -fi GRCh38.fa -bed ENCFF150ENH.bed -fo enhancer_seqs.fa
# Scan with MEME/FIMO using JASPAR motifs
fimo --thresh 1e-4 JASPAR_liver_motifs.meme enhancer_seqs.fa > motif_hits.tsv
Interpretation: If HNF4A motifs are enriched 5× over background in liver enhancers, this confirms HNF4A as a key driver. Unexpected motifs (e.g., TP53) may suggest stress response enhancers.
encode_search_experiments(assay_title="TF ChIP-seq", organ="liver", target="HNF4A", organism="Homo sapiens")
Expected output:
{
"total": 3,
"results": [
{"accession": "ENCSR200HNF", "assay_title": "TF ChIP-seq", "target": "HNF4A", "biosample_summary": "liver"}
]
}
Interpretation: If JASPAR-predicted HNF4A motif sites overlap actual HNF4A ChIP-seq peaks → validated motif prediction.
encode_get_facets(assay_title="TF ChIP-seq", facet_field="target.label", organ="liver", organism="Homo sapiens")
Expected output:
{
"facets": {
"target.label": {"HNF4A": 3, "CEBPA": 2, "FOXA2": 2, "RXRA": 2, "TP53": 1}
}
}
encode_compare_experiments(accession_1="ENCSR100LIV", accession_2="ENCSR200HNF")
Expected output:
{
"comparison": {
"shared": {"organ": "liver", "organism": "Homo sapiens", "assembly": "GRCh38"},
"differences": {
"assay": ["Histone ChIP-seq", "TF ChIP-seq"],
"target": ["H3K27ac", "HNF4A"]
}
}
}
encode_track_experiment(accession="ENCSR100LIV", notes="Liver H3K27ac for JASPAR motif scanning - HNF4A/CEBPA/FOXA2")
Expected output:
{
"status": "tracked",
"accession": "ENCSR100LIV",
"notes": "Liver H3K27ac for JASPAR motif scanning - HNF4A/CEBPA/FOXA2"
}
| This skill produces... | Feed into... | Purpose |
|---|---|---|
| TF motif enrichment scores | motif-analysis | Compare JASPAR database motifs with de novo discovered motifs |
| Predicted TF binding sites | peak-annotation | Annotate enhancer peaks with predicted TF regulators |
| TF-enhancer regulatory links | regulatory-elements | Classify enhancers by predicted TF driver identity |
| Motif-disrupting variant positions | variant-annotation | Identify SNPs that alter TF binding motifs |
| Tissue-specific TF motif profiles | compare-biosamples | Compare TF regulatory programs between tissues |
| TF binding predictions | disease-research | Connect TF motif disruption to disease mechanisms |
| Motif scanning results | visualization-workflow | Generate motif logos and enrichment heatmaps |
| Validated TF-target pairs | gtex-expression | Check TF expression in tissue via GTEx |
When reporting JASPAR motif scanning results:
motif-analysis for de novo motif discovery with HOMER/MEME, or regulatory-elements to characterize the ENCODE regulatory elements containing the motif hitsregulatory-elements — Characterizing ENCODE regulatory elements that motifs help annotateepigenome-profiling — Building tissue epigenomic profiles including TF binding landscapesvariant-annotation — Assessing whether variants disrupt TF binding motifs in ENCODE peakscompare-biosamples — Comparing TF binding and motif usage across ENCODE biosamplesquality-assessment — Using motif enrichment as a quality control metric for TF ChIP-seqgwas-catalog — GWAS variants that disrupt TF motifs in ENCODE peakspublication-trust — Verify literature claims backing analytical decisions