From sciagent-skills
Queries ReMap 2022 TF ChIP-seq peak database via REST API: peaks overlapping regions/genes, by species/biotype, BED downloads for TF-cell pairs. For TF co-occupancy and regulatory annotation.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
ReMap 2022 is an integrative database of transcription factor (TF), cofactor, and chromatin regulator binding sites derived from uniformly reprocessed ChIP-seq experiments. The 2022 release catalogs 165 million non-redundant peaks from 8,113 ChIP-seq datasets covering 1,210 TFs across human (hg38/hg19), mouse (mm10), Drosophila, and Arabidopsis genomes. All peaks are called with a consistent pi...
Queries ENCODE Portal REST API for TF ChIP-seq, ATAC-seq/DNase-seq peaks, histone marks, RNA-seq across 1000+ cell types. Downloads BED/bigWig files; retrieves SCREEN cCREs by region/gene for variant annotation and regulatory analysis.
Investigates transcription factor binding, cis-regulatory elements, chromatin accessibility, and regulatory variants using JASPAR, ENCODE, RegulomeDB, UCSC cCREs. For TF sites, enhancers, ChIP-seq, ATAC-seq, variant impacts.
Provides fast CLI and Python queries to 20+ bioinformatics databases for gene info, BLAST searches, AlphaFold structures, enrichment analysis, and reference genomes. For interactive exploration and simple queries.
Share bugs, ideas, or general feedback.
ReMap 2022 is an integrative database of transcription factor (TF), cofactor, and chromatin regulator binding sites derived from uniformly reprocessed ChIP-seq experiments. The 2022 release catalogs 165 million non-redundant peaks from 8,113 ChIP-seq datasets covering 1,210 TFs across human (hg38/hg19), mouse (mm10), Drosophila, and Arabidopsis genomes. All peaks are called with a consistent pipeline from public GEO/ArrayExpress experiments. Access is via the ReMap 2022 REST API at https://remap2022.univ-amu.fr/api/ and bulk BED file downloads; no authentication required.
jaspar-database instead when you need PWM/PFM sequence models of TF binding specificity rather than ChIP-seq peak locationsencode-database; ReMap aggregates TF binding peaks from many sources including ENCODErequests, pandas, matplotlibtime.sleep(0.5) between batch requests to avoid server overloadpip install requests pandas matplotlib
import requests
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
# Query TF peaks overlapping a genomic region
r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
"chr": "chr17",
"start": 7_670_000,
"end": 7_690_000,
"assembly": "hg38"
}, timeout=30)
r.raise_for_status()
peaks = r.json()
print(f"Peaks overlapping TP53 locus: {len(peaks)}")
tfs = set(p.get("name", "").split(":")[0] for p in peaks)
print(f"Unique TFs: {len(tfs)}")
print(f"TF names (first 10): {sorted(tfs)[:10]}")
Find all TF ChIP-seq peaks overlapping a specified genomic window. Returns peak records including TF name, cell type, coordinates, and score.
import requests, time, pandas as pd
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def query_region(chrom, start, end, assembly="hg38", timeout=30):
"""Return all ReMap peaks overlapping [chrom:start-end]."""
r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
"chr": chrom, "start": start, "end": end, "assembly": assembly
}, timeout=timeout)
r.raise_for_status()
return r.json()
# Query 100 kb window on chr17 around TP53
peaks = query_region("chr17", 7_670_000, 7_690_000, assembly="hg38")
print(f"Total peaks: {len(peaks)}")
# Parse name field: format is "TF:experiment_id:cell_type"
rows = []
for p in peaks:
parts = p.get("name", "::").split(":")
tf = parts[0] if len(parts) > 0 else ""
exp = parts[1] if len(parts) > 1 else ""
cell = parts[2] if len(parts) > 2 else ""
rows.append({
"chr": p.get("chr", p.get("chrom", "")),
"start": p.get("start", 0),
"end": p.get("end", 0),
"tf_name": tf,
"experiment_id": exp,
"cell_type": cell,
"score": p.get("score", 0),
})
df = pd.DataFrame(rows)
print(f"\nUnique TFs: {df['tf_name'].nunique()}")
print(f"Top TFs by peak count:\n{df['tf_name'].value_counts().head(10).to_string()}")
# Fallback: if API is unavailable, use a locally downloaded BED file
# Download from: https://remap2022.univ-amu.fr/download_page
# e.g., remap2022_all_macs2_hg38_v1_0.bed.gz
import pandas as pd
def query_region_from_bed(bed_file, chrom, start, end):
"""Filter a ReMap BED file for overlapping peaks."""
cols = ["chr", "start", "end", "name", "score", "strand",
"thick_start", "thick_end", "color"]
df = pd.read_csv(bed_file, sep="\t", header=None, names=cols,
compression="infer")
mask = (df["chr"] == chrom) & (df["end"] > start) & (df["start"] < end)
return df[mask].reset_index(drop=True)
# Usage (requires downloaded BED):
# df = query_region_from_bed("remap2022_all_macs2_hg38_v1_0.bed.gz",
# "chr17", 7_670_000, 7_690_000)
Retrieve all TF ChIP-seq peaks near a gene's TSS, providing a promoter-proximal regulatory landscape for the gene.
import requests, time, pandas as pd
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def query_gene_peaks(gene_name, assembly="hg38", timeout=30):
"""Return all ReMap peaks near a gene TSS."""
r = requests.get(f"{REMAP_API}/peaks/gene/", params={
"gene": gene_name, "assembly": assembly
}, timeout=timeout)
r.raise_for_status()
return r.json()
peaks = query_gene_peaks("MYC", assembly="hg38")
print(f"Peaks near MYC TSS: {len(peaks)}")
rows = []
for p in peaks:
parts = p.get("name", "::").split(":")
rows.append({
"tf_name": parts[0] if parts else "",
"cell_type": parts[2] if len(parts) > 2 else "",
"chr": p.get("chr", p.get("chrom", "")),
"start": p.get("start", 0),
"end": p.get("end", 0),
"score": p.get("score", 0),
"biotype": p.get("biotype", ""),
})
df = pd.DataFrame(rows)
print(f"\nTFs near MYC TSS ({df['tf_name'].nunique()} unique):")
print(df["tf_name"].value_counts().head(10).to_string())
print(f"\nCell types represented: {df['cell_type'].nunique()}")
List all TFs available in ReMap for a given genome assembly, with peak and experiment counts.
import requests, time, pandas as pd
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def list_tfs(assembly="hg38", timeout=30):
"""Return all TFs in ReMap for the given assembly with statistics."""
r = requests.get(f"{REMAP_API}/tfbs/list/", params={"assembly": assembly}, timeout=timeout)
r.raise_for_status()
return r.json()
def get_database_stats(assembly="hg38", timeout=30):
"""Return overall database statistics for the assembly."""
r = requests.get(f"{REMAP_API}/stats/", params={"assembly": assembly}, timeout=timeout)
r.raise_for_status()
return r.json()
# Database overview
try:
stats = get_database_stats("hg38")
print(f"ReMap 2022 hg38 statistics:")
for k, v in stats.items():
print(f" {k}: {v}")
except Exception as e:
print(f"Stats endpoint unavailable: {e}")
print("ReMap 2022 hg38: 165M peaks, 1,210 TFs, 8,113 datasets (from publication)")
# TF list
try:
tfs = list_tfs("hg38")
df_tfs = pd.DataFrame(tfs)
print(f"\nTFs available (hg38): {len(df_tfs)}")
if "peak_count" in df_tfs.columns:
top = df_tfs.nlargest(10, "peak_count")[["name", "peak_count", "dataset_count"]]
print("Top 10 TFs by peak count:")
print(top.to_string(index=False))
except Exception as e:
print(f"TF list endpoint unavailable: {e}")
print("Use TF name queries directly (Query 4) or download TF-specific BED files.")
Retrieve all peaks for a named TF in a given assembly, optionally filtered by cell type.
import requests, time, pandas as pd
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def query_tf_peaks(tf_name, assembly="hg38", timeout=30):
"""Return all ChIP-seq peaks for a TF across all cell types."""
r = requests.get(f"{REMAP_API}/tfbs/name/", params={
"name": tf_name, "assembly": assembly
}, timeout=timeout)
r.raise_for_status()
return r.json()
peaks = query_tf_peaks("CTCF", assembly="hg38")
print(f"CTCF peaks (all cell types): {len(peaks)}")
# Parse and summarize
rows = []
for p in peaks:
parts = p.get("name", "::").split(":")
rows.append({
"tf_name": parts[0] if parts else "",
"cell_type": parts[2] if len(parts) > 2 else "",
"chr": p.get("chr", p.get("chrom", "")),
"start": p.get("start", 0),
"end": p.get("end", 0),
"score": p.get("score", 0),
"biotype": p.get("biotype", ""),
})
df = pd.DataFrame(rows)
print(f"Cell types: {df['cell_type'].nunique()}")
print(f"Chromosomes: {df['chr'].nunique()}")
print(f"Peak width stats (bp):")
df["width"] = df["end"] - df["start"]
print(f" Median: {df['width'].median():.0f} Mean: {df['width'].mean():.0f} "
f"Min: {df['width'].min()} Max: {df['width'].max()}")
Filter peaks by regulatory biotype annotation to identify binding at promoters, enhancers, or intergenic regions.
import requests, pandas as pd, matplotlib.pyplot as plt
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def get_biotypes(assembly="hg38", timeout=30):
"""List all regulatory biotype categories available."""
r = requests.get(f"{REMAP_API}/biotypes/", params={"assembly": assembly}, timeout=timeout)
r.raise_for_status()
return r.json()
def query_tf_by_biotype(tf_name, biotype, assembly="hg38", timeout=30):
"""Retrieve TF peaks filtered by regulatory biotype."""
r = requests.get(f"{REMAP_API}/peaks/biotype/", params={
"name": tf_name, "biotype": biotype, "assembly": assembly
}, timeout=timeout)
r.raise_for_status()
return r.json()
# List available biotypes
try:
biotypes = get_biotypes("hg38")
print(f"Available biotypes: {biotypes}")
except Exception:
biotypes = ["promoter", "enhancer", "exon", "intron", "intergenic", "UTR"]
print(f"Using known biotypes: {biotypes}")
# Query CTCF peaks and plot biotype distribution
peaks = query_tf_peaks("CTCF", assembly="hg38") # from Query 4 function above
def query_tf_peaks(tf_name, assembly="hg38", timeout=30):
r = requests.get(f"https://remap2022.univ-amu.fr/api/v1/tfbs/name/",
params={"name": tf_name, "assembly": assembly}, timeout=timeout)
r.raise_for_status()
return r.json()
peaks = query_tf_peaks("CTCF")
rows = [{"biotype": p.get("biotype", "unknown"),
"cell_type": p.get("name", "::").split(":")[2] if len(p.get("name","").split(":")) > 2 else ""}
for p in peaks]
df = pd.DataFrame(rows)
biotype_counts = df["biotype"].value_counts()
biotype_counts = biotype_counts[biotype_counts > 0]
print(f"\nCTCF peak biotype distribution:")
print(biotype_counts.to_string())
# Stacked bar chart across top 5 cell types
top_cells = df["cell_type"].value_counts().head(5).index.tolist()
pivot = (df[df["cell_type"].isin(top_cells)]
.groupby(["cell_type", "biotype"])
.size()
.unstack(fill_value=0))
fig, ax = plt.subplots(figsize=(9, 5))
pivot.plot(kind="bar", stacked=True, ax=ax, colormap="tab10", edgecolor="white")
ax.set_xlabel("Cell Type")
ax.set_ylabel("Peak Count")
ax.set_title("CTCF ChIP-seq Peak Biotype Distribution by Cell Type (ReMap 2022, hg38)")
ax.legend(title="Biotype", bbox_to_anchor=(1.01, 1), loc="upper left", fontsize=8)
plt.tight_layout()
plt.savefig("CTCF_biotype_distribution.png", dpi=150, bbox_inches="tight")
print("Saved CTCF_biotype_distribution.png")
The name field in every ReMap peak record encodes three pieces of information as a colon-separated string:
TF_NAME:EXPERIMENT_ID:CELL_TYPE
For example: CTCF:GSE30263.SRX028592:GM12878
Always parse with .split(":") and guard against missing parts. Some records may have fewer than three components if metadata is incomplete.
| Assembly code | Organism | Notes |
|---|---|---|
hg38 | Homo sapiens (GRCh38) | Primary human assembly in ReMap 2022 |
hg19 | Homo sapiens (GRCh37) | Legacy human assembly; fewer datasets |
mm10 | Mus musculus | Primary mouse assembly |
dm6 | Drosophila melanogaster | Smaller dataset collection |
tair10 | Arabidopsis thaliana | Plant TF dataset |
When the REST API is unavailable or for offline bulk analysis, ReMap provides pre-built BED files at https://remap2022.univ-amu.fr/download_page. Key files:
remap2022_all_macs2_hg38_v1_0.bed.gz — all peaks, hg38 (large, ~5 GB)remap2022_{TF}_macs2_hg38_v1_0.bed.gz — per-TF peak filesremap2022_crm_macs2_hg38_v1_0.bed.gz — cis-regulatory modules (merged peaks)import pandas as pd
def load_remap_bed(bed_path, chrom=None, start=None, end=None):
"""
Load a ReMap BED file with optional region filter.
Columns: chr, start, end, name (TF:exp:cell), score, strand,
thick_start, thick_end, itemRgb
"""
cols = ["chr", "start", "end", "name", "score", "strand",
"thick_start", "thick_end", "itemRgb"]
df = pd.read_csv(bed_path, sep="\t", header=None, names=cols,
compression="infer", low_memory=False)
if chrom:
df = df[df["chr"] == chrom]
if start is not None and end is not None:
df = df[(df["end"] > start) & (df["start"] < end)]
# Parse name field
parts = df["name"].str.split(":", expand=True)
df["tf_name"] = parts[0]
df["experiment_id"] = parts[1] if 1 in parts.columns else ""
df["cell_type"] = parts[2] if 2 in parts.columns else ""
return df.reset_index(drop=True)
# Usage example (offline):
# df = load_remap_bed("remap2022_CTCF_macs2_hg38_v1_0.bed.gz",
# chrom="chr17", start=7_670_000, end=7_690_000)
# print(df.head())
Goal: Identify all TFs with ChIP-seq evidence at a genomic locus and rank by peak count, then export a co-occupancy matrix.
import requests, time, pandas as pd, matplotlib.pyplot as plt
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def query_region(chrom, start, end, assembly="hg38", timeout=30):
r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
"chr": chrom, "start": start, "end": end, "assembly": assembly
}, timeout=timeout)
r.raise_for_status()
return r.json()
def parse_peaks(peaks):
rows = []
for p in peaks:
parts = p.get("name", "::").split(":")
rows.append({
"tf_name": parts[0] if len(parts) > 0 else "unknown",
"cell_type": parts[2] if len(parts) > 2 else "unknown",
"chr": p.get("chr", p.get("chrom", "")),
"start": p.get("start", 0),
"end": p.get("end", 0),
"score": p.get("score", 0),
})
return pd.DataFrame(rows)
# BRCA1 promoter region (GRCh38)
peaks = query_region("chr17", 43_044_000, 43_050_000, assembly="hg38")
df = parse_peaks(peaks)
print(f"Peaks at BRCA1 promoter: {len(df)}")
# TF occupancy summary
tf_summary = (df.groupby("tf_name")
.agg(peak_count=("tf_name", "count"),
cell_types=("cell_type", "nunique"),
mean_score=("score", "mean"))
.sort_values("peak_count", ascending=False))
print(f"\nTop TFs at BRCA1 promoter:")
print(tf_summary.head(15).to_string())
tf_summary.to_csv("BRCA1_promoter_TF_occupancy.csv")
# Horizontal bar chart
top = tf_summary.head(20)
fig, ax = plt.subplots(figsize=(8, 6))
ax.barh(top.index[::-1], top["peak_count"][::-1], color="#1f77b4", edgecolor="white")
ax.set_xlabel("Number of ChIP-seq Peaks")
ax.set_title("TF Co-occupancy at BRCA1 Promoter (ReMap 2022, hg38)")
plt.tight_layout()
plt.savefig("BRCA1_promoter_TF_cooccupancy.png", dpi=150, bbox_inches="tight")
print("Saved BRCA1_promoter_TF_cooccupancy.png")
Goal: For a list of genes, retrieve their promoter-proximal TF binding profiles and compare the TF repertoires across genes.
import requests, time, pandas as pd
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def query_gene_peaks(gene_name, assembly="hg38", timeout=30):
try:
r = requests.get(f"{REMAP_API}/peaks/gene/", params={
"gene": gene_name, "assembly": assembly
}, timeout=timeout)
r.raise_for_status()
return r.json()
except Exception as e:
print(f" Warning: {gene_name} failed — {e}")
return []
genes_of_interest = ["MYC", "TP53", "BRCA1", "EGFR", "CDK4"]
gene_tf_profiles = {}
for gene in genes_of_interest:
peaks = query_gene_peaks(gene, assembly="hg38")
if peaks:
tfs = set()
for p in peaks:
parts = p.get("name", "").split(":")
if parts:
tfs.add(parts[0])
gene_tf_profiles[gene] = tfs
print(f"{gene}: {len(peaks)} peaks, {len(tfs)} unique TFs")
time.sleep(0.5)
# Build binary TF presence matrix
all_tfs = sorted(set().union(*gene_tf_profiles.values()))
matrix = pd.DataFrame(
{gene: [1 if tf in gene_tf_profiles.get(gene, set()) else 0 for tf in all_tfs]
for gene in genes_of_interest},
index=all_tfs
)
print(f"\nTF × Gene matrix: {matrix.shape}")
print(f"TFs shared by all genes: {(matrix.sum(axis=1) == len(genes_of_interest)).sum()}")
matrix.to_csv("gene_TF_binding_atlas.csv")
print("Saved gene_TF_binding_atlas.csv")
Goal: Download a TF-specific ReMap BED file and analyze its genomic distribution with pandas.
import requests, gzip, io, pandas as pd, time
# ReMap provides per-TF BED files. For large-scale offline analysis:
REMAP_DOWNLOAD_BASE = "https://remap2022.univ-amu.fr/storage/remap2022/hg38/MACS2"
def download_tf_bed(tf_name, assembly="hg38", save_path=None):
"""
Attempt to download TF-specific BED file from ReMap.
Falls back to API region query if download unavailable.
"""
filename = f"remap2022_{tf_name}_macs2_{assembly}_v1_0.bed.gz"
url = f"{REMAP_DOWNLOAD_BASE}/{filename}"
print(f"Attempting download: {url}")
r = requests.get(url, stream=True, timeout=60)
if r.status_code == 200:
if save_path:
with open(save_path, "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
print(f"Saved: {save_path}")
return save_path
else:
# Read directly into DataFrame
content = b"".join(r.iter_content(chunk_size=8192))
cols = ["chr", "start", "end", "name", "score", "strand",
"thick_start", "thick_end", "itemRgb"]
with gzip.open(io.BytesIO(content), "rt") as gz:
df = pd.read_csv(gz, sep="\t", header=None, names=cols)
return df
else:
print(f"Download returned {r.status_code}; use API query as fallback")
return None
# Analyze a downloaded BED file
def analyze_remap_bed(df):
"""Compute summary statistics for a ReMap peak DataFrame."""
parts = df["name"].str.split(":", expand=True)
df = df.copy()
df["tf_name"] = parts[0]
df["cell_type"] = parts[2] if 2 in parts.columns else "unknown"
df["width"] = df["end"] - df["start"]
print(f"Total peaks: {len(df):,}")
print(f"Unique TFs: {df['tf_name'].nunique()}")
print(f"Unique cell types: {df['cell_type'].nunique()}")
print(f"\nPeak width (bp): median={df['width'].median():.0f} "
f"mean={df['width'].mean():.0f} range=[{df['width'].min()}, {df['width'].max()}]")
print(f"\nChromosome distribution:")
chr_counts = df["chr"].value_counts().head(5)
print(chr_counts.to_string())
return df
# Example usage (requires BED download or substitute with API results):
# df_raw = download_tf_bed("CTCF", save_path="CTCF_hg38.bed.gz")
# if df_raw is not None:
# df_analyzed = analyze_remap_bed(df_raw)
| Parameter | Endpoint | Default | Range / Options | Effect |
|---|---|---|---|---|
chr | /peaks/overlap/ | — | chr1–chrX, chrY, chrM | Chromosome for region query (include chr prefix) |
start | /peaks/overlap/ | — | Integer genomic coordinate | Region start (0-based) |
end | /peaks/overlap/ | — | Integer genomic coordinate | Region end (exclusive) |
assembly | All endpoints | — | hg38, hg19, mm10, dm6, tair10 | Genome assembly for coordinates and peak lookup |
gene | /peaks/gene/ | — | HGNC gene symbol (e.g., TP53, MYC) | Queries peaks near the gene's annotated TSS |
name | /tfbs/name/ | — | TF name as in ReMap (e.g., CTCF, SP1) | TF name is case-sensitive; match ReMap TF naming |
biotype | /peaks/biotype/ | — | promoter, enhancer, exon, intron, intergenic, UTR | Filters peaks by Ensembl regulatory biotype |
timeout | All requests | 30 | Integer seconds | Increase to 60–120 for large gene/TF queries |
Parse the name field defensively: The TF:experiment:cell_type format may have fewer than three components for some records. Always guard with parts[n] if len(parts) > n else "".
Use BED downloads for genome-wide analyses: Querying large genomic regions or all peaks for a TF via the REST API can time out. For whole-genome or per-chromosome scans, download the per-TF or per-assembly BED files from the ReMap download page and filter locally with pandas or bedtools.
Cross-reference with JASPAR for sequence evidence: ReMap peaks show where TF binding was detected by ChIP-seq (positional evidence); JASPAR PWMs show what sequence the TF prefers (motif evidence). For robust regulatory annotation, require both: a ReMap peak in the region AND a JASPAR motif hit within the peak.
Use time.sleep(0.5) in batch loops: The ReMap API serves a research community; polite request pacing prevents throttling.
Validate assembly coordinates: ReMap 2022 hg38 peaks use 0-based half-open BED coordinates ([start, end)). When comparing with VCF or 1-based GFF coordinates, add 1 to start.
When to use: Prioritize functional candidates from a GWAS hit by identifying which TFs bind at the SNP location.
import requests
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def tfs_at_snp(chrom, pos, window=500, assembly="hg38"):
"""Find TFs with ChIP-seq peaks overlapping a SNP position ± window bp."""
r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
"chr": chrom, "start": pos - window, "end": pos + window,
"assembly": assembly
}, timeout=30)
r.raise_for_status()
peaks = r.json()
tfs = {}
for p in peaks:
parts = p.get("name", "::").split(":")
tf = parts[0] if parts else "unknown"
tfs[tf] = tfs.get(tf, 0) + 1
return dict(sorted(tfs.items(), key=lambda x: -x[1]))
# Example: rs2736100 (TERT locus, chr5:1,286,401)
snp_tfs = tfs_at_snp("chr5", 1_286_401, window=500, assembly="hg38")
print(f"TFs at TERT GWAS SNP (±500 bp): {len(snp_tfs)}")
for tf, count in list(snp_tfs.items())[:10]:
print(f" {tf:<20s} {count:3d} peaks")
When to use: Check whether two co-regulated genes share the same upstream TF binding landscape.
import requests, time
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def get_gene_tfs(gene, assembly="hg38"):
try:
r = requests.get(f"{REMAP_API}/peaks/gene/", params={"gene": gene, "assembly": assembly}, timeout=30)
r.raise_for_status()
peaks = r.json()
return set(p.get("name", "").split(":")[0] for p in peaks if p.get("name", ""))
except Exception as e:
print(f"Warning: {gene} → {e}")
return set()
gene_a, gene_b = "MYC", "MYCN"
tfs_a = get_gene_tfs(gene_a)
time.sleep(0.5)
tfs_b = get_gene_tfs(gene_b)
shared = tfs_a & tfs_b
only_a = tfs_a - tfs_b
only_b = tfs_b - tfs_a
print(f"{gene_a} TFs: {len(tfs_a)} | {gene_b} TFs: {len(tfs_b)}")
print(f"Shared: {len(shared)} | {gene_a}-only: {len(only_a)} | {gene_b}-only: {len(only_b)}")
print(f"\nShared TFs (first 15): {sorted(shared)[:15]}")
print(f"\n{gene_a}-only (first 10): {sorted(only_a)[:10]}")
When to use: Export ReMap query results to BED format for downstream bedtools intersection or IGV visualization.
import requests, pandas as pd
REMAP_API = "https://remap2022.univ-amu.fr/api/v1"
def export_region_as_bed(chrom, start, end, outfile, assembly="hg38"):
"""Query ReMap region and save as 6-column BED file."""
r = requests.get(f"{REMAP_API}/peaks/overlap/", params={
"chr": chrom, "start": start, "end": end, "assembly": assembly
}, timeout=30)
r.raise_for_status()
peaks = r.json()
rows = [{
"chr": p.get("chr", p.get("chrom", "")),
"start": p.get("start", 0),
"end": p.get("end", 0),
"name": p.get("name", "."),
"score": p.get("score", 0),
"strand": p.get("strand", "."),
} for p in peaks]
df = pd.DataFrame(rows)
df = df.sort_values(["chr", "start"])
df.to_csv(outfile, sep="\t", header=False, index=False)
print(f"Saved {len(df)} peaks to {outfile}")
return df
export_region_as_bed("chr17", 7_670_000, 7_690_000, "TP53_locus_remap.bed")
| Problem | Cause | Solution |
|---|---|---|
404 Not Found from API | Endpoint path changed or unavailable | Check https://remap2022.univ-amu.fr/api/ for current endpoint list; fall back to BED download |
Empty JSON list [] from region query | No peaks in region, or assembly mismatch | Verify coordinates are on the correct assembly; try a wider window (±10 kb) |
| Gene query returns empty | Gene symbol not recognized by ReMap | Try Ensembl gene symbol; some aliases are not mapped — verify with HGNC |
requests.exceptions.Timeout | Large region or slow server | Increase timeout=60; for regions >1 Mb use BED file download instead |
name field has only one component | Incomplete metadata in ReMap for that experiment | Guard with parts[n] if len(parts) > n else "unknown" |
| BED download 404 | Per-TF files use exact ReMap TF naming | Check TF name case and spelling at https://remap2022.univ-amu.fr/download_page |
| Duplicate peaks for same TF | Multiple experiments per TF in a cell type | Group by tf_name and count unique experiments; deduplicate peaks with bedtools merge |
jaspar-database — TF binding motif matrices (PWMs/PFMs); use alongside ReMap peak evidence for sequence-level validationencode-database — ENCODE regulatory tracks including TF ChIP-seq, DNase-seq, and ATAC-seq; partially overlaps with ReMaphomer-motif-analysis — de novo motif discovery in ChIP-seq peak sets from ReMap or MACS3macs3-peak-calling — call peaks from raw ChIP-seq BAM files; ReMap provides pre-called peaks from the same approachregulomedb-database — regulatory variant scoring that integrates TF binding evidence similar to ReMap