From sciagent-skills
Searches EMDB cryo-EM density maps, fitted atomic models, and metadata via REST API. Query by keyword, resolution, method, or organism; fetch map URLs, PDB links, publications. No auth required.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
The Electron Microscopy Data Bank (EMDB) at EBI archives 3D electron microscopy density maps — primarily cryo-EM and cryo-ET maps — for macromolecular assemblies. It holds 30,000+ entries including ribosomes, membrane proteins, viruses, and large complexes not tractable by X-ray crystallography. The EMDB REST API at `https://www.ebi.ac.uk/emdb/api/` provides JSON responses for entry metadata, m...
Search and analyze cryo-EM maps, single particle structures, tomography datasets, and raw micrographs from EMDB, EMPIAR, CryoET. Cross-reference PDB structures and AlphaFold predictions for resolution-aware structural biology analysis.
Queries RCSB PDB (200K+ protein structures) via rcsb-api Python SDK. Performs text/attribute/sequence/3D similarity searches, retrieves metadata via GraphQL, and downloads PDB/mmCIF files for structural biology and drug discovery.
Accesses RCSB PDB to search 3D protein/nucleic acid structures by text, sequence, or structure similarity. Downloads PDB/mmCIF coordinates and retrieves metadata for structural biology and drug discovery workflows.
Share bugs, ideas, or general feedback.
The Electron Microscopy Data Bank (EMDB) at EBI archives 3D electron microscopy density maps — primarily cryo-EM and cryo-ET maps — for macromolecular assemblies. It holds 30,000+ entries including ribosomes, membrane proteins, viruses, and large complexes not tractable by X-ray crystallography. The EMDB REST API at https://www.ebi.ac.uk/emdb/api/ provides JSON responses for entry metadata, map download info, fitted atomic models (PDB IDs), and publications. No authentication or API key is required.
.map.gz density file to use in local structure visualization (UCSF ChimeraX, PyMOL)pdb-database instead when you need experimentally determined atomic coordinates (X-ray, NMR, or cryo-EM deposited with coordinates); EMDB provides the raw density map, PDB provides the atom positionsalphafold-database-access; EMDB is for experimental EM maps onlyrequests, pandas, matplotlibEMD-XXXX), keyword search strings, or PDB IDs for cross-referencingtime.sleep(0.2) between requests in batch loops for polite accesspip install requests pandas matplotlib
import requests
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
# Search for cryo-EM maps of the SARS-CoV-2 spike protein
response = requests.get(f"{EMDB_API}/search/", params={"q": "spike protein SARS-CoV-2"}, timeout=30)
response.raise_for_status()
results = response.json()
hits = results.get("results", [])
print(f"Total hits: {results.get('numFound', 0)}")
for entry in hits[:5]:
emdb_id = entry.get("emdbId", "")
title = entry.get("title", "")
resol = entry.get("resolution", "?")
print(f" {emdb_id}: {title[:60]} ({resol} Å)")
# EMD-30210: SARS-CoV-2 spike protein in the prefusion conf... (3.46 Å)
# EMD-22221: SARS-CoV-2 spike protein glycoprotein structure... (2.8 Å)
Search EMDB entries by keyword. Returns a paginated result list with basic metadata for each hit.
import requests
import pandas as pd
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
def emdb_search(query: str, rows: int = 20, start: int = 0) -> dict:
"""Full-text search of EMDB entries. Returns JSON response."""
params = {"q": query, "rows": rows, "start": start}
r = requests.get(f"{EMDB_API}/search/", params=params, timeout=30)
r.raise_for_status()
return r.json()
data = emdb_search("ribosome 70S bacterial", rows=10)
print(f"Total entries found: {data.get('numFound', 0)}")
rows = []
for entry in data.get("results", []):
rows.append({
"emdb_id": entry.get("emdbId"),
"title": entry.get("title", "")[:80],
"resolution": entry.get("resolution"),
"method": entry.get("imageAcquisition", {}).get("imagingMethod", ""),
"organism": entry.get("organism", ""),
})
df = pd.DataFrame(rows)
print(df.to_string(index=False))
# Search with resolution filter using pandas post-filtering
data = emdb_search("membrane protein", rows=50)
rows = []
for entry in data.get("results", []):
resol = entry.get("resolution")
if resol is not None and resol <= 3.0:
rows.append({
"emdb_id": entry.get("emdbId"),
"title": entry.get("title", "")[:70],
"resolution": resol,
})
df_highres = pd.DataFrame(rows).sort_values("resolution") if rows else pd.DataFrame()
print(f"High-resolution membrane protein maps (≤3.0 Å): {len(df_highres)}")
if not df_highres.empty:
print(df_highres.head(5).to_string(index=False))
Retrieve full metadata for a single EMDB entry by its ID (e.g., EMD-1234).
import requests
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
def get_entry(emdb_id: str) -> dict:
"""Fetch full metadata for a single EMDB entry. emdb_id e.g. 'EMD-1234'."""
r = requests.get(f"{EMDB_API}/entry/{emdb_id}", timeout=30)
r.raise_for_status()
return r.json()
entry = get_entry("EMD-30210") # SARS-CoV-2 spike
# Navigate the nested JSON
header = entry.get("map", {}).get("header", {})
title = header.get("title", "")
deposited = header.get("depositionDate", "")
print(f"Entry: EMD-30210")
print(f"Title: {title}")
print(f"Deposited: {deposited}")
# Resolution
resol_block = entry.get("processing", {}).get("reconstruction", {}).get("resolutionByAuthor", "")
print(f"Resolution: {resol_block}")
# Sample organism
sample = entry.get("sample", {})
name_block = sample.get("name", "")
print(f"Sample: {name_block}")
Retrieve the download URL and file format for the associated .map.gz density file.
import requests
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
def get_map_info(emdb_id: str) -> dict:
"""Retrieve map file download metadata for an EMDB entry."""
r = requests.get(f"{EMDB_API}/entry/{emdb_id}/map", timeout=30)
r.raise_for_status()
return r.json()
map_info = get_map_info("EMD-30210")
print("Map download info:")
for item in map_info if isinstance(map_info, list) else [map_info]:
file_url = item.get("url", "")
file_size = item.get("size", "")
format_ = item.get("format", "")
print(f" URL: {file_url}")
print(f" Format: {format_} | Size: {file_size}")
# Construct standard download URL manually (always available)
num = "30210" # numeric part of EMD-30210
standard_url = f"https://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-{num}/map/emd_{num}.map.gz"
print(f"\nFTP map URL: {standard_url}")
List the PDB IDs of atomic models that have been fitted into this EM map.
import requests
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
def get_fitted_models(emdb_id: str) -> list:
"""Return list of PDB IDs fitted to the EMDB map."""
r = requests.get(f"{EMDB_API}/entry/{emdb_id}/fitted", timeout=30)
r.raise_for_status()
data = r.json()
if isinstance(data, list):
return data
return data.get("fittedModels", [])
models = get_fitted_models("EMD-30210")
print(f"Fitted PDB models for EMD-30210: {len(models)}")
for m in models:
pdb_id = m.get("pdbId") if isinstance(m, dict) else m
print(f" PDB: {pdb_id}")
# Reverse lookup: given a PDB ID, find associated EMDB entries via search
import requests
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
def find_emdb_for_pdb(pdb_id: str) -> list:
"""Search EMDB for entries associated with a PDB ID."""
r = requests.get(f"{EMDB_API}/search/", params={"q": pdb_id, "rows": 10}, timeout=30)
r.raise_for_status()
results = r.json().get("results", [])
return [e.get("emdbId") for e in results if e.get("emdbId")]
pdb_id = "7BNM" # SARS-CoV-2 spike structure
associated = find_emdb_for_pdb(pdb_id)
print(f"EMDB entries associated with PDB {pdb_id}: {associated}")
Retrieve primary publications (citations) linked to an EMDB entry.
import requests
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
def get_publications(emdb_id: str) -> list:
"""Retrieve publications associated with an EMDB entry."""
r = requests.get(f"{EMDB_API}/entry/{emdb_id}/publications", timeout=30)
r.raise_for_status()
data = r.json()
if isinstance(data, list):
return data
return data.get("publications", [])
pubs = get_publications("EMD-30210")
print(f"Publications for EMD-30210: {len(pubs)}")
for pub in pubs:
title = pub.get("title", "")
doi = pub.get("doi", "")
year = pub.get("year", "")
print(f" [{year}] {title[:70]}")
if doi:
print(f" DOI: {doi}")
Retrieve aggregate EMDB database statistics — total entry count, resolution distribution, method breakdown.
import requests
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
def get_statistics() -> dict:
"""Retrieve overall EMDB database statistics."""
r = requests.get(f"{EMDB_API}/statistics/", timeout=30)
r.raise_for_status()
return r.json()
stats = get_statistics()
print("EMDB Database Statistics:")
total = stats.get("totalEntries", stats.get("total", "n/a"))
print(f" Total entries: {total}")
# Method breakdown if available
methods = stats.get("methods", stats.get("imagingMethods", {}))
if methods:
print(" By method:")
for method, count in sorted(methods.items(), key=lambda x: -x[1] if isinstance(x[1], int) else 0):
print(f" {method}: {count}")
Plot the resolution distribution of a set of EMDB search results.
import requests
import matplotlib.pyplot as plt
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
# Fetch 200 entries for visualization
r = requests.get(f"{EMDB_API}/search/", params={"q": "cryo-EM", "rows": 200}, timeout=60)
r.raise_for_status()
results = r.json().get("results", [])
resolutions = [
entry["resolution"] for entry in results
if entry.get("resolution") is not None and 1.0 <= entry["resolution"] <= 10.0
]
fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(resolutions, bins=30, color="#2c7fb8", edgecolor="white", alpha=0.85)
ax.axvline(x=3.0, color="#d62728", lw=1.5, ls="--", label="3 Å threshold")
ax.set_xlabel("Resolution (Å)")
ax.set_ylabel("Number of entries")
ax.set_title(f"EMDB Resolution Distribution (n={len(resolutions)})")
ax.legend()
plt.tight_layout()
plt.savefig("emdb_resolution_distribution.png", dpi=150, bbox_inches="tight")
print(f"Saved emdb_resolution_distribution.png ({len(resolutions)} entries)")
below3 = sum(1 for r in resolutions if r <= 3.0)
print(f"Entries at ≤3.0 Å: {below3}/{len(resolutions)} ({below3/len(resolutions)*100:.1f}%)")
EMDB IDs follow the pattern EMD-XXXX (e.g., EMD-1234, EMD-30210). The numeric part is used in FTP paths. The API accepts both EMD-1234 and 1234 in most endpoints. FTP download paths use zero-padded 4-digit numbers for older entries.
An EMDB entry holds the raw electron density map (.map or .map.gz, in MRC/CCP4 format) — a 3D voxel grid of electron scattering density. The fitted atomic model (PDB entry) is a separate record with ATOM/HETATM coordinates interpreted from the map. Many maps have multiple fitted models from different groups; some maps have none (primary data without model deposition).
| Resolution | Typical interpretability |
|---|---|
| < 2.5 Å | Near-atomic: side-chain positions visible |
| 2.5–3.5 Å | High-res: backbone well-resolved, some side chains |
| 3.5–5.0 Å | Medium: secondary structure clear, limited side-chain detail |
| > 5.0 Å | Low-res: domain arrangement only |
Use the resolution field from search results to filter for structures appropriate for your analysis task.
Goal: Find all EMDB maps for a protein target with resolution ≤ 3.5 Å, export to CSV with PDB model cross-references.
import requests
import time
import pandas as pd
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
def emdb_search_all(query: str, rows_per_page: int = 50) -> list:
"""Paginate through all search results."""
all_results = []
start = 0
while True:
r = requests.get(f"{EMDB_API}/search/",
params={"q": query, "rows": rows_per_page, "start": start},
timeout=30)
r.raise_for_status()
data = r.json()
batch = data.get("results", [])
all_results.extend(batch)
if start + rows_per_page >= data.get("numFound", 0) or not batch:
break
start += rows_per_page
time.sleep(0.2)
return all_results
target = "ACE2"
print(f"Searching EMDB for: {target}")
entries = emdb_search_all(target)
print(f"Total entries: {len(entries)}")
rows = []
for entry in entries:
resol = entry.get("resolution")
if resol is None or resol > 3.5:
continue
emdb_id = entry.get("emdbId", "")
# Fetch fitted PDB models
try:
r2 = requests.get(f"{EMDB_API}/entry/{emdb_id}/fitted", timeout=15)
models = r2.json() if r2.status_code == 200 else []
pdb_ids = [m.get("pdbId") if isinstance(m, dict) else str(m) for m in (models if isinstance(models, list) else [])]
except Exception:
pdb_ids = []
time.sleep(0.2)
rows.append({
"emdb_id": emdb_id,
"title": entry.get("title", "")[:80],
"resolution": resol,
"pdb_models": ";".join(pdb_ids) if pdb_ids else "",
"organism": entry.get("organism", ""),
})
df = pd.DataFrame(rows).sort_values("resolution")
df.to_csv(f"{target}_emdb_highres.csv", index=False)
print(f"High-res entries (≤3.5 Å): {len(df)}")
print(df[["emdb_id", "resolution", "pdb_models", "title"]].head(8).to_string(index=False))
Goal: Given a list of EMDB IDs from a literature search, fetch structured metadata and build a summary table.
import requests
import time
import pandas as pd
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
emdb_ids = ["EMD-30210", "EMD-22221", "EMD-23970", "EMD-13731", "EMD-14127"]
records = []
for emdb_id in emdb_ids:
try:
r = requests.get(f"{EMDB_API}/entry/{emdb_id}", timeout=20)
r.raise_for_status()
entry = r.json()
header = entry.get("map", {}).get("header", {})
processing = entry.get("processing", {})
recon = processing.get("reconstruction", {})
records.append({
"emdb_id": emdb_id,
"title": header.get("title", "")[:80],
"deposited": header.get("depositionDate", ""),
"resolution": recon.get("resolutionByAuthor", ""),
"software": recon.get("software", {}).get("name", "") if isinstance(recon.get("software"), dict) else "",
})
except Exception as e:
print(f"Warning: {emdb_id} failed — {e}")
time.sleep(0.2)
df = pd.DataFrame(records)
print(df.to_string(index=False))
df.to_csv("emdb_batch_metadata.csv", index=False)
print(f"\nSaved emdb_batch_metadata.csv ({len(df)} entries)")
Goal: Download an EMDB .map.gz file programmatically for use in ChimeraX or PyMOL.
import requests
from pathlib import Path
def download_emdb_map(emdb_id: str, output_dir: str = ".") -> str:
"""
Download an EMDB map file (.map.gz) via FTP.
Returns the path to the downloaded file.
"""
num = emdb_id.replace("EMD-", "").replace("emd-", "").lstrip("0") or "0"
num_padded = num.zfill(4) if len(num) < 4 else num
url = (f"https://ftp.ebi.ac.uk/pub/databases/emdb/structures/"
f"EMD-{num_padded}/map/emd_{num_padded}.map.gz")
out_path = Path(output_dir) / f"emd_{num_padded}.map.gz"
print(f"Downloading {emdb_id} map from EBI FTP...")
print(f" URL: {url}")
r = requests.get(url, stream=True, timeout=120)
r.raise_for_status()
total_mb = int(r.headers.get("content-length", 0)) / 1e6
downloaded = 0
with open(out_path, "wb") as f:
for chunk in r.iter_content(chunk_size=1024 * 1024):
f.write(chunk)
downloaded += len(chunk)
print(f" Saved: {out_path} ({downloaded/1e6:.1f} MB)")
return str(out_path)
# Example: download spike protein map
path = download_emdb_map("EMD-30210", output_dir="/tmp")
print(f"Map file: {path}")
print("Open in ChimeraX with: open /tmp/emd_30210.map.gz")
| Parameter | Function/Endpoint | Default | Range / Options | Effect |
|---|---|---|---|---|
q | /search/ | — | Any keyword string | Full-text search query; supports boolean and phrase matching |
rows | /search/ | 10 | 1–200 | Number of results per page |
start | /search/ | 0 | 0–numFound | Pagination offset for large result sets |
emdb_id | /entry/{id} | — | EMD-XXXX format | Specific entry identifier |
resolution | Result field | — | float (Å) | Filter post-query by entry["resolution"] threshold |
imagingMethod | Result field | — | "cryo EM", "cryo ET" | Method filter; applies via pandas post-fetch |
organism | Result field | — | organism name string | Organism filter; match with .str.contains() |
Use the FTP endpoint for large map files: The REST API provides metadata; the actual .map.gz files are served via the EBI FTP. Construct the FTP URL as https://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-XXXX/map/emd_XXXX.map.gz.
num = "30210"
url = f"https://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-{num}/map/emd_{num}.map.gz"
Add time.sleep(0.2) in batch loops: The EMDB REST API is shared infrastructure with no published rate limits. Polite delays prevent throttling.
Filter by resolution post-query: The /search/ endpoint does not support server-side numeric range filtering. Fetch a larger rows value and filter the resolution field with pandas locally.
Cross-reference via both directions: An EMDB entry can have 0–10+ fitted PDB models. Always check /entry/{emdb_id}/fitted for the definitive PDB list; keyword search alone may miss older depositions.
Check for None resolution: Some cryo-ET and subtomogram averages lack a numeric resolution estimate. Guard with if entry.get("resolution") is not None before numeric comparisons.
When to use: You have an EMDB ID and want to load all associated atomic coordinates.
import requests
def get_pdb_ids_for_emdb(emdb_id: str) -> list:
"""Return list of PDB IDs fitted into the given EMDB map."""
r = requests.get(f"https://www.ebi.ac.uk/emdb/api/entry/{emdb_id}/fitted", timeout=15)
if r.status_code != 200:
return []
data = r.json()
models = data if isinstance(data, list) else data.get("fittedModels", [])
return [m.get("pdbId") if isinstance(m, dict) else str(m) for m in models]
pdb_ids = get_pdb_ids_for_emdb("EMD-30210")
print(f"PDB models for EMD-30210: {pdb_ids}")
# PDB models for EMD-30210: ['7BNM', '7BNN']
When to use: Survey EMDB coverage and resolution for a list of protein targets.
import requests
import time
import pandas as pd
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
targets = ["KRAS", "EGFR", "ACE2", "p53", "mTOR"]
rows = []
for target in targets:
r = requests.get(f"{EMDB_API}/search/",
params={"q": target, "rows": 100}, timeout=30)
r.raise_for_status()
entries = r.json().get("results", [])
resolutions = [e["resolution"] for e in entries if e.get("resolution") is not None]
rows.append({
"target": target,
"n_entries": len(entries),
"best_resol": min(resolutions) if resolutions else None,
"mean_resol": round(sum(resolutions)/len(resolutions), 2) if resolutions else None,
})
time.sleep(0.3)
df = pd.DataFrame(rows)
print(df.to_string(index=False))
# target n_entries best_resol mean_resol
# KRAS 12 2.19 3.84
# EGFR 31 2.60 4.12
# ACE2 45 2.05 3.27
When to use: Build a benchmark set of high-resolution cryo-EM structures for a specific system.
import requests
import pandas as pd
EMDB_API = "https://www.ebi.ac.uk/emdb/api"
def find_highres_entries(query: str, resolution_cutoff: float = 3.0, max_results: int = 200) -> pd.DataFrame:
r = requests.get(f"{EMDB_API}/search/",
params={"q": query, "rows": max_results}, timeout=60)
r.raise_for_status()
entries = r.json().get("results", [])
rows = []
for e in entries:
resol = e.get("resolution")
if resol is not None and resol <= resolution_cutoff:
rows.append({
"emdb_id": e.get("emdbId"),
"resolution": resol,
"title": e.get("title", "")[:70],
"organism": e.get("organism", ""),
})
return pd.DataFrame(rows).sort_values("resolution") if rows else pd.DataFrame()
df = find_highres_entries("ion channel", resolution_cutoff=3.0)
print(f"Ion channel maps at ≤3.0 Å: {len(df)}")
print(df.head(5).to_string(index=False))
df.to_csv("ion_channel_highres_emdb.csv", index=False)
| Problem | Cause | Solution |
|---|---|---|
HTTPError: 404 for /entry/{emdb_id} | Entry ID not found or wrong format | Verify format is EMD-XXXX with correct numeric suffix; confirm entry exists on https://www.ebi.ac.uk/emdb/ |
Empty results list | Query too specific or misspelled | Broaden the search term; try the gene name alone without qualifiers |
resolution field is None | Cryo-ET or subtomogram averaging entries without reported resolution | Skip with if entry.get("resolution") is not None; these are valid entries |
FTP download returns 404 | Wrong numeric padding in FTP path | Use the raw number without leading zeros for EMD-XXXX where XXXX is < 4 digits; verify the path at https://ftp.ebi.ac.uk/pub/databases/emdb/structures/ |
| Fitted models list is empty | Map has no associated deposited PDB model | Some authors deposit maps without atomic models; cross-reference by keyword search with the EMDB title |
| Slow search for common terms | Large result sets | Limit rows to a manageable number (50–200) and filter post-fetch; avoid open-ended queries like q=protein |
ConnectionError / Timeout | Network issue or server overload | Retry with exponential backoff; increase timeout to 60s for large requests |
pdb-database — RCSB PDB REST API for experimental atomic coordinates; complement to EMDB mapsalphafold-database-access — AlphaFold predicted structures (200M+ proteins), no EM mappubmed-database — Retrieve publications by DOI or PMID retrieved from EMDB publications endpointmdanalysis-trajectory — Analyze MD trajectories of structures initially determined by cryo-EM.map.gz density files and XML metadata