Skill

chembl-database-bioactivity

Queries ChEMBL (2M+ compounds, 19M+ bioactivity measurements) via public REST API using only `requests`. Search compounds, retrieve IC50/Ki/EC50, find target inhibitors, run SAR, access drug data.

Python

REST API

data-engineering

database

Popularity

Stars

200

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/sciagent-skills:chembl-database-bioactivity

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

> **Why no SDK?** The `chembl_webresource_client` package is convenient sugar over a public, no-auth REST/JSON API at `https://www.ebi.ac.uk/chembl/api/data/`. When the SDK is unavailable, every operation can be reproduced with plain `requests` and URL parameters. This SKILL.md uses the REST path throughout so the code runs in any environment with `requests` installed. Django-style filter synta...

SKILL.md

594 lines · ~6.4k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Stars200

Forks21

MaintenanceExcellent

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

ChEMBL Database — Bioactivity Queries

Why no SDK? The chembl_webresource_client package is convenient sugar over a public, no-auth REST/JSON API at https://www.ebi.ac.uk/chembl/api/data/. When the SDK is unavailable, every operation can be reproduced with plain requests and URL parameters. This SKILL.md uses the REST path throughout so the code runs in any environment with requests installed. Django-style filter syntax (field__icontains=…, field__lte=…, field__range=a,b) works as URL query parameters.

Overview

ChEMBL is EMBL-EBI's bioactive molecule database: 2M+ compounds, 19M+ bioactivity measurements (IC50, Ki, EC50, Kd, …), 13K+ targets. The REST API at https://www.ebi.ac.uk/chembl/api/data/ returns JSON (append .json) or XML/YAML, requires no authentication, and supports Django-style query filters via URL parameters plus cursor-style pagination via page_meta.next.

When to Use

Finding compounds by name, ChEMBL ID, or physicochemical properties
Querying bioactivity data (IC50, Ki, EC50) for specific targets
Performing similarity or substructure searches using SMILES
Retrieving drug mechanisms of action and clinical indications
Identifying inhibitors, agonists, or bioactive molecules for a target
Analyzing structure-activity relationships (SAR) across compound series
Filtering molecules by Lipinski rule-of-5 or other drug-likeness criteria
For general cheminformatics (SMILES manipulation, fingerprints, descriptors) use rdkit-cheminformatics instead
For an alternative compound database (NIH, broader coverage) use pubchem-compound-search

Prerequisites

Python packages: requests (only requirement). Optional: pandas for tabular analysis.
No API key required: ChEMBL is freely accessible.
Rate limits: No published hard limit. The infrastructure is shared — add time.sleep(0.2-0.5) between requests in batch loops; back off on HTTP 429.

pip install requests
# Optional, for DataFrame work:
pip install pandas

Quick Start

import requests

BASE = "https://www.ebi.ac.uk/chembl/api/data"

# Retrieve a molecule by ChEMBL ID
r = requests.get(f"{BASE}/molecule/CHEMBL25.json", timeout=15)
r.raise_for_status()
aspirin = r.json()
print(f"{aspirin['pref_name']}: MW={aspirin['molecule_properties']['mw_freebase']}")
# ASPIRIN: MW=180.16

# Search targets by full name (acronyms like 'EGFR' don't match pref_name — use full term)
r = requests.get(
    f"{BASE}/target.json",
    params={"pref_name__icontains": "epidermal growth factor receptor",
            "target_type": "SINGLE PROTEIN", "limit": 5},
    timeout=15,
)
targets = r.json()["targets"]
print(f"EGFR-like targets: {len(targets)}, first={targets[0]['target_chembl_id']}")

# Potent bioactivities: EGFR (CHEMBL203) IC50 <= 100 nM
r = requests.get(
    f"{BASE}/activity.json",
    params={"target_chembl_id": "CHEMBL203",
            "standard_type": "IC50",
            "standard_value__lte": 100,
            "standard_units": "nM",
            "limit": 5},
    timeout=30,
)
data = r.json()
print(f"EGFR IC50 ≤ 100 nM records: {data['page_meta']['total_count']}")

Key Concepts

Filter Operators (Django-style, as URL parameters)

The SDK's field__operator=value syntax maps 1:1 to URL query parameters. Use & to combine filters.

Operator	URL pattern	Example URL fragment
`__exact`	`field=value`	`target_type=SINGLE+PROTEIN`
`__iexact`	`field__iexact=value`	`pref_name__iexact=aspirin`
`__contains` / `__icontains`	`field__icontains=value`	`pref_name__icontains=kinase`
`__startswith` / `__endswith`	`field__startswith=Epi`	`pref_name__endswith=nib`
`__gt` / `__gte` / `__lt` / `__lte`	`field__lte=100`	`standard_value__lte=100`
`__range`	`field__range=lo,hi`	`molecule_properties__mw_freebase__range=300,500`
`__in`	`field__in=a,b,c`	`standard_type__in=IC50,Ki,Kd`
`__isnull`	`field__isnull=False` (Python `False`/`True` strings)	`pchembl_value__isnull=False`
`__regex`	`field__regex=…`	`pref_name__regex=^EGF.*kinase$`
`__search`	`field__search=…`	`description__search=apoptosis`

When passed via requests.get(..., params={...}), the library handles URL encoding automatically (including the commas in __range and __in).

Core Endpoints

All endpoints accept .json, .xml, or .yaml suffix. JSON is the default below.

Endpoint URL	Returns	Key fields
`/molecule/{chembl_id}.json`	Compound by ID	`pref_name`, `molecule_chembl_id`, `molecule_properties`, `molecule_structures`
`/molecule.json?<filters>`	Compound search	paginated `molecules[]`
`/target/{chembl_id}.json`	Target by ID	`pref_name`, `target_type`, `organism`, `target_components`
`/target.json?<filters>`	Target search	paginated `targets[]`
`/activity.json?<filters>`	Bioactivity records	paginated `activities[]`
`/assay.json?<filters>`	Assay details	paginated `assays[]`
`/drug.json?<filters>`	Approved drug info	paginated `drugs[]`; supports `/drug/{chembl_id}.json`
`/mechanism.json?<filters>`	Mechanism of action	paginated `mechanisms[]`
`/drug_indication.json?<filters>`	Therapeutic indications	paginated `drug_indications[]`
`/similarity/{smiles}/{tanimoto}.json`	Tanimoto similarity (0–100)	paginated `molecules[]` with `similarity` field
`/substructure/{smiles}.json`	Substructure search	paginated `molecules[]`
`/image/{chembl_id}.svg`	SVG structure image	binary SVG (NOT JSON)
`/molecule_form/{chembl_id}.json`	Parent/salt forms	`molecule_forms[]`
`/protein_class.json`	Protein classification hierarchy	hierarchical browse
`/document.json?<filters>`	Literature source records	paginated `documents[]`

Response Shape

{
  "page_meta": {
    "limit": 20,
    "offset": 0,
    "total_count": 12145,
    "next": "/chembl/api/data/activity.json?...&offset=20",
    "previous": null
  },
  "activities": [ /* or molecules[], targets[], etc. */ ]
}

Walk page_meta.next (a relative URL — prefix with https://www.ebi.ac.uk) until it becomes null.

Molecular Properties

Properties accessible via molecule_properties on each record:

Field	Description
`mw_freebase`	Molecular weight (free base)
`full_mwt`	Full molecular weight (including salts)
`alogp`	Calculated LogP
`hba`	Hydrogen bond acceptors
`hbd`	Hydrogen bond donors
`psa`	Polar surface area
`rtb`	Rotatable bonds
`num_ro5_violations`	Lipinski rule-of-5 violations
`ro3_pass`	Rule of 3 compliance
`cx_most_apka` / `cx_most_bpka`	Most acidic / basic pKa

Target Information Fields

Field	Description
`target_chembl_id`	ChEMBL target identifier
`pref_name`	Preferred (full) target name — acronyms like "EGFR" do NOT match; use the spelled-out term
`target_type`	`SINGLE PROTEIN`, `PROTEIN COMPLEX`, `ORGANISM`, …
`organism`	Target organism (e.g., `Homo sapiens`)
`tax_id`	NCBI taxonomy ID
`target_components[]`	Components (UniProt accession, sequence, …)

Bioactivity Data Fields

Field	Description
`standard_type`	Activity type: `IC50`, `Ki`, `Kd`, `EC50`, …
`standard_value`	Numerical activity value
`standard_units`	Units: `nM`, `uM`, …
`pchembl_value`	Normalized -log10 activity (>6 = potent)
`activity_comment`	Activity annotations
`data_validity_comment`	Data quality flags (check before analysis)
`potential_duplicate`	Duplicate flag

Core API

1. Molecule Queries

import requests
BASE = "https://www.ebi.ac.uk/chembl/api/data"

# By ChEMBL ID
r = requests.get(f"{BASE}/molecule/CHEMBL25.json", timeout=15)
aspirin = r.json()
print(f"{aspirin['pref_name']}: MW={aspirin['molecule_properties']['mw_freebase']}")

# By name (case-insensitive substring)
r = requests.get(f"{BASE}/molecule.json",
                 params={"pref_name__icontains": "imatinib", "limit": 5},
                 timeout=15)
for mol in r.json()["molecules"]:
    print(f"  {mol['molecule_chembl_id']}  {mol.get('pref_name')!r}")

# By Lipinski-compliant property ranges
r = requests.get(f"{BASE}/molecule.json",
                 params={"molecule_properties__mw_freebase__range": "300,500",
                         "molecule_properties__alogp__lte": 5,
                         "molecule_properties__hba__lte": 10,
                         "molecule_properties__hbd__lte": 5,
                         "limit": 3},
                 timeout=15)
print(f"Lipinski-compliant total: {r.json()['page_meta']['total_count']}")

2. Target Queries

import requests
BASE = "https://www.ebi.ac.uk/chembl/api/data"

# By ChEMBL ID
r = requests.get(f"{BASE}/target/CHEMBL203.json", timeout=15)
egfr = r.json()
print(f"{egfr['pref_name']} ({egfr['organism']}) — type={egfr['target_type']}")

# Search by full name (NOT acronym) + type
r = requests.get(f"{BASE}/target.json",
                 params={"pref_name__icontains": "kinase",
                         "target_type": "SINGLE PROTEIN", "limit": 5},
                 timeout=15)
d = r.json()
print(f"Kinase SINGLE_PROTEIN targets: total={d['page_meta']['total_count']}")
for t in d["targets"][:5]:
    print(f"  {t['target_chembl_id']:12s} {t.get('pref_name')!r}  ({t['organism']})")

# By organism
r = requests.get(f"{BASE}/target.json",
                 params={"organism": "Homo sapiens", "limit": 3},
                 timeout=15)
print(f"Human targets: total={r.json()['page_meta']['total_count']}")

3. Bioactivity Data

import requests
BASE = "https://www.ebi.ac.uk/chembl/api/data"

# Potent inhibitors for a target (EGFR = CHEMBL203)
r = requests.get(f"{BASE}/activity.json",
                 params={"target_chembl_id": "CHEMBL203",
                         "standard_type": "IC50",
                         "standard_value__lte": 100,
                         "standard_units": "nM",
                         "limit": 5},
                 timeout=30)
data = r.json()
print(f"EGFR IC50≤100nM: total={data['page_meta']['total_count']}")
for act in data["activities"][:5]:
    print(f"  {act['molecule_chembl_id']:14s} IC50={act['standard_value']} nM "
          f"pChEMBL={act.get('pchembl_value')}")

# All pChEMBL-tagged activities for a compound
r = requests.get(f"{BASE}/activity.json",
                 params={"molecule_chembl_id": "CHEMBL25",
                         "pchembl_value__isnull": "False",
                         "limit": 5},
                 timeout=30)
print(f"Aspirin pChEMBL activities: total={r.json()['page_meta']['total_count']}")

# Multiple activity types (CHEMBL240 = D2 dopamine receptor)
r = requests.get(f"{BASE}/activity.json",
                 params={"target_chembl_id": "CHEMBL240",
                         "standard_type__in": "IC50,Ki,Kd",
                         "limit": 5},
                 timeout=30)
print(f"D2 receptor IC50/Ki/Kd: total={r.json()['page_meta']['total_count']}")

4. Structure-Based Search

import requests
from urllib.parse import quote
BASE = "https://www.ebi.ac.uk/chembl/api/data"

# Similarity search (Tanimoto ≥ 85%)
# Path-style endpoint: /similarity/{smiles}/{threshold}
# The SMILES MUST be URL-encoded (it contains '/', '(', ')' etc.)
aspirin_smiles = quote("CC(=O)Oc1ccccc1C(=O)O", safe="")
r = requests.get(f"{BASE}/similarity/{aspirin_smiles}/85.json",
                 params={"limit": 5}, timeout=30)
data = r.json()
print(f"Similar to aspirin (≥85% Tanimoto): total={data['page_meta']['total_count']}")
for m in data["molecules"][:5]:
    print(f"  {m['molecule_chembl_id']}  similarity={m.get('similarity')}")

# Substructure search
benzimidazole = quote("c1ccc2[nH]cnc2c1", safe="")
r = requests.get(f"{BASE}/substructure/{benzimidazole}.json",
                 params={"limit": 3}, timeout=30)
print(f"Benzimidazole substructure total: {r.json()['page_meta']['total_count']}")

5. Drug and Mechanism Data

import requests
BASE = "https://www.ebi.ac.uk/chembl/api/data"

# Drug record (max clinical phase, ATC class, etc.)
r = requests.get(f"{BASE}/drug/CHEMBL941.json", timeout=15)   # imatinib
drug = r.json()
print(f"Imatinib max_phase={drug.get('max_phase')}")

# Mechanisms of action — note: not every drug has mechanism records.
# Imatinib (CHEMBL941) returns 0 mechanism rows; sunitinib (CHEMBL535) has many.
r = requests.get(f"{BASE}/mechanism.json",
                 params={"molecule_chembl_id": "CHEMBL535"}, timeout=15)
for m in r.json()["mechanisms"]:
    print(f"  {m['mechanism_of_action']} → target {m.get('target_chembl_id')}")

# Therapeutic indications
r = requests.get(f"{BASE}/drug_indication.json",
                 params={"molecule_chembl_id": "CHEMBL941", "limit": 5},
                 timeout=15)
for ind in r.json()["drug_indications"]:
    print(f"  {ind.get('mesh_heading')!r}  max_phase_for_ind={ind.get('max_phase_for_ind')}")

# SVG molecular structure image — direct binary response, NOT JSON
# Do NOT call /image/{cid}.json — that endpoint raises JSONDecodeError.
r = requests.get(f"{BASE}/image/CHEMBL25.svg", timeout=15)
r.raise_for_status()
with open("aspirin.svg", "w") as f:
    f.write(r.text)
print(f"Saved aspirin.svg ({len(r.text)} bytes, looks_svg={'<svg' in r.text})")

Common Workflows

Workflow 1: Find Inhibitors for a Target

Note: pref_name__icontains matches the spelled-out name. Acronyms like 'EGFR' or 'BRAF' return 0 results — use 'epidermal growth factor receptor' or 'B-raf' (with the hyphen).

import requests, pandas as pd, time
BASE = "https://www.ebi.ac.uk/chembl/api/data"

# Step 1: Resolve the target by full name
r = requests.get(f"{BASE}/target.json",
                 params={"pref_name__icontains": "B-raf",
                         "target_type": "SINGLE PROTEIN", "limit": 5},
                 timeout=15)
targets = r.json()["targets"]
human_braf = next(t for t in targets if t["organism"] == "Homo sapiens")
target_id = human_braf["target_chembl_id"]
print(f"Using {target_id} — {human_braf['pref_name']}")

# Step 2: Paginate all potent IC50 activities (cap at 500 for demo)
url = (f"{BASE}/activity.json"
       f"?target_chembl_id={target_id}"
       f"&standard_type=IC50"
       f"&standard_value__lte=100"
       f"&standard_units=nM"
       f"&pchembl_value__isnull=False"
       f"&limit=200")
records = []
while url and len(records) < 500:
    r = requests.get(url, timeout=30)
    r.raise_for_status()
    data = r.json()
    records.extend(data["activities"])
    nxt = data["page_meta"].get("next")
    url = f"https://www.ebi.ac.uk{nxt}" if nxt else None
    time.sleep(0.2)

df = pd.DataFrame(records)
df["standard_value"] = pd.to_numeric(df["standard_value"])
print(f"Retrieved {len(df)} potent {target_id} compounds")
print(df[["molecule_chembl_id", "standard_value", "pchembl_value"]].head(10))

Workflow 2: Analyze a Known Drug

import requests
BASE = "https://www.ebi.ac.uk/chembl/api/data"

# Sunitinib (CHEMBL535) — has documented mechanisms + indications.
# Imatinib (CHEMBL941) sometimes returns 0 mechanism rows depending on ChEMBL release.
chembl_id = "CHEMBL535"

# Molecule record
m = requests.get(f"{BASE}/molecule/{chembl_id}.json", timeout=15).json()
print(f"Name: {m['pref_name']}")
print(f"MW  : {m['molecule_properties']['mw_freebase']}")

# Mechanisms
mechs = requests.get(f"{BASE}/mechanism.json",
                     params={"molecule_chembl_id": chembl_id},
                     timeout=15).json()["mechanisms"]
for mc in mechs:
    print(f"  Mechanism: {mc['mechanism_of_action']}")

# Indications
inds = requests.get(f"{BASE}/drug_indication.json",
                    params={"molecule_chembl_id": chembl_id, "limit": 5},
                    timeout=15).json()["drug_indications"]
for ind in inds:
    print(f"  Indication: {ind.get('mesh_heading')}  "
          f"(Phase {ind.get('max_phase_for_ind')})")

# Bioactivity record count
total = requests.get(f"{BASE}/activity.json",
                     params={"molecule_chembl_id": chembl_id,
                             "pchembl_value__isnull": "False",
                             "limit": 1},
                     timeout=30).json()["page_meta"]["total_count"]
print(f"Total bioactivity records (pChEMBL-tagged): {total}")

Workflow 3: SAR Study

import requests, pandas as pd, time
from urllib.parse import quote
BASE = "https://www.ebi.ac.uk/chembl/api/data"

# Step 1: Similar compounds to a lead (e.g., quinoline scaffold)
lead_smiles = "c1ccc2c(c1)cc(nc2N)c3ccc(cc3)NC(=O)c4ccccc4"
r = requests.get(f"{BASE}/similarity/{quote(lead_smiles, safe='')}/80.json",
                 params={"limit": 20}, timeout=30)
analogs = r.json()["molecules"]
print(f"Analogs found: {len(analogs)}")

# Step 2: Collect bioactivities for each analog
records = []
for compound in analogs[:20]:
    cid = compound["molecule_chembl_id"]
    acts = requests.get(f"{BASE}/activity.json",
                        params={"molecule_chembl_id": cid,
                                "standard_type": "IC50",
                                "pchembl_value__isnull": "False",
                                "limit": 20},
                        timeout=30).json()["activities"]
    for act in acts:
        records.append({
            "chembl_id": cid,
            "target": act.get("target_pref_name"),
            "IC50_nM": act.get("standard_value"),
            "pchembl": act.get("pchembl_value"),
            "mw":    (compound.get("molecule_properties") or {}).get("mw_freebase"),
            "alogp": (compound.get("molecule_properties") or {}).get("alogp"),
        })
    time.sleep(0.2)

df = pd.DataFrame(records)
if not df.empty:
    df["IC50_nM"] = pd.to_numeric(df["IC50_nM"])
    print(df.groupby("target")["IC50_nM"].describe())

Common Recipes

Recipe: Virtual Screening Filter (Lipinski rule-of-5)

import requests
BASE = "https://www.ebi.ac.uk/chembl/api/data"
r = requests.get(f"{BASE}/molecule.json",
                 params={"molecule_properties__mw_freebase__range": "300,500",
                         "molecule_properties__alogp__lte": 5,
                         "molecule_properties__hba__lte": 10,
                         "molecule_properties__hbd__lte": 5,
                         "molecule_properties__num_ro5_violations": 0,
                         "limit": 1},
                 timeout=15)
print(f"Drug-like candidates: {r.json()['page_meta']['total_count']}")

Recipe: Paginate Activities to CSV

import requests, pandas as pd, time
BASE = "https://www.ebi.ac.uk/chembl/api/data"

url = (f"{BASE}/activity.json"
       f"?target_chembl_id=CHEMBL203"
       f"&standard_type=IC50"
       f"&pchembl_value__isnull=False"
       f"&limit=500")
all_acts = []
while url:
    r = requests.get(url, timeout=60)
    r.raise_for_status()
    data = r.json()
    all_acts.extend(data["activities"])
    nxt = data["page_meta"].get("next")
    url = f"https://www.ebi.ac.uk{nxt}" if nxt else None
    time.sleep(0.3)

df = pd.DataFrame(all_acts)
df.to_csv("egfr_activities.csv", index=False)
print(f"Exported {len(df)} records → egfr_activities.csv")

Recipe: Robust Session with Retries

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def chembl_session(retries=3, backoff=1.0):
    s = requests.Session()
    s.headers.update({"Accept": "application/json"})
    s.mount("https://", HTTPAdapter(max_retries=Retry(
        total=retries, backoff_factor=backoff,
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["GET"])))
    return s

session = chembl_session()
r = session.get("https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL25.json", timeout=15)
print(r.json()["pref_name"])

Recipe: Download SVG Structure Image

import requests
r = requests.get("https://www.ebi.ac.uk/chembl/api/data/image/CHEMBL25.svg", timeout=15)
r.raise_for_status()
with open("aspirin.svg", "w") as f:
    f.write(r.text)

Key Parameters

Parameter	Endpoint	Default	Description
`limit`	all list endpoints	`20`	Page size; max 1000
`offset`	all list endpoints	`0`	Pagination offset (or follow `page_meta.next`)
`format`	all endpoints	`json` (via `.json` suffix)	Also `.xml`, `.yaml`
`pref_name__icontains`	`/target`, `/molecule`	—	Substring on full name; acronyms don't match, use full term
`target_chembl_id`	`/activity`	—	E.g., `CHEMBL203` (EGFR), `CHEMBL240` (D2 receptor)
`molecule_chembl_id`	`/activity`, `/mechanism`, `/drug_indication`	—	E.g., `CHEMBL25` (aspirin)
`standard_type`	`/activity`	—	`IC50`, `Ki`, `Kd`, `EC50`
`standard_value__lte`	`/activity`	—	Max activity value (paired with `standard_units`)
`pchembl_value__isnull`	`/activity`	—	`"False"` to require pChEMBL-tagged data
`target_type`	`/target`	—	`SINGLE PROTEIN`, `PROTEIN COMPLEX`, `ORGANISM`, …
`{tanimoto}` (path)	`/similarity/{smiles}/{tanimoto}`	—	`0`–`100` Tanimoto threshold
`{smiles}` (path)	`/similarity`, `/substructure`	—	URL-encoded SMILES (`urllib.parse.quote(s, safe="")`)

Troubleshooting

Problem	Cause	Solution
`pref_name__icontains=EGFR` (or `BRAF`) returns 0	ChEMBL stores spelled-out names; acronyms don't match	Use `"epidermal growth factor receptor"`; for BRAF use `"B-raf"` with the hyphen
`mechanism.json?molecule_chembl_id=CHEMBL941` returns empty	Not every drug has mechanism rows in every release (e.g., imatinib has 0 in current data)	Use `CHEMBL535` (sunitinib) or `CHEMBL192` (sildenafil) as known-populated examples
`JSONDecodeError` on `/image/{cid}.json`	The image endpoint is binary, not JSON	Always use `.svg` or `.png` suffix: `/image/{cid}.svg`
404 on `/molecule/{id}`	Invalid ChEMBL ID format	IDs must include the prefix: `CHEMBL25`, not `25`
400 on similarity search	Unencoded SMILES (`/` collides with URL path)	URL-encode: `urllib.parse.quote(smiles, safe="")`
Empty `next` page but `total_count` higher	Reached internal limit (typically 10000 with `offset` pagination)	Narrow filters (date range, target class) and re-paginate; or use the ChEMBL FTP downloads for >100K records
`HTTP 429 Too Many Requests`	Burst pace	Add `time.sleep(0.3)`; mount a `Retry` adapter (see Recipe)
Mixed units in `activity` records	Different assays report in nM / µM / % inhibition	Filter `standard_units="nM"` and prefer `pchembl_value` for cross-assay comparison
`data_validity_comment` is non-empty	Curation flag (e.g., "Potential transcription error", "Outside typical range")	Drop these rows before SAR/regression analysis
Duplicate activity records	Same measurement reported in multiple sources	Check `potential_duplicate=True` and dedupe

Best Practices

Use pchembl_value for cross-study comparisons — it normalizes IC50/Ki/EC50 to a comparable -log10 scale.
Always check data_validity_comment before computing aggregates — flagged rows can skew distributions.
Pin standard_units="nM" in activity queries to avoid mixing nM with µM.
Follow page_meta.next for pagination instead of incrementing offset manually — the URL already carries the right cursor.
URL-encode SMILES in path-style endpoints (/similarity/{smiles}/..., /substructure/{smiles}) with urllib.parse.quote(smi, safe="").
Use a Session with retry adapter for batch work (see Recipe) — ChEMBL handles a fair amount of traffic and occasionally returns 502/503.
For >100K records prefer the ChEMBL FTP downloads over paginated API calls.
Be deliberate about acronyms in pref_name__icontains — EGFR, BRAF, HER2 all return 0 hits. Use the spelled-out term or filter via target_components__accession=<UniProt> instead.

Related Skills

rdkit-cheminformatics — SMILES manipulation, fingerprints, descriptors
datamol-cheminformatics — molecular preprocessing & featurization
pubchem-compound-search — alternative compound database (NIH; broader coverage but less bioactivity depth)
pdb-database — 3D structures of ChEMBL targets via RCSB PDB REST API
opentargets-database — links ChEMBL drug-target evidence to disease associations

References

ChEMBL website: https://www.ebi.ac.uk/chembl/
REST API root: https://www.ebi.ac.uk/chembl/api/data/
API docs: https://www.ebi.ac.uk/chembl/api/data/docs
Interface docs (Django filter syntax): https://chembl.gitbook.io/chembl-interface-documentation/web-services
Bulk downloads (for >100K records): https://chembl.gitbook.io/chembl-interface-documentation/downloads
For SDK-based usage, see the chembl_webresource_client PyPI package; this SKILL.md uses the underlying REST API directly so no SDK install is needed.

chembl-database-bioactivity

Popularity

Invocation

Context Preview

SKILL.md

chembl-database-bioactivity

Popularity

Invocation

Context Preview

SKILL.md

ChEMBL Database — Bioactivity Queries

Overview

When to Use

Prerequisites

Quick Start

Key Concepts

Filter Operators (Django-style, as URL parameters)

Core Endpoints

Response Shape

Molecular Properties

Target Information Fields

Bioactivity Data Fields

Core API

1. Molecule Queries

2. Target Queries

3. Bioactivity Data

4. Structure-Based Search

5. Drug and Mechanism Data

Common Workflows

Workflow 1: Find Inhibitors for a Target

Workflow 2: Analyze a Known Drug

Workflow 3: SAR Study

Common Recipes

Recipe: Virtual Screening Filter (Lipinski rule-of-5)

Recipe: Paginate Activities to CSV

Recipe: Robust Session with Retries

Recipe: Download SVG Structure Image

Key Parameters

Troubleshooting

Best Practices

Related Skills

References

Similar Skills

ChEMBL Database — Bioactivity Queries

Overview

When to Use

Prerequisites

Quick Start

Key Concepts

Filter Operators (Django-style, as URL parameters)

Core Endpoints

Response Shape

Molecular Properties

Target Information Fields

Bioactivity Data Fields

Core API

1. Molecule Queries

2. Target Queries

3. Bioactivity Data

4. Structure-Based Search

5. Drug and Mechanism Data

Common Workflows

Workflow 1: Find Inhibitors for a Target

Workflow 2: Analyze a Known Drug

Workflow 3: SAR Study

Common Recipes

Recipe: Virtual Screening Filter (Lipinski rule-of-5)

Recipe: Paginate Activities to CSV

Recipe: Robust Session with Retries

Recipe: Download SVG Structure Image

Key Parameters

Troubleshooting

Best Practices

Related Skills

References

Similar Skills