From sciagent-skills
Queries ChEMBL database via Python SDK for compound searches by structure/properties, bioactivity retrieval (IC50, Ki, EC50), target inhibitors, SAR analysis, and drug mechanism/indication data.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
Query the ChEMBL bioactive molecule database (2M+ compounds, 19M+ bioactivity measurements, 13K+ targets) using the `chembl_webresource_client` Python SDK. Covers compound search, target lookup, bioactivity retrieval, structure-based search, and drug information access.
Queries ChEMBL database for bioactive molecules, targets, and bioactivity data (IC50, Ki). Searches by structure/properties, finds inhibitors for drug discovery and medicinal chemistry.
Queries ChEMBL database for bioactive molecules, targets, bioactivity data (IC50, Ki), and drugs. Searches compounds by name, structure, properties; finds inhibitors for medicinal chemistry and SAR studies.
Finds, characterizes, and sources small molecules for drug discovery using PubChem, ChEMBL, ADMET predictors, and suppliers like eMolecules. Use for compound ID, structure search, binding affinities, drug-likeness, availability.
Share bugs, ideas, or general feedback.
Query the ChEMBL bioactive molecule database (2M+ compounds, 19M+ bioactivity measurements, 13K+ targets) using the chembl_webresource_client Python SDK. Covers compound search, target lookup, bioactivity retrieval, structure-based search, and drug information access.
uv pip install chembl_webresource_client
# Optional: pandas for tabular analysis
uv pip install pandas
Rate limiting: The SDK handles rate limiting internally via automatic retries and caching. No time.sleep() needed between queries. For large-scale data retrieval (100K+ records), consider ChEMBL bulk downloads instead of API queries.
from chembl_webresource_client.new_client import new_client
# Each entity type has its own client endpoint
molecule = new_client.molecule
target = new_client.target
activity = new_client.activity
# Retrieve a molecule by ChEMBL ID
aspirin = molecule.get('CHEMBL25')
print(f"{aspirin['pref_name']}: MW={aspirin['molecule_properties']['mw_freebase']}")
# Search targets by name
egfr_targets = target.filter(pref_name__icontains='EGFR', target_type='SINGLE PROTEIN')
print(f"Found {len(list(egfr_targets))} EGFR-related targets")
# Query bioactivities with filters
potent = activity.filter(
target_chembl_id='CHEMBL203', # EGFR
standard_type='IC50',
standard_value__lte=100, # <= 100 nM
standard_units='nM'
)
ChEMBL uses Django-style query filters on all endpoints:
| Operator | Meaning | Example |
|---|---|---|
__exact | Exact match (default) | target_type__exact='SINGLE PROTEIN' |
__iexact | Case-insensitive exact | pref_name__iexact='aspirin' |
__contains | Substring match | pref_name__contains='kinase' |
__icontains | Case-insensitive substring | pref_name__icontains='egfr' |
__startswith | Prefix match | pref_name__startswith='Epi' |
__endswith | Suffix match | pref_name__endswith='nib' |
__gt / __gte | Greater than (or equal) | standard_value__gte=10 |
__lt / __lte | Less than (or equal) | standard_value__lte=100 |
__range | Value in range | mw_freebase__range=[300, 500] |
__in | Value in list | target_chembl_id__in=['CHEMBL203', 'CHEMBL240'] |
__isnull | Null check | pchembl_value__isnull=False |
__regex | Regular expression | pref_name__regex='^EGF.*kinase$' |
__search | Full-text search | description__search='apoptosis' |
| Endpoint | Access | Description |
|---|---|---|
molecule | new_client.molecule | Compound structures, properties, synonyms |
target | new_client.target | Protein and non-protein biological targets |
activity | new_client.activity | Bioassay measurement results |
assay | new_client.assay | Experimental assay details |
drug | new_client.drug | Approved pharmaceutical information |
mechanism | new_client.mechanism | Drug mechanism of action data |
drug_indication | new_client.drug_indication | Drug therapeutic indications |
similarity | new_client.similarity | Tanimoto similarity search |
substructure | new_client.substructure | Substructure search |
image | new_client.image | SVG molecular structure images |
molecule_form | new_client.molecule_form | Parent/salt forms |
protein_class | new_client.protein_class | Protein classification hierarchy |
target_component | new_client.target_component | Target component details |
cell_line | new_client.cell_line | Cell line information |
tissue | new_client.tissue | Tissue type information |
compound_structural_alert | new_client.compound_structural_alert | Structural alerts for toxicity |
document | new_client.document | Literature source references |
Properties accessible via molecule['molecule_properties']:
| Field | Description |
|---|---|
mw_freebase | Molecular weight (free base) |
full_mwt | Full molecular weight (including salts) |
alogp | Calculated LogP |
hba | Hydrogen bond acceptors |
hbd | Hydrogen bond donors |
psa | Polar surface area |
rtb | Rotatable bonds |
num_ro5_violations | Lipinski rule-of-5 violations |
ro3_pass | Rule of 3 compliance |
cx_most_apka | Most acidic pKa |
cx_most_bpka | Most basic pKa |
Key fields in target records:
| Field | Description |
|---|---|
target_chembl_id | ChEMBL target identifier |
pref_name | Preferred target name |
target_type | Type: SINGLE PROTEIN, PROTEIN COMPLEX, ORGANISM |
organism | Target organism (e.g., Homo sapiens) |
tax_id | NCBI taxonomy ID |
target_components | Component details (UniProt accession, etc.) |
Key fields in activity records:
| Field | Description |
|---|---|
standard_type | Activity type: IC50, Ki, Kd, EC50, etc. |
standard_value | Numerical activity value |
standard_units | Units: nM, uM, etc. |
pchembl_value | Normalized activity (-log10 scale, comparable across types) |
activity_comment | Activity annotations |
data_validity_comment | Data quality flags (check before analysis) |
potential_duplicate | Duplicate flag |
molecule = new_client.molecule
# By ChEMBL ID
aspirin = molecule.get('CHEMBL25')
# By name (case-insensitive)
results = molecule.filter(pref_name__icontains='imatinib')
# By properties (Lipinski rule-of-5 compliant)
drug_like = molecule.filter(
molecule_properties__mw_freebase__lte=500,
molecule_properties__alogp__lte=5,
molecule_properties__hba__lte=10,
molecule_properties__hbd__lte=5
)
# By property range
mid_weight = molecule.filter(
molecule_properties__mw_freebase__range=[300, 500]
)
target = new_client.target
# By ChEMBL ID
egfr = target.get('CHEMBL203')
print(f"{egfr['pref_name']} ({egfr['organism']})")
# Search by name and type
kinases = target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
# By organism
human_targets = target.filter(organism='Homo sapiens')
activity = new_client.activity
# Potent inhibitors for a target
potent = activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
# All activities for a compound (with pChEMBL values)
compound_acts = activity.filter(
molecule_chembl_id='CHEMBL25',
pchembl_value__isnull=False
)
# Multiple activity types
ki_data = activity.filter(
target_chembl_id='CHEMBL240',
standard_type__in=['IC50', 'Ki', 'Kd']
)
# Similarity search (Tanimoto)
similarity = new_client.similarity
similar = similarity.filter(
smiles='CC(=O)Oc1ccccc1C(=O)O', # aspirin
similarity=85 # >=85% similarity
)
# Substructure search
substructure = new_client.substructure
benzimidazoles = substructure.filter(smiles='c1ccc2[nH]cnc2c1')
drug = new_client.drug
mechanism = new_client.mechanism
drug_indication = new_client.drug_indication
# Drug details
drug_info = drug.get('CHEMBL25')
# Mechanisms of action
mechs = mechanism.filter(molecule_chembl_id='CHEMBL941')
for m in mechs:
print(f"{m['mechanism_of_action']} → {m.get('target_chembl_id')}")
# Therapeutic indications
indications = drug_indication.filter(molecule_chembl_id='CHEMBL941')
for ind in indications:
print(f"{ind.get('mesh_heading')} (Phase {ind.get('max_phase_for_ind')})")
# SVG molecular image
image = new_client.image
svg_data = image.get('CHEMBL25')
with open('aspirin.svg', 'w') as f:
f.write(svg_data)
from chembl_webresource_client.new_client import new_client
import pandas as pd
# Step 1: Identify the target
targets = new_client.target.filter(pref_name__icontains='BRAF', target_type='SINGLE PROTEIN')
target_id = list(targets)[0]['target_chembl_id']
# Step 2: Query potent activities
activities = new_client.activity.filter(
target_chembl_id=target_id,
standard_type='IC50',
standard_value__lte=100,
standard_units='nM',
pchembl_value__isnull=False
)
# Step 3: Convert to DataFrame for analysis
df = pd.DataFrame(list(activities))
df['standard_value'] = pd.to_numeric(df['standard_value'])
print(f"Found {len(df)} potent compounds")
print(df[['molecule_chembl_id', 'standard_value', 'pchembl_value']].head(10))
from chembl_webresource_client.new_client import new_client
chembl_id = 'CHEMBL941' # Imatinib
# Drug information
drug_info = new_client.molecule.get(chembl_id)
print(f"Name: {drug_info['pref_name']}")
print(f"MW: {drug_info['molecule_properties']['mw_freebase']}")
# Mechanisms of action
mechs = list(new_client.mechanism.filter(molecule_chembl_id=chembl_id))
for m in mechs:
print(f"Mechanism: {m['mechanism_of_action']}")
# Indications
indications = list(new_client.drug_indication.filter(molecule_chembl_id=chembl_id))
for ind in indications:
print(f"Indication: {ind.get('mesh_heading')} (Phase {ind.get('max_phase_for_ind')})")
# All bioactivity data
activities = list(new_client.activity.filter(
molecule_chembl_id=chembl_id, pchembl_value__isnull=False
))
print(f"Total bioactivity records: {len(activities)}")
from chembl_webresource_client.new_client import new_client
import pandas as pd
# Step 1: Find similar compounds to lead
similar = new_client.similarity.filter(
smiles='c1ccc2c(c1)cc(nc2N)c3ccc(cc3)NC(=O)c4ccccc4', # lead compound
similarity=80
)
analogs = list(similar)
# Step 2: Collect activities for each analog
records = []
for compound in analogs[:20]: # limit for demo
cid = compound['molecule_chembl_id']
acts = list(new_client.activity.filter(
molecule_chembl_id=cid,
standard_type='IC50',
pchembl_value__isnull=False
))
for act in acts:
records.append({
'chembl_id': cid,
'target': act.get('target_pref_name'),
'IC50_nM': act.get('standard_value'),
'pchembl': act.get('pchembl_value'),
'mw': compound.get('molecule_properties', {}).get('mw_freebase'),
'alogp': compound.get('molecule_properties', {}).get('alogp')
})
# Step 3: Analyze property-activity relationships
df = pd.DataFrame(records)
if not df.empty:
df['IC50_nM'] = pd.to_numeric(df['IC50_nM'])
print(df.groupby('target')['IC50_nM'].describe())
from chembl_webresource_client.new_client import new_client
candidates = new_client.molecule.filter(
molecule_properties__mw_freebase__range=[300, 500],
molecule_properties__alogp__lte=5,
molecule_properties__hba__lte=10,
molecule_properties__hbd__lte=5,
molecule_properties__num_ro5_violations=0
)
print(f"Drug-like candidates: {len(list(candidates))}")
from chembl_webresource_client.settings import Settings
Settings.Instance().CACHING = True # enable/disable cache
Settings.Instance().CACHE_EXPIRE = 86400 # cache duration (seconds)
Settings.Instance().TIMEOUT = 30 # request timeout (seconds)
Settings.Instance().TOTAL_RETRIES = 3 # retry count on failure
import pandas as pd
from chembl_webresource_client.new_client import new_client
activities = new_client.activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
pchembl_value__isnull=False
)
df = pd.DataFrame(list(activities))
df.to_csv('egfr_activities.csv', index=False)
print(f"Exported {len(df)} records")
| Parameter | Endpoint | Default | Description |
|---|---|---|---|
similarity | similarity.filter() | — | Tanimoto threshold (0-100), typically 70-90 |
standard_type | activity.filter() | — | Activity type: IC50, Ki, Kd, EC50 |
standard_value__lte | activity.filter() | — | Max activity value (nM) |
pchembl_value | activity.filter() | — | Normalized -log10 activity (>6 = potent) |
target_type | target.filter() | — | SINGLE PROTEIN, PROTEIN COMPLEX, ORGANISM |
CACHING | Settings | True | Enable HTTP response caching |
CACHE_EXPIRE | Settings | 86400 | Cache TTL in seconds |
TIMEOUT | Settings | 30 | HTTP request timeout in seconds |
TOTAL_RETRIES | Settings | 3 | Auto-retry count on failure |
| Problem | Cause | Solution |
|---|---|---|
| Empty results from filter | No matches or too strict filters | Relax filters; verify IDs exist with .get() first |
KeyError on molecule properties | Not all molecules have full property data | Use .get('molecule_properties', {}).get('field') |
| Query returns unexpectedly few results | Lazy evaluation not consumed | Convert to list() before checking length |
| Slow queries | Large result sets paginated automatically | Add more filters to narrow results; use __range |
404 on .get() | Invalid ChEMBL ID | Verify ID format (e.g., CHEMBL25, not 25) |
| Stale data | Aggressive caching | Set Settings.Instance().CACHING = False or clear cache |
| Timeout errors | Server overload or large query | Increase TIMEOUT; split into smaller queries |
| Mixed units in activity data | Different assays use different units | Filter by standard_units='nM' or use pchembl_value |
| Duplicate activity records | Same measurement from different sources | Check potential_duplicate and data_validity_comment |
pchembl_value for cross-study comparisons — it normalizes IC50/Ki/EC50 to a comparable -log10 scaledata_validity_comment before using activity values — flags data quality issuesstandard_units to ensure consistent units across resultslist() only when you need all results in memorylist() only when neededrdkit-cheminformatics — SMILES manipulation, molecular descriptors, fingerprintsdatamol-cheminformatics — Molecular preprocessing and featurizationpubchem-compound-search — Alternative compound database (NIH)Self-contained entry (no references/ directory). Original total: 662 lines (SKILL.md 389 + api_reference.md 273). Scripts: 279 lines (example_queries.py).
Original file disposition:
SKILL.md (389 lines) → Core API, Workflows, Quick Start. "Common Use Cases" consolidated (rule 7b): Find Kinase Inhibitors → Workflow 1 pattern, Virtual Screening → Recipe, Drug Repurposing → omitted (trivial loop over drug endpoint, not a distinct analytical workflow). "Important Notes" section routed to Best Practices and Troubleshooting (rule 9)references/api_reference.md (273 lines) → Consolidated inline. Filter Operators → Key Concepts table. Core Endpoints listing → Key Concepts table (all endpoints). Molecular Properties → Key Concepts table. Bioactivity Data Fields → Key Concepts table. Target Information Fields → Key Concepts table. Configuration/Settings → Common Recipes. Error handling/rate limiting → Troubleshooting + Best Practices. Response formats (JSON/XML/YAML) → omitted (JSON is default and only format used via Python SDK). Advanced query examples already covered in Core APIscripts/example_queries.py (279 lines) → Thin-wrapper functions absorbed into Core API modules: get_molecule_info / search_molecules_by_name / find_molecules_by_properties → Module 1 (Molecule Queries); get_target_info / search_targets_by_name → Module 2 (Target Queries); get_bioactivity_data / get_compound_bioactivities → Module 3 (Bioactivity Data); find_similar_compounds / substructure_search → Module 4 (Structure-Based Search); get_drug_info → Module 5 (Drug and Mechanism Data); find_kinase_inhibitors → Workflow 1; export_to_dataframe → Workflow 1 + Recipe (Export)Retention: ~460 lines / 662 original = ~69%. Vendor metadata stripped (rule 13). Agent-behavior section stripped (rule 4).