From sciagent-skills
Queries ZINC15/ZINC22 compound libraries (1.4B compounds, 750M purchasable) for lead-like, fragment-like, drug-like sets by MW, logP, reactivity, SMILES similarity. Downloads 3D conformers for docking.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
ZINC (ZINC Is Not Commercial) is a free database of commercially available compounds curated for virtual screening. ZINC22 contains over 1.4 billion compounds (ZINC20: 1.4B, including purchasable 3D conformers), organized by molecular property filters (lead-like, fragment-like, drug-like) and reactivity class. The REST API enables SMILES-based searches, property-filtered downloads, and compound...
Accesses ZINC22 database of 230M+ purchasable compounds. Searches by ZINC ID/SMILES, performs similarity/analog searches, retrieves 3D-ready structures for docking, virtual screening, and drug discovery.
Accesses ZINC22 database of 230M+ purchasable compounds. Searches by ZINC ID/SMILES, performs similarity/analog queries, retrieves 3D-ready structures for docking, virtual screening, and drug discovery.
Finds, characterizes, and sources small molecules for drug discovery using PubChem, ChEMBL, ADMET predictors, and suppliers like eMolecules. Use for compound ID, structure search, binding affinities, drug-likeness, availability.
Share bugs, ideas, or general feedback.
ZINC (ZINC Is Not Commercial) is a free database of commercially available compounds curated for virtual screening. ZINC22 contains over 1.4 billion compounds (ZINC20: 1.4B, including purchasable 3D conformers), organized by molecular property filters (lead-like, fragment-like, drug-like) and reactivity class. The REST API enables SMILES-based searches, property-filtered downloads, and compound subset exports for docking campaigns.
chembl-database-bioactivity; for approved drug structures use drugbank-database-access; for RDKit property calculation use rdkit-cheminformaticsrequests, pandaspip install requests pandas
import requests
# Search ZINC15 REST API for drug-like compounds
BASE = "https://zinc15.docking.org"
r = requests.get(f"{BASE}/substances.json",
params={"mwt__gte": 250, "mwt__lte": 350,
"logp__gte": 0, "logp__lte": 3,
"availability": "for-sale", "count": 5})
r.raise_for_status()
compounds = r.json()
print(f"Returned {len(compounds)} compounds")
for c in compounds[:3]:
print(f" ZINC: {c['zinc_id']:20s} MW: {c['mwt']:.1f} logP: {c['logp']:.2f} SMILES: {c['smiles'][:40]}")
Search ZINC15 by molecular property ranges (Lipinski, lead-like, fragment-like criteria).
import requests, pandas as pd
BASE = "https://zinc15.docking.org"
def zinc_search(params, max_results=500):
"""Search ZINC15 with property filters. Returns DataFrame."""
all_results = []
params = dict(params)
params["count"] = min(100, max_results)
r = requests.get(f"{BASE}/substances.json", params=params)
r.raise_for_status()
compounds = r.json()
all_results.extend(compounds)
return pd.DataFrame(all_results)
# Lead-like set: MW 250-350, logP 1-3, HBD ≤ 3
df_leads = zinc_search({
"mwt__gte": 250, "mwt__lte": 350,
"logp__gte": 1, "logp__lte": 3,
"hbd__lte": 3, "hba__lte": 7,
"availability": "for-sale",
})
print(f"Lead-like compounds: {len(df_leads)}")
print(df_leads[["zinc_id", "mwt", "logp", "smiles"]].head())
# Fragment-like set: MW < 300, logP < 3 (Rule of Three)
df_frags = zinc_search({
"mwt__lte": 300,
"logp__lte": 3,
"hbd__lte": 3,
"availability": "for-sale",
})
print(f"\nFragment-like compounds: {len(df_frags)}")
print(df_frags[["zinc_id", "mwt", "logp", "smiles"]].head())
Fetch full compound data for a known ZINC identifier.
import requests
BASE = "https://zinc15.docking.org"
zinc_id = "ZINC000000029632"
r = requests.get(f"{BASE}/substances/{zinc_id}.json")
r.raise_for_status()
c = r.json()
print(f"ZINC ID : {c['zinc_id']}")
print(f"SMILES : {c['smiles']}")
print(f"MW : {c['mwt']:.2f}")
print(f"logP : {c['logp']:.2f}")
print(f"HBD : {c['hbd']}")
print(f"HBA : {c['hba']}")
print(f"TPSA : {c.get('tpsa', 'n/a')}")
print(f"Rotatable: {c.get('rotatable_bonds', 'n/a')}")
print(f"Suppliers: {len(c.get('suppliers', []))}")
ZINC organizes compounds into "tranches" by MW and logP. Download pre-built SDF/SMILES files.
import requests
# ZINC15 tranche download (MW 200-250, logP 1-2 range)
# Tranche naming: letters encode MW range (A-K) and logP range (A-J)
# See http://zinc15.docking.org/tranches/home
def download_zinc_tranche(tranche_name, dest_file, fmt="smi"):
"""Download a ZINC tranche SMILES file."""
url = f"https://zinc15.docking.org/tranches/{tranche_name}.{fmt}"
r = requests.get(url, stream=True)
r.raise_for_status()
with open(dest_file, "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
print(f"Downloaded {dest_file}")
# Download one tranche as SMILES
download_zinc_tranche("AABA", "zinc_AABA.smi", fmt="smi")
Find ZINC compounds similar to a query molecule.
import requests, pandas as pd
BASE = "https://zinc15.docking.org"
query_smiles = "c1ccc(NC(=O)c2ccccc2)cc1" # benzanilide analog
r = requests.get(f"{BASE}/substances.json",
params={
"smiles": query_smiles,
"similarity": 0.6, # Tanimoto similarity threshold
"count": 20,
"availability": "for-sale"
})
r.raise_for_status()
results = r.json()
print(f"Similar compounds found: {len(results)}")
df = pd.DataFrame(results)[["zinc_id", "smiles", "mwt", "logp"]]
print(df.head())
Retrieve purchasability and supplier catalog data for compounds.
import requests
BASE = "https://zinc15.docking.org"
# Check purchasability and catalog info
zinc_id = "ZINC000000029632"
r = requests.get(f"{BASE}/substances/{zinc_id}/suppliers.json")
r.raise_for_status()
suppliers = r.json()
print(f"Suppliers for {zinc_id}: {len(suppliers)}")
for sup in suppliers[:5]:
print(f" {sup.get('name', 'n/a'):30s} | Catalog: {sup.get('catalognum', 'n/a')}")
For large-scale virtual screening, download entire ZINC subsets as compressed SMILES.
import requests, gzip, io, pandas as pd
# ZINC15 drug-like purchasable slice (public URL pattern)
# Full drug-like: https://zinc15.docking.org/substances/subsets/drug-like.smi.gz
def download_zinc_subset(subset_name, max_lines=1000):
"""Download a ZINC subset SMILES file and return a DataFrame sample."""
url = f"https://zinc15.docking.org/substances/subsets/{subset_name}.smi.gz"
r = requests.get(url, stream=True)
r.raise_for_status()
lines = []
with gzip.open(r.raw, "rt") as f:
for i, line in enumerate(f):
if i >= max_lines:
break
lines.append(line.strip().split())
df = pd.DataFrame(lines, columns=["smiles", "zinc_id"] + [f"col{i}" for i in range(max(0, len(lines[0])-2))])
return df[["smiles", "zinc_id"]]
# Load first 1000 from lead-like subset
df_sample = download_zinc_subset("lead-like", max_lines=1000)
print(f"Loaded {len(df_sample)} compounds from lead-like subset")
print(df_sample.head())
Compounds are organized into a 2D grid of "tranches" based on MW (rows A–K: <200 to >600 Da) and logP (columns A–J: <-1 to >5). Each tranche can be downloaded as a SMILES or SDF file. This tranching enables targeted downloads of specific property spaces for docking.
Goal: Curate a purchasable, lead-like compound library within specific property ranges, deduplicate, and export for docking.
import requests, pandas as pd
BASE = "https://zinc15.docking.org"
# Fetch lead-like purchasable compounds with Lipinski compliance
params = {
"mwt__gte": 200, "mwt__lte": 500,
"logp__gte": -1, "logp__lte": 5,
"hbd__lte": 5, "hba__lte": 10,
"rotatable_bonds__lte": 10,
"availability": "for-sale",
"count": 200,
}
r = requests.get(f"{BASE}/substances.json", params=params)
r.raise_for_status()
compounds = r.json()
df = pd.DataFrame(compounds)[["zinc_id", "smiles", "mwt", "logp", "hbd", "hba"]]
df = df.drop_duplicates(subset=["smiles"])
print(f"Curated library: {len(df)} unique compounds")
# Export as SMILES for docking input
df[["smiles", "zinc_id"]].to_csv("docking_library.smi", sep=" ", index=False, header=False)
print("Saved: docking_library.smi")
print(df.head())
Goal: Download fragment-like (Rule of Three) compounds for fragment-based drug discovery.
import requests, pandas as pd
BASE = "https://zinc15.docking.org"
# Rule of Three: MW ≤ 300, logP ≤ 3, HBD ≤ 3, HBA ≤ 3, RotB ≤ 3
params = {
"mwt__lte": 300,
"logp__lte": 3,
"hbd__lte": 3,
"hba__lte": 3,
"rotatable_bonds__lte": 3,
"availability": "for-sale",
"count": 200,
}
r = requests.get(f"{BASE}/substances.json", params=params)
fragments = r.json()
df = pd.DataFrame(fragments)[["zinc_id", "smiles", "mwt", "logp"]]
print(f"Fragment library: {len(df)} compounds (Rule of Three)")
df.to_csv("fragment_library.smi", sep=" ", index=False, header=False)
print("Saved: fragment_library.smi")
df.describe()
| Parameter | Module | Default | Range / Options | Effect |
|---|---|---|---|---|
mwt__gte / mwt__lte | Search | — | numeric (Da) | Molecular weight lower/upper bound |
logp__gte / logp__lte | Search | — | numeric | logP (lipophilicity) range |
hbd__lte | Search | — | integer | Max hydrogen bond donors |
hba__lte | Search | — | integer | Max hydrogen bond acceptors |
rotatable_bonds__lte | Search | — | integer | Max rotatable bonds |
availability | Search | all | "for-sale", "in-stock", "on-demand" | Purchasability filter |
count | Search | 10 | 1–1000 | Max compounds returned per request |
similarity | Similarity | — | 0.0–1.0 | Tanimoto similarity threshold |
Use tranches for large docking campaigns: Downloading entire MW/logP tranches as pre-built SDF files is faster than paginating the API. Use the ZINC tranches page to identify the subset of property space you need.
Apply reactivity filters: ZINC marks reactive compounds with "reactivity" flags. Exclude compounds with reactive groups (reactivity: "clean" filter) for cell-based assays.
Deduplicate by SMILES: API results may contain duplicates across supplier catalog entries. Canonical SMILES deduplication with RDKit (Chem.MolToSmiles(Chem.MolFromSmiles(smi))) before docking.
Combine with RDKit filtering: After downloading, apply additional filters (PAINS, Brenk alerts) using rdkit-cheminformatics or medchem before investing compute in docking.
Cache SMILES downloads: ZINC data is updated periodically. Cache downloads with a date-stamped filename and avoid re-downloading within a project.
When to use: Find the ZINC ID for a known compound to check purchasability.
import requests
BASE = "https://zinc15.docking.org"
smiles = "CC(=O)Nc1ccc(O)cc1" # paracetamol / acetaminophen
r = requests.get(f"{BASE}/substances.json",
params={"smiles": smiles, "count": 3})
for c in r.json():
print(f"ZINC: {c['zinc_id']} | MW: {c['mwt']:.1f} | In stock: {c.get('availability')}")
When to use: Download 3D SDF conformers for a list of ZINC IDs for use in docking software.
import requests
BASE = "https://zinc15.docking.org"
zinc_ids = ["ZINC000000029632", "ZINC000001532592"]
for zid in zinc_ids:
r = requests.get(f"{BASE}/substances/{zid}.sdf")
if r.ok:
with open(f"{zid}.sdf", "w") as f:
f.write(r.text)
print(f"Downloaded {zid}.sdf")
else:
print(f"Not available: {zid}")
When to use: Quickly assess the property coverage of a downloaded compound set.
import pandas as pd
df = pd.read_csv("docking_library.smi", sep=" ", names=["smiles", "zinc_id"])
print(f"Library size: {len(df)}")
# If you have the full ZINC metadata:
# df = pd.DataFrame(compounds)[["mwt", "logp", "hbd", "hba"]]
# print(df.describe())
# import matplotlib.pyplot as plt
# df[["mwt", "logp"]].hist(bins=30, figsize=(10, 4)); plt.show()
| Problem | Cause | Solution |
|---|---|---|
HTTP 404 for compound ID | ZINC ID format incorrect | Use full 12-digit ZINC ID (e.g., ZINC000000029632) |
| Empty results for property search | Filters too restrictive | Relax ranges; check mwt__gte < mwt__lte is not inverted |
| Similarity search returns nothing | SMILES invalid or unusual scaffold | Validate SMILES with RDKit first; try lower similarity threshold |
| Tranche file download fails | Tranche code wrong | Verify tranche naming at zinc15.docking.org/tranches/home |
| API returns HTML error page | Server maintenance | Retry after a few minutes; check ZINC status |
| Slow large downloads | Large compound sets | Download tranche files via FTP/HTTP bulk download instead of API pagination |
rdkit-cheminformatics — Compute additional properties and apply PAINS filters on downloaded ZINC compoundsautodock-vina-docking — Use downloaded ZINC SMILES/SDF files for molecular docking campaignschembl-database-bioactivity — Bioactivity data for compounds identified in ZINC virtual screensmedchem — Apply medicinal chemistry filters (Lipinski, PAINS, NIBR) on ZINC libraries