npx claudepluginhub kohulan/chemauditThis skill is limited to using the following tools:
Multi-stage funnel for generative-chemistry outputs and virtual screening hits. Six sequential stages, each of which can reject a molecule independently:
Applies medicinal chemistry filters for compound triage: drug-likeness rules (Lipinski Ro5, Veber, Oprea, CNS), structural alerts (PAINS, NIBR), chemical detectors, complexity metrics. Built on RDKit/datamol for hit-to-lead, library design, ADMET screening.
Applies medicinal chemistry filters like Lipinski, Veber rules, PAINS patterns, structural alerts, and complexity metrics to prioritize and filter compound libraries in drug discovery.
Applies drug-likeness rules (Lipinski, Veber), PAINS filters, structural alerts, and complexity metrics to filter and prioritize compound libraries in drug discovery.
Share bugs, ideas, or general feedback.
Multi-stage funnel for generative-chemistry outputs and virtual screening hits. Six sequential stages, each of which can reject a molecule independently:
parse — RDKit parse + sanitization.valence — valence validity check.alerts — structural alert catalogs (PAINS, Brenk, Kazius, NIBR — configurable).property — MW, LogP, TPSA, rotatable bonds, rings within bounds.sa_score — synthetic accessibility score ≤ threshold.dedup — InChIKey-based deduplication.Optional 7th stage:
7. novelty — Tanimoto similarity vs ChEMBL (gated by enable_novelty).
Counts how many molecules enter and leave each stage (funnel diagram data).
Paired with a composite 0-1 scorer that blends validity / QED / alert-free / SA into a single reward, useful for generative-model reinforcement learning loops. Invalid SMILES score null, never 0.0 (D-14 / Pitfall 4) so generative agents can distinguish "unparseable" from "parseable but terrible".
REINVENT 4 Component API is supported out of the box.
See references/preset-configs.md for full threshold details.
| Preset | MW | LogP | TPSA | RotB | Rings | SA | Alerts |
|---|---|---|---|---|---|---|---|
drug_like | 200–500 | -1 to 5 | ≤140 | ≤10 | — | ≤5.0 | PAINS + Brenk + Kazius |
lead_like | 200–350 | -1 to 3.5 | ≤140 | ≤7 | — | ≤4.0 | PAINS + Brenk + Kazius + NIBR |
fragment_like | 100–300 | -1 to 3 | ≤100 | ≤3 | ≤3 | ≤3.0 | PAINS only |
permissive | 100–800 | -5 to 8 | ≤200 | ≤15 | — | ≤7.0 | none |
Weight vectors for the composite score differ per preset (D-15):
| Preset | validity | qed | alert_free | sa |
|---|---|---|---|---|
drug_like | 0.3 | 0.3 | 0.2 | 0.2 |
lead_like | 0.2 | 0.4 | 0.2 | 0.2 |
fragment_like | 0.2 | 0.2 | 0.3 | 0.3 |
permissive | 0.4 | 0.3 | 0.1 | 0.2 |
POST /structure-filter/filter with ≤ 1000 SMILES → sync FilterResponse.POST /structure-filter/filter with > 1000 SMILES → returns StructureFilterBatchUploadResponse with job_id, task runs on Celery.POST /structure-filter/batch/upload (file upload) → always async.curl -sS -X POST http://localhost:8000/api/v1/structure-filter/filter \
-H 'Content-Type: application/json' \
-d '{
"smiles_list": ["CCO", "c1ccccc1", "CC(=O)Oc1ccccc1C(=O)O", "[Ni]"],
"preset": "drug_like"
}'
Response (synchronous):
{
"input_count": 4,
"output_count": 1,
"stages": [
{"stage_name": "parse", "stage_index": 1, "input_count": 4, "passed_count": 4, "rejected_count": 0, "enabled": true},
{"stage_name": "valence", "stage_index": 2, "input_count": 4, "passed_count": 4, "rejected_count": 0, "enabled": true},
{"stage_name": "alerts", "stage_index": 3, "input_count": 4, "passed_count": 3, "rejected_count": 1, "enabled": true},
...
],
"molecules": [
{"smiles": "CCO", "status": "rejected", "failed_at": "property", "rejection_reason": "MW 46.07 below minimum 200"},
{"smiles": "CC(=O)Oc1ccccc1C(=O)O", "status": "passed", "failed_at": null, "rejection_reason": null},
...
]
}
curl -sS -X POST http://localhost:8000/api/v1/structure-filter/filter \
-H 'Content-Type: application/json' \
-d '{
"smiles_list": ["CCO", "c1ccccc1C(=O)O"],
"config": {
"min_mw": 100, "max_mw": 400,
"min_logp": -1, "max_logp": 4,
"max_tpsa": 120, "max_rot_bonds": 8,
"max_rings": null, "max_sa_score": 4.5,
"use_pains": true, "use_brenk": true, "use_kazius": false, "use_nibr": false,
"enable_novelty": false, "novelty_threshold": 0.85,
"weight_validity": 0.25, "weight_qed": 0.35,
"weight_alert_free": 0.2, "weight_sa": 0.2
}
}'
If both preset and config are given, preset wins.
curl -sS -X POST http://localhost:8000/api/v1/structure-filter/score \
-H 'Content-Type: application/json' \
-d '{
"smiles_list": ["CCO", "c1ccccc1C(=O)O", "invalid_smiles"],
"preset": "drug_like"
}'
Response: {"scores": [0.412, 0.871, null]}. null means unparseable — preserve that in downstream ML pipelines rather than coercing to 0.
The endpoint accepts the REINVENT 4 input contract verbatim (raw list, not wrapped in a model):
curl -sS -X POST "http://localhost:8000/api/v1/structure-filter/reinvent-score?preset=drug_like" \
-H 'Content-Type: application/json' \
-d '[
{"input_string": "CCO", "query_id": "q1"},
{"input_string": "c1ccccc1C(=O)O", "query_id": "q2"},
{"input_string": "invalid", "query_id": "q3"}
]'
Response:
{
"output": {
"successes_list": [
{"query_id": "q1", "output_value": 0.412},
{"query_id": "q2", "output_value": 0.871}
]
}
}
Critical invariant: invalid SMILES are omitted, not scored 0.0 (REINVENT expects this — see Pitfall 4 / D-14). Don't insert zeros yourself.
REINVENT 4 config.toml snippet:
[[parameters.component]]
type = "ExternalProcess"
command = "curl -X POST http://chemaudit:8000/api/v1/structure-filter/reinvent-score?preset=drug_like -H 'Content-Type: application/json' -d @-"
curl -sS -X POST http://localhost:8000/api/v1/structure-filter/batch/upload \
-F "file=@generated_library.csv" \
-F "preset=lead_like"
Or with explicit JSON config:
curl -sS -X POST http://localhost:8000/api/v1/structure-filter/batch/upload \
-F "file=@library.sdf" \
-F 'config={"min_mw":200,"max_mw":450,...}'
WebSocket:
const ws = new WebSocket(`ws://localhost:8000/ws/structure-filter/${job_id}`);
Poll:
curl -sS http://localhost:8000/api/v1/structure-filter/batch/<job_id>/status
Response: {job_id, status, progress, current_stage}.
curl -sS http://localhost:8000/api/v1/structure-filter/batch/<job_id>/results
Returns the same FilterResponse shape wrapped in {job_id, status, result}.
# Plain text — one passed SMILES per line
curl -sS http://localhost:8000/api/v1/structure-filter/batch/<job_id>/download/passed_txt \
-o passed.txt
# Full CSV — every molecule with status, failed_at, rejection_reason
curl -sS http://localhost:8000/api/v1/structure-filter/batch/<job_id>/download/full_csv \
-o results.csv
POST /structure-filter/batch/upload with the file and preset=drug_like → get job_id./status.GET /structure-filter/batch/<job_id>/download/passed_txt → plain-text file of passed SMILES.Wire the /reinvent-score endpoint as an ExternalProcess component. Choose the preset via query string. REINVENT's query_id round-trip is preserved; agents skip invalid SMILES naturally.
POST /structure-filter/filter with smiles_list (≤1000) and preset="fragment_like".molecules[], keep those with status="passed".stages[] has input_count / passed_count / rejected_count per stage.Set enable_novelty: true and novelty_threshold: 0.85 in an explicit config (no preset enables novelty by default):
curl -sS -X POST http://localhost:8000/api/v1/structure-filter/filter \
-H 'Content-Type: application/json' \
-d '{"smiles_list": [...], "config": {"min_mw": 200, "max_mw": 500, ..., "enable_novelty": true, "novelty_threshold": 0.85}}'
Molecules with Tanimoto > 0.85 to any ChEMBL compound are rejected at the novelty stage.
/structure-filter/filter: 20/min./structure-filter/score, /structure-filter/reinvent-score: 30/min./structure-filter/batch/upload: 3/min./structure-filter/batch/<job_id>/status: 60/min./structure-filter/batch/<job_id>/results: 30/min./structure-filter/batch/<job_id>/download/<format>: 10/min.Valid presets: drug_like, lead_like, fragment_like, permissive. Case-sensitive.
File parsed but every row errored. For CSV, ensure the column is named exactly SMILES. For SDF, ensure molecules have valid MOL blocks.
null = parse failure or SMILES rejected before scoring. 0.0 = parsed and scored but every component was 0. Critical distinction for generative models; never collapse them.
Expected — invalid SMILES are omitted from successes_list, not zero-scored. Match by query_id, not list position.
/status but no resultsCelery worker crashed. Check docker compose logs worker and re-upload. Redis TTL is 1 hour on metadata.
All molecules pass individual stages but none make it through dedup. Means the input contains only duplicates (identical InChIKeys). Deduplicate before filtering, or disable dedup by adjusting the pipeline.
alerts stage for a fragment-screenfragment_like preset has only PAINS enabled. If you want Brenk filtering for fragments, use an explicit config rather than the preset.
references/preset-configs.md — exhaustive threshold tables and weight vectors.chemaudit-qsar-ready — different goal: ML-ready curation with provenance, not pass/fail filtering.chemaudit-molecule-validation — single-molecule scoring when you need per-check detail.