From chemaudit
Standardizes chemical structures via ChemAudit's ChEMBL-compatible pipeline (checker, standardizer, getparent, optional tautomer) with per-stage provenance, atom-level diffs, and InChIKey comparisons. Activates on 'standardize molecule', 'canonical SMILES', 'strip salts' queries.
npx claudepluginhub kohulan/chemauditThis skill is limited to using the following tools:
ChEMBL-compatible standardization pipeline with four stages:
Diagnoses SMILES parse errors with position/fix suggestions, compares InChI layers, checks round-trip lossiness via InChI/MOL, compares standardization pipelines, pre-validates SDF/CSV files.
Analyzes molecules using RDKit: parses SMILES/SDF, computes descriptors (MW, LogP, TPSA), fingerprints (Morgan, MACCS), Tanimoto similarity, SMARTS filtering, Lipinski checks, reactions, 2D/3D coords.
Provides RDKit Python APIs for cheminformatics: SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, similarity, reactions. For drug discovery and computational chemistry.
Share bugs, ideas, or general feedback.
ChEMBL-compatible standardization pipeline with four stages:
ChEMBL structure checker; detects structural problems with penalty scores (penalty>0 means the checker flagged something).TautomerEnumerator. Warning: may destroy E/Z double-bond stereochemistry.Optional detailed provenance records atom-level charge/bond/ring changes per stage.
POST /api/v1/standardize.chemaudit standardize --smiles "CCO" (server) or --local (offline).POST /api/v1/diagnostics/cross-pipeline runs three pipelines (RDKit MolStandardize, ChEMBL-style, minimal) and compares outputs.curl -sS -X POST http://localhost:8000/api/v1/standardize \
-H 'Content-Type: application/json' \
-d '{"molecule": "CC(=O)Oc1ccccc1C(=O)[O-].[Na+]", "format": "auto"}'
Response:
{
"molecule_info": {
"input_smiles": "CC(=O)Oc1ccccc1C(=O)[O-].[Na+]",
"canonical_smiles": "...",
"inchikey": "...",
"molecular_formula": "C9H7NaO4",
"molecular_weight": 203.14
},
"result": {
"original_smiles": "CC(=O)Oc1ccccc1C(=O)[O-].[Na+]",
"standardized_smiles": "CC(=O)Oc1ccccc1C(=O)O",
"success": true,
"steps_applied": [
{"step_name": "checker", "applied": true, "description": "ChEMBL structure checker", "changes": "Found 1 issue"},
{"step_name": "standardizer", "applied": true, "description": "Normalize functional groups", "changes": "No changes"},
{"step_name": "get_parent", "applied": true, "description": "Remove salts and counterions", "changes": "Removed 1 fragment: Na+"}
],
"checker_issues": [
{"penalty_score": 5, "message": "Salt form detected"}
],
"excluded_fragments": ["[Na+]"],
"stereo_comparison": {
"before_count": 0, "after_count": 0, "lost": 0, "gained": 0,
"double_bond_stereo_lost": 0, "warning": null
},
"structure_comparison": {
"original_atom_count": 14, "standardized_atom_count": 13,
"original_formula": "C9H7NaO4", "standardized_formula": "C9H8O4",
"original_mw": 203.14, "standardized_mw": 180.16,
"mass_change_percent": -11.31, "is_identical": false,
"diff_summary": ["Removed: Na+"]
},
"mass_change_percent": -11.31,
"provenance": null
},
"execution_time_ms": 87
}
curl -sS -X POST http://localhost:8000/api/v1/standardize \
-H 'Content-Type: application/json' \
-d '{
"molecule": "Oc1nc(N)nc2c1ncn2",
"options": {
"include_tautomer": true,
"preserve_stereo": true
}
}'
Adds a tautomer_canonicalization step. Watch stereo_comparison.double_bond_stereo_lost — tautomer enumeration can collapse E/Z bonds.
curl -sS -X POST http://localhost:8000/api/v1/standardize \
-H 'Content-Type: application/json' \
-d '{
"molecule": "CC(=O)Oc1ccccc1C(=O)[O-].[Na+]",
"options": {
"include_provenance": true
}
}'
The result.provenance field fills with per-stage records:
{
"stages": [
{
"stage_name": "standardizer",
"input_smiles": "...", "output_smiles": "...",
"applied": true,
"charge_changes": [
{"atom_idx": 11, "element": "O", "before_charge": -1, "after_charge": 0, "rule_name": "carboxylate", "smarts": "..."}
],
"bond_changes": [...],
"radical_changes": [],
"ring_changes": [],
"fragment_removals": [],
"dval_cross_refs": []
},
{
"stage_name": "get_parent",
"applied": true,
"fragment_removals": [
{"smiles": "[Na+]", "name": "Sodium", "role": "counterion", "mw": 22.99}
],
...
}
],
"tautomer": null,
"stereo_summary": {
"stereo_stripped": false, "centers_lost": 0, "bonds_lost": 0,
"per_center": [], "dval_cross_refs": []
}
}
Cross-references to Deep Validation check IDs (DVAL-01, DVAL-03) appear in dval_cross_refs when the molecule fails those checks — useful for understanding whether standardization addressed a flagged issue.
If you already ran deep validation and want the standardization provenance to cross-reference DVAL IDs:
curl -sS -X POST http://localhost:8000/api/v1/standardize \
-H 'Content-Type: application/json' \
-d '{
"molecule": "...",
"options": {
"include_provenance": true,
"dval_results": {
"undefined_stereo": {"count": 2},
"tautomer_detection": {"count": 1}
}
}
}'
curl -sS http://localhost:8000/api/v1/standardize/options
Returns the canonical list of pipeline steps, their defaults, and warnings (e.g. tautomer stereo-loss caveat).
curl -sS -X POST http://localhost:8000/api/v1/diagnostics/cross-pipeline \
-H 'Content-Type: application/json' \
-d '{"molecule": "CC(=O)Oc1ccccc1C(=O)[O-].[Na+]", "format": "auto"}'
Runs:
MolStandardize (Cleanup + LargestFragment + Uncharger + TautomerEnumerator)./standardize).Compares SMILES, InChIKey, MW, formula, charge, stereo count across all three. Surfaces disagreements — critical when picking a pipeline for a new dataset.
chemaudit standardize --smiles "CC(=O)Oc1ccccc1C(=O)[O-].[Na+]"
chemaudit standardize --smiles "..." --local --format json # offline
chemaudit standardize --file compounds.csv # first line
echo "CCO" | chemaudit standardize # stdin
--local uses app.services.standardization.chembl_pipeline directly — no HTTP. Outputs a Rich table by default, JSON when piped or --format json.
result.stereo_comparison summarizes:
| Field | Meaning |
|---|---|
before_count / after_count | Defined stereocenters before and after |
lost / gained | Net change in defined stereocenters |
double_bond_stereo_lost | E/Z bonds lost to tautomerization |
warning | Populated when lost > 0 or double_bond_stereo_lost > 0 |
preserve_stereo: true (default) makes the pipeline attempt to retain stereochemistry through each stage. Tautomer canonicalization is the main offender; the warning surfaces when it strips stereo.
POST /standardize (defaults).result.original_smiles vs result.standardized_smiles.result.steps_applied[]; print each step's description + changes.result.excluded_fragments[] for the salt/solvent story.structure_comparison.mass_change_percent — usually negative (lost a counterion).POST /diagnostics/cross-pipeline for each — read each pipeline's InChIKey.disagreements and property_comparison[] to see which property differs.structural_disagreements > 0, the three standardization approaches produced different molecular graphs — investigate the per-pipeline error fields.Defaults (include_tautomer: false) already do this. The pipeline runs Checker → Standardizer (normalizes functional groups) → GetParent (salt stripping) — leaving tautomers alone. The result's canonical SMILES is safe to use as a dedup key without risking E/Z loss.
Pass {"options": {"preserve_stereo": true, "include_tautomer": false}} — default. If you need tautomer canonicalization and stereo, accept that some E/Z information may be lost; the response surfaces exactly how much in stereo_comparison.double_bond_stereo_lost.
/standardize: 10/min./standardize/options: 30/min./diagnostics/cross-pipeline: 10/min.API-key tier raises to 300/min.
result.success = falseThe pipeline raised an exception; result.error_message has details. Common causes:
format_detected).complexity_flag=true.standardized_smiles = null but success = trueStripping all fragments left nothing. Means the input was entirely a salt/counterion with no parent (e.g. [Na+].[Cl-]). Not an error — genuinely no parent exists.
mass_change_percent = 0 but steps_applied shows applied stagesThe standardizer applied normalizations that don't change mass (e.g. resonance structure normalization, redrawing nitro groups). Check structure_comparison.diff_summary[] for the qualitative story.
double_bond_stereo_lost > 0 after enabling tautomerExpected. Tautomer canonicalization breaks and reforms double bonds. If E/Z geometry matters, set include_tautomer: false — accept that different tautomeric inputs will produce different canonical SMILES.
Likely tautomer canonicalization, or a charge/protonation normalization. Run with include_provenance: true and inspect charge_changes[] and bond_changes[] to see the atom-level story.
penalty_score = 0Zero-penalty entries are informational. Non-zero penalties mean the ChEMBL checker considers the input substandard.
chemaudit-qsar-ready — when you want the full 10-step ML-curation pipeline, not just the 3-4 ChEMBL steps.chemaudit-diagnostics — for cross-pipeline comparison and SMILES round-trip checking.chemaudit-molecule-validation — for deep checks that complement standardization.