From sciagent-skills
Provides protocols for Western blot quantification and analysis: band detection, loading control normalization, two-step normalization, statistical aggregation across replicates.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
---
Processes MaxQuant proteomics outputs in Python: parses proteinGroups.txt, filters contaminants/decoys, log2 median-normalizes, imputes MNAR, t-tests with FDR, volcano plots, GO enrichment.
Analyzes mass spectrometry proteomics data for protein quantification, differential expression, PTMs, and PPIs. Processes MaxQuant, Spectronaut, DIA-NN outputs; performs normalization, stats, enrichment, transcriptomics integration.
Differential expression analysis for label-free quantitative (LFQ) intensity data with standard MaxQuant and DIA-NN output. Workflow includes preprocessing, imputation, and statistical testing.
Share bugs, ideas, or general feedback.
Short Description: Comprehensive guide for quantifying and analyzing Western blot images with multiple experimental repetitions, including intensity measurement, normalization, statistical analysis, and visualization.
Authors: Ohagent Team
Version: 1.0
Last Updated: December 2025
License: CC BY 4.0
Commercial Use: ✅ Allowed
This guide provides a standardized workflow for analyzing Western blot images, particularly for experiments with multiple repetitions and conditions. The protocol covers band intensity detection, normalization procedures, statistical aggregation, and visualization best practices.
Western blot quantification cannot use raw band intensities, because total protein loaded per lane varies between samples (pipetting error, transfer efficiency, gel artifacts). A loading control is a protein assumed to be expressed at the same level across all samples (commonly GAPDH, β-actin, α-tubulin, or a total-protein stain such as Ponceau S / stain-free imaging). Dividing the target band intensity by the loading control intensity in the same lane yields a normalized value that corrects for these per-lane technical variations. The loading control must itself be unsaturated and within the linear dynamic range of the detection system.
When two related signals are measured in the same blot — for example a total form (SMAD2) and its phosphorylated form (PSMAD2) — a two-step normalization disentangles changes in protein abundance from changes in modification state. Step A normalizes the total protein to a housekeeping control (SMAD2_norm = SMAD2 / GAPDH); Step B normalizes the modified form to that loading-corrected total (PSMAD2_target = PSMAD2 / SMAD2_norm). This isolates the modification-specific signal from changes in expression of the underlying protein.
Each Western blot is one experimental observation; biological conclusions require biological replicates (independent experiments, not just multiple lanes from one gel). Aggregation steps: (1) normalize within each replicate, (2) compute fold-change relative to the within-replicate control (so the control is 1.0 by definition), (3) compute mean and dispersion (SD or SE) across replicates. Normalizing across replicates before computing fold-change inflates apparent effect size and confuses gel-to-gel variation with biological effect.
SD describes the spread of the underlying biological response across replicates and is appropriate when the question is "how variable is this effect?". SE (= SD / √n) describes the precision of the estimated mean and is appropriate when the question is "how confident are we in this mean value?". For typical n=3 western blot experiments, SD bars look larger than SE bars but communicate the underlying biology more honestly. Always state which error measure is plotted in the figure legend.
Western blot quantification decision tree
└── Single target protein measured?
├── Yes -> Single-step normalization: Target / LoadingControl (per lane)
│ └── Compute fold change vs control within each replicate
│ └── Aggregate mean +/- error across replicates
└── No, two related signals (e.g., total + modified form)
└── Two-step normalization
├── Step A: TotalForm_norm = TotalForm / LoadingControl (per lane)
└── Step B: ModifiedForm_target = ModifiedForm / TotalForm_norm
Error bar choice:
└── Reporting biological variability of the effect? -> SD
└── Reporting precision of the mean estimate? -> SE = SD / sqrt(n)
Experimental design choice:
└── Discrete treatments (control vs conditions) -> Multi-condition design + bar graph + ANOVA / t-tests
└── Same treatment over multiple time points -> Time course design + line graph; normalize to t0 control
└── Same treatment at multiple concentrations -> Dose response design + log-x line graph; fit EC50 / IC50
| Situation | Recommended choice | Rationale |
|---|---|---|
| Quantifying total protein abundance changes | Single-step normalization (Target / LoadingControl) | One measurement per lane; loading control corrects total-protein loading |
| Quantifying post-translational modification (phosphorylation, ubiquitination) | Two-step normalization (Modified / Total_norm) | Isolates modification stoichiometry from changes in total protein expression |
| n = 3 replicates, biology-focused figure | Mean ± SD | Communicates the spread of the biological response |
| n = 3 replicates, statistical-precision figure | Mean ± SE | Communicates the precision of the mean estimate |
| Small fold changes (~1.5×) on noisy blots | Increase n to ≥ 4–6 and report SE with explicit n in legend | Low effect size requires more replicates for adequate statistical power |
| Comparing 4+ discrete conditions | Multi-condition design + ANOVA with post-hoc correction | Pairwise t-tests across many conditions inflate Type I error |
| Tracking the same effect over time | Time-course design, normalize to t = 0 within each replicate | Removes baseline drift between replicates |
| Determining potency (EC50 / IC50) | Dose-response design with log-spaced concentrations | Log spacing samples the sigmoidal response uniformly; nonlinear fit gives EC50 |
| Loading control band saturated | Re-image at lower exposure or dilute the lysate | Saturated bands violate the linear dynamic range and silently bias normalization |
| One outlier replicate with unusually high variability | Document and exclude with justification (e.g., transfer artifact) | Honest exclusion is preferable to a noisy mean; never silently drop data |
Objective: Identify ROIs and isolate individual bands in the Western blot image.
Key Considerations:
Tools: analyze_pixel_distribution, find_roi_from_image
Objective: Quantify band intensities for all detected bands.
Procedure:
For each lane/repetition, measure the intensity of:
Record measurements in a structured format:
Best Practices:
Objective: Normalize target protein intensities to account for loading variations.
Two-Step Normalization Process:
Calculate the relative intensity of the loading control protein:
SMAD2_norm = Intensity_SMAD2 / Intensity_GAPDH
This accounts for variations in total protein loading across samples.
Calculate the final normalized target protein intensity:
Target_value = Intensity_PSMAD2 / SMAD2_norm
This provides the normalized PSMAD2 intensity that accounts for both loading control and relative protein levels.
Alternative Normalization Methods:
Target_norm = Intensity_Target / Intensity_GAPDHTarget_norm = Intensity_Target / Intensity_TotalProteinObjective: Express results relative to a control condition.
Procedure:
Fold_Change = Target_value_condition / Target_value_control
Important Notes:
Objective: Combine data from multiple experimental repetitions.
Procedure:
Statistical Considerations:
Objective: Create clear, publication-ready visualizations.
Bar Graph Requirements:
Visualization Best Practices:
Verification Images:
wb_grid_verification.png)Problem: Some bands not detected or incorrectly identified Solutions:
Problem: Large standard deviations or inconsistent results Solutions:
Problem: Normalized values don't match expected biological response Solutions:
Problem: High background affecting intensity measurements Solutions:
Before finalizing analysis, verify:
Required Outputs:
Quantification results: CSV or Excel file with:
Visualization: Bar graph image (e.g., psmad2_quantification.png)
Verification image (optional but recommended): wb_grid_verification.png
For a typical experiment with 3 repetitions and 4 conditions:
Loading Control Normalization:
Loading_norm = Intensity_LoadingControl / Intensity_Housekeeping
Target Normalization:
Target_norm = Intensity_Target / Loading_norm
Fold Change:
Fold_Change = Target_norm_condition / Target_norm_control
Statistics:
Mean = Σ(values) / n
SD = √[Σ(value - mean)² / (n-1)]
SE = SD / √n
Pitfall: Reporting raw band intensities without loading-control normalization. Differences seen on the blot can be entirely explained by per-lane loading variation.
Pitfall: Using a saturated loading control. A saturated GAPDH/β-actin band looks "even" but is outside the linear dynamic range, so normalization silently understates true differences.
Pitfall: Aggregating normalized values across replicates before computing fold change. This conflates gel-to-gel variation with biological signal and inflates the apparent effect size.
Pitfall: Plotting SE bars but labeling them SD (or vice versa). Reviewers and readers cannot interpret the figure correctly.
Pitfall: Drawing conclusions from n = 1 or n = 2 experiments. A single observation cannot distinguish biological effect from technical noise.
Pitfall: Silently excluding outlier replicates without documentation. This biases the reported mean and is irreproducible.
Pitfall: Choosing a loading control that itself responds to the treatment. Some "housekeeping" proteins (e.g., GAPDH) change under metabolic stress, hypoxia, or starvation, breaking the assumption that the loading control is constant.
Pitfall: Using fixed-threshold automatic ROI detection on every image. Different exposures, contrasts, and noise floors require different thresholds; one-size-fits-all detection misses dim bands or splits strong ones.
lower_threshold and upper_threshold per image; manually verify the grid overlay before extracting intensities, and preserve correctly detected ROIs when adjusting parameters.