Systematic PyTorch profiling using 4-phase framework to identify bottlenecks
/plugin marketplace add tachyon-beep/skillpacks/plugin install yzmir-pytorch-engineering@foundryside-marketplace[script_path] [--cpu|--gpu|--memory|--io]# PyTorch Profiling Command You are profiling PyTorch code to identify performance bottlenecks. Follow the 4-phase framework. ## Core Principle **Profile before optimizing. Measure, don't guess.** The bottleneck is rarely where you think it is. ## Phase 1: Establish Baseline Before profiling, establish a reproducible baseline: **Critical GPU Timing Rule**: Always use `torch.cuda.synchronize()` or CUDA events for GPU timing. `time.time()` alone is meaningless for async GPU ops. ## Phase 2: Identify Bottleneck Type Run the profiler to categorize the bottleneck: ### Bottleneck Cla...