/profile

Description

Systematic PyTorch profiling using 4-phase framework to identify bottlenecks

Install

Add the repository(one-time)

/plugin marketplace add tachyon-beep/skillpacks

Install the plugin

/plugin install yzmir-pytorch-engineering@foundryside-marketplace

Argument

[script_path] [--cpu|--gpu|--memory|--io]

Allowed Tools

ReadGrepGlobBashTaskWrite

Prompt Preview

# PyTorch Profiling Command

You are profiling PyTorch code to identify performance bottlenecks. Follow the 4-phase framework.

## Core Principle

**Profile before optimizing. Measure, don't guess.** The bottleneck is rarely where you think it is.

## Phase 1: Establish Baseline

Before profiling, establish a reproducible baseline:



**Critical GPU Timing Rule**: Always use `torch.cuda.synchronize()` or CUDA events for GPU timing. `time.time()` alone is meaningless for async GPU ops.

## Phase 2: Identify Bottleneck Type

Run the profiler to categorize the bottleneck:



### Bottleneck Cla...