🔬 nsys-ai
AI-powered analysis for NVIDIA Nsight Systems profiles
Navigate GPU kernel timelines, diagnose performance bottlenecks with AI, and explore NVTX hierarchies — from your browser or terminal.
Mission: Build an intelligent agent that truly understands GPU performance from first principles. An agent that can identify pipeline bubbles, calculate MFU, assess arithmetic intensity, and diagnose the root causes that cost millions of dollars in GPU hours — turning months of expert debugging into minutes.

⚡ Install
pip install nsys-ai
That's it. No system dependencies, no CUDA required. Just Python 3.10+.
🌐 Web UI First (Default)
nsys-ai is web-first. The default command opens the timeline UI in your browser.
# Default: open web timeline UI
nsys-ai my_training.nsys-rep
# Explicit command (same web UI)
nsys-ai timeline-web my_training.nsys-rep
Use TUI/CLI modes when you specifically want terminal workflows.
🎯 What It Does
nsys-ai reads .nsys-rep or .sqlite profile exports from NVIDIA Nsight Systems and gives you a web-first workflow plus terminal and export tools:
🌐 Web Timeline
Multi-GPU browser viewer with progressive rendering
|
🖥️ Timeline TUI
Perfetto-style horizontal timeline in your terminal
|
🌲 Tree TUI
Interactive NVTX hierarchy browser with kernel details
|
📄 HTML Export
Exportable interactive visualizations for sharing
|
|
Browser-based viewer:
• Multi-GPU stacked streams
• NVTX hierarchy bars
• Pinch-to-zoom, trackpad pan
• AI chat sidebar
|
S21 ████░██████░███
S56 ██████░░░███████
S60 ░░░██████░░░░░██
| │
39.1s 39.5s
|
▼ Iteration (324ms)
▼ forward (180ms)
▼ Attention (89ms)
■ flash_fwd 26ms
■ flash_bwd 63ms
|
Interactive HTML exports:
• NVTX stack viewer
• SQLite schema explorer
• Perfetto JSON traces
|
🚀 Quick Start
1. Get a profile
# Option A: Profile your own PyTorch training
nsys profile -o my_training python train.py
# → produces my_training.nsys-rep (or .sqlite via --export sqlite)
# Option B: Download an example profile
cd examples/example-20-megatron-distca
python download_data.py
# → downloads output/megatron_distca.nsys-rep
2. Explore it
# Start here: one command opens the web timeline in your browser
nsys-ai my_training.nsys-rep
# Or explicitly:
nsys-ai timeline-web my_training.nsys-rep
# Then use overview/summaries as needed
nsys-ai info my_training.nsys-rep
# GPU kernel summary
nsys-ai summary my_training.nsys-rep --gpu 0
Prefer a terminal? nsys-ai also has full TUI support:
nsys-ai timeline my_training.nsys-rep --gpu 0 --trim 39 42 # horizontal timeline
nsys-ai tui my_training.nsys-rep --gpu 0 --trim 39 42 # tree browser
3. Export & share
# Perfetto JSON (open in ui.perfetto.dev)
nsys-ai export my_training.sqlite -o traces/
# Interactive HTML viewer
nsys-ai viewer my_training.sqlite --gpu 0 --trim 39 42 -o report.html
# Flat CSV/JSON for scripting
nsys-ai export-csv my_training.sqlite --gpu 0 --trim 39 42 -o kernels.csv
🌐 Web Timeline
The web timeline is a browser-based multi-GPU viewer with progressive rendering — no --trim required. This is the default view when you run nsys-ai <profile>.
# Just give it a profile — opens in your browser
nsys-ai my_training.nsys-rep
# Or explicitly with GPU selection:
nsys-ai timeline-web my_training.nsys-rep --gpu 0 1 2 3
Features
- Multi-GPU stacked view — all GPUs shown simultaneously with color-coded separators
- Progressive rendering — pre-builds full NVTX tree at startup, then serves tiles instantly (~1ms per tile)
- NVTX hierarchy — layered bars (L0–L5) showing annotation nesting per GPU
- AI chat sidebar — press
A to ask questions about the profile
- Kernel search — press
/ to search by kernel name
Navigation