Complete GPU-accelerated encoding/decoding system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20). PROACTIVELY activate for: (1) NVIDIA NVENC/NVDEC encoding, (2) Intel Quick Sync Video (QSV), (3) AMD AMF encoding, (4) Apple VideoToolbox, (5) Linux VAAPI setup, (6) Vulkan Video 8.0 (FFv1, AV1, VP9, ProRes RAW), (7) VVC/H.266 hardware decoding (VAAPI/QSV), (8) GPU pipeline optimization with pad_cuda, (9) Docker GPU containers, (10) Performance benchmarking. Provides: Platform-specific commands, preset comparisons, quality tuning, full GPU pipeline examples, Vulkan compute codecs, VVC decoding, troubleshooting guides. Ensures: Maximum encoding speed with optimal quality using GPU acceleration.
Provides GPU-accelerated FFmpeg encoding/decoding for NVIDIA, Intel, AMD, Apple, and Vulkan.
npx claudepluginhub josiahsiegel/claude-plugin-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
MANDATORY: Always Use Backslashes on Windows for File Paths
When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).
| Platform | Encoder | Decoder | Detect Command |
|---|---|---|---|
| NVIDIA | h264_nvenc, hevc_nvenc, av1_nvenc | h264_cuvid, hevc_cuvid | ffmpeg -encoders | grep nvenc |
| Intel QSV | h264_qsv, hevc_qsv, av1_qsv | h264_qsv, hevc_qsv | ffmpeg -encoders | grep qsv |
| AMD AMF | h264_amf, hevc_amf, av1_amf | N/A (use software) | ffmpeg -encoders | grep amf |
| Apple | h264_videotoolbox, hevc_videotoolbox | h264_videotoolbox | macOS only |
| VAAPI | h264_vaapi, hevc_vaapi, av1_vaapi | with -hwaccel vaapi | Linux only |
Use when GPU acceleration is needed:
Key decision: GPU encoding trades some quality for massive speed. Use -cq or -qp for quality control.
Comprehensive guide to GPU-accelerated encoding and decoding with NVIDIA, Intel, AMD, Apple, and Vulkan.
Current Latest: FFmpeg 8.0.1 (released 2025-11-20) - Check with ffmpeg -version
Hardware acceleration uses dedicated GPU/SoC components for video processing:
| Method | Speed | Quality | Power | Use Case |
|---|---|---|---|---|
| libx264 (CPU) | 1x | Best | High | Quality-critical |
| libx265 (CPU) | 0.3x | Best | Very High | Archival |
| h264_nvenc | 10-20x | Good | Low | Real-time, streaming |
| hevc_nvenc | 8-15x | Good | Low | 4K streaming |
| h264_qsv | 8-15x | Good | Very Low | Laptop, efficiency |
| h264_amf | 8-15x | Good | Low | AMD systems |
--enable-nvenc --enable-cuda --enable-cuvid# Check available NVIDIA codecs
ffmpeg -encoders | grep nvenc
ffmpeg -decoders | grep cuvid
# Check GPU info
nvidia-smi
# List hardware accelerators
ffmpeg -hwaccels
# H.264 NVENC encoding
ffmpeg -i input.mp4 -c:v h264_nvenc -preset p4 -b:v 5M output.mp4
# H.265/HEVC NVENC encoding
ffmpeg -i input.mp4 -c:v hevc_nvenc -preset p4 -b:v 4M output.mp4
# AV1 NVENC (RTX 40 series+)
ffmpeg -i input.mp4 -c:v av1_nvenc -preset p4 -b:v 3M output.mp4
| Preset | Speed | Quality | Use Case |
|---|---|---|---|
| p1 | Fastest | Lowest | Real-time capture |
| p2 | Faster | Low | Screen recording |
| p3 | Fast | Medium | General streaming |
| p4 | Medium | Good | Recommended |
| p5 | Slow | Better | High-quality streaming |
| p6 | Slower | Best | Offline encoding |
| p7 | Slowest | Highest | Maximum quality |
# Keep frames in GPU memory (fastest)
ffmpeg -y -vsync 0 \
-hwaccel cuda \
-hwaccel_output_format cuda \
-i input.mp4 \
-c:v h264_nvenc \
-preset p4 \
-b:v 5M \
-c:a copy \
output.mp4
# GPU-based scaling with scale_cuda
ffmpeg -y -vsync 0 \
-hwaccel cuda \
-hwaccel_output_format cuda \
-i input.mp4 \
-vf scale_cuda=1280:720 \
-c:v h264_nvenc \
-preset p4 \
output.mp4
# GPU overlay with overlay_cuda
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i main.mp4 \
-hwaccel cuda -hwaccel_output_format cuda -i overlay.mp4 \
-filter_complex "[0:v][1:v]overlay_cuda=10:10" \
-c:v h264_nvenc output.mp4
# GPU padding with pad_cuda (FFmpeg 8.0+)
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
-vf "pad_cuda=1920:1080:(ow-iw)/2:(oh-ih)/2:black" \
-c:v h264_nvenc output.mp4
# Letterbox with pad_cuda
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
-vf "scale_cuda=1920:-2,pad_cuda=1920:1080:(ow-iw)/2:(oh-ih)/2" \
-c:v h264_nvenc output.mp4
| Filter | Description | Key Parameters |
|---|---|---|
scale_cuda | GPU video scaling | w, h, format, interp_algo |
scale_npp | NPP-based scaling (higher quality) | w, h, interp_algo |
overlay_cuda | GPU overlay/compositing | x, y, eof_action |
pad_cuda | GPU padding (8.0+) | w, h, x, y, color |
chromakey_cuda | GPU chroma keying | color, similarity, blend |
colorspace_cuda | Color space conversion | all, space, trc, primaries |
bilateral_cuda | Bilateral filter | sigmaS, sigmaR, window_size |
bwdif_cuda | Deinterlacing | mode, parity, deint |
hwupload_cuda | Upload to GPU | - |
hwdownload | Download from GPU | - |
For higher-quality scaling, use scale_npp with optimized interpolation algorithms:
# High-quality downscaling with super-sampling
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input.mp4 \
-vf "scale_npp=1280:720:interp_algo=super" \
-c:v h264_nvenc output.mp4
# Lanczos interpolation for upscaling
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input.mp4 \
-vf "scale_npp=1920:1080:interp_algo=lanczos" \
-c:v h264_nvenc output.mp4
scale_npp Interpolation Algorithms:
| Algorithm | Quality | Speed | Best For |
|---|---|---|---|
nn | Low | Fastest | Pixel art, nearest neighbor |
linear | Medium | Fast | General use |
cubic | Good | Medium | Smooth scaling |
lanczos | High | Slow | High quality upscaling |
super | Best | Slowest | Downscaling |
# Green screen removal on GPU
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i greenscreen.mp4 \
-hwaccel cuda -hwaccel_output_format cuda -i background.mp4 \
-filter_complex "[0:v]chromakey_cuda=0x00FF00:0.3:0.1[fg];[1:v][fg]overlay_cuda" \
-c:v h264_nvenc output.mp4
When mixing CPU and GPU filters, use hwupload_cuda and hwdownload:
# CPU decode -> CPU filter -> GPU encode
ffmpeg -i input.mp4 \
-vf "fade=in:0:30,hwupload_cuda" \
-c:v h264_nvenc output.mp4
# GPU decode -> CPU filter (drawtext) -> GPU encode
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input.mp4 \
-vf "hwdownload,format=nv12,drawtext=text='Hello':fontsize=48,hwupload_cuda" \
-c:v h264_nvenc output.mp4
# Complete hybrid pipeline: GPU scale -> CPU text -> GPU encode
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
-filter_complex "\
[0:v]scale_cuda=1920:1080[scaled];\
[scaled]hwdownload,format=nv12[cpu];\
[cpu]drawtext=text='Watermark':fontsize=48:x=10:y=10[text];\
[text]hwupload_cuda[gpu]" \
-map "[gpu]" \
-c:v h264_nvenc output.mp4
# High quality with lookahead
ffmpeg -i input.mp4 \
-c:v hevc_nvenc \
-preset p5 \
-tune hq \
-rc vbr \
-cq 23 \
-b:v 0 \
-rc-lookahead 32 \
-spatial-aq 1 \
-temporal-aq 1 \
-c:a copy \
output.mp4
# Constant quality mode (CQP)
ffmpeg -i input.mp4 \
-c:v h264_nvenc \
-preset p4 \
-rc constqp \
-qp 23 \
-c:a copy \
output.mp4
# Two-pass for best quality (ABR)
ffmpeg -i input.mp4 \
-c:v h264_nvenc \
-preset p5 \
-2pass 1 \
-b:v 5M \
-c:a copy \
output.mp4
# Enable B-frames for better compression
ffmpeg -i input.mp4 \
-c:v hevc_nvenc \
-preset p4 \
-bf 4 \
-b_ref_mode 2 \
-g 250 \
-c:a copy \
output.mp4
PCIe transfers between CPU and GPU memory are the primary bottleneck. Keeping frames in GPU memory throughout the pipeline is critical for performance.
Memory Flow Patterns:
Pattern 1: Full GPU Pipeline (OPTIMAL)
Input File -> GPU Decode -> GPU Filter -> GPU Encode -> Output File
| | |
v v v
[GPU Memory - No PCIe Transfer]
Pattern 2: CPU Filter Insertion (SUBOPTIMAL)
Input -> GPU Decode -> hwdownload -> CPU Filter -> hwupload -> GPU Encode -> Output
| |
v v
[PCIe Transfer] [PCIe Transfer]
Pattern 3: Software Decode + GPU Encode
Input -> CPU Decode -> hwupload -> GPU Encode -> Output
|
v
[PCIe Transfer]
# OPTIMAL: Full GPU pipeline - no CPU-GPU transfers
ffmpeg -y -vsync 0 \
-hwaccel cuda \
-hwaccel_output_format cuda \
-i input.mp4 \
-vf scale_cuda=1280:720 \
-c:v h264_nvenc \
output.mp4
# SUBOPTIMAL: GPU decode, CPU filter, GPU encode - 2 transfers
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input.mp4 \
-vf "hwdownload,format=nv12,drawtext=text='Hello',hwupload_cuda" \
-c:v h264_nvenc output.mp4
# AVOID: Missing -hwaccel_output_format - frame copies to CPU after decode
ffmpeg -hwaccel cuda -i input.mp4 \
-vf scale=1280:720 \
-c:v h264_nvenc output.mp4
Why -hwaccel_output_format cuda matters: Without it, decoded frames transfer from GPU to CPU via PCIe, get processed, then transfer back to GPU for encoding. This can reduce throughput by up to 50%.
# NVIDIA GPU memory and utilization
nvidia-smi dmon -s m
# Watch GPU utilization in real-time
watch -n 1 nvidia-smi
# Intel GPU monitoring
intel_gpu_top
-hwaccel_output_format to keep decoded frames in GPU memory-vsync 0 to prevent frame duplication overhead# Specify GPU by index (parallel processing on different GPUs)
ffmpeg -hwaccel cuda \
-hwaccel_device 0 \
-hwaccel_output_format cuda \
-i input1.mp4 \
-c:v h264_nvenc output1.mp4 &
ffmpeg -hwaccel cuda \
-hwaccel_device 1 \
-hwaccel_output_format cuda \
-i input2.mp4 \
-c:v h264_nvenc output2.mp4 &
wait
# Initialize specific CUDA device for filters
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
-init_hw_device cuda=gpu1:1 \
-filter_complex "[0:v]scale_cuda=1280:720[scaled]" \
-map "[scaled]" \
-c:v h264_nvenc output.mp4
More efficient than separate processes - shares CUDA context:
# Single input to multiple resolutions (ABR ladder)
ffmpeg -y -vsync 0 \
-hwaccel cuda \
-hwaccel_output_format cuda \
-i input.mp4 \
-filter_complex "[0:v]split=3[v1][v2][v3];\
[v1]scale_cuda=1920:1080[hd];\
[v2]scale_cuda=1280:720[sd];\
[v3]scale_cuda=640:360[mobile]" \
-map "[hd]" -c:v h264_nvenc -b:v 5M output_1080p.mp4 \
-map "[sd]" -c:v h264_nvenc -b:v 2M output_720p.mp4 \
-map "[mobile]" -c:v h264_nvenc -b:v 500k output_360p.mp4
For maximum throughput:
# Environment variables for faster initialization
export CUDA_VISIBLE_DEVICES=0
export CUDA_DEVICE_MAX_CONNECTIONS=2
# Parallel encoding script
for i in {1..4}; do
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input_$i.mp4 \
-c:v h264_nvenc \
output_$i.mp4 &
done
wait
--enable-libmfx or --enable-libvpl# Check available QSV codecs
ffmpeg -encoders | grep qsv
ffmpeg -decoders | grep qsv
# Verify Intel GPU access (Linux)
ls /dev/dri/
vainfo # Check VAAPI support
# Initialize QSV device
ffmpeg -init_hw_device qsv=hw \
-filter_hw_device hw \
-i input.mp4 \
-c:v h264_qsv \
-preset medium \
-b:v 5M \
output.mp4
# H.265/HEVC QSV
ffmpeg -init_hw_device qsv=hw \
-filter_hw_device hw \
-i input.mp4 \
-c:v hevc_qsv \
-preset medium \
-b:v 4M \
output.mp4
# AV1 QSV (Intel Arc, 12th gen+)
ffmpeg -init_hw_device qsv=hw \
-filter_hw_device hw \
-i input.mp4 \
-c:v av1_qsv \
-preset medium \
-b:v 3M \
output.mp4
# Decode and encode on GPU
ffmpeg -hwaccel qsv \
-hwaccel_output_format qsv \
-i input.mp4 \
-c:v h264_qsv \
-preset medium \
-b:v 5M \
output.mp4
# Constant quality (recommended)
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
-i input.mp4 \
-c:v hevc_qsv \
-preset slow \
-global_quality 22 \
-look_ahead 1 \
output.mp4
# VBR with lookahead
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
-i input.mp4 \
-c:v h264_qsv \
-preset medium \
-b:v 5M \
-maxrate 7M \
-bufsize 10M \
-look_ahead 1 \
-look_ahead_depth 40 \
output.mp4
QSV Quality Parameters:
| Parameter | Description | Range |
|---|---|---|
-global_quality | Quality level (like CRF) | 1-51 (lower = better) |
-preset | Encoding speed/quality | veryfast, faster, fast, medium, slow, slower, veryslow |
-look_ahead | Enable lookahead | 0 or 1 |
-look_ahead_depth | Lookahead frames | 10-100 |
# Select specific GPU by render device
ffmpeg -init_hw_device qsv=hw:/dev/dri/renderD129 \
-filter_hw_device hw \
-i input.mp4 \
-c:v h264_qsv output.mp4
# GPU scaling with vpp_qsv
ffmpeg -hwaccel qsv \
-hwaccel_output_format qsv \
-i input.mp4 \
-vf "vpp_qsv=w=1280:h=720" \
-c:v h264_qsv \
-preset medium \
output.mp4
# Hardware VVC decoding
ffmpeg -hwaccel qsv \
-hwaccel_output_format qsv \
-c:v vvc_qsv \
-i input.vvc \
-c:v h264_qsv \
output.mp4
--enable-amfffmpeg -encoders | grep amf
# H.264 AMF
ffmpeg -i input.mp4 \
-c:v h264_amf \
-quality balanced \
-b:v 5M \
output.mp4
# H.265/HEVC AMF
ffmpeg -i input.mp4 \
-c:v hevc_amf \
-quality balanced \
-b:v 4M \
output.mp4
# AV1 AMF (RDNA3+)
ffmpeg -i input.mp4 \
-c:v av1_amf \
-quality balanced \
-b:v 3M \
output.mp4
| Preset | Description |
|---|---|
| speed | Fastest encoding |
| balanced | Recommended |
| quality | Best quality |
# Super resolution upscaling
ffmpeg -i input.mp4 \
-vf "sr_amf=4096:2160:algorithm=sr1-1" \
-c:v hevc_amf \
output.mp4
--enable-vaapi# Check VAAPI info
vainfo
# List VAAPI codecs
ffmpeg -encoders | grep vaapi
ffmpeg -decoders | grep vaapi
# H.264 VAAPI
ffmpeg -vaapi_device /dev/dri/renderD128 \
-i input.mp4 \
-vf 'format=nv12,hwupload' \
-c:v h264_vaapi \
-b:v 5M \
output.mp4
# H.265/HEVC VAAPI
ffmpeg -vaapi_device /dev/dri/renderD128 \
-i input.mp4 \
-vf 'format=nv12,hwupload' \
-c:v hevc_vaapi \
-b:v 4M \
output.mp4
# Decode and encode on GPU
ffmpeg -hwaccel vaapi \
-hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi \
-i input.mp4 \
-c:v h264_vaapi \
-b:v 5M \
output.mp4
ffmpeg -hwaccel vaapi \
-hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi \
-i input.mp4 \
-vf 'scale_vaapi=w=1280:h=720' \
-c:v h264_vaapi \
output.mp4
FFmpeg 8.0 adds VVC/H.266 hardware decoding on Intel and AMD GPUs via VAAPI.
# Hardware VVC/H.266 decoding
ffmpeg -hwaccel vaapi \
-hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi \
-i input.vvc \
-c:v h264_vaapi \
output.mp4
# VVC decode + transcode to H.265
ffmpeg -hwaccel vaapi \
-hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi \
-i input.mkv \
-c:v hevc_vaapi \
-b:v 4M \
output.mp4
# VVC with Screen Content Coding (SCC) support
# FFmpeg 8.0 adds full SCC support including:
# - IBC (Inter Block Copy)
# - Palette Mode
# - ACT (Adaptive Color Transform)
ffmpeg -hwaccel vaapi \
-hwaccel_device /dev/dri/renderD128 \
-i screen_recording.vvc \
output.mp4
VVC VAAPI Requirements:
--enable-videotoolboxffmpeg -encoders | grep videotoolbox
ffmpeg -decoders | grep videotoolbox
# H.264 VideoToolbox
ffmpeg -i input.mp4 \
-c:v h264_videotoolbox \
-b:v 5M \
output.mp4
# H.265/HEVC VideoToolbox
ffmpeg -i input.mp4 \
-c:v hevc_videotoolbox \
-b:v 4M \
-tag:v hvc1 \
output.mp4
# ProRes VideoToolbox
ffmpeg -i input.mp4 \
-c:v prores_videotoolbox \
-profile:v 3 \
output.mov
# Quality-based encoding
ffmpeg -i input.mp4 \
-c:v h264_videotoolbox \
-q:v 65 \
output.mp4
# Hardware accelerated decode + encode
ffmpeg -hwaccel videotoolbox \
-i input.mp4 \
-c:v h264_videotoolbox \
-b:v 5M \
output.mp4
Vulkan Video provides cross-platform GPU acceleration using Vulkan compute shaders. Unlike proprietary hardware accelerators, Vulkan codecs are based on compute shaders and work on any implementation of Vulkan 1.3.
Benefits of Vulkan Compute Codecs:
# H.264 Vulkan
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-c:v h264_vulkan \
-b:v 5M \
output.mp4
# H.265/HEVC Vulkan
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-c:v hevc_vulkan \
-b:v 4M \
output.mp4
# AV1 Vulkan (FFmpeg 8.0+)
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-c:v av1_vulkan \
-b:v 3M \
output.mp4
# FFv1 Vulkan Lossless (FFmpeg 8.0+)
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-c:v ffv1_vulkan \
output.mkv
# Complete Vulkan decode-filter-encode
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-hwaccel vulkan \
-hwaccel_output_format vulkan \
-i input.mp4 \
-vf "scale_vulkan=1280:720" \
-c:v h264_vulkan \
output.mp4
# Hardware decode VP9 with Vulkan
ffmpeg -init_hw_device vulkan \
-hwaccel vulkan \
-hwaccel_output_format vulkan \
-i input.webm \
-c:v h264_vulkan \
output.mp4
# Hardware decode ProRes RAW with Vulkan
ffmpeg -init_hw_device vulkan \
-hwaccel vulkan \
-i input.mov \
-c:v libx264 \
output.mp4
# High-throughput lossless capture (Linux X11)
ffmpeg -init_hw_device vulkan \
-f x11grab -framerate 60 -i :0.0 \
-c:v ffv1_vulkan \
screen_capture.mkv
# Lossless screen recording (Windows)
ffmpeg -init_hw_device vulkan \
-f gdigrab -framerate 60 -i desktop \
-c:v ffv1_vulkan \
screen_capture.mkv
The next minor update will add:
Vulkan filters provide cross-platform GPU acceleration for video processing.
| Filter | Description | Key Parameters |
|---|---|---|
scale_vulkan | GPU scaling | w, h, scaler |
overlay_vulkan | Compositing | x, y |
transpose_vulkan | Rotate/transpose | dir, passthrough |
flip_vulkan | Horizontal flip | - |
avgblur_vulkan | Box blur | sizeX, sizeY |
gblur_vulkan | Gaussian blur | sigma, sigmaV, planes |
chromaber_vulkan | Chromatic aberration | dist_x, dist_y |
nlmeans_vulkan | NLMeans denoising | s, p, r |
bwdif_vulkan | Deinterlacing | mode, parity, deint |
xfade_vulkan | Transitions | transition, duration, offset |
libplacebo | Libplacebo processing | w, h, colorspace |
# Basic Vulkan scaling
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-i input.mp4 \
-vf "hwupload,scale_vulkan=1920:1080,hwdownload,format=yuv420p" \
output.mp4
# Full Vulkan pipeline with scaling
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-hwaccel vulkan -hwaccel_output_format vulkan \
-i input.mp4 \
-vf "scale_vulkan=1280:720" \
-c:v h264_vulkan output.mp4
# Vulkan scaling with specific scaler
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,scale_vulkan=w=1920:h=1080:scaler=lanczos,hwdownload,format=yuv420p" \
output.mp4
# Picture-in-picture with Vulkan
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-i main.mp4 -i overlay.mp4 \
-filter_complex "\
[0:v]hwupload[main];\
[1:v]hwupload,scale_vulkan=320:180[pip];\
[main][pip]overlay_vulkan=x=10:y=10[out];\
[out]hwdownload,format=yuv420p" \
-map "[out]" output.mp4
# Full Vulkan overlay pipeline
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-hwaccel vulkan -hwaccel_output_format vulkan -i main.mp4 \
-hwaccel vulkan -hwaccel_output_format vulkan -i overlay.mp4 \
-filter_complex "[0:v][1:v]overlay_vulkan=x=W-w-10:y=10" \
-c:v h264_vulkan output.mp4
# Transpose (rotate 90° clockwise)
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,transpose_vulkan=dir=clock,hwdownload,format=yuv420p" \
output.mp4
# Horizontal flip
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,flip_vulkan,hwdownload,format=yuv420p" \
output.mp4
transpose_vulkan directions:
| dir | Rotation |
|---|---|
cclock_flip | 90° counter-clockwise + vertical flip |
clock | 90° clockwise |
cclock | 90° counter-clockwise |
clock_flip | 90° clockwise + vertical flip |
# Gaussian blur
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,gblur_vulkan=sigma=5,hwdownload,format=yuv420p" \
output.mp4
# Box blur (averaging)
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,avgblur_vulkan=sizeX=5:sizeY=5,hwdownload,format=yuv420p" \
output.mp4
# Add chromatic aberration effect
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,chromaber_vulkan=dist_x=-0.002:dist_y=0.002,hwdownload,format=yuv420p" \
output.mp4
# Non-local means denoising on GPU
ffmpeg -init_hw_device vulkan \
-i noisy.mp4 \
-vf "hwupload,nlmeans_vulkan=s=3:p=7:r=15,hwdownload,format=yuv420p" \
output.mp4
# Full Vulkan denoising pipeline
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-hwaccel vulkan -hwaccel_output_format vulkan \
-i noisy.mp4 \
-vf "nlmeans_vulkan=s=4" \
-c:v h264_vulkan output.mp4
# Bob-Weaver deinterlacing
ffmpeg -init_hw_device vulkan \
-i interlaced.mp4 \
-vf "hwupload,bwdif_vulkan=mode=send_frame,hwdownload,format=yuv420p" \
output.mp4
# Full Vulkan deinterlace pipeline
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-hwaccel vulkan -hwaccel_output_format vulkan \
-i interlaced.mp4 \
-vf "bwdif_vulkan=1" \
-c:v h264_vulkan output.mp4
# GPU-accelerated crossfade between two clips
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-i clip1.mp4 -i clip2.mp4 \
-filter_complex "\
[0:v]hwupload[v1];\
[1:v]hwupload[v2];\
[v1][v2]xfade_vulkan=transition=fade:duration=1:offset=4[out];\
[out]hwdownload,format=yuv420p" \
-map "[out]" output.mp4
xfade_vulkan transitions: fade, dissolve, wipeleft, wiperight, wipeup, wipedown, slideleft, slideright, slideup, slidedown, and more.
Libplacebo provides advanced color management and processing:
# HDR tonemapping with libplacebo
ffmpeg -init_hw_device vulkan \
-i hdr_input.mp4 \
-vf "hwupload,libplacebo=tonemapping=hable:colorspace=bt709:color_primaries=bt709:color_trc=bt709,hwdownload,format=yuv420p" \
sdr_output.mp4
# Scaling with libplacebo (high quality)
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,libplacebo=w=3840:h=2160:upscaler=ewa_lanczos,hwdownload,format=yuv420p10le" \
output.mp4
OpenCL provides GPU acceleration that works across vendors (NVIDIA, AMD, Intel).
| Filter | Description | Key Parameters |
|---|---|---|
scale_opencl | GPU scaling | w, h |
overlay_opencl | Compositing | x, y |
pad_opencl | Padding | w, h, x, y, color |
colorkey_opencl | Chroma keying | color, similarity, blend |
unsharp_opencl | Sharpening | lx, ly, la, cx, cy, ca |
nlmeans_opencl | NLMeans denoising | s, p, r |
deshake_opencl | Stabilization | tripod, smooth, filename |
tonemap_opencl | HDR tonemapping | tonemap, transfer, matrix |
avgblur_opencl | Box blur | sizeX, sizeY |
convolution_opencl | Custom kernel | 0m, 1m, etc. |
# List OpenCL devices
ffmpeg -init_hw_device opencl=gpu:0.0 -filter_hw_device gpu -f lavfi -i nullsrc -vf "hwupload,avgblur_opencl=2" -t 1 -f null -
# Basic OpenCL filter pattern
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,<filter_opencl>,hwdownload,format=yuv420p" \
output.mp4
# GPU scaling with OpenCL
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,scale_opencl=1920:1080,hwdownload,format=yuv420p" \
output.mp4
# Picture-in-picture with OpenCL
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i main.mp4 -i overlay.mp4 \
-filter_complex "\
[0:v]hwupload[main];\
[1:v]hwupload,scale_opencl=320:180[pip];\
[main][pip]overlay_opencl=x=10:y=10[out];\
[out]hwdownload,format=yuv420p" \
-map "[out]" output.mp4
# Green screen removal with OpenCL
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i greenscreen.mp4 -i background.mp4 \
-filter_complex "\
[0:v]hwupload,colorkey_opencl=0x00FF00:0.3:0.1[fg];\
[1:v]hwupload[bg];\
[bg][fg]overlay_opencl[out];\
[out]hwdownload,format=yuv420p" \
-map "[out]" output.mp4
# Non-local means denoising on GPU
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i noisy.mp4 \
-vf "hwupload,nlmeans_opencl=s=3:p=7:r=15,hwdownload,format=yuv420p" \
output.mp4
# Strong denoising
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i noisy.mp4 \
-vf "hwupload,nlmeans_opencl=s=5:p=7:r=21,hwdownload,format=yuv420p" \
output.mp4
# GPU-accelerated video stabilization
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i shaky.mp4 \
-vf "hwupload,deshake_opencl,hwdownload,format=yuv420p" \
output.mp4
# Tripod mode (camera stays in one place)
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i shaky.mp4 \
-vf "hwupload,deshake_opencl=tripod=1,hwdownload,format=yuv420p" \
output.mp4
# Basic sharpening
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,unsharp_opencl=lx=5:ly=5:la=1.0,hwdownload,format=yuv420p" \
output.mp4
# Subtle sharpening for HD video
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,unsharp_opencl=lx=3:ly=3:la=0.5,hwdownload,format=yuv420p" \
output.mp4
# HDR to SDR conversion with GPU
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i hdr_input.mp4 \
-vf "hwupload,tonemap_opencl=tonemap=hable:transfer=bt709:matrix=bt709:primaries=bt709,hwdownload,format=yuv420p" \
sdr_output.mp4
# Reinhard tonemapping
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i hdr_input.mp4 \
-vf "hwupload,tonemap_opencl=tonemap=reinhard:param=0.5,hwdownload,format=yuv420p" \
sdr_output.mp4
tonemap_opencl algorithms:
| Algorithm | Description |
|---|---|
none | No tonemapping |
clip | Simple clipping |
linear | Linear scaling |
gamma | Gamma-based |
reinhard | Reinhard operator |
hable | Hable/Filmic (most popular) |
mobius | Mobius (preserves blacks) |
bt2390 | ITU-R BT.2390 (broadcast standard) |
# Box blur
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,avgblur_opencl=sizeX=5:sizeY=5,hwdownload,format=yuv420p" \
output.mp4
# Edge detection kernel
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,convolution_opencl=0m='-1 -1 -1 -1 8 -1 -1 -1 -1':0rdiv=1:0bias=128,hwdownload,format=yuv420p" \
output.mp4
# Sharpen kernel
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,convolution_opencl=0m='0 -1 0 -1 5 -1 0 -1 0':0rdiv=1:0bias=0,hwdownload,format=yuv420p" \
output.mp4
| Filter Type | CUDA | Vulkan | OpenCL | VAAPI | QSV |
|---|---|---|---|---|---|
| Scaling | ✅ scale_cuda | ✅ scale_vulkan | ✅ scale_opencl | ✅ scale_vaapi | ✅ vpp_qsv |
| Overlay | ✅ overlay_cuda | ✅ overlay_vulkan | ✅ overlay_opencl | - | - |
| Denoising | ✅ bilateral_cuda | ✅ nlmeans_vulkan | ✅ nlmeans_opencl | - | - |
| Deinterlace | ✅ bwdif_cuda | ✅ bwdif_vulkan | - | ✅ deinterlace_vaapi | ✅ vpp_qsv |
| Chromakey | ✅ chromakey_cuda | - | ✅ colorkey_opencl | - | - |
| Tonemapping | - | ✅ libplacebo | ✅ tonemap_opencl | ✅ tonemap_vaapi | - |
| Stabilization | - | - | ✅ deshake_opencl | - | - |
| Padding | ✅ pad_cuda | - | ✅ pad_opencl | - | - |
| API | Platform Support | Filter Variety | Performance |
|---|---|---|---|
| CUDA | NVIDIA only | Excellent | Best on NVIDIA |
| Vulkan | Cross-platform | Good (growing) | Very good |
| OpenCL | Cross-platform | Good | Good |
| VAAPI | Linux only | Limited | Good on Linux |
| QSV | Intel only | Limited | Good on Intel |
# Run with NVIDIA GPU support
docker run --gpus all \
--rm \
-v $(pwd):/data \
jrottenberg/ffmpeg:nvidia \
-hwaccel cuda \
-hwaccel_output_format cuda \
-i /data/input.mp4 \
-c:v h264_nvenc \
/data/output.mp4
# Run with Intel GPU access
docker run --rm \
--device=/dev/dri:/dev/dri \
-v $(pwd):/data \
jrottenberg/ffmpeg:vaapi \
-hwaccel qsv \
-i /data/input.mp4 \
-c:v h264_qsv \
/data/output.mp4
"No NVENC capable devices found"
# Check GPU support
nvidia-smi
# Verify driver version
nvidia-smi --query-gpu=driver_version --format=csv
# Check CUDA version
nvcc --version
"Cannot load libcuda.so"
# Set library path (Linux)
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
QSV "Unsupported format"
# Force pixel format conversion
ffmpeg -i input.mp4 \
-vf "format=nv12" \
-c:v h264_qsv \
output.mp4
VAAPI permission denied
# Add user to video/render group
sudo usermod -aG video $USER
sudo usermod -aG render $USER
# Re-login or use newgrp
# Benchmark encode speed
ffmpeg -benchmark -i input.mp4 -c:v h264_nvenc -f null -
# Show hardware decode stats
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -benchmark -i input.mp4 -f null -
# Monitor GPU usage (NVIDIA)
nvidia-smi dmon -s u
# Monitor GPU usage (Intel)
intel_gpu_top
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input \
-c:v h264_nvenc -preset p3 -tune ll -zerolatency 1 -b:v 6M \
-f flv rtmp://server/live/stream
ffmpeg -i input.mp4 \
-c:v hevc_nvenc -preset p6 -tune hq \
-rc vbr -cq 22 -b:v 0 \
-rc-lookahead 32 -spatial-aq 1 \
output.mp4
# Multiple parallel streams on one GPU
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input1.mp4 -c:v h264_nvenc output1.mp4 &
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input2.mp4 -c:v h264_nvenc output2.mp4 &
wait
This guide covers hardware acceleration fundamentals. For specific platform optimizations and advanced configurations, consult the respective vendor documentation.
Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.
Search, retrieve, and install Agent Skills from the prompts.chat registry using MCP tools. Use when the user asks to find skills, browse skill catalogs, install a skill for Claude, or extend Claude's capabilities with reusable AI agent components.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.