Search everything...

Skill

ffmpeg-hardware-acceleration

Provides platform-specific FFmpeg commands, presets, and guides for GPU-accelerated encoding/decoding with NVIDIA NVENC, Intel QSV, AMD AMF, Apple VideoToolbox, VAAPI, and Vulkan Video. Useful for speed-critical video processing and benchmarking.

Docker

npx claudepluginhub josiahsiegel/claude-plugin-marketplace --plugin ffmpeg-master

Tool Access

This skill uses the workspace's default tool permissions.

Preview

**MANDATORY: Always Use Backslashes on Windows for File Paths**

SKILL.md

Similar Skills

using-git-worktrees

169.2k

Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.

superpowers

subagent-driven-development

169.2k

Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.

3 files

superpowers

dispatching-parallel-agents

169.2k

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

Stats

Parent Repo Stars22

Parent Repo Forks3

Last CommitJan 16, 2026

Actions

View Source View Plugin View on GitHub View README

ffmpeg-hardware-acceleration

From ffmpeg-master

npx claudepluginhub josiahsiegel/claude-plugin-marketplace --plugin ffmpeg-master

Tool Access

This skill uses the workspace's default tool permissions.

Preview

**MANDATORY: Always Use Backslashes on Windows for File Paths**

SKILL.md

CRITICAL GUIDELINES

Windows File Path Requirements

MANDATORY: Always Use Backslashes on Windows for File Paths

When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).

Quick Reference

Platform	Encoder	Decoder	Detect Command
NVIDIA	`h264_nvenc`, `hevc_nvenc`, `av1_nvenc`	`h264_cuvid`, `hevc_cuvid`	`ffmpeg -encoders \| grep nvenc`
Intel QSV	`h264_qsv`, `hevc_qsv`, `av1_qsv`	`h264_qsv`, `hevc_qsv`	`ffmpeg -encoders \| grep qsv`
AMD AMF	`h264_amf`, `hevc_amf`, `av1_amf`	N/A (use software)	`ffmpeg -encoders \| grep amf`
Apple	`h264_videotoolbox`, `hevc_videotoolbox`	`h264_videotoolbox`	macOS only
VAAPI	`h264_vaapi`, `hevc_vaapi`, `av1_vaapi`	with `-hwaccel vaapi`	Linux only

When to Use This Skill

Use when GPU acceleration is needed:

Encoding speed is critical (10-30x faster than CPU)
Processing large batches of videos
Real-time encoding for streaming
Server-side transcoding at scale
Docker containers with GPU passthrough

Key decision: GPU encoding trades some quality for massive speed. Use -cq or -qp for quality control.

FFmpeg Hardware Acceleration (2025)

Comprehensive guide to GPU-accelerated encoding and decoding with NVIDIA, Intel, AMD, Apple, and Vulkan.

Current Latest: FFmpeg 8.0.1 (released 2025-11-20) - Check with ffmpeg -version

Hardware Acceleration Overview

Hardware acceleration uses dedicated GPU/SoC components for video processing:

NVENC/NVDEC (NVIDIA): Dedicated video encode/decode engines
QSV (Intel): Quick Sync Video on Intel CPUs with integrated graphics
AMF (AMD): Advanced Media Framework for AMD GPUs
VideoToolbox (Apple): macOS/iOS hardware acceleration
VAAPI (Linux): Video Acceleration API (Intel, AMD on Linux)
Vulkan Video (Cross-platform): FFmpeg 7.1+/8.0 GPU acceleration

Performance Comparison (2025 Benchmarks)

Method	Speed	Quality	Power	Use Case
libx264 (CPU)	1x	Best	High	Quality-critical
libx265 (CPU)	0.3x	Best	Very High	Archival
h264_nvenc	10-20x	Good	Low	Real-time, streaming
hevc_nvenc	8-15x	Good	Low	4K streaming
h264_qsv	8-15x	Good	Very Low	Laptop, efficiency
h264_amf	8-15x	Good	Low	AMD systems

NVIDIA NVENC/NVDEC

Requirements

NVIDIA GPU (GTX 600+ / Quadro K series+)
NVIDIA drivers 450+
FFmpeg built with --enable-nvenc --enable-cuda --enable-cuvid

Check NVIDIA Support

# Check available NVIDIA codecs
ffmpeg -encoders | grep nvenc
ffmpeg -decoders | grep cuvid

# Check GPU info
nvidia-smi

# List hardware accelerators
ffmpeg -hwaccels

Basic NVENC Encoding

# H.264 NVENC encoding
ffmpeg -i input.mp4 -c:v h264_nvenc -preset p4 -b:v 5M output.mp4

# H.265/HEVC NVENC encoding
ffmpeg -i input.mp4 -c:v hevc_nvenc -preset p4 -b:v 4M output.mp4

# AV1 NVENC (RTX 40 series+)
ffmpeg -i input.mp4 -c:v av1_nvenc -preset p4 -b:v 3M output.mp4

NVENC Presets (FFmpeg 7+)

Preset	Speed	Quality	Use Case
p1	Fastest	Lowest	Real-time capture
p2	Faster	Low	Screen recording
p3	Fast	Medium	General streaming
p4	Medium	Good	Recommended
p5	Slow	Better	High-quality streaming
p6	Slower	Best	Offline encoding
p7	Slowest	Highest	Maximum quality

Full GPU Pipeline (Decode + Encode)

# Keep frames in GPU memory (fastest)
ffmpeg -y -vsync 0 \
  -hwaccel cuda \
  -hwaccel_output_format cuda \
  -i input.mp4 \
  -c:v h264_nvenc \
  -preset p4 \
  -b:v 5M \
  -c:a copy \
  output.mp4

GPU Scaling and Filtering

# GPU-based scaling with scale_cuda
ffmpeg -y -vsync 0 \
  -hwaccel cuda \
  -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf scale_cuda=1280:720 \
  -c:v h264_nvenc \
  -preset p4 \
  output.mp4

# GPU overlay with overlay_cuda
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i main.mp4 \
  -hwaccel cuda -hwaccel_output_format cuda -i overlay.mp4 \
  -filter_complex "[0:v][1:v]overlay_cuda=10:10" \
  -c:v h264_nvenc output.mp4

# GPU padding with pad_cuda (FFmpeg 8.0+)
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
  -vf "pad_cuda=1920:1080:(ow-iw)/2:(oh-ih)/2:black" \
  -c:v h264_nvenc output.mp4

# Letterbox with pad_cuda
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
  -vf "scale_cuda=1920:-2,pad_cuda=1920:1080:(ow-iw)/2:(oh-ih)/2" \
  -c:v h264_nvenc output.mp4

CUDA Filters Reference (FFmpeg 8.0+)

Filter	Description	Key Parameters
`scale_cuda`	GPU video scaling	`w`, `h`, `format`, `interp_algo`
`scale_npp`	NPP-based scaling (higher quality)	`w`, `h`, `interp_algo`
`overlay_cuda`	GPU overlay/compositing	`x`, `y`, `eof_action`
`pad_cuda`	GPU padding (8.0+)	`w`, `h`, `x`, `y`, `color`
`chromakey_cuda`	GPU chroma keying	`color`, `similarity`, `blend`
`colorspace_cuda`	Color space conversion	`all`, `space`, `trc`, `primaries`
`bilateral_cuda`	Bilateral filter	`sigmaS`, `sigmaR`, `window_size`
`bwdif_cuda`	Deinterlacing	`mode`, `parity`, `deint`
`hwupload_cuda`	Upload to GPU	-
`hwdownload`	Download from GPU	-

scale_npp (NVIDIA Performance Primitives)

For higher-quality scaling, use scale_npp with optimized interpolation algorithms:

# High-quality downscaling with super-sampling
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf "scale_npp=1280:720:interp_algo=super" \
  -c:v h264_nvenc output.mp4

# Lanczos interpolation for upscaling
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf "scale_npp=1920:1080:interp_algo=lanczos" \
  -c:v h264_nvenc output.mp4

scale_npp Interpolation Algorithms:

Algorithm	Quality	Speed	Best For
`nn`	Low	Fastest	Pixel art, nearest neighbor
`linear`	Medium	Fast	General use
`cubic`	Good	Medium	Smooth scaling
`lanczos`	High	Slow	High quality upscaling
`super`	Best	Slowest	Downscaling

GPU Chromakey (Green Screen)

# Green screen removal on GPU
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i greenscreen.mp4 \
  -hwaccel cuda -hwaccel_output_format cuda -i background.mp4 \
  -filter_complex "[0:v]chromakey_cuda=0x00FF00:0.3:0.1[fg];[1:v][fg]overlay_cuda" \
  -c:v h264_nvenc output.mp4

Hybrid GPU + CPU Filter Pipelines

When mixing CPU and GPU filters, use hwupload_cuda and hwdownload:

# CPU decode -> CPU filter -> GPU encode
ffmpeg -i input.mp4 \
  -vf "fade=in:0:30,hwupload_cuda" \
  -c:v h264_nvenc output.mp4

# GPU decode -> CPU filter (drawtext) -> GPU encode
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf "hwdownload,format=nv12,drawtext=text='Hello':fontsize=48,hwupload_cuda" \
  -c:v h264_nvenc output.mp4

# Complete hybrid pipeline: GPU scale -> CPU text -> GPU encode
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
  -filter_complex "\
    [0:v]scale_cuda=1920:1080[scaled];\
    [scaled]hwdownload,format=nv12[cpu];\
    [cpu]drawtext=text='Watermark':fontsize=48:x=10:y=10[text];\
    [text]hwupload_cuda[gpu]" \
  -map "[gpu]" \
  -c:v h264_nvenc output.mp4

NVENC Quality Optimization

# High quality with lookahead
ffmpeg -i input.mp4 \
  -c:v hevc_nvenc \
  -preset p5 \
  -tune hq \
  -rc vbr \
  -cq 23 \
  -b:v 0 \
  -rc-lookahead 32 \
  -spatial-aq 1 \
  -temporal-aq 1 \
  -c:a copy \
  output.mp4

# Constant quality mode (CQP)
ffmpeg -i input.mp4 \
  -c:v h264_nvenc \
  -preset p4 \
  -rc constqp \
  -qp 23 \
  -c:a copy \
  output.mp4

NVENC Two-Pass Encoding

# Two-pass for best quality (ABR)
ffmpeg -i input.mp4 \
  -c:v h264_nvenc \
  -preset p5 \
  -2pass 1 \
  -b:v 5M \
  -c:a copy \
  output.mp4

NVENC B-Frames and GOP

# Enable B-frames for better compression
ffmpeg -i input.mp4 \
  -c:v hevc_nvenc \
  -preset p4 \
  -bf 4 \
  -b_ref_mode 2 \
  -g 250 \
  -c:a copy \
  output.mp4

GPU Memory Management

Critical Concepts

PCIe transfers between CPU and GPU memory are the primary bottleneck. Keeping frames in GPU memory throughout the pipeline is critical for performance.

Memory Flow Patterns:

Pattern 1: Full GPU Pipeline (OPTIMAL)
Input File -> GPU Decode -> GPU Filter -> GPU Encode -> Output File
                    |          |           |
                    v          v           v
                  [GPU Memory - No PCIe Transfer]

Pattern 2: CPU Filter Insertion (SUBOPTIMAL)
Input -> GPU Decode -> hwdownload -> CPU Filter -> hwupload -> GPU Encode -> Output
                           |                           |
                           v                           v
                     [PCIe Transfer]             [PCIe Transfer]

Pattern 3: Software Decode + GPU Encode
Input -> CPU Decode -> hwupload -> GPU Encode -> Output
                          |
                          v
                    [PCIe Transfer]

Command Patterns Comparison

# OPTIMAL: Full GPU pipeline - no CPU-GPU transfers
ffmpeg -y -vsync 0 \
  -hwaccel cuda \
  -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf scale_cuda=1280:720 \
  -c:v h264_nvenc \
  output.mp4

# SUBOPTIMAL: GPU decode, CPU filter, GPU encode - 2 transfers
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf "hwdownload,format=nv12,drawtext=text='Hello',hwupload_cuda" \
  -c:v h264_nvenc output.mp4

# AVOID: Missing -hwaccel_output_format - frame copies to CPU after decode
ffmpeg -hwaccel cuda -i input.mp4 \
  -vf scale=1280:720 \
  -c:v h264_nvenc output.mp4

Why -hwaccel_output_format cuda matters: Without it, decoded frames transfer from GPU to CPU via PCIe, get processed, then transfer back to GPU for encoding. This can reduce throughput by up to 50%.

Memory Monitoring

# NVIDIA GPU memory and utilization
nvidia-smi dmon -s m

# Watch GPU utilization in real-time
watch -n 1 nvidia-smi

# Intel GPU monitoring
intel_gpu_top

Best Practices

Use -hwaccel_output_format to keep decoded frames in GPU memory
Use GPU filters when available (scale_cuda, overlay_cuda, pad_cuda)
Batch similar operations to minimize filter graph complexity
Monitor GPU memory for high-resolution content
Use -vsync 0 to prevent frame duplication overhead

Multi-GPU Encoding

NVIDIA Multi-GPU

# Specify GPU by index (parallel processing on different GPUs)
ffmpeg -hwaccel cuda \
  -hwaccel_device 0 \
  -hwaccel_output_format cuda \
  -i input1.mp4 \
  -c:v h264_nvenc output1.mp4 &

ffmpeg -hwaccel cuda \
  -hwaccel_device 1 \
  -hwaccel_output_format cuda \
  -i input2.mp4 \
  -c:v h264_nvenc output2.mp4 &

wait

# Initialize specific CUDA device for filters
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
  -init_hw_device cuda=gpu1:1 \
  -filter_complex "[0:v]scale_cuda=1280:720[scaled]" \
  -map "[scaled]" \
  -c:v h264_nvenc output.mp4

1:N Encoding (Single Input, Multiple Outputs)

More efficient than separate processes - shares CUDA context:

# Single input to multiple resolutions (ABR ladder)
ffmpeg -y -vsync 0 \
  -hwaccel cuda \
  -hwaccel_output_format cuda \
  -i input.mp4 \
  -filter_complex "[0:v]split=3[v1][v2][v3];\
    [v1]scale_cuda=1920:1080[hd];\
    [v2]scale_cuda=1280:720[sd];\
    [v3]scale_cuda=640:360[mobile]" \
  -map "[hd]" -c:v h264_nvenc -b:v 5M output_1080p.mp4 \
  -map "[sd]" -c:v h264_nvenc -b:v 2M output_720p.mp4 \
  -map "[mobile]" -c:v h264_nvenc -b:v 500k output_360p.mp4

Parallel Encoding Strategy

For maximum throughput:

Run 4+ parallel FFmpeg sessions
Use inputs with 15+ seconds to amortize initialization
Measure aggregate FPS across all sessions

# Environment variables for faster initialization
export CUDA_VISIBLE_DEVICES=0
export CUDA_DEVICE_MAX_CONNECTIONS=2

# Parallel encoding script
for i in {1..4}; do
  ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
    -i input_$i.mp4 \
    -c:v h264_nvenc \
    output_$i.mp4 &
done
wait

Intel Quick Sync Video (QSV)

Requirements

Intel CPU with integrated graphics (Sandy Bridge+)
Intel Media SDK or oneVPL
FFmpeg built with --enable-libmfx or --enable-libvpl

Check QSV Support

# Check available QSV codecs
ffmpeg -encoders | grep qsv
ffmpeg -decoders | grep qsv

# Verify Intel GPU access (Linux)
ls /dev/dri/
vainfo  # Check VAAPI support

Basic QSV Encoding

# Initialize QSV device
ffmpeg -init_hw_device qsv=hw \
  -filter_hw_device hw \
  -i input.mp4 \
  -c:v h264_qsv \
  -preset medium \
  -b:v 5M \
  output.mp4

# H.265/HEVC QSV
ffmpeg -init_hw_device qsv=hw \
  -filter_hw_device hw \
  -i input.mp4 \
  -c:v hevc_qsv \
  -preset medium \
  -b:v 4M \
  output.mp4

# AV1 QSV (Intel Arc, 12th gen+)
ffmpeg -init_hw_device qsv=hw \
  -filter_hw_device hw \
  -i input.mp4 \
  -c:v av1_qsv \
  -preset medium \
  -b:v 3M \
  output.mp4

Full QSV Pipeline

# Decode and encode on GPU
ffmpeg -hwaccel qsv \
  -hwaccel_output_format qsv \
  -i input.mp4 \
  -c:v h264_qsv \
  -preset medium \
  -b:v 5M \
  output.mp4

QSV Quality Settings

# Constant quality (recommended)
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
  -i input.mp4 \
  -c:v hevc_qsv \
  -preset slow \
  -global_quality 22 \
  -look_ahead 1 \
  output.mp4

# VBR with lookahead
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
  -i input.mp4 \
  -c:v h264_qsv \
  -preset medium \
  -b:v 5M \
  -maxrate 7M \
  -bufsize 10M \
  -look_ahead 1 \
  -look_ahead_depth 40 \
  output.mp4

QSV Quality Parameters:

Parameter	Description	Range
`-global_quality`	Quality level (like CRF)	1-51 (lower = better)
`-preset`	Encoding speed/quality	`veryfast`, `faster`, `fast`, `medium`, `slow`, `slower`, `veryslow`
`-look_ahead`	Enable lookahead	0 or 1
`-look_ahead_depth`	Lookahead frames	10-100

Multi-GPU QSV (Linux)

# Select specific GPU by render device
ffmpeg -init_hw_device qsv=hw:/dev/dri/renderD129 \
  -filter_hw_device hw \
  -i input.mp4 \
  -c:v h264_qsv output.mp4

QSV with Scaling

# GPU scaling with vpp_qsv
ffmpeg -hwaccel qsv \
  -hwaccel_output_format qsv \
  -i input.mp4 \
  -vf "vpp_qsv=w=1280:h=720" \
  -c:v h264_qsv \
  -preset medium \
  output.mp4

VVC/H.266 QSV Decoding (FFmpeg 7.1+)

# Hardware VVC decoding
ffmpeg -hwaccel qsv \
  -hwaccel_output_format qsv \
  -c:v vvc_qsv \
  -i input.vvc \
  -c:v h264_qsv \
  output.mp4

AMD AMF

Requirements

AMD GPU (GCN or newer)
AMD drivers with AMF support
FFmpeg built with --enable-amf

Check AMF Support

ffmpeg -encoders | grep amf

Basic AMF Encoding

# H.264 AMF
ffmpeg -i input.mp4 \
  -c:v h264_amf \
  -quality balanced \
  -b:v 5M \
  output.mp4

# H.265/HEVC AMF
ffmpeg -i input.mp4 \
  -c:v hevc_amf \
  -quality balanced \
  -b:v 4M \
  output.mp4

# AV1 AMF (RDNA3+)
ffmpeg -i input.mp4 \
  -c:v av1_amf \
  -quality balanced \
  -b:v 3M \
  output.mp4

AMF Quality Presets

Preset	Description
speed	Fastest encoding
balanced	Recommended
quality	Best quality

AMD Hardware Upscaling (FFmpeg 8.0+)

# Super resolution upscaling
ffmpeg -i input.mp4 \
  -vf "sr_amf=4096:2160:algorithm=sr1-1" \
  -c:v hevc_amf \
  output.mp4

VAAPI (Linux)

Requirements

Intel, AMD, or NVIDIA GPU on Linux
VAAPI drivers (intel-media-driver, mesa-va-drivers)
FFmpeg built with --enable-vaapi

Check VAAPI Support

# Check VAAPI info
vainfo

# List VAAPI codecs
ffmpeg -encoders | grep vaapi
ffmpeg -decoders | grep vaapi

Basic VAAPI Encoding

# H.264 VAAPI
ffmpeg -vaapi_device /dev/dri/renderD128 \
  -i input.mp4 \
  -vf 'format=nv12,hwupload' \
  -c:v h264_vaapi \
  -b:v 5M \
  output.mp4

# H.265/HEVC VAAPI
ffmpeg -vaapi_device /dev/dri/renderD128 \
  -i input.mp4 \
  -vf 'format=nv12,hwupload' \
  -c:v hevc_vaapi \
  -b:v 4M \
  output.mp4

Full VAAPI Pipeline

# Decode and encode on GPU
ffmpeg -hwaccel vaapi \
  -hwaccel_device /dev/dri/renderD128 \
  -hwaccel_output_format vaapi \
  -i input.mp4 \
  -c:v h264_vaapi \
  -b:v 5M \
  output.mp4

VAAPI Scaling

ffmpeg -hwaccel vaapi \
  -hwaccel_device /dev/dri/renderD128 \
  -hwaccel_output_format vaapi \
  -i input.mp4 \
  -vf 'scale_vaapi=w=1280:h=720' \
  -c:v h264_vaapi \
  output.mp4

VVC VAAPI Decoding (FFmpeg 8.0+)

FFmpeg 8.0 adds VVC/H.266 hardware decoding on Intel and AMD GPUs via VAAPI.

# Hardware VVC/H.266 decoding
ffmpeg -hwaccel vaapi \
  -hwaccel_device /dev/dri/renderD128 \
  -hwaccel_output_format vaapi \
  -i input.vvc \
  -c:v h264_vaapi \
  output.mp4

# VVC decode + transcode to H.265
ffmpeg -hwaccel vaapi \
  -hwaccel_device /dev/dri/renderD128 \
  -hwaccel_output_format vaapi \
  -i input.mkv \
  -c:v hevc_vaapi \
  -b:v 4M \
  output.mp4

# VVC with Screen Content Coding (SCC) support
# FFmpeg 8.0 adds full SCC support including:
# - IBC (Inter Block Copy)
# - Palette Mode
# - ACT (Adaptive Color Transform)
ffmpeg -hwaccel vaapi \
  -hwaccel_device /dev/dri/renderD128 \
  -i screen_recording.vvc \
  output.mp4

VVC VAAPI Requirements:

Intel Xe2 graphics (Lunar Lake) or newer for full VVC support
FFmpeg 8.0 or later
Intel media driver with VVC support

Apple VideoToolbox

Requirements

macOS 10.8+ or iOS 8+
FFmpeg built with --enable-videotoolbox

Check VideoToolbox Support

ffmpeg -encoders | grep videotoolbox
ffmpeg -decoders | grep videotoolbox

Basic VideoToolbox Encoding

# H.264 VideoToolbox
ffmpeg -i input.mp4 \
  -c:v h264_videotoolbox \
  -b:v 5M \
  output.mp4

# H.265/HEVC VideoToolbox
ffmpeg -i input.mp4 \
  -c:v hevc_videotoolbox \
  -b:v 4M \
  -tag:v hvc1 \
  output.mp4

# ProRes VideoToolbox
ffmpeg -i input.mp4 \
  -c:v prores_videotoolbox \
  -profile:v 3 \
  output.mov

VideoToolbox Quality Settings

# Quality-based encoding
ffmpeg -i input.mp4 \
  -c:v h264_videotoolbox \
  -q:v 65 \
  output.mp4

# Hardware accelerated decode + encode
ffmpeg -hwaccel videotoolbox \
  -i input.mp4 \
  -c:v h264_videotoolbox \
  -b:v 5M \
  output.mp4

Vulkan Video (FFmpeg 7.1+/8.0)

Overview

Vulkan Video provides cross-platform GPU acceleration using Vulkan compute shaders. Unlike proprietary hardware accelerators, Vulkan codecs are based on compute shaders and work on any implementation of Vulkan 1.3.

FFmpeg 7.1 Vulkan Features

H.264 Vulkan encoding
H.265/HEVC Vulkan encoding

FFmpeg 8.0 Vulkan Features (New)

AV1 Vulkan encoding - GPU-accelerated AV1 via compute shaders
VP9 Vulkan decoding - Hardware-accelerated VP9 decode
FFv1 Vulkan encode/decode - Lossless codec for archival/capture
ProRes RAW Vulkan decode - Apple ProRes RAW hardware decode

Benefits of Vulkan Compute Codecs:

Cross-platform: Same code works on AMD, Intel, and NVIDIA
No vendor lock-in: Works with any Vulkan 1.3 driver
Ideal for: Lossless screen capture, high-throughput archival, professional workflows

Basic Vulkan Encoding

# H.264 Vulkan
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -c:v h264_vulkan \
  -b:v 5M \
  output.mp4

# H.265/HEVC Vulkan
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -c:v hevc_vulkan \
  -b:v 4M \
  output.mp4

# AV1 Vulkan (FFmpeg 8.0+)
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -c:v av1_vulkan \
  -b:v 3M \
  output.mp4

# FFv1 Vulkan Lossless (FFmpeg 8.0+)
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -c:v ffv1_vulkan \
  output.mkv

Full Vulkan Pipeline

# Complete Vulkan decode-filter-encode
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -hwaccel vulkan \
  -hwaccel_output_format vulkan \
  -i input.mp4 \
  -vf "scale_vulkan=1280:720" \
  -c:v h264_vulkan \
  output.mp4

VP9 Vulkan Decoding (FFmpeg 8.0+)

# Hardware decode VP9 with Vulkan
ffmpeg -init_hw_device vulkan \
  -hwaccel vulkan \
  -hwaccel_output_format vulkan \
  -i input.webm \
  -c:v h264_vulkan \
  output.mp4

ProRes RAW Vulkan Decoding (FFmpeg 8.0+)

# Hardware decode ProRes RAW with Vulkan
ffmpeg -init_hw_device vulkan \
  -hwaccel vulkan \
  -i input.mov \
  -c:v libx264 \
  output.mp4

Lossless Screen Capture with FFv1 Vulkan

# High-throughput lossless capture (Linux X11)
ffmpeg -init_hw_device vulkan \
  -f x11grab -framerate 60 -i :0.0 \
  -c:v ffv1_vulkan \
  screen_capture.mkv

# Lossless screen recording (Windows)
ffmpeg -init_hw_device vulkan \
  -f gdigrab -framerate 60 -i desktop \
  -c:v ffv1_vulkan \
  screen_capture.mkv

Upcoming Vulkan Codecs

The next minor update will add:

ProRes (encode and decode)
VC-2 (encode and decode)

Vulkan Filters (FFmpeg 8.0+)

Vulkan filters provide cross-platform GPU acceleration for video processing.

Vulkan Filter Reference

Filter	Description	Key Parameters
`scale_vulkan`	GPU scaling	`w`, `h`, `scaler`
`overlay_vulkan`	Compositing	`x`, `y`
`transpose_vulkan`	Rotate/transpose	`dir`, `passthrough`
`flip_vulkan`	Horizontal flip	-
`avgblur_vulkan`	Box blur	`sizeX`, `sizeY`
`gblur_vulkan`	Gaussian blur	`sigma`, `sigmaV`, `planes`
`chromaber_vulkan`	Chromatic aberration	`dist_x`, `dist_y`
`nlmeans_vulkan`	NLMeans denoising	`s`, `p`, `r`
`bwdif_vulkan`	Deinterlacing	`mode`, `parity`, `deint`
`xfade_vulkan`	Transitions	`transition`, `duration`, `offset`
`libplacebo`	Libplacebo processing	`w`, `h`, `colorspace`

Vulkan Scaling

# Basic Vulkan scaling
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -i input.mp4 \
  -vf "hwupload,scale_vulkan=1920:1080,hwdownload,format=yuv420p" \
  output.mp4

# Full Vulkan pipeline with scaling
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -hwaccel vulkan -hwaccel_output_format vulkan \
  -i input.mp4 \
  -vf "scale_vulkan=1280:720" \
  -c:v h264_vulkan output.mp4

# Vulkan scaling with specific scaler
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,scale_vulkan=w=1920:h=1080:scaler=lanczos,hwdownload,format=yuv420p" \
  output.mp4

Vulkan Compositing (overlay_vulkan)

# Picture-in-picture with Vulkan
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -i main.mp4 -i overlay.mp4 \
  -filter_complex "\
    [0:v]hwupload[main];\
    [1:v]hwupload,scale_vulkan=320:180[pip];\
    [main][pip]overlay_vulkan=x=10:y=10[out];\
    [out]hwdownload,format=yuv420p" \
  -map "[out]" output.mp4

# Full Vulkan overlay pipeline
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -hwaccel vulkan -hwaccel_output_format vulkan -i main.mp4 \
  -hwaccel vulkan -hwaccel_output_format vulkan -i overlay.mp4 \
  -filter_complex "[0:v][1:v]overlay_vulkan=x=W-w-10:y=10" \
  -c:v h264_vulkan output.mp4

Vulkan Rotation and Transpose

# Transpose (rotate 90° clockwise)
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,transpose_vulkan=dir=clock,hwdownload,format=yuv420p" \
  output.mp4

# Horizontal flip
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,flip_vulkan,hwdownload,format=yuv420p" \
  output.mp4

transpose_vulkan directions:

dir	Rotation
`cclock_flip`	90° counter-clockwise + vertical flip
`clock`	90° clockwise
`cclock`	90° counter-clockwise
`clock_flip`	90° clockwise + vertical flip

Vulkan Blur Effects

# Gaussian blur
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,gblur_vulkan=sigma=5,hwdownload,format=yuv420p" \
  output.mp4

# Box blur (averaging)
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,avgblur_vulkan=sizeX=5:sizeY=5,hwdownload,format=yuv420p" \
  output.mp4

Vulkan Chromatic Aberration

# Add chromatic aberration effect
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,chromaber_vulkan=dist_x=-0.002:dist_y=0.002,hwdownload,format=yuv420p" \
  output.mp4

Vulkan Denoising (nlmeans_vulkan)

# Non-local means denoising on GPU
ffmpeg -init_hw_device vulkan \
  -i noisy.mp4 \
  -vf "hwupload,nlmeans_vulkan=s=3:p=7:r=15,hwdownload,format=yuv420p" \
  output.mp4

# Full Vulkan denoising pipeline
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -hwaccel vulkan -hwaccel_output_format vulkan \
  -i noisy.mp4 \
  -vf "nlmeans_vulkan=s=4" \
  -c:v h264_vulkan output.mp4

Vulkan Deinterlacing (bwdif_vulkan)

# Bob-Weaver deinterlacing
ffmpeg -init_hw_device vulkan \
  -i interlaced.mp4 \
  -vf "hwupload,bwdif_vulkan=mode=send_frame,hwdownload,format=yuv420p" \
  output.mp4

# Full Vulkan deinterlace pipeline
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -hwaccel vulkan -hwaccel_output_format vulkan \
  -i interlaced.mp4 \
  -vf "bwdif_vulkan=1" \
  -c:v h264_vulkan output.mp4

Vulkan Transitions (xfade_vulkan)

# GPU-accelerated crossfade between two clips
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -i clip1.mp4 -i clip2.mp4 \
  -filter_complex "\
    [0:v]hwupload[v1];\
    [1:v]hwupload[v2];\
    [v1][v2]xfade_vulkan=transition=fade:duration=1:offset=4[out];\
    [out]hwdownload,format=yuv420p" \
  -map "[out]" output.mp4

xfade_vulkan transitions: fade, dissolve, wipeleft, wiperight, wipeup, wipedown, slideleft, slideright, slideup, slidedown, and more.

Libplacebo Processing (Advanced)

Libplacebo provides advanced color management and processing:

# HDR tonemapping with libplacebo
ffmpeg -init_hw_device vulkan \
  -i hdr_input.mp4 \
  -vf "hwupload,libplacebo=tonemapping=hable:colorspace=bt709:color_primaries=bt709:color_trc=bt709,hwdownload,format=yuv420p" \
  sdr_output.mp4

# Scaling with libplacebo (high quality)
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,libplacebo=w=3840:h=2160:upscaler=ewa_lanczos,hwdownload,format=yuv420p10le" \
  output.mp4

OpenCL Filters

OpenCL provides GPU acceleration that works across vendors (NVIDIA, AMD, Intel).

OpenCL Filter Reference

Filter	Description	Key Parameters
`scale_opencl`	GPU scaling	`w`, `h`
`overlay_opencl`	Compositing	`x`, `y`
`pad_opencl`	Padding	`w`, `h`, `x`, `y`, `color`
`colorkey_opencl`	Chroma keying	`color`, `similarity`, `blend`
`unsharp_opencl`	Sharpening	`lx`, `ly`, `la`, `cx`, `cy`, `ca`
`nlmeans_opencl`	NLMeans denoising	`s`, `p`, `r`
`deshake_opencl`	Stabilization	`tripod`, `smooth`, `filename`
`tonemap_opencl`	HDR tonemapping	`tonemap`, `transfer`, `matrix`
`avgblur_opencl`	Box blur	`sizeX`, `sizeY`
`convolution_opencl`	Custom kernel	`0m`, `1m`, etc.

OpenCL Initialization

# List OpenCL devices
ffmpeg -init_hw_device opencl=gpu:0.0 -filter_hw_device gpu -f lavfi -i nullsrc -vf "hwupload,avgblur_opencl=2" -t 1 -f null -

# Basic OpenCL filter pattern
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,<filter_opencl>,hwdownload,format=yuv420p" \
  output.mp4

OpenCL Scaling

# GPU scaling with OpenCL
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,scale_opencl=1920:1080,hwdownload,format=yuv420p" \
  output.mp4

OpenCL Compositing

# Picture-in-picture with OpenCL
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i main.mp4 -i overlay.mp4 \
  -filter_complex "\
    [0:v]hwupload[main];\
    [1:v]hwupload,scale_opencl=320:180[pip];\
    [main][pip]overlay_opencl=x=10:y=10[out];\
    [out]hwdownload,format=yuv420p" \
  -map "[out]" output.mp4

OpenCL Chroma Keying (colorkey_opencl)

# Green screen removal with OpenCL
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i greenscreen.mp4 -i background.mp4 \
  -filter_complex "\
    [0:v]hwupload,colorkey_opencl=0x00FF00:0.3:0.1[fg];\
    [1:v]hwupload[bg];\
    [bg][fg]overlay_opencl[out];\
    [out]hwdownload,format=yuv420p" \
  -map "[out]" output.mp4

OpenCL Denoising (nlmeans_opencl)

# Non-local means denoising on GPU
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i noisy.mp4 \
  -vf "hwupload,nlmeans_opencl=s=3:p=7:r=15,hwdownload,format=yuv420p" \
  output.mp4

# Strong denoising
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i noisy.mp4 \
  -vf "hwupload,nlmeans_opencl=s=5:p=7:r=21,hwdownload,format=yuv420p" \
  output.mp4

OpenCL Stabilization (deshake_opencl)

# GPU-accelerated video stabilization
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i shaky.mp4 \
  -vf "hwupload,deshake_opencl,hwdownload,format=yuv420p" \
  output.mp4

# Tripod mode (camera stays in one place)
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i shaky.mp4 \
  -vf "hwupload,deshake_opencl=tripod=1,hwdownload,format=yuv420p" \
  output.mp4

OpenCL Sharpening (unsharp_opencl)

# Basic sharpening
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,unsharp_opencl=lx=5:ly=5:la=1.0,hwdownload,format=yuv420p" \
  output.mp4

# Subtle sharpening for HD video
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,unsharp_opencl=lx=3:ly=3:la=0.5,hwdownload,format=yuv420p" \
  output.mp4

OpenCL HDR Tonemapping (tonemap_opencl)

# HDR to SDR conversion with GPU
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i hdr_input.mp4 \
  -vf "hwupload,tonemap_opencl=tonemap=hable:transfer=bt709:matrix=bt709:primaries=bt709,hwdownload,format=yuv420p" \
  sdr_output.mp4

# Reinhard tonemapping
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i hdr_input.mp4 \
  -vf "hwupload,tonemap_opencl=tonemap=reinhard:param=0.5,hwdownload,format=yuv420p" \
  sdr_output.mp4

tonemap_opencl algorithms:

Algorithm	Description
`none`	No tonemapping
`clip`	Simple clipping
`linear`	Linear scaling
`gamma`	Gamma-based
`reinhard`	Reinhard operator
`hable`	Hable/Filmic (most popular)
`mobius`	Mobius (preserves blacks)
`bt2390`	ITU-R BT.2390 (broadcast standard)

OpenCL Blur

# Box blur
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,avgblur_opencl=sizeX=5:sizeY=5,hwdownload,format=yuv420p" \
  output.mp4

OpenCL Custom Convolution

# Edge detection kernel
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,convolution_opencl=0m='-1 -1 -1 -1 8 -1 -1 -1 -1':0rdiv=1:0bias=128,hwdownload,format=yuv420p" \
  output.mp4

# Sharpen kernel
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,convolution_opencl=0m='0 -1 0 -1 5 -1 0 -1 0':0rdiv=1:0bias=0,hwdownload,format=yuv420p" \
  output.mp4

GPU Filter Comparison

Cross-Platform Filter Availability

Filter Type	CUDA	Vulkan	OpenCL	VAAPI	QSV
Scaling	✅ `scale_cuda`	✅ `scale_vulkan`	✅ `scale_opencl`	✅ `scale_vaapi`	✅ `vpp_qsv`
Overlay	✅ `overlay_cuda`	✅ `overlay_vulkan`	✅ `overlay_opencl`	-	-
Denoising	✅ `bilateral_cuda`	✅ `nlmeans_vulkan`	✅ `nlmeans_opencl`	-	-
Deinterlace	✅ `bwdif_cuda`	✅ `bwdif_vulkan`	-	✅ `deinterlace_vaapi`	✅ `vpp_qsv`
Chromakey	✅ `chromakey_cuda`	-	✅ `colorkey_opencl`	-	-
Tonemapping	-	✅ `libplacebo`	✅ `tonemap_opencl`	✅ `tonemap_vaapi`	-
Stabilization	-	-	✅ `deshake_opencl`	-	-
Padding	✅ `pad_cuda`	-	✅ `pad_opencl`	-	-

Performance Comparison

API	Platform Support	Filter Variety	Performance
CUDA	NVIDIA only	Excellent	Best on NVIDIA
Vulkan	Cross-platform	Good (growing)	Very good
OpenCL	Cross-platform	Good	Good
VAAPI	Linux only	Limited	Good on Linux
QSV	Intel only	Limited	Good on Intel

Docker with Hardware Acceleration

NVIDIA GPU in Docker

# Run with NVIDIA GPU support
docker run --gpus all \
  --rm \
  -v $(pwd):/data \
  jrottenberg/ffmpeg:nvidia \
  -hwaccel cuda \
  -hwaccel_output_format cuda \
  -i /data/input.mp4 \
  -c:v h264_nvenc \
  /data/output.mp4

Intel QSV in Docker

# Run with Intel GPU access
docker run --rm \
  --device=/dev/dri:/dev/dri \
  -v $(pwd):/data \
  jrottenberg/ffmpeg:vaapi \
  -hwaccel qsv \
  -i /data/input.mp4 \
  -c:v h264_qsv \
  /data/output.mp4

Troubleshooting

Common Issues

"No NVENC capable devices found"

# Check GPU support
nvidia-smi
# Verify driver version
nvidia-smi --query-gpu=driver_version --format=csv
# Check CUDA version
nvcc --version

"Cannot load libcuda.so"

# Set library path (Linux)
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

QSV "Unsupported format"

# Force pixel format conversion
ffmpeg -i input.mp4 \
  -vf "format=nv12" \
  -c:v h264_qsv \
  output.mp4

VAAPI permission denied

# Add user to video/render group
sudo usermod -aG video $USER
sudo usermod -aG render $USER
# Re-login or use newgrp

Performance Debugging

# Benchmark encode speed
ffmpeg -benchmark -i input.mp4 -c:v h264_nvenc -f null -

# Show hardware decode stats
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -benchmark -i input.mp4 -f null -

# Monitor GPU usage (NVIDIA)
nvidia-smi dmon -s u

# Monitor GPU usage (Intel)
intel_gpu_top

Best Practices

Use full GPU pipelines when possible to avoid CPU-GPU memory transfers
Match decode and encode hardware for best performance
Use appropriate presets - faster isn't always better for quality
Enable lookahead and AQ for quality-critical encodes
Test on target hardware - quality varies by GPU generation
Monitor GPU memory for high-resolution content
Consider power efficiency for laptops and servers
Update drivers regularly for performance and feature improvements

Recommended Settings by Use Case

Live Streaming

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input \
  -c:v h264_nvenc -preset p3 -tune ll -zerolatency 1 -b:v 6M \
  -f flv rtmp://server/live/stream

VOD Encoding (Quality)

ffmpeg -i input.mp4 \
  -c:v hevc_nvenc -preset p6 -tune hq \
  -rc vbr -cq 22 -b:v 0 \
  -rc-lookahead 32 -spatial-aq 1 \
  output.mp4

Batch Processing

# Multiple parallel streams on one GPU
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input1.mp4 -c:v h264_nvenc output1.mp4 &
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input2.mp4 -c:v h264_nvenc output2.mp4 &
wait

This guide covers hardware acceleration fundamentals. For specific platform optimizations and advanced configurations, consult the respective vendor documentation.

Similar Skills

using-git-worktrees

169.2k

Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.

superpowers

subagent-driven-development

169.2k

Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.

3 files

superpowers

dispatching-parallel-agents

169.2k

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

Stats

Parent Repo Stars22

Parent Repo Forks3

Last CommitJan 16, 2026

Actions

View Source View Plugin View on GitHub View README

CRITICAL GUIDELINES

Windows File Path Requirements

MANDATORY: Always Use Backslashes on Windows for File Paths

When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).

Quick Reference

Platform	Encoder	Decoder	Detect Command
NVIDIA	`h264_nvenc`, `hevc_nvenc`, `av1_nvenc`	`h264_cuvid`, `hevc_cuvid`	`ffmpeg -encoders \| grep nvenc`
Intel QSV	`h264_qsv`, `hevc_qsv`, `av1_qsv`	`h264_qsv`, `hevc_qsv`	`ffmpeg -encoders \| grep qsv`
AMD AMF	`h264_amf`, `hevc_amf`, `av1_amf`	N/A (use software)	`ffmpeg -encoders \| grep amf`
Apple	`h264_videotoolbox`, `hevc_videotoolbox`	`h264_videotoolbox`	macOS only
VAAPI	`h264_vaapi`, `hevc_vaapi`, `av1_vaapi`	with `-hwaccel vaapi`	Linux only

When to Use This Skill

Use when GPU acceleration is needed:

Encoding speed is critical (10-30x faster than CPU)
Processing large batches of videos
Real-time encoding for streaming
Server-side transcoding at scale
Docker containers with GPU passthrough

Key decision: GPU encoding trades some quality for massive speed. Use -cq or -qp for quality control.

FFmpeg Hardware Acceleration (2025)

Comprehensive guide to GPU-accelerated encoding and decoding with NVIDIA, Intel, AMD, Apple, and Vulkan.

Current Latest: FFmpeg 8.0.1 (released 2025-11-20) - Check with ffmpeg -version

Hardware Acceleration Overview

Hardware acceleration uses dedicated GPU/SoC components for video processing:

NVENC/NVDEC (NVIDIA): Dedicated video encode/decode engines
QSV (Intel): Quick Sync Video on Intel CPUs with integrated graphics
AMF (AMD): Advanced Media Framework for AMD GPUs
VideoToolbox (Apple): macOS/iOS hardware acceleration
VAAPI (Linux): Video Acceleration API (Intel, AMD on Linux)
Vulkan Video (Cross-platform): FFmpeg 7.1+/8.0 GPU acceleration

Performance Comparison (2025 Benchmarks)

Method	Speed	Quality	Power	Use Case
libx264 (CPU)	1x	Best	High	Quality-critical
libx265 (CPU)	0.3x	Best	Very High	Archival
h264_nvenc	10-20x	Good	Low	Real-time, streaming
hevc_nvenc	8-15x	Good	Low	4K streaming
h264_qsv	8-15x	Good	Very Low	Laptop, efficiency
h264_amf	8-15x	Good	Low	AMD systems

NVIDIA NVENC/NVDEC

Requirements

NVIDIA GPU (GTX 600+ / Quadro K series+)
NVIDIA drivers 450+
FFmpeg built with --enable-nvenc --enable-cuda --enable-cuvid

Check NVIDIA Support

# Check available NVIDIA codecs
ffmpeg -encoders | grep nvenc
ffmpeg -decoders | grep cuvid

# Check GPU info
nvidia-smi

# List hardware accelerators
ffmpeg -hwaccels

Basic NVENC Encoding

# H.264 NVENC encoding
ffmpeg -i input.mp4 -c:v h264_nvenc -preset p4 -b:v 5M output.mp4

# H.265/HEVC NVENC encoding
ffmpeg -i input.mp4 -c:v hevc_nvenc -preset p4 -b:v 4M output.mp4

# AV1 NVENC (RTX 40 series+)
ffmpeg -i input.mp4 -c:v av1_nvenc -preset p4 -b:v 3M output.mp4

NVENC Presets (FFmpeg 7+)

Preset	Speed	Quality	Use Case
p1	Fastest	Lowest	Real-time capture
p2	Faster	Low	Screen recording
p3	Fast	Medium	General streaming
p4	Medium	Good	Recommended
p5	Slow	Better	High-quality streaming
p6	Slower	Best	Offline encoding
p7	Slowest	Highest	Maximum quality

Full GPU Pipeline (Decode + Encode)

# Keep frames in GPU memory (fastest)
ffmpeg -y -vsync 0 \
  -hwaccel cuda \
  -hwaccel_output_format cuda \
  -i input.mp4 \
  -c:v h264_nvenc \
  -preset p4 \
  -b:v 5M \
  -c:a copy \
  output.mp4

GPU Scaling and Filtering

# GPU-based scaling with scale_cuda
ffmpeg -y -vsync 0 \
  -hwaccel cuda \
  -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf scale_cuda=1280:720 \
  -c:v h264_nvenc \
  -preset p4 \
  output.mp4

# GPU overlay with overlay_cuda
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i main.mp4 \
  -hwaccel cuda -hwaccel_output_format cuda -i overlay.mp4 \
  -filter_complex "[0:v][1:v]overlay_cuda=10:10" \
  -c:v h264_nvenc output.mp4

# GPU padding with pad_cuda (FFmpeg 8.0+)
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
  -vf "pad_cuda=1920:1080:(ow-iw)/2:(oh-ih)/2:black" \
  -c:v h264_nvenc output.mp4

# Letterbox with pad_cuda
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
  -vf "scale_cuda=1920:-2,pad_cuda=1920:1080:(ow-iw)/2:(oh-ih)/2" \
  -c:v h264_nvenc output.mp4

CUDA Filters Reference (FFmpeg 8.0+)

Filter	Description	Key Parameters
`scale_cuda`	GPU video scaling	`w`, `h`, `format`, `interp_algo`
`scale_npp`	NPP-based scaling (higher quality)	`w`, `h`, `interp_algo`
`overlay_cuda`	GPU overlay/compositing	`x`, `y`, `eof_action`
`pad_cuda`	GPU padding (8.0+)	`w`, `h`, `x`, `y`, `color`
`chromakey_cuda`	GPU chroma keying	`color`, `similarity`, `blend`
`colorspace_cuda`	Color space conversion	`all`, `space`, `trc`, `primaries`
`bilateral_cuda`	Bilateral filter	`sigmaS`, `sigmaR`, `window_size`
`bwdif_cuda`	Deinterlacing	`mode`, `parity`, `deint`
`hwupload_cuda`	Upload to GPU	-
`hwdownload`	Download from GPU	-

scale_npp (NVIDIA Performance Primitives)

For higher-quality scaling, use scale_npp with optimized interpolation algorithms:

# High-quality downscaling with super-sampling
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf "scale_npp=1280:720:interp_algo=super" \
  -c:v h264_nvenc output.mp4

# Lanczos interpolation for upscaling
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf "scale_npp=1920:1080:interp_algo=lanczos" \
  -c:v h264_nvenc output.mp4

scale_npp Interpolation Algorithms:

Algorithm	Quality	Speed	Best For
`nn`	Low	Fastest	Pixel art, nearest neighbor
`linear`	Medium	Fast	General use
`cubic`	Good	Medium	Smooth scaling
`lanczos`	High	Slow	High quality upscaling
`super`	Best	Slowest	Downscaling

GPU Chromakey (Green Screen)

# Green screen removal on GPU
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i greenscreen.mp4 \
  -hwaccel cuda -hwaccel_output_format cuda -i background.mp4 \
  -filter_complex "[0:v]chromakey_cuda=0x00FF00:0.3:0.1[fg];[1:v][fg]overlay_cuda" \
  -c:v h264_nvenc output.mp4

Hybrid GPU + CPU Filter Pipelines

When mixing CPU and GPU filters, use hwupload_cuda and hwdownload:

# CPU decode -> CPU filter -> GPU encode
ffmpeg -i input.mp4 \
  -vf "fade=in:0:30,hwupload_cuda" \
  -c:v h264_nvenc output.mp4

# GPU decode -> CPU filter (drawtext) -> GPU encode
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf "hwdownload,format=nv12,drawtext=text='Hello':fontsize=48,hwupload_cuda" \
  -c:v h264_nvenc output.mp4

# Complete hybrid pipeline: GPU scale -> CPU text -> GPU encode
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
  -filter_complex "\
    [0:v]scale_cuda=1920:1080[scaled];\
    [scaled]hwdownload,format=nv12[cpu];\
    [cpu]drawtext=text='Watermark':fontsize=48:x=10:y=10[text];\
    [text]hwupload_cuda[gpu]" \
  -map "[gpu]" \
  -c:v h264_nvenc output.mp4

NVENC Quality Optimization

# High quality with lookahead
ffmpeg -i input.mp4 \
  -c:v hevc_nvenc \
  -preset p5 \
  -tune hq \
  -rc vbr \
  -cq 23 \
  -b:v 0 \
  -rc-lookahead 32 \
  -spatial-aq 1 \
  -temporal-aq 1 \
  -c:a copy \
  output.mp4

# Constant quality mode (CQP)
ffmpeg -i input.mp4 \
  -c:v h264_nvenc \
  -preset p4 \
  -rc constqp \
  -qp 23 \
  -c:a copy \
  output.mp4

NVENC Two-Pass Encoding

# Two-pass for best quality (ABR)
ffmpeg -i input.mp4 \
  -c:v h264_nvenc \
  -preset p5 \
  -2pass 1 \
  -b:v 5M \
  -c:a copy \
  output.mp4

NVENC B-Frames and GOP

# Enable B-frames for better compression
ffmpeg -i input.mp4 \
  -c:v hevc_nvenc \
  -preset p4 \
  -bf 4 \
  -b_ref_mode 2 \
  -g 250 \
  -c:a copy \
  output.mp4

GPU Memory Management

Critical Concepts

PCIe transfers between CPU and GPU memory are the primary bottleneck. Keeping frames in GPU memory throughout the pipeline is critical for performance.

Memory Flow Patterns:

Pattern 1: Full GPU Pipeline (OPTIMAL)
Input File -> GPU Decode -> GPU Filter -> GPU Encode -> Output File
                    |          |           |
                    v          v           v
                  [GPU Memory - No PCIe Transfer]

Pattern 2: CPU Filter Insertion (SUBOPTIMAL)
Input -> GPU Decode -> hwdownload -> CPU Filter -> hwupload -> GPU Encode -> Output
                           |                           |
                           v                           v
                     [PCIe Transfer]             [PCIe Transfer]

Pattern 3: Software Decode + GPU Encode
Input -> CPU Decode -> hwupload -> GPU Encode -> Output
                          |
                          v
                    [PCIe Transfer]

Command Patterns Comparison

# OPTIMAL: Full GPU pipeline - no CPU-GPU transfers
ffmpeg -y -vsync 0 \
  -hwaccel cuda \
  -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf scale_cuda=1280:720 \
  -c:v h264_nvenc \
  output.mp4

# SUBOPTIMAL: GPU decode, CPU filter, GPU encode - 2 transfers
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf "hwdownload,format=nv12,drawtext=text='Hello',hwupload_cuda" \
  -c:v h264_nvenc output.mp4

# AVOID: Missing -hwaccel_output_format - frame copies to CPU after decode
ffmpeg -hwaccel cuda -i input.mp4 \
  -vf scale=1280:720 \
  -c:v h264_nvenc output.mp4

Memory Monitoring

# NVIDIA GPU memory and utilization
nvidia-smi dmon -s m

# Watch GPU utilization in real-time
watch -n 1 nvidia-smi

# Intel GPU monitoring
intel_gpu_top

Best Practices

Use -hwaccel_output_format to keep decoded frames in GPU memory
Use GPU filters when available (scale_cuda, overlay_cuda, pad_cuda)
Batch similar operations to minimize filter graph complexity
Monitor GPU memory for high-resolution content
Use -vsync 0 to prevent frame duplication overhead

Multi-GPU Encoding

NVIDIA Multi-GPU

# Specify GPU by index (parallel processing on different GPUs)
ffmpeg -hwaccel cuda \
  -hwaccel_device 0 \
  -hwaccel_output_format cuda \
  -i input1.mp4 \
  -c:v h264_nvenc output1.mp4 &

ffmpeg -hwaccel cuda \
  -hwaccel_device 1 \
  -hwaccel_output_format cuda \
  -i input2.mp4 \
  -c:v h264_nvenc output2.mp4 &

wait

# Initialize specific CUDA device for filters
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
  -init_hw_device cuda=gpu1:1 \
  -filter_complex "[0:v]scale_cuda=1280:720[scaled]" \
  -map "[scaled]" \
  -c:v h264_nvenc output.mp4

1:N Encoding (Single Input, Multiple Outputs)

More efficient than separate processes - shares CUDA context:

# Single input to multiple resolutions (ABR ladder)
ffmpeg -y -vsync 0 \
  -hwaccel cuda \
  -hwaccel_output_format cuda \
  -i input.mp4 \
  -filter_complex "[0:v]split=3[v1][v2][v3];\
    [v1]scale_cuda=1920:1080[hd];\
    [v2]scale_cuda=1280:720[sd];\
    [v3]scale_cuda=640:360[mobile]" \
  -map "[hd]" -c:v h264_nvenc -b:v 5M output_1080p.mp4 \
  -map "[sd]" -c:v h264_nvenc -b:v 2M output_720p.mp4 \
  -map "[mobile]" -c:v h264_nvenc -b:v 500k output_360p.mp4

Parallel Encoding Strategy

For maximum throughput:

Run 4+ parallel FFmpeg sessions
Use inputs with 15+ seconds to amortize initialization
Measure aggregate FPS across all sessions

# Environment variables for faster initialization
export CUDA_VISIBLE_DEVICES=0
export CUDA_DEVICE_MAX_CONNECTIONS=2

# Parallel encoding script
for i in {1..4}; do
  ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
    -i input_$i.mp4 \
    -c:v h264_nvenc \
    output_$i.mp4 &
done
wait

Intel Quick Sync Video (QSV)

Requirements

Intel CPU with integrated graphics (Sandy Bridge+)
Intel Media SDK or oneVPL
FFmpeg built with --enable-libmfx or --enable-libvpl

Check QSV Support

# Check available QSV codecs
ffmpeg -encoders | grep qsv
ffmpeg -decoders | grep qsv

# Verify Intel GPU access (Linux)
ls /dev/dri/
vainfo  # Check VAAPI support

Basic QSV Encoding

# Initialize QSV device
ffmpeg -init_hw_device qsv=hw \
  -filter_hw_device hw \
  -i input.mp4 \
  -c:v h264_qsv \
  -preset medium \
  -b:v 5M \
  output.mp4

# H.265/HEVC QSV
ffmpeg -init_hw_device qsv=hw \
  -filter_hw_device hw \
  -i input.mp4 \
  -c:v hevc_qsv \
  -preset medium \
  -b:v 4M \
  output.mp4

# AV1 QSV (Intel Arc, 12th gen+)
ffmpeg -init_hw_device qsv=hw \
  -filter_hw_device hw \
  -i input.mp4 \
  -c:v av1_qsv \
  -preset medium \
  -b:v 3M \
  output.mp4

Full QSV Pipeline

# Decode and encode on GPU
ffmpeg -hwaccel qsv \
  -hwaccel_output_format qsv \
  -i input.mp4 \
  -c:v h264_qsv \
  -preset medium \
  -b:v 5M \
  output.mp4

QSV Quality Settings

# Constant quality (recommended)
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
  -i input.mp4 \
  -c:v hevc_qsv \
  -preset slow \
  -global_quality 22 \
  -look_ahead 1 \
  output.mp4

# VBR with lookahead
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
  -i input.mp4 \
  -c:v h264_qsv \
  -preset medium \
  -b:v 5M \
  -maxrate 7M \
  -bufsize 10M \
  -look_ahead 1 \
  -look_ahead_depth 40 \
  output.mp4

QSV Quality Parameters:

Parameter	Description	Range
`-global_quality`	Quality level (like CRF)	1-51 (lower = better)
`-preset`	Encoding speed/quality	`veryfast`, `faster`, `fast`, `medium`, `slow`, `slower`, `veryslow`
`-look_ahead`	Enable lookahead	0 or 1
`-look_ahead_depth`	Lookahead frames	10-100

Multi-GPU QSV (Linux)

# Select specific GPU by render device
ffmpeg -init_hw_device qsv=hw:/dev/dri/renderD129 \
  -filter_hw_device hw \
  -i input.mp4 \
  -c:v h264_qsv output.mp4

QSV with Scaling

# GPU scaling with vpp_qsv
ffmpeg -hwaccel qsv \
  -hwaccel_output_format qsv \
  -i input.mp4 \
  -vf "vpp_qsv=w=1280:h=720" \
  -c:v h264_qsv \
  -preset medium \
  output.mp4

VVC/H.266 QSV Decoding (FFmpeg 7.1+)

# Hardware VVC decoding
ffmpeg -hwaccel qsv \
  -hwaccel_output_format qsv \
  -c:v vvc_qsv \
  -i input.vvc \
  -c:v h264_qsv \
  output.mp4

AMD AMF

Requirements

AMD GPU (GCN or newer)
AMD drivers with AMF support
FFmpeg built with --enable-amf

Check AMF Support

ffmpeg -encoders | grep amf

Basic AMF Encoding

# H.264 AMF
ffmpeg -i input.mp4 \
  -c:v h264_amf \
  -quality balanced \
  -b:v 5M \
  output.mp4

# H.265/HEVC AMF
ffmpeg -i input.mp4 \
  -c:v hevc_amf \
  -quality balanced \
  -b:v 4M \
  output.mp4

# AV1 AMF (RDNA3+)
ffmpeg -i input.mp4 \
  -c:v av1_amf \
  -quality balanced \
  -b:v 3M \
  output.mp4

AMF Quality Presets

Preset	Description
speed	Fastest encoding
balanced	Recommended
quality	Best quality

AMD Hardware Upscaling (FFmpeg 8.0+)

# Super resolution upscaling
ffmpeg -i input.mp4 \
  -vf "sr_amf=4096:2160:algorithm=sr1-1" \
  -c:v hevc_amf \
  output.mp4

VAAPI (Linux)

Requirements

Intel, AMD, or NVIDIA GPU on Linux
VAAPI drivers (intel-media-driver, mesa-va-drivers)
FFmpeg built with --enable-vaapi

Check VAAPI Support

# Check VAAPI info
vainfo

# List VAAPI codecs
ffmpeg -encoders | grep vaapi
ffmpeg -decoders | grep vaapi

Basic VAAPI Encoding

# H.264 VAAPI
ffmpeg -vaapi_device /dev/dri/renderD128 \
  -i input.mp4 \
  -vf 'format=nv12,hwupload' \
  -c:v h264_vaapi \
  -b:v 5M \
  output.mp4

# H.265/HEVC VAAPI
ffmpeg -vaapi_device /dev/dri/renderD128 \
  -i input.mp4 \
  -vf 'format=nv12,hwupload' \
  -c:v hevc_vaapi \
  -b:v 4M \
  output.mp4

Full VAAPI Pipeline

# Decode and encode on GPU
ffmpeg -hwaccel vaapi \
  -hwaccel_device /dev/dri/renderD128 \
  -hwaccel_output_format vaapi \
  -i input.mp4 \
  -c:v h264_vaapi \
  -b:v 5M \
  output.mp4

VAAPI Scaling

ffmpeg -hwaccel vaapi \
  -hwaccel_device /dev/dri/renderD128 \
  -hwaccel_output_format vaapi \
  -i input.mp4 \
  -vf 'scale_vaapi=w=1280:h=720' \
  -c:v h264_vaapi \
  output.mp4

VVC VAAPI Decoding (FFmpeg 8.0+)

FFmpeg 8.0 adds VVC/H.266 hardware decoding on Intel and AMD GPUs via VAAPI.

# Hardware VVC/H.266 decoding
ffmpeg -hwaccel vaapi \
  -hwaccel_device /dev/dri/renderD128 \
  -hwaccel_output_format vaapi \
  -i input.vvc \
  -c:v h264_vaapi \
  output.mp4

# VVC decode + transcode to H.265
ffmpeg -hwaccel vaapi \
  -hwaccel_device /dev/dri/renderD128 \
  -hwaccel_output_format vaapi \
  -i input.mkv \
  -c:v hevc_vaapi \
  -b:v 4M \
  output.mp4

# VVC with Screen Content Coding (SCC) support
# FFmpeg 8.0 adds full SCC support including:
# - IBC (Inter Block Copy)
# - Palette Mode
# - ACT (Adaptive Color Transform)
ffmpeg -hwaccel vaapi \
  -hwaccel_device /dev/dri/renderD128 \
  -i screen_recording.vvc \
  output.mp4

VVC VAAPI Requirements:

Intel Xe2 graphics (Lunar Lake) or newer for full VVC support
FFmpeg 8.0 or later
Intel media driver with VVC support

Apple VideoToolbox

Requirements

macOS 10.8+ or iOS 8+
FFmpeg built with --enable-videotoolbox

Check VideoToolbox Support

ffmpeg -encoders | grep videotoolbox
ffmpeg -decoders | grep videotoolbox

Basic VideoToolbox Encoding

# H.264 VideoToolbox
ffmpeg -i input.mp4 \
  -c:v h264_videotoolbox \
  -b:v 5M \
  output.mp4

# H.265/HEVC VideoToolbox
ffmpeg -i input.mp4 \
  -c:v hevc_videotoolbox \
  -b:v 4M \
  -tag:v hvc1 \
  output.mp4

# ProRes VideoToolbox
ffmpeg -i input.mp4 \
  -c:v prores_videotoolbox \
  -profile:v 3 \
  output.mov

VideoToolbox Quality Settings

# Quality-based encoding
ffmpeg -i input.mp4 \
  -c:v h264_videotoolbox \
  -q:v 65 \
  output.mp4

# Hardware accelerated decode + encode
ffmpeg -hwaccel videotoolbox \
  -i input.mp4 \
  -c:v h264_videotoolbox \
  -b:v 5M \
  output.mp4

Vulkan Video (FFmpeg 7.1+/8.0)

Overview

FFmpeg 7.1 Vulkan Features

H.264 Vulkan encoding
H.265/HEVC Vulkan encoding

FFmpeg 8.0 Vulkan Features (New)

AV1 Vulkan encoding - GPU-accelerated AV1 via compute shaders
VP9 Vulkan decoding - Hardware-accelerated VP9 decode
FFv1 Vulkan encode/decode - Lossless codec for archival/capture
ProRes RAW Vulkan decode - Apple ProRes RAW hardware decode

Benefits of Vulkan Compute Codecs:

Cross-platform: Same code works on AMD, Intel, and NVIDIA
No vendor lock-in: Works with any Vulkan 1.3 driver
Ideal for: Lossless screen capture, high-throughput archival, professional workflows

Basic Vulkan Encoding

# H.264 Vulkan
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -c:v h264_vulkan \
  -b:v 5M \
  output.mp4

# H.265/HEVC Vulkan
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -c:v hevc_vulkan \
  -b:v 4M \
  output.mp4

# AV1 Vulkan (FFmpeg 8.0+)
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -c:v av1_vulkan \
  -b:v 3M \
  output.mp4

# FFv1 Vulkan Lossless (FFmpeg 8.0+)
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -c:v ffv1_vulkan \
  output.mkv

Full Vulkan Pipeline

# Complete Vulkan decode-filter-encode
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -hwaccel vulkan \
  -hwaccel_output_format vulkan \
  -i input.mp4 \
  -vf "scale_vulkan=1280:720" \
  -c:v h264_vulkan \
  output.mp4

VP9 Vulkan Decoding (FFmpeg 8.0+)

# Hardware decode VP9 with Vulkan
ffmpeg -init_hw_device vulkan \
  -hwaccel vulkan \
  -hwaccel_output_format vulkan \
  -i input.webm \
  -c:v h264_vulkan \
  output.mp4

ProRes RAW Vulkan Decoding (FFmpeg 8.0+)

# Hardware decode ProRes RAW with Vulkan
ffmpeg -init_hw_device vulkan \
  -hwaccel vulkan \
  -i input.mov \
  -c:v libx264 \
  output.mp4

Lossless Screen Capture with FFv1 Vulkan

# High-throughput lossless capture (Linux X11)
ffmpeg -init_hw_device vulkan \
  -f x11grab -framerate 60 -i :0.0 \
  -c:v ffv1_vulkan \
  screen_capture.mkv

# Lossless screen recording (Windows)
ffmpeg -init_hw_device vulkan \
  -f gdigrab -framerate 60 -i desktop \
  -c:v ffv1_vulkan \
  screen_capture.mkv

Upcoming Vulkan Codecs

The next minor update will add:

ProRes (encode and decode)
VC-2 (encode and decode)

Vulkan Filters (FFmpeg 8.0+)

Vulkan filters provide cross-platform GPU acceleration for video processing.

Vulkan Filter Reference

Filter	Description	Key Parameters
`scale_vulkan`	GPU scaling	`w`, `h`, `scaler`
`overlay_vulkan`	Compositing	`x`, `y`
`transpose_vulkan`	Rotate/transpose	`dir`, `passthrough`
`flip_vulkan`	Horizontal flip	-
`avgblur_vulkan`	Box blur	`sizeX`, `sizeY`
`gblur_vulkan`	Gaussian blur	`sigma`, `sigmaV`, `planes`
`chromaber_vulkan`	Chromatic aberration	`dist_x`, `dist_y`
`nlmeans_vulkan`	NLMeans denoising	`s`, `p`, `r`
`bwdif_vulkan`	Deinterlacing	`mode`, `parity`, `deint`
`xfade_vulkan`	Transitions	`transition`, `duration`, `offset`
`libplacebo`	Libplacebo processing	`w`, `h`, `colorspace`

Vulkan Scaling

# Basic Vulkan scaling
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -i input.mp4 \
  -vf "hwupload,scale_vulkan=1920:1080,hwdownload,format=yuv420p" \
  output.mp4

# Full Vulkan pipeline with scaling
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -hwaccel vulkan -hwaccel_output_format vulkan \
  -i input.mp4 \
  -vf "scale_vulkan=1280:720" \
  -c:v h264_vulkan output.mp4

# Vulkan scaling with specific scaler
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,scale_vulkan=w=1920:h=1080:scaler=lanczos,hwdownload,format=yuv420p" \
  output.mp4

Vulkan Compositing (overlay_vulkan)

# Picture-in-picture with Vulkan
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -i main.mp4 -i overlay.mp4 \
  -filter_complex "\
    [0:v]hwupload[main];\
    [1:v]hwupload,scale_vulkan=320:180[pip];\
    [main][pip]overlay_vulkan=x=10:y=10[out];\
    [out]hwdownload,format=yuv420p" \
  -map "[out]" output.mp4

# Full Vulkan overlay pipeline
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -hwaccel vulkan -hwaccel_output_format vulkan -i main.mp4 \
  -hwaccel vulkan -hwaccel_output_format vulkan -i overlay.mp4 \
  -filter_complex "[0:v][1:v]overlay_vulkan=x=W-w-10:y=10" \
  -c:v h264_vulkan output.mp4

Vulkan Rotation and Transpose

# Transpose (rotate 90° clockwise)
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,transpose_vulkan=dir=clock,hwdownload,format=yuv420p" \
  output.mp4

# Horizontal flip
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,flip_vulkan,hwdownload,format=yuv420p" \
  output.mp4

transpose_vulkan directions:

dir	Rotation
`cclock_flip`	90° counter-clockwise + vertical flip
`clock`	90° clockwise
`cclock`	90° counter-clockwise
`clock_flip`	90° clockwise + vertical flip

Vulkan Blur Effects

# Gaussian blur
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,gblur_vulkan=sigma=5,hwdownload,format=yuv420p" \
  output.mp4

# Box blur (averaging)
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,avgblur_vulkan=sizeX=5:sizeY=5,hwdownload,format=yuv420p" \
  output.mp4

Vulkan Chromatic Aberration

# Add chromatic aberration effect
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,chromaber_vulkan=dist_x=-0.002:dist_y=0.002,hwdownload,format=yuv420p" \
  output.mp4

Vulkan Denoising (nlmeans_vulkan)

# Non-local means denoising on GPU
ffmpeg -init_hw_device vulkan \
  -i noisy.mp4 \
  -vf "hwupload,nlmeans_vulkan=s=3:p=7:r=15,hwdownload,format=yuv420p" \
  output.mp4

# Full Vulkan denoising pipeline
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -hwaccel vulkan -hwaccel_output_format vulkan \
  -i noisy.mp4 \
  -vf "nlmeans_vulkan=s=4" \
  -c:v h264_vulkan output.mp4

Vulkan Deinterlacing (bwdif_vulkan)

# Bob-Weaver deinterlacing
ffmpeg -init_hw_device vulkan \
  -i interlaced.mp4 \
  -vf "hwupload,bwdif_vulkan=mode=send_frame,hwdownload,format=yuv420p" \
  output.mp4

# Full Vulkan deinterlace pipeline
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -hwaccel vulkan -hwaccel_output_format vulkan \
  -i interlaced.mp4 \
  -vf "bwdif_vulkan=1" \
  -c:v h264_vulkan output.mp4

Vulkan Transitions (xfade_vulkan)

# GPU-accelerated crossfade between two clips
ffmpeg -init_hw_device vulkan=vk \
  -filter_hw_device vk \
  -i clip1.mp4 -i clip2.mp4 \
  -filter_complex "\
    [0:v]hwupload[v1];\
    [1:v]hwupload[v2];\
    [v1][v2]xfade_vulkan=transition=fade:duration=1:offset=4[out];\
    [out]hwdownload,format=yuv420p" \
  -map "[out]" output.mp4

xfade_vulkan transitions: fade, dissolve, wipeleft, wiperight, wipeup, wipedown, slideleft, slideright, slideup, slidedown, and more.

Libplacebo Processing (Advanced)

Libplacebo provides advanced color management and processing:

# HDR tonemapping with libplacebo
ffmpeg -init_hw_device vulkan \
  -i hdr_input.mp4 \
  -vf "hwupload,libplacebo=tonemapping=hable:colorspace=bt709:color_primaries=bt709:color_trc=bt709,hwdownload,format=yuv420p" \
  sdr_output.mp4

# Scaling with libplacebo (high quality)
ffmpeg -init_hw_device vulkan \
  -i input.mp4 \
  -vf "hwupload,libplacebo=w=3840:h=2160:upscaler=ewa_lanczos,hwdownload,format=yuv420p10le" \
  output.mp4

OpenCL Filters

OpenCL provides GPU acceleration that works across vendors (NVIDIA, AMD, Intel).

OpenCL Filter Reference

Filter	Description	Key Parameters
`scale_opencl`	GPU scaling	`w`, `h`
`overlay_opencl`	Compositing	`x`, `y`
`pad_opencl`	Padding	`w`, `h`, `x`, `y`, `color`
`colorkey_opencl`	Chroma keying	`color`, `similarity`, `blend`
`unsharp_opencl`	Sharpening	`lx`, `ly`, `la`, `cx`, `cy`, `ca`
`nlmeans_opencl`	NLMeans denoising	`s`, `p`, `r`
`deshake_opencl`	Stabilization	`tripod`, `smooth`, `filename`
`tonemap_opencl`	HDR tonemapping	`tonemap`, `transfer`, `matrix`
`avgblur_opencl`	Box blur	`sizeX`, `sizeY`
`convolution_opencl`	Custom kernel	`0m`, `1m`, etc.

OpenCL Initialization

# List OpenCL devices
ffmpeg -init_hw_device opencl=gpu:0.0 -filter_hw_device gpu -f lavfi -i nullsrc -vf "hwupload,avgblur_opencl=2" -t 1 -f null -

# Basic OpenCL filter pattern
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,<filter_opencl>,hwdownload,format=yuv420p" \
  output.mp4

OpenCL Scaling

# GPU scaling with OpenCL
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,scale_opencl=1920:1080,hwdownload,format=yuv420p" \
  output.mp4

OpenCL Compositing

# Picture-in-picture with OpenCL
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i main.mp4 -i overlay.mp4 \
  -filter_complex "\
    [0:v]hwupload[main];\
    [1:v]hwupload,scale_opencl=320:180[pip];\
    [main][pip]overlay_opencl=x=10:y=10[out];\
    [out]hwdownload,format=yuv420p" \
  -map "[out]" output.mp4

OpenCL Chroma Keying (colorkey_opencl)

# Green screen removal with OpenCL
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i greenscreen.mp4 -i background.mp4 \
  -filter_complex "\
    [0:v]hwupload,colorkey_opencl=0x00FF00:0.3:0.1[fg];\
    [1:v]hwupload[bg];\
    [bg][fg]overlay_opencl[out];\
    [out]hwdownload,format=yuv420p" \
  -map "[out]" output.mp4

OpenCL Denoising (nlmeans_opencl)

# Non-local means denoising on GPU
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i noisy.mp4 \
  -vf "hwupload,nlmeans_opencl=s=3:p=7:r=15,hwdownload,format=yuv420p" \
  output.mp4

# Strong denoising
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i noisy.mp4 \
  -vf "hwupload,nlmeans_opencl=s=5:p=7:r=21,hwdownload,format=yuv420p" \
  output.mp4

OpenCL Stabilization (deshake_opencl)

# GPU-accelerated video stabilization
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i shaky.mp4 \
  -vf "hwupload,deshake_opencl,hwdownload,format=yuv420p" \
  output.mp4

# Tripod mode (camera stays in one place)
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i shaky.mp4 \
  -vf "hwupload,deshake_opencl=tripod=1,hwdownload,format=yuv420p" \
  output.mp4

OpenCL Sharpening (unsharp_opencl)

# Basic sharpening
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,unsharp_opencl=lx=5:ly=5:la=1.0,hwdownload,format=yuv420p" \
  output.mp4

# Subtle sharpening for HD video
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,unsharp_opencl=lx=3:ly=3:la=0.5,hwdownload,format=yuv420p" \
  output.mp4

OpenCL HDR Tonemapping (tonemap_opencl)

# HDR to SDR conversion with GPU
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i hdr_input.mp4 \
  -vf "hwupload,tonemap_opencl=tonemap=hable:transfer=bt709:matrix=bt709:primaries=bt709,hwdownload,format=yuv420p" \
  sdr_output.mp4

# Reinhard tonemapping
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i hdr_input.mp4 \
  -vf "hwupload,tonemap_opencl=tonemap=reinhard:param=0.5,hwdownload,format=yuv420p" \
  sdr_output.mp4

tonemap_opencl algorithms:

Algorithm	Description
`none`	No tonemapping
`clip`	Simple clipping
`linear`	Linear scaling
`gamma`	Gamma-based
`reinhard`	Reinhard operator
`hable`	Hable/Filmic (most popular)
`mobius`	Mobius (preserves blacks)
`bt2390`	ITU-R BT.2390 (broadcast standard)

OpenCL Blur

# Box blur
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,avgblur_opencl=sizeX=5:sizeY=5,hwdownload,format=yuv420p" \
  output.mp4

OpenCL Custom Convolution

# Edge detection kernel
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,convolution_opencl=0m='-1 -1 -1 -1 8 -1 -1 -1 -1':0rdiv=1:0bias=128,hwdownload,format=yuv420p" \
  output.mp4

# Sharpen kernel
ffmpeg -init_hw_device opencl=gpu \
  -filter_hw_device gpu \
  -i input.mp4 \
  -vf "hwupload,convolution_opencl=0m='0 -1 0 -1 5 -1 0 -1 0':0rdiv=1:0bias=0,hwdownload,format=yuv420p" \
  output.mp4

GPU Filter Comparison

Cross-Platform Filter Availability

Filter Type	CUDA	Vulkan	OpenCL	VAAPI	QSV
Scaling	✅ `scale_cuda`	✅ `scale_vulkan`	✅ `scale_opencl`	✅ `scale_vaapi`	✅ `vpp_qsv`
Overlay	✅ `overlay_cuda`	✅ `overlay_vulkan`	✅ `overlay_opencl`	-	-
Denoising	✅ `bilateral_cuda`	✅ `nlmeans_vulkan`	✅ `nlmeans_opencl`	-	-
Deinterlace	✅ `bwdif_cuda`	✅ `bwdif_vulkan`	-	✅ `deinterlace_vaapi`	✅ `vpp_qsv`
Chromakey	✅ `chromakey_cuda`	-	✅ `colorkey_opencl`	-	-
Tonemapping	-	✅ `libplacebo`	✅ `tonemap_opencl`	✅ `tonemap_vaapi`	-
Stabilization	-	-	✅ `deshake_opencl`	-	-
Padding	✅ `pad_cuda`	-	✅ `pad_opencl`	-	-

Performance Comparison

API	Platform Support	Filter Variety	Performance
CUDA	NVIDIA only	Excellent	Best on NVIDIA
Vulkan	Cross-platform	Good (growing)	Very good
OpenCL	Cross-platform	Good	Good
VAAPI	Linux only	Limited	Good on Linux
QSV	Intel only	Limited	Good on Intel

Docker with Hardware Acceleration

NVIDIA GPU in Docker

# Run with NVIDIA GPU support
docker run --gpus all \
  --rm \
  -v $(pwd):/data \
  jrottenberg/ffmpeg:nvidia \
  -hwaccel cuda \
  -hwaccel_output_format cuda \
  -i /data/input.mp4 \
  -c:v h264_nvenc \
  /data/output.mp4

Intel QSV in Docker

# Run with Intel GPU access
docker run --rm \
  --device=/dev/dri:/dev/dri \
  -v $(pwd):/data \
  jrottenberg/ffmpeg:vaapi \
  -hwaccel qsv \
  -i /data/input.mp4 \
  -c:v h264_qsv \
  /data/output.mp4

Troubleshooting

Common Issues

"No NVENC capable devices found"

# Check GPU support
nvidia-smi
# Verify driver version
nvidia-smi --query-gpu=driver_version --format=csv
# Check CUDA version
nvcc --version

"Cannot load libcuda.so"

# Set library path (Linux)
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

QSV "Unsupported format"

# Force pixel format conversion
ffmpeg -i input.mp4 \
  -vf "format=nv12" \
  -c:v h264_qsv \
  output.mp4

VAAPI permission denied

# Add user to video/render group
sudo usermod -aG video $USER
sudo usermod -aG render $USER
# Re-login or use newgrp

Performance Debugging

# Benchmark encode speed
ffmpeg -benchmark -i input.mp4 -c:v h264_nvenc -f null -

# Show hardware decode stats
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -benchmark -i input.mp4 -f null -

# Monitor GPU usage (NVIDIA)
nvidia-smi dmon -s u

# Monitor GPU usage (Intel)
intel_gpu_top

Best Practices

Use full GPU pipelines when possible to avoid CPU-GPU memory transfers
Match decode and encode hardware for best performance
Use appropriate presets - faster isn't always better for quality
Enable lookahead and AQ for quality-critical encodes
Test on target hardware - quality varies by GPU generation
Monitor GPU memory for high-resolution content
Consider power efficiency for laptops and servers
Update drivers regularly for performance and feature improvements

Recommended Settings by Use Case

Live Streaming

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input \
  -c:v h264_nvenc -preset p3 -tune ll -zerolatency 1 -b:v 6M \
  -f flv rtmp://server/live/stream

VOD Encoding (Quality)

ffmpeg -i input.mp4 \
  -c:v hevc_nvenc -preset p6 -tune hq \
  -rc vbr -cq 22 -b:v 0 \
  -rc-lookahead 32 -spatial-aq 1 \
  output.mp4

Batch Processing

# Multiple parallel streams on one GPU
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input1.mp4 -c:v h264_nvenc output1.mp4 &
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input2.mp4 -c:v h264_nvenc output2.mp4 &
wait

This guide covers hardware acceleration fundamentals. For specific platform optimizations and advanced configurations, consult the respective vendor documentation.