CRITICAL GUIDELINES

Windows File Path Requirements

MANDATORY: Always Use Backslashes on Windows for File Paths

When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).

Python-FFmpeg Integration Reference (2025-2026)

Complete type-safe parameter mapping reference for integrating FFmpeg with Python.

Quick Reference

FFmpeg Parameter	Python Type	Range/Format	Common Mistake
`-crf`	`int` or `str`	0-51 (H.264/H.265)	Using float: `crf=18.5` ❌
`-b:v`	`str`	"5M", "1000k"	Using int: `b:v=5000000` ❌
`fontsize`	`int`	1-999	Using str: `fontsize="24"` ❌
`fontcolor`	`str`	"white", "#FFFFFF", "0xFFFFFF"	Wrong format: `fontcolor="255,255,255"` ❌
ASS PrimaryColour	`str`	"&HAABBGGRR"	Using RGB: `&H00FFFFFF` ❌ (should be BGR)
`alpha`	`str` (expression)	"0.5", "min(1,t/2)"	Using float: `alpha=0.5` ❌
`x`, `y`	`str` or `int`	"100", "(w-tw)/2"	Forgetting quotes on expressions ❌

Section 1: Color Format Conversions

Critical: FFmpeg Color Format Landscape

Context	Format	Byte Order	Example	Python Type
FFmpeg drawtext	Named/Hex	RGB	"white", "#FFFFFF", "0xFFFFFF"	`str`
ASS PrimaryColour	&HAABBGGRR	ABGR	`&H00FFFFFF` (white)	`str`
ASS OutlineColour	&HAABBGGRR	ABGR	`&H00000000` (black)	`str`
OpenCV cv2	Array	BGR	`[255, 255, 255]`	`np.ndarray`
PIL/Pillow	Tuple/Hex	RGB	`(255, 255, 255)`, "#FFFFFF"	`tuple` or `str`
NumPy	Array	RGB or BGR	Depends on source	`np.ndarray`

Color Conversion Functions (Python)

from typing import Tuple

def rgb_to_bgr_hex(r: int, g: int, b: int) -> str:
    """
    Convert RGB (0-255) to BGR hex string.
    Used for OpenCV color specifications.

    Args:
        r, g, b: Red, Green, Blue (0-255)

    Returns:
        Hex string in BGR order: "0xBBGGRR"
    """
    return f"0x{b:02X}{g:02X}{r:02X}"

def rgb_to_ass_color(r: int, g: int, b: int, alpha: int = 0) -> str:
    """
    Convert RGB to ASS/SSA color format (&HAABBGGRR).

    CRITICAL: ASS uses BGR order, not RGB!

    Args:
        r, g, b: Red, Green, Blue (0-255)
        alpha: Alpha channel (0=opaque, 255=transparent)

    Returns:
        ASS color string: "&HAABBGGRR"

    Examples:
        >>> rgb_to_ass_color(255, 255, 255)  # White
        '&H00FFFFFF'
        >>> rgb_to_ass_color(255, 0, 0)      # Red
        '&H000000FF'
        >>> rgb_to_ass_color(0, 255, 0)      # Green
        '&H0000FF00'
        >>> rgb_to_ass_color(0, 0, 255)      # Blue
        '&H00FF0000'
    """
    return f"&H{alpha:02X}{b:02X}{g:02X}{r:02X}"

def ass_color_to_rgb(ass_color: str) -> Tuple[int, int, int, int]:
    """
    Parse ASS color format (&HAABBGGRR) to RGBA.

    Args:
        ass_color: ASS color string like "&H00FFFFFF"

    Returns:
        Tuple (r, g, b, alpha)
    """
    # Remove &H prefix
    hex_val = ass_color.replace("&H", "").replace("&h", "")

    # Pad to 8 characters if needed (some formats omit alpha)
    hex_val = hex_val.zfill(8)

    # Extract AABBGGRR
    alpha = int(hex_val[0:2], 16)
    blue = int(hex_val[2:4], 16)
    green = int(hex_val[4:6], 16)
    red = int(hex_val[6:8], 16)

    return (red, green, blue, alpha)

def ffmpeg_color_to_rgb(color: str) -> Tuple[int, int, int]:
    """
    Parse FFmpeg named or hex color to RGB.

    Args:
        color: Named color ("white") or hex ("#FFFFFF", "0xFFFFFF")

    Returns:
        Tuple (r, g, b)
    """
    # Named colors (subset)
    named_colors = {
        "white": (255, 255, 255),
        "black": (0, 0, 0),
        "red": (255, 0, 0),
        "green": (0, 255, 0),
        "blue": (0, 0, 255),
        "yellow": (255, 255, 0),
        "cyan": (0, 255, 255),
        "magenta": (255, 0, 255),
    }

    if color.lower() in named_colors:
        return named_colors[color.lower()]

    # Parse hex
    hex_val = color.replace("#", "").replace("0x", "")
    r = int(hex_val[0:2], 16)
    g = int(hex_val[2:4], 16)
    b = int(hex_val[4:6], 16)

    return (r, g, b)

# Common color presets (ASS format)
ASS_COLORS = {
    "white": "&H00FFFFFF",
    "black": "&H00000000",
    "red": "&H000000FF",
    "green": "&H0000FF00",
    "blue": "&H00FF0000",
    "yellow": "&H0000FFFF",
    "cyan": "&H00FFFF00",
    "magenta": "&H00FF00FF",
    "orange": "&H0000A5FF",  # RGB(255, 165, 0)
    "purple": "&H00800080",  # RGB(128, 0, 128)
}

# Transparency examples (ASS alpha channel)
ASS_ALPHA = {
    "opaque": 0x00,        # Fully opaque (0%)
    "transparent_10": 0x1A,  # 10% transparent
    "transparent_25": 0x40,  # 25% transparent
    "transparent_50": 0x80,  # 50% transparent (common for shadows)
    "transparent_75": 0xBF,  # 75% transparent
    "invisible": 0xFF,     # Fully transparent (100%)
}

Color Format Comparison Chart

Color	RGB	Hex (RGB)	ASS (&HAABBGGRR)	OpenCV BGR Array
White	(255,255,255)	#FFFFFF	&H00FFFFFF	[255,255,255]
Black	(0,0,0)	#000000	&H00000000	[0,0,0]
Red	(255,0,0)	#FF0000	&H000000FF	[0,0,255]
Green	(0,255,0)	#00FF00	&H0000FF00	[0,255,0]
Blue	(0,0,255)	#0000FF	&H00FF0000	[255,0,0]
Yellow	(255,255,0)	#FFFF00	&H0000FFFF	[0,255,255]
Cyan	(0,255,255)	#00FFFF	&H00FFFF00	[255,255,0]
Magenta	(255,0,255)	#FF00FF	&H00FF00FF	[255,0,255]
Orange	(255,165,0)	#FFA500	&H0000A5FF	[0,165,255]

Section 2: Time Unit Conversions

Critical: Three Different Time Systems

Context	Unit	Python Type	Conversion Formula	Example
FFmpeg filters (fade, xfade)	Seconds	`float` or `int`	N/A	`duration=1.5`
ASS karaoke (\k, \kf, \ko)	Centiseconds	`int`	`cs = seconds * 100`	`{\k50}` = 0.5s
ASS animation (\t, \fad, \move)	Milliseconds	`int`	`ms = seconds * 1000`	`\t(0,500,...)` = 0.5s

Time Conversion Functions

from typing import Union

def seconds_to_centiseconds(seconds: float) -> int:
    """
    Convert seconds to centiseconds for ASS karaoke tags.

    Args:
        seconds: Time in seconds

    Returns:
        Centiseconds (1/100 second)

    Examples:
        >>> seconds_to_centiseconds(0.5)
        50
        >>> seconds_to_centiseconds(1.0)
        100
        >>> seconds_to_centiseconds(2.5)
        250
    """
    return int(seconds * 100)

def seconds_to_milliseconds(seconds: float) -> int:
    """
    Convert seconds to milliseconds for ASS animation tags.

    Args:
        seconds: Time in seconds

    Returns:
        Milliseconds (1/1000 second)

    Examples:
        >>> seconds_to_milliseconds(0.5)
        500
        >>> seconds_to_milliseconds(1.0)
        1000
        >>> seconds_to_milliseconds(0.2)
        200
    """
    return int(seconds * 1000)

def centiseconds_to_seconds(centiseconds: int) -> float:
    """
    Convert ASS karaoke centiseconds to seconds.

    Args:
        centiseconds: Duration in centiseconds

    Returns:
        Seconds
    """
    return centiseconds / 100.0

def milliseconds_to_seconds(milliseconds: int) -> float:
    """
    Convert ASS animation milliseconds to seconds.

    Args:
        milliseconds: Duration in milliseconds

    Returns:
        Seconds
    """
    return milliseconds / 1000.0

def format_ass_timestamp(seconds: float) -> str:
    """
    Format seconds as ASS timestamp (H:MM:SS.CC).

    Args:
        seconds: Time in seconds

    Returns:
        ASS timestamp string

    Examples:
        >>> format_ass_timestamp(1.5)
        '0:00:01.50'
        >>> format_ass_timestamp(65.25)
        '0:01:05.25'
        >>> format_ass_timestamp(3661.0)
        '1:01:01.00'
    """
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    secs = int(seconds % 60)
    centis = int((seconds % 1) * 100)

    return f"{hours}:{minutes:02d}:{secs:02d}.{centis:02d}"

def parse_ass_timestamp(timestamp: str) -> float:
    """
    Parse ASS timestamp to seconds.

    Args:
        timestamp: ASS timestamp like "0:00:05.50"

    Returns:
        Time in seconds

    Examples:
        >>> parse_ass_timestamp("0:00:01.50")
        1.5
        >>> parse_ass_timestamp("0:01:05.25")
        65.25
    """
    parts = timestamp.split(":")
    hours = int(parts[0])
    minutes = int(parts[1])
    sec_parts = parts[2].split(".")
    seconds = int(sec_parts[0])
    centiseconds = int(sec_parts[1]) if len(sec_parts) > 1 else 0

    total = hours * 3600 + minutes * 60 + seconds + centiseconds / 100.0
    return total

# Quick conversion constants
SECOND_TO_CS = 100      # Centiseconds per second
SECOND_TO_MS = 1000     # Milliseconds per second
CS_TO_MS = 10           # Milliseconds per centisecond

Common Duration Mappings

Human Readable	Seconds	Centiseconds (ASS \k)	Milliseconds (ASS \t)
100ms (flash)	0.1	10	100
250ms (quick)	0.25	25	250
500ms (half second)	0.5	50	500
1 second	1.0	100	1000
1.5 seconds	1.5	150	1500
2 seconds	2.0	200	2000

Section 3: FFmpeg drawtext Parameters

Complete Parameter Type Reference

Text Content

Parameter	Python Type	Description	Example
`text`	`str`	Text to display	`text='Hello World'`
`textfile`	`str` (path)	Path to text file	`textfile='/path/to/text.txt'`

Font Parameters

Parameter	Python Type	Range/Format	Validation	Example
`fontfile`	`str` (path)	Absolute path to .ttf/.otf	File exists	`fontfile='/fonts/Arial.ttf'`
`fontsize`	`int`	1-999 (practical: 12-200)	`1 <= size <= 999`	`fontsize=48`
`fontcolor`	`str`	Named or hex RGB	Valid color	`fontcolor='white'`
`fontcolor_expr`	`str` (expression)	Dynamic color expression	Valid FFmpeg expr	`fontcolor_expr='0xFFFFFF'`

CRITICAL: fontsize MUST be int, not str. Common error:

# ❌ WRONG:
drawtext(fontsize="24")

# ✅ CORRECT:
drawtext(fontsize=24)

Position Parameters

Parameter	Python Type	Format	Example
`x`	`str` or `int`	Pixel value or expression	`x=10` or `x='(w-tw)/2'`
`y`	`str` or `int`	Pixel value or expression	`y=50` or `y='h-th-20'`

Position Expression Variables:

w: Video width (pixels)
h: Video height (pixels)
tw: Text width (pixels)
th: Text height (pixels)
t: Time in seconds

# Static position (int)
x=100, y=50

# Dynamic position (str expression)
x='(w-tw)/2'  # Centered horizontally
y='h-th-20'   # 20px from bottom

# Time-based animation (str expression)
x='w-mod(t*100,w+tw)'  # Scrolling ticker

Styling Parameters

Parameter	Python Type	Range	Example
`borderw`	`int`	0-20 (practical)	`borderw=3`
`bordercolor`	`str`	Named or hex RGB	`bordercolor='black'`
`shadowx`	`int`	-50 to 50 (practical)	`shadowx=2`
`shadowy`	`int`	-50 to 50 (practical)	`shadowy=2`
`shadowcolor`	`str`	Named or hex RGB	`shadowcolor='black'`
`box`	`int`	0 (off) or 1 (on)	`box=1`
`boxcolor`	`str`	Named/hex + alpha	`boxcolor='black@0.5'`
`boxborderw`	`int`	0-50	`boxborderw=5`

Alpha Transparency in Colors:

# Format: "color@opacity"
# opacity: 0.0 (transparent) to 1.0 (opaque)

boxcolor='black@0.5'      # 50% transparent black
shadowcolor='red@0.3'     # 30% opaque red
fontcolor='white@0.8'     # 80% opaque white

Timing Parameters

Parameter	Python Type	Format	Example
`enable`	`str` (expression)	Boolean expression	`enable='gte(t,2)'`
`alpha`	`str` (expression)	0.0-1.0	`alpha='min(1,t/2)'`

Timing Expressions:

# Show after 2 seconds
enable='gte(t,2)'

# Show between 2-5 seconds
enable='between(t,2,5)'

# Fade in over 1 second
alpha='min(1,t)'

# Fade out over last 2 seconds (10s video)
alpha='if(gt(t,8),1-(t-8)/2,1)'

Section 4: ASS/SSA Subtitle Parameters

ASS Style Definition

from typing import NamedTuple

class ASSStyle(NamedTuple):
    """Type-safe ASS style definition."""
    name: str
    fontname: str
    fontsize: int
    primary_colour: str      # &HAABBGGRR format
    secondary_colour: str    # &HAABBGGRR format
    outline_colour: str      # &HAABBGGRR format
    back_colour: str         # &HAABBGGRR format (shadow)
    bold: int                # 0 or -1 (FFmpeg quirk: -1 for bold)
    italic: int              # 0 or 1
    underline: int           # 0 or 1
    strikeout: int           # 0 or 1
    scale_x: int             # Percentage (100 = normal)
    scale_y: int             # Percentage (100 = normal)
    spacing: int             # Letter spacing in pixels
    angle: float             # Rotation angle in degrees
    border_style: int        # 1 (outline) or 3 (opaque box)
    outline: float           # Outline width (0.0-4.0)
    shadow: float            # Shadow distance (0.0-4.0)
    alignment: int           # Numpad alignment (1-9)
    margin_l: int            # Left margin (pixels)
    margin_r: int            # Right margin (pixels)
    margin_v: int            # Vertical margin (pixels)
    encoding: int            # Character encoding (1=UTF-8)

def ass_style_to_string(style: ASSStyle) -> str:
    """
    Convert ASSStyle to ASS format string.

    Returns:
        ASS Style line
    """
    return (
        f"Style: {style.name},"
        f"{style.fontname},{style.fontsize},"
        f"{style.primary_colour},{style.secondary_colour},"
        f"{style.outline_colour},{style.back_colour},"
        f"{style.bold},{style.italic},{style.underline},{style.strikeout},"
        f"{style.scale_x},{style.scale_y},{style.spacing},{style.angle},"
        f"{style.border_style},{style.outline},{style.shadow},"
        f"{style.alignment},"
        f"{style.margin_l},{style.margin_r},{style.margin_v},"
        f"{style.encoding}"
    )

# Example usage
karaoke_style = ASSStyle(
    name="Karaoke",
    fontname="Arial Black",
    fontsize=72,
    primary_colour="&H00FFFFFF",  # White text (unhighlighted)
    secondary_colour="&H000000FF",  # Red text (highlighted)
    outline_colour="&H00000000",  # Black outline
    back_colour="&H80000000",     # 50% transparent black shadow
    bold=-1,                      # Bold (FFmpeg uses -1)
    italic=0,
    underline=0,
    strikeout=0,
    scale_x=100,
    scale_y=100,
    spacing=0,
    angle=0.0,
    border_style=1,               # Outline + shadow
    outline=3.0,                  # 3px outline
    shadow=2.0,                   # 2px shadow
    alignment=2,                  # Bottom center
    margin_l=10,
    margin_r=10,
    margin_v=50,                  # 50px from bottom
    encoding=1                    # UTF-8
)

print(ass_style_to_string(karaoke_style))

ASS Parameter Ranges and Types

Parameter	Python Type	Range	Unit	Notes
`fontsize`	`int`	1-999	Points	Screen-relative
`primary_colour`	`str`	&H00000000 - &HFFFFFFFF	ABGR hex	Text color
`secondary_colour`	`str`	&H00000000 - &HFFFFFFFF	ABGR hex	Karaoke fill
`outline_colour`	`str`	&H00000000 - &HFFFFFFFF	ABGR hex	Border color
`back_colour`	`str`	&H00000000 - &HFFFFFFFF	ABGR hex	Shadow color
`bold`	`int`	-1 (on), 0 (off)	Boolean	FFmpeg quirk: -1 for bold
`italic`	`int`	0, 1	Boolean	Standard
`scale_x`, `scale_y`	`int`	1-1000	Percentage	100 = normal
`outline`	`float`	0.0-4.0	Pixels	Border width
`shadow`	`float`	0.0-4.0	Pixels	Shadow offset
`alignment`	`int`	1-9	Numpad	See alignment chart

ASS Alignment (Numpad Layout)

7 (top-left)      8 (top-center)      9 (top-right)
4 (middle-left)   5 (middle-center)   6 (middle-right)
1 (bottom-left)   2 (bottom-center)   3 (bottom-right)

ASS_ALIGNMENT = {
    "bottom_left": 1,
    "bottom_center": 2,
    "bottom_right": 3,
    "middle_left": 4,
    "middle_center": 5,
    "middle_right": 6,
    "top_left": 7,
    "top_center": 8,
    "top_right": 9,
}

Section 5: ASS Karaoke Tags

Karaoke Tag Reference

Tag	Name	Unit	Python Type	Range	Effect
`\k`	Karaoke	Centiseconds	`int`	0-9999	Instant highlight
`\kf` / `\K`	Karaoke Fill	Centiseconds	`int`	0-9999	Progressive fill
`\ko`	Karaoke Outline	Centiseconds	`int`	0-9999	Outline sweep

def generate_karaoke_line(
    words: list[str],
    durations: list[float],  # In SECONDS
    style: str = "Karaoke"
) -> str:
    """
    Generate ASS karaoke dialogue line.

    Args:
        words: List of words/syllables
        durations: Duration for each word IN SECONDS
        style: ASS style name

    Returns:
        ASS dialogue line with karaoke tags

    Example:
        >>> generate_karaoke_line(
        ...     ["Hello", "world"],
        ...     [0.5, 0.6]
        ... )
        '{\\k50}Hello {\\k60}world'
    """
    # Convert seconds to centiseconds
    karaoke_tags = []
    for word, duration_sec in zip(words, durations):
        cs = int(duration_sec * 100)  # Centiseconds
        karaoke_tags.append(f"{{\\k{cs}}}{word}")

    return " ".join(karaoke_tags)

# Example usage
words = ["Never", "gonna", "give", "you", "up"]
durations = [0.8, 0.6, 0.6, 0.5, 0.7]  # seconds

karaoke_text = generate_karaoke_line(words, durations)
print(karaoke_text)
# Output: {\k80}Never {\k60}gonna {\k60}give {\k50}you {\k70}up

Section 6: ASS Animation Tags

Animation Tag Reference

Tag	Format	Unit	Example	Description
`\t`	`\t(t1,t2,tags)`	Milliseconds	`\t(0,500,\fscx120)`	Animate over time
`\fad`	`\fad(in,out)`	Milliseconds	`\fad(300,200)`	Fade in/out
`\move`	`\move(x1,y1,x2,y2,t1,t2)`	Milliseconds	`\move(0,0,100,100,0,1000)`	Move position
`\fscx`, `\fscy`	`\fscxN`, `\fscyN`	Percentage	`\fscx120\fscy120`	Scale X/Y
`\frz`	`\frzN`	Degrees	`\frz45`	Rotate (Z-axis)
`\c`, `\1c`	`\c&HBBGGRR&`	ABGR hex	`\c&HFF0000&`	Primary color
`\3c`	`\3c&HBBGGRR&`	ABGR hex	`\3c&H000000&`	Outline color
`\4c`	`\4c&HBBGGRR&`	ABGR hex	`\4c&H808080&`	Shadow color

from typing import List, Tuple

def create_scale_animation(
    duration_ms: int,
    start_scale: int = 80,
    peak_scale: int = 120,
    end_scale: int = 100
) -> str:
    """
    Create bounce scale animation (pop effect).

    Args:
        duration_ms: Total animation duration in MILLISECONDS
        start_scale: Initial scale percentage
        peak_scale: Peak scale (overshoot)
        end_scale: Final settled scale

    Returns:
        ASS animation tags

    Example:
        >>> create_scale_animation(400)
        '\\fscx80\\fscy80\\t(0,150,\\fscx120\\fscy120)\\t(150,300,\\fscx95\\fscy95)\\t(300,400,\\fscx100\\fscy100)'
    """
    t1 = int(duration_ms * 0.375)  # 37.5% to peak
    t2 = int(duration_ms * 0.75)   # 75% to settle
    t3 = duration_ms

    mid_scale = int((peak_scale + end_scale) / 2) - 5

    return (
        f"\\fscx{start_scale}\\fscy{start_scale}"
        f"\\t(0,{t1},\\fscx{peak_scale}\\fscy{peak_scale})"
        f"\\t({t1},{t2},\\fscx{mid_scale}\\fscy{mid_scale})"
        f"\\t({t2},{t3},\\fscx{end_scale}\\fscy{end_scale})"
    )

def create_fade_animation(
    fade_in_ms: int,
    fade_out_ms: int = 0
) -> str:
    """
    Create fade in/out animation.

    Args:
        fade_in_ms: Fade in duration in MILLISECONDS
        fade_out_ms: Fade out duration in MILLISECONDS (0 = no fade out)

    Returns:
        ASS fade tag

    Example:
        >>> create_fade_animation(300, 200)
        '\\fad(300,200)'
    """
    return f"\\fad({fade_in_ms},{fade_out_ms})"

def create_color_transition(
    start_color_rgb: Tuple[int, int, int],
    end_color_rgb: Tuple[int, int, int],
    duration_ms: int
) -> str:
    """
    Create smooth color transition animation.

    Args:
        start_color_rgb: Starting RGB color
        end_color_rgb: Ending RGB color
        duration_ms: Transition duration in MILLISECONDS

    Returns:
        ASS animation tags

    Example:
        >>> create_color_transition((255,255,255), (255,0,0), 1000)
        '\\c&H00FFFFFF&\\t(0,1000,\\c&H000000FF&)'
    """
    start_ass = rgb_to_ass_color(*start_color_rgb)[:-2] + "&"  # Remove last 2 chars, add &
    end_ass = rgb_to_ass_color(*end_color_rgb)[:-2] + "&"

    return f"\\c{start_ass}\\t(0,{duration_ms},\\c{end_ass})"

# Example: Complete animated karaoke line
def create_animated_karaoke_word(
    word: str,
    karaoke_duration_sec: float,
    pop_animation: bool = True
) -> str:
    """
    Create word with karaoke + pop animation.

    Args:
        word: Word text
        karaoke_duration_sec: Karaoke fill duration in SECONDS
        pop_animation: Add scale pop effect

    Returns:
        ASS karaoke word with animations
    """
    karaoke_cs = int(karaoke_duration_sec * 100)  # Centiseconds
    karaoke_ms = int(karaoke_duration_sec * 1000)  # Milliseconds

    if pop_animation:
        pop = create_scale_animation(karaoke_ms, 90, 115, 100)
        return f"{{\\k{karaoke_cs}{pop}}}{word}"
    else:
        return f"{{\\k{karaoke_cs}}}{word}"

# Usage
animated_line = " ".join([
    create_animated_karaoke_word("Never", 0.8, True),
    create_animated_karaoke_word("gonna", 0.6, True),
    create_animated_karaoke_word("give", 0.6, True),
    create_animated_karaoke_word("you", 0.5, True),
    create_animated_karaoke_word("up", 0.7, True),
])

Section 7: ffmpeg-python Library Integration

Type-Safe Filter Application

import ffmpeg
from typing import Optional, Union

def apply_drawtext_filter(
    input_stream,
    text: str,
    fontsize: int,
    fontcolor: str = "white",
    x: Union[str, int] = 10,
    y: Union[str, int] = 10,
    fontfile: Optional[str] = None,
    borderw: int = 0,
    bordercolor: str = "black",
    shadowx: int = 0,
    shadowy: int = 0,
    shadowcolor: str = "black",
    box: int = 0,
    boxcolor: str = "black@0.5",
    boxborderw: int = 0,
    enable: Optional[str] = None,
    alpha: Optional[str] = None
):
    """
    Apply drawtext filter with type safety.

    Args:
        input_stream: ffmpeg input stream
        text: Text to display
        fontsize: Font size in points (int, 1-999)
        fontcolor: Color name or hex string
        x: X position (int or expression string)
        y: Y position (int or expression string)
        fontfile: Path to font file (optional)
        borderw: Border width (int, 0-20)
        bordercolor: Border color
        shadowx: Shadow X offset (int)
        shadowy: Shadow Y offset (int)
        shadowcolor: Shadow color
        box: Enable background box (0 or 1)
        boxcolor: Box color with alpha (e.g., "black@0.5")
        boxborderw: Box border width
        enable: Enable expression (e.g., "gte(t,2)")
        alpha: Alpha expression (e.g., "min(1,t)")

    Returns:
        ffmpeg stream with drawtext filter applied

    Raises:
        TypeError: If parameters have incorrect types
        ValueError: If parameters are out of valid range
    """
    # Type validation
    if not isinstance(fontsize, int):
        raise TypeError(f"fontsize must be int, got {type(fontsize).__name__}")

    if not (1 <= fontsize <= 999):
        raise ValueError(f"fontsize must be 1-999, got {fontsize}")

    if not isinstance(borderw, int) or borderw < 0:
        raise ValueError(f"borderw must be non-negative int, got {borderw}")

    if not isinstance(box, int) or box not in (0, 1):
        raise ValueError(f"box must be 0 or 1, got {box}")

    # Build filter parameters
    params = {
        "text": text,
        "fontsize": fontsize,
        "fontcolor": fontcolor,
        "x": x,
        "y": y,
        "borderw": borderw,
        "bordercolor": bordercolor,
        "box": box,
    }

    # Optional parameters
    if fontfile:
        params["fontfile"] = fontfile

    if shadowx != 0:
        params["shadowx"] = shadowx

    if shadowy != 0:
        params["shadowy"] = shadowy
        params["shadowcolor"] = shadowcolor

    if box == 1:
        params["boxcolor"] = boxcolor
        if boxborderw > 0:
            params["boxborderw"] = boxborderw

    if enable:
        params["enable"] = enable

    if alpha:
        params["alpha"] = alpha

    return input_stream.drawtext(**params)

# Example usage
input_file = ffmpeg.input("input.mp4")
output = apply_drawtext_filter(
    input_file,
    text="Hello World",
    fontsize=48,
    fontcolor="white",
    x="(w-tw)/2",
    y="(h-th)/2",
    borderw=2,
    bordercolor="black",
    shadowx=2,
    shadowy=2,
    enable="between(t,1,5)"
)
output = ffmpeg.output(output, "output.mp4")
ffmpeg.run(output)

Complete Audio/Video Filter Example

import ffmpeg
from pathlib import Path

def add_subtitles_with_audio(
    input_video: str,
    output_video: str,
    subtitle_text: str,
    fontsize: int = 48,
    crf: int = 18,
    audio_codec: str = "aac",
    audio_bitrate: str = "192k"
):
    """
    Add burned-in subtitles while preserving audio.

    CRITICAL: Always explicitly handle audio stream to prevent loss.

    Args:
        input_video: Input video path
        output_video: Output video path
        subtitle_text: Text to display
        fontsize: Font size (int)
        crf: Constant Rate Factor for H.264 (int, 0-51)
        audio_codec: Audio codec (default: "aac")
        audio_bitrate: Audio bitrate (default: "192k")
    """
    # Input
    input_file = ffmpeg.input(input_video)

    # Video processing
    video = input_file.video.drawtext(
        text=subtitle_text,
        fontsize=fontsize,
        fontcolor="white",
        x="(w-tw)/2",
        y="h-th-50",
        borderw=2,
        bordercolor="black"
    )

    # Audio passthrough (CRITICAL)
    audio = input_file.audio

    # Output with both streams
    output = ffmpeg.output(
        video,
        audio,
        output_video,
        vcodec="libx264",
        crf=crf,
        acodec=audio_codec,
        audio_bitrate=audio_bitrate
    )

    # Run
    output = output.overwrite_output()
    ffmpeg.run(output)

Section 8: Subprocess Pattern with Pipes

Frame-by-Frame Processing

import subprocess
import numpy as np
from typing import Generator, Tuple

def read_video_frames(
    input_path: str,
    width: int,
    height: int,
    pix_fmt: str = "rgb24"
) -> Generator[np.ndarray, None, None]:
    """
    Read video frames using FFmpeg subprocess.

    Args:
        input_path: Input video file path
        width: Frame width
        height: Frame height
        pix_fmt: Pixel format ("rgb24" or "bgr24")

    Yields:
        NumPy array frames (height, width, 3)

    Example:
        >>> for frame in read_video_frames("input.mp4", 1920, 1080):
        ...     # Process frame (RGB format)
        ...     processed = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    """
    # FFmpeg command
    cmd = [
        "ffmpeg",
        "-i", input_path,
        "-f", "rawvideo",
        "-pix_fmt", pix_fmt,
        "-"  # Output to stdout
    ]

    process = subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.DEVNULL,
        bufsize=10**8
    )

    frame_size = width * height * 3

    try:
        while True:
            raw_frame = process.stdout.read(frame_size)
            if len(raw_frame) != frame_size:
                break

            # Convert to NumPy array
            frame = np.frombuffer(raw_frame, dtype=np.uint8)
            frame = frame.reshape((height, width, 3))

            yield frame
    finally:
        process.stdout.close()
        process.wait()

def write_video_frames(
    output_path: str,
    width: int,
    height: int,
    fps: float = 30.0,
    pix_fmt: str = "rgb24",
    crf: int = 18
) -> subprocess.Popen:
    """
    Create FFmpeg process for writing frames.

    Args:
        output_path: Output video file path
        width: Frame width
        height: Frame height
        fps: Frames per second
        pix_fmt: Pixel format ("rgb24" or "bgr24")
        crf: Constant Rate Factor (0-51)

    Returns:
        subprocess.Popen instance (write frames to .stdin)

    Example:
        >>> writer = write_video_frames("output.mp4", 1920, 1080, 30)
        >>> for frame in frames:
        ...     writer.stdin.write(frame.tobytes())
        >>> writer.stdin.close()
        >>> writer.wait()
    """
    cmd = [
        "ffmpeg",
        "-y",  # Overwrite output
        "-f", "rawvideo",
        "-vcodec", "rawvideo",
        "-s", f"{width}x{height}",
        "-pix_fmt", pix_fmt,
        "-r", str(fps),
        "-i", "-",  # Read from stdin
        "-c:v", "libx264",
        "-preset", "fast",
        "-crf", str(crf),
        "-pix_fmt", "yuv420p",
        output_path
    ]

    process = subprocess.Popen(
        cmd,
        stdin=subprocess.PIPE,
        stderr=subprocess.DEVNULL
    )

    return process

# Complete pipeline example
def process_video_frames(
    input_path: str,
    output_path: str,
    process_fn: callable
):
    """
    Read, process, and write video frames.

    Args:
        input_path: Input video
        output_path: Output video
        process_fn: Function to process each frame
    """
    # Get video dimensions
    import cv2
    cap = cv2.VideoCapture(input_path)
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = cap.get(cv2.CAP_PROP_FPS)
    cap.release()

    # Create writer
    writer = write_video_frames(output_path, width, height, fps)

    try:
        # Process frames
        for frame in read_video_frames(input_path, width, height):
            processed = process_fn(frame)
            writer.stdin.write(processed.tobytes())
    finally:
        writer.stdin.close()
        writer.wait()

Section 9: Common Pitfalls and Solutions

Pitfall 1: String vs Int for Numeric Parameters

# ❌ WRONG - passing string for int parameter
import ffmpeg
ffmpeg.input("input.mp4").drawtext(
    text="Hello",
    fontsize="24"  # ❌ Should be int
).output("output.mp4")

# ✅ CORRECT
ffmpeg.input("input.mp4").drawtext(
    text="Hello",
    fontsize=24  # ✅ int
).output("output.mp4")

Pitfall 2: RGB vs BGR Color Order

# ❌ WRONG - using RGB for ASS color
def wrong_ass_color(r: int, g: int, b: int) -> str:
    return f"&H00{r:02X}{g:02X}{b:02X}"  # ❌ RGB order

# ✅ CORRECT - BGR order for ASS
def correct_ass_color(r: int, g: int, b: int) -> str:
    return f"&H00{b:02X}{g:02X}{r:02X}"  # ✅ BGR order

# Example: Pure red
print(wrong_ass_color(255, 0, 0))   # ❌ "&H00FF0000" = Blue in ASS!
print(correct_ass_color(255, 0, 0))  # ✅ "&H000000FF" = Red in ASS

Pitfall 3: Centiseconds vs Milliseconds Confusion

# ❌ WRONG - mixing units
def wrong_karaoke_timing():
    # Karaoke tag uses centiseconds
    # Animation uses milliseconds - DIFFERENT!
    return r"{\k100\t(0,100,\fscx120)}Word"  # ❌ Mismatch!
    # \k100 = 1 second
    # \t(0,100,...) = 0.1 seconds (100ms)

# ✅ CORRECT - consistent timing
def correct_karaoke_timing(duration_sec: float):
    cs = int(duration_sec * 100)   # Karaoke: centiseconds
    ms = int(duration_sec * 1000)  # Animation: milliseconds
    return rf"{{\k{cs}\t(0,{ms},\fscx120)}}Word"

print(correct_karaoke_timing(1.0))
# Output: {\k100\t(0,1000,\fscx120)}Word
# Both tags now represent 1 second ✅

Pitfall 4: Forgetting Quotes on Expressions

# ❌ WRONG - expression without quotes
import ffmpeg
ffmpeg.input("input.mp4").drawtext(
    text="Hello",
    x=(w-tw)/2  # ❌ Python evaluates this, causes NameError
)

# ✅ CORRECT - expression as string
ffmpeg.input("input.mp4").drawtext(
    text="Hello",
    x="(w-tw)/2"  # ✅ FFmpeg evaluates this at runtime
)

Pitfall 5: Audio Stream Loss

# ❌ WRONG - audio is silently dropped
import ffmpeg
input_file = ffmpeg.input("input.mp4")
(
    input_file
    .filter("scale", 1280, 720)  # ❌ Only video stream
    .output("output.mp4")
    .run()
)

# ✅ CORRECT - explicitly handle audio
input_file = ffmpeg.input("input.mp4")
video = input_file.video.filter("scale", 1280, 720)
audio = input_file.audio  # ✅ Preserve audio
ffmpeg.output(video, audio, "output.mp4").run()

Pitfall 6: Incorrect ASS Bold Value

# ❌ WRONG - using 1 for bold (works in some renderers, not all)
ass_style = f"Style: Default,Arial,48,&H00FFFFFF,...,1,..."  # ❌ May not work

# ✅ CORRECT - FFmpeg expects -1 for bold
ass_style = f"Style: Default,Arial,48,&H00FFFFFF,...,-1,..."  # ✅ Proper bold

Section 10: Validation Helpers

from typing import Union
import re

def validate_fontsize(size: int) -> int:
    """Validate fontsize parameter."""
    if not isinstance(size, int):
        raise TypeError(f"fontsize must be int, got {type(size).__name__}")
    if not (1 <= size <= 999):
        raise ValueError(f"fontsize must be 1-999, got {size}")
    return size

def validate_crf(crf: int, codec: str = "h264") -> int:
    """Validate CRF parameter."""
    if not isinstance(crf, int):
        raise TypeError(f"crf must be int, got {type(crf).__name__}")

    ranges = {
        "h264": (0, 51),
        "h265": (0, 51),
        "hevc": (0, 51),
        "vp9": (0, 63),
        "av1": (0, 63),
    }

    min_crf, max_crf = ranges.get(codec, (0, 51))
    if not (min_crf <= crf <= max_crf):
        raise ValueError(f"crf for {codec} must be {min_crf}-{max_crf}, got {crf}")

    return crf

def validate_ass_color(color: str) -> str:
    """Validate ASS color format."""
    pattern = r"^&H[0-9A-Fa-f]{8}$"
    if not re.match(pattern, color):
        raise ValueError(f"Invalid ASS color format: {color} (expected &HAABBGGRR)")
    return color

def validate_alignment(alignment: int) -> int:
    """Validate ASS alignment (1-9 numpad)."""
    if not isinstance(alignment, int):
        raise TypeError(f"alignment must be int, got {type(alignment).__name__}")
    if not (1 <= alignment <= 9):
        raise ValueError(f"alignment must be 1-9, got {alignment}")
    return alignment

def validate_ffmpeg_color(color: str) -> str:
    """Validate FFmpeg color (named or hex)."""
    named_colors = {
        "white", "black", "red", "green", "blue",
        "yellow", "cyan", "magenta", "orange", "purple"
    }

    if color.lower() in named_colors:
        return color

    # Validate hex format
    hex_pattern = r"^(#|0x)?[0-9A-Fa-f]{6}$"
    if re.match(hex_pattern, color):
        return color

    raise ValueError(f"Invalid FFmpeg color: {color}")

def validate_time_expression(expr: str) -> str:
    """Validate FFmpeg time expression syntax."""
    # Basic validation - check for common mistakes
    if not isinstance(expr, str):
        raise TypeError(f"Time expression must be str, got {type(expr).__name__}")

    # Check for unquoted expressions in Python (common mistake)
    if "w" in expr or "h" in expr or "tw" in expr or "th" in expr:
        # Likely an expression - ensure it's a string
        if not isinstance(expr, str):
            raise TypeError("FFmpeg expressions must be strings")

    return expr

Section 11: Complete Working Examples

Example 1: Type-Safe Karaoke Generator

import ffmpeg
from typing import List, Tuple
from pathlib import Path

class KaraokeGenerator:
    """Type-safe karaoke subtitle generator."""

    def __init__(
        self,
        video_path: str,
        output_path: str,
        font_size: int = 72,
        font_name: str = "Arial Black",
        text_color_rgb: Tuple[int, int, int] = (255, 255, 255),
        highlight_color_rgb: Tuple[int, int, int] = (255, 0, 0)
    ):
        self.video_path = video_path
        self.output_path = output_path
        self.font_size = validate_fontsize(font_size)
        self.font_name = font_name
        self.text_color = rgb_to_ass_color(*text_color_rgb)
        self.highlight_color = rgb_to_ass_color(*highlight_color_rgb)

        self.lyrics: List[Tuple[float, List[Tuple[str, float]]]] = []

    def add_line(
        self,
        start_time: float,
        words: List[str],
        durations: List[float]
    ):
        """
        Add karaoke line.

        Args:
            start_time: Line start time in SECONDS
            words: List of words
            durations: Duration for each word in SECONDS
        """
        if len(words) != len(durations):
            raise ValueError("words and durations must have same length")

        self.lyrics.append((start_time, list(zip(words, durations))))

    def generate_ass(self) -> str:
        """Generate complete ASS subtitle file."""
        lines = [
            "[Script Info]",
            "Title: Karaoke Subtitles",
            "ScriptType: v4.00+",
            "PlayResX: 1920",
            "PlayResY: 1080",
            "",
            "[V4+ Styles]",
            "Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, "
            "OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, "
            "ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, "
            "Alignment, MarginL, MarginR, MarginV, Encoding",
            (
                f"Style: Karaoke,{self.font_name},{self.font_size},"
                f"{self.text_color},{self.highlight_color},"
                "&H00000000,&H80000000,"
                "-1,0,0,0,100,100,0,0,1,3,2,2,10,10,50,1"
            ),
            "",
            "[Events]",
            "Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text"
        ]

        # Generate dialogue lines
        for start_time, word_durations in self.lyrics:
            # Calculate end time
            total_duration = sum(dur for _, dur in word_durations)
            end_time = start_time + total_duration

            # Generate karaoke tags (centiseconds)
            karaoke_text = ""
            for word, duration_sec in word_durations:
                cs = seconds_to_centiseconds(duration_sec)
                karaoke_text += f"{{\\k{cs}}}{word} "

            karaoke_text = karaoke_text.strip()

            # Format timestamps
            start_str = format_ass_timestamp(start_time)
            end_str = format_ass_timestamp(end_time)

            dialogue = f"Dialogue: 0,{start_str},{end_str},Karaoke,,0,0,0,,{karaoke_text}"
            lines.append(dialogue)

        return "\n".join(lines)

    def render(self, crf: int = 18):
        """Render video with karaoke subtitles."""
        # Generate ASS file
        ass_content = self.generate_ass()
        ass_path = Path(self.output_path).with_suffix(".ass")
        ass_path.write_text(ass_content, encoding="utf-8")

        # Apply subtitles with ffmpeg-python
        input_file = ffmpeg.input(self.video_path)
        video = input_file.video.filter("ass", str(ass_path))
        audio = input_file.audio

        output = ffmpeg.output(
            video,
            audio,
            self.output_path,
            vcodec="libx264",
            crf=validate_crf(crf),
            acodec="aac"
        )

        ffmpeg.run(output.overwrite_output())

        print(f"✅ Karaoke video saved: {self.output_path}")

# Usage example
karaoke = KaraokeGenerator(
    "input.mp4",
    "karaoke_output.mp4",
    font_size=80,
    text_color_rgb=(255, 255, 255),  # White
    highlight_color_rgb=(255, 0, 0)   # Red
)

# Add lyrics
karaoke.add_line(
    start_time=1.0,
    words=["Never", "gonna", "give", "you", "up"],
    durations=[0.8, 0.6, 0.6, 0.5, 0.7]
)

karaoke.add_line(
    start_time=4.3,
    words=["Never", "gonna", "let", "you", "down"],
    durations=[0.8, 0.6, 0.6, 0.5, 0.7]
)

# Render
karaoke.render(crf=18)

Example 2: Dynamic Text Overlay with Type Safety

import ffmpeg
from typing import Optional

class TextOverlay:
    """Type-safe text overlay builder."""

    def __init__(self, text: str):
        self.text = text
        self.fontsize: int = 48
        self.fontcolor: str = "white"
        self.x: str = "10"
        self.y: str = "10"
        self.borderw: int = 0
        self.bordercolor: str = "black"
        self.shadowx: int = 0
        self.shadowy: int = 0
        self.enable: Optional[str] = None
        self.alpha: Optional[str] = None

    def set_size(self, size: int) -> 'TextOverlay':
        """Set font size (fluent API)."""
        self.fontsize = validate_fontsize(size)
        return self

    def set_color(self, color: str) -> 'TextOverlay':
        """Set font color (fluent API)."""
        self.fontcolor = validate_ffmpeg_color(color)
        return self

    def center(self) -> 'TextOverlay':
        """Center text horizontally and vertically."""
        self.x = "(w-tw)/2"
        self.y = "(h-th)/2"
        return self

    def set_border(self, width: int, color: str = "black") -> 'TextOverlay':
        """Add text border."""
        if not isinstance(width, int) or width < 0:
            raise ValueError(f"Border width must be non-negative int")
        self.borderw = width
        self.bordercolor = validate_ffmpeg_color(color)
        return self

    def set_shadow(self, x: int, y: int, color: str = "black") -> 'TextOverlay':
        """Add text shadow."""
        self.shadowx = x
        self.shadowy = y
        return self

    def fade_in(self, duration: float) -> 'TextOverlay':
        """Add fade-in effect."""
        self.alpha = f"min(1,t/{duration})"
        return self

    def show_between(self, start: float, end: float) -> 'TextOverlay':
        """Show text between specific times."""
        self.enable = f"between(t,{start},{end})"
        return self

    def apply(self, stream) -> ffmpeg.Stream:
        """Apply text overlay to ffmpeg stream."""
        params = {
            "text": self.text,
            "fontsize": self.fontsize,
            "fontcolor": self.fontcolor,
            "x": self.x,
            "y": self.y,
        }

        if self.borderw > 0:
            params["borderw"] = self.borderw
            params["bordercolor"] = self.bordercolor

        if self.shadowx != 0 or self.shadowy != 0:
            params["shadowx"] = self.shadowx
            params["shadowy"] = self.shadowy

        if self.enable:
            params["enable"] = self.enable

        if self.alpha:
            params["alpha"] = self.alpha

        return stream.drawtext(**params)

# Usage with fluent API
input_file = ffmpeg.input("input.mp4")

overlay = (
    TextOverlay("Hello World")
    .set_size(72)
    .set_color("white")
    .center()
    .set_border(3, "black")
    .set_shadow(2, 2, "black")
    .fade_in(1.0)
    .show_between(1.0, 5.0)
)

output = overlay.apply(input_file)
output = ffmpeg.output(output, "output.mp4", vcodec="libx264", crf=18)
ffmpeg.run(output.overwrite_output())

Summary: Type Safety Checklist

When integrating FFmpeg with Python:

✅ Always Use Correct Types

fontsize → int (not str)
crf → int (not float)
b:v, audio_bitrate → str with unit ("5M", "192k")
Color values → str (named or hex)
Position expressions → str (quoted)
Time values → float for seconds, int for frames

✅ Verify Unit Conversions

ASS karaoke tags: seconds × 100 = centiseconds
ASS animation tags: seconds × 1000 = milliseconds
FFmpeg filters: use seconds directly

✅ Check Color Format

FFmpeg drawtext: RGB hex or named
ASS colors: BGR format (&HAABBGGRR)
OpenCV: BGR array order

✅ Handle Audio Streams

Always explicitly preserve audio with .audio
Don't rely on automatic passthrough

✅ Validate Ranges

CRF: 0-51 (H.264/H.265), 0-63 (VP9/AV1)
Fontsize: 1-999 (practical: 12-200)
Alignment: 1-9 (numpad layout)

This reference ensures type-safe, bug-free Python-FFmpeg integration with accurate parameter mappings and proper unit conversions.

ffmpeg-python-integration-reference