Announce: "I'm using dev-test-linux for Linux desktop automation."

Tool Availability Gate
When to Use Linux Automation
Detect Display Server
Wayland: ydotool
X11: xdotool
Screenshots
D-Bus Control
Accessibility (AT-SPI)
Complete E2E Examples

Linux Desktop Automation

<EXTREMELY-IMPORTANT> ## Tool Availability Gate

Verify automation tools are installed before proceeding.

# Detect display server
echo $XDG_SESSION_TYPE  # "wayland" or "x11"

# Wayland tools
which ydotool || echo "MISSING: ydotool"
which wtype || echo "MISSING: wtype"
which grim || echo "MISSING: grim"
which slurp || echo "MISSING: slurp"

# X11 tools
which xdotool || echo "MISSING: xdotool"
which xclip || echo "MISSING: xclip"
which scrot || echo "MISSING: scrot"

# D-Bus
which dbus-send || echo "MISSING: dbus-send"

If missing (Wayland):

STOP: Cannot proceed with Wayland automation.

Missing tools for Wayland E2E testing.

Install with:
  # Arch
  sudo pacman -S ydotool wtype grim slurp

  # Debian/Ubuntu
  sudo apt install ydotool wtype grim slurp

  # Nix
  nix-env -iA nixpkgs.ydotool nixpkgs.wtype nixpkgs.grim nixpkgs.slurp

Start ydotool daemon:
  sudo systemctl enable --now ydotool
  # Or for user service:
  systemctl --user enable --now ydotool

Reply when installed and I'll continue testing.

This gate is non-negotiable. Missing tools = full stop. </EXTREMELY-IMPORTANT>

<EXTREMELY-IMPORTANT> ## When to Use Linux Automation

USE Linux automation (ydotool/xdotool) when you need:

Linux native application automation
GTK/Qt application testing
System-wide keyboard/mouse control
Window management testing
D-Bus service interaction
Accessibility testing (AT-SPI)

DO NOT use Linux automation when:

Testing web applications (use Chrome MCP or Playwright)
macOS desktop automation (use dev-test-hammerspoon)
Cross-platform testing needed

For web testing, use:

Skill(skill="workflows:dev-test-chrome") - debugging
Skill(skill="workflows:dev-test-playwright") - CI/CD

Rationalization Prevention

Thought	Reality
"I can test the app manually"	AUTOMATE IT with ydotool/xdotool
"Web testing tools work for desktop apps"	NO. Use native Linux tools
"ydotool daemon is hard to set up"	One-time setup. Do it.
"X11 is deprecated, skip xdotool"	Many systems still use X11. Support both.
"D-Bus is too complex"	D-Bus gives precise control. Learn it.

Display Server Detection

if [ "$XDG_SESSION_TYPE" = "wayland" ]; then
    # Use ydotool, wtype, grim
else
    # Use xdotool, xclip, scrot
fi

Always detect display server before choosing tools. </EXTREMELY-IMPORTANT>

Detect Display Server

# Check display server type
if [ "$XDG_SESSION_TYPE" = "wayland" ]; then
    echo "Using Wayland tools (ydotool, wtype, grim)"
else
    echo "Using X11 tools (xdotool, xclip, scrot)"
fi

Wayland: ydotool

Requires ydotoold daemon running.

Keyboard Input

# Type text
ydotool type "hello world"

# Type with delay between keys (microseconds)
ydotool type --delay 50 "slow typing"

# Key press (using keycodes)
# Format: keycode:state (1=down, 0=up)
ydotool key 28:1 28:0           # Enter
ydotool key 1:1 1:0             # Escape
ydotool key 29:1 46:1 46:0 29:0 # Ctrl+C (29=Ctrl, 46=C)
ydotool key 29:1 47:1 47:0 29:0 # Ctrl+V (47=V)
ydotool key 56:1 15:1 15:0 56:0 # Alt+Tab

# Common keycodes:
# 1=Escape, 14=Backspace, 15=Tab, 28=Enter, 29=Ctrl, 42=LShift
# 56=Alt, 57=Space, 100=RightAlt, 125=Super/Win

Alternative: wtype (Wayland-native)

# Type text
wtype "hello world"

# Key press with modifiers
wtype -M ctrl -k c  # Ctrl+C
wtype -M ctrl -M shift -k s  # Ctrl+Shift+S
wtype -k Return     # Enter
wtype -k Escape     # Escape

# Modifier keys: -M (press and hold)
# Available: shift, ctrl, alt, logo (super)

Mouse Input

# Move mouse to absolute position
ydotool mousemove --absolute 100 200

# Relative move
ydotool mousemove 50 -30

# Click (button: 1=left, 2=middle, 3=right)
ydotool click 1        # Left click
ydotool click 3        # Right click

# Double click
ydotool click 1 1

# Click at specific position
ydotool mousemove --absolute 500 300 && ydotool click 1

# Drag
ydotool mousemove --absolute 100 100
ydotool mousedown 1
ydotool mousemove --absolute 200 200
ydotool mouseup 1

X11: xdotool

Keyboard Input

# Type text
xdotool type "hello world"

# Key press
xdotool key Return
xdotool key Escape
xdotool key ctrl+c
xdotool key ctrl+shift+s
xdotool key alt+Tab
xdotool key super+d

# Key with delay
xdotool type --delay 50 "slow typing"

# Hold key
xdotool keydown ctrl
xdotool key c
xdotool keyup ctrl

Mouse Input

# Move mouse
xdotool mousemove 100 200           # Absolute
xdotool mousemove --relative 50 30  # Relative

# Click
xdotool click 1   # Left
xdotool click 2   # Middle
xdotool click 3   # Right

# Double click
xdotool click --repeat 2 1

# Click at position
xdotool mousemove 500 300 click 1

# Drag
xdotool mousemove 100 100 mousedown 1 mousemove 200 200 mouseup 1

Window Control (X11)

# Get active window ID
xdotool getactivewindow

# Focus window by name
xdotool search --name "Firefox" windowactivate

# Focus window by class
xdotool search --class "firefox" windowactivate

# Get window title
xdotool getactivewindow getwindowname

# Move window
xdotool getactivewindow windowmove 100 100

# Resize window
xdotool getactivewindow windowsize 800 600

# Minimize/Maximize
xdotool getactivewindow windowminimize
xdotool search --name "Firefox" windowactivate --sync

Screenshots

<EXTREMELY-IMPORTANT> ### The Iron Law of Visual Verification

Every E2E test MUST include screenshot evidence.

After completing a workflow, capture a screenshot to prove success. </EXTREMELY-IMPORTANT>

Wayland: grim + slurp

# Full screen (all outputs)
grim /tmp/screenshot.png

# Specific output
grim -o DP-1 /tmp/screen.png

# Interactive region selection
grim -g "$(slurp)" /tmp/region.png

# Specific region (x,y width x height)
grim -g "100,200 800x600" /tmp/region.png

# Specific window (Hyprland)
# Get window geometry first
hyprctl clients -j | jq '.[] | select(.class=="firefox")'
grim -g "X,Y WxH" /tmp/window.png

# Sway: get focused window
grim -g "$(swaymsg -t get_tree | jq -r '.. | select(.focused?) | .rect | "\(.x),\(.y) \(.width)x\(.height)"')" /tmp/window.png

X11: scrot / import

# Full screen
scrot /tmp/screenshot.png

# Active window
scrot -u /tmp/window.png

# Interactive selection
scrot -s /tmp/selection.png

# Delay (seconds)
scrot -d 3 /tmp/delayed.png

# Using ImageMagick import
import -window root /tmp/screenshot.png
import -window "$(xdotool getactivewindow)" /tmp/window.png

Image Comparison

# Compare screenshots (ImageMagick)
compare -metric AE baseline.png current.png diff.png
# Returns number of different pixels

# Threshold comparison
compare -metric AE -fuzz 5% baseline.png current.png diff.png

D-Bus Control

Preferred for apps that expose D-Bus interfaces.

# List available services
dbus-send --session --print-reply --dest=org.freedesktop.DBus \
  /org/freedesktop/DBus org.freedesktop.DBus.ListNames

# Example: Zathura PDF viewer
# Get PID first, then use org.pwmt.zathura.PID-XXXX
dbus-send --print-reply --dest=org.pwmt.zathura.PID-12345 \
  /org/pwmt/zathura org.pwmt.zathura.OpenDocument string:"/path/to/file.pdf"

dbus-send --print-reply --dest=org.pwmt.zathura.PID-12345 \
  /org/pwmt/zathura org.pwmt.zathura.GotoPage uint32:5

# Example: GNOME apps via freedesktop.Application
dbus-send --session --dest=org.gnome.Nautilus \
  /org/gnome/Nautilus org.freedesktop.Application.Open \
  array:string:"file:///home/user" dict:string:string:""

# Introspect available methods
dbus-send --session --print-reply --dest=org.example.App \
  /org/example/App org.freedesktop.DBus.Introspectable.Introspect

Accessibility (AT-SPI)

For UI element discovery and verification.

#!/usr/bin/env python3
import pyatspi

# Find application
desktop = pyatspi.Registry.getDesktop(0)
for app in desktop:
    if "firefox" in app.name.lower():
        print(f"Found: {app.name}")

        # Traverse accessibility tree
        def dump_tree(node, indent=0):
            print("  " * indent + f"{node.getRole()}: {node.name}")
            for child in node:
                dump_tree(child, indent + 1)

        dump_tree(app)

# Find specific element
def find_button(app, name):
    for child in app:
        if child.getRole() == pyatspi.ROLE_PUSH_BUTTON:
            if name.lower() in child.name.lower():
                return child
        found = find_button(child, name)
        if found:
            return found
    return None

button = find_button(app, "Submit")
if button:
    # Click via AT-SPI
    button.queryAction().doAction(0)

Complete E2E Examples

<EXTREMELY-IMPORTANT> ### E2E Test Structure

Every Linux E2E test MUST:

Detect - Check display server (Wayland vs X11)
Launch - Start the application
Wait - Allow app to fully initialize
Interact - Perform user actions
Verify - Check expected state
Screenshot - Capture visual evidence
Cleanup - Close app, restore state </EXTREMELY-IMPORTANT>

Wayland E2E Test

#!/bin/bash
# test_workflow.sh - Wayland E2E test

set -e  # Exit on error

echo "Starting E2E test..."

# 1. Launch app
firefox &
sleep 3

# 2. Navigate to URL
wtype -M ctrl -k l  # Focus address bar
sleep 0.2
wtype "https://example.com"
wtype -k Return
sleep 2

# 3. Take screenshot
grim /tmp/test_before.png

# 4. Interact with page
ydotool mousemove --absolute 500 400
ydotool click 1
sleep 0.5

# 5. Take final screenshot
grim /tmp/test_after.png

# 6. Compare (simple size check)
SIZE_BEFORE=$(stat -c%s /tmp/test_before.png)
SIZE_AFTER=$(stat -c%s /tmp/test_after.png)

if [ "$SIZE_BEFORE" -ne "$SIZE_AFTER" ]; then
    echo "PASS: Screenshots differ (interaction worked)"
else
    echo "WARN: Screenshots identical"
fi

echo "Test complete"

X11 E2E Test

#!/bin/bash
# test_workflow_x11.sh - X11 E2E test

set -e

echo "Starting X11 E2E test..."

# 1. Launch app
gedit &
sleep 2

# 2. Focus window
xdotool search --name "gedit" windowactivate --sync

# 3. Type content
xdotool type "Hello, this is an automated test!"
sleep 0.5

# 4. Select all and copy
xdotool key ctrl+a
xdotool key ctrl+c

# 5. Verify clipboard
CLIPBOARD=$(xclip -selection clipboard -o)
if [[ "$CLIPBOARD" == *"automated test"* ]]; then
    echo "PASS: Clipboard contains expected text"
else
    echo "FAIL: Clipboard mismatch"
    exit 1
fi

# 6. Screenshot
scrot -u /tmp/test_result.png
echo "Screenshot saved"

# 7. Close without saving
xdotool key ctrl+w
sleep 0.5
xdotool key Tab key Return  # Don't save

echo "Test complete"

Output Requirements

Every test run MUST be documented in LEARNINGS.md:

## Linux E2E Test: [Description]

**Display Server:** Wayland / X11

**Tool:** ydotool / xdotool

**Script:**
```bash
./test_workflow.sh

Output:

Starting E2E test...
PASS: Screenshots differ (interaction worked)
Test complete

Result: PASS

Screenshot: /tmp/test_result.png


## Integration

This skill is referenced by `dev-test` for Linux desktop automation.

For TDD protocol, see: `Skill(skill="workflows:dev-tdd")`

dev-test-linux

Contents

Linux Desktop Automation

Rationalization Prevention

Display Server Detection

Detect Display Server

Wayland: ydotool

Keyboard Input

Alternative: wtype (Wayland-native)

Mouse Input

X11: xdotool

Keyboard Input

Mouse Input

Window Control (X11)

Screenshots

Wayland: grim + slurp

X11: scrot / import

Image Comparison

D-Bus Control

Accessibility (AT-SPI)

Complete E2E Examples

Wayland E2E Test

X11 E2E Test

Output Requirements

Similar Skills