SwiftAutoGUI

A Swift library for macOS automation — mouse, keyboard, screenshots, image recognition, and AI-powered agents.
This repository is inspired by pyautogui.
Demo
AI Agent that autonomously observes the screen and executes actions to achieve a goal.
sagui agent "Open Safari and search for Swift"
Requirements
Installation
Swift Package Manager
SwiftAutoGUI is available through Swift Package Manager.
in Package.swift add the following:
dependencies: [
// Dependencies declare other packages that this package depends on.
.package(url: "https://github.com/NakaokaRei/SwiftAutoGUI", branch: "master")
],
targets: [
.target(
name: "MyProject",
dependencies: [..., "SwiftAutoGUI"]
)
...
]
Example Usage
If you would like to know more details, please refer to the DocC Style Document.
AI Agent (Autonomous Loop)
SwiftAutoGUI includes an Agent that can autonomously observe the screen, reason about what it sees, and execute actions in a loop until a goal is achieved. This follows the ReAct (Observe → Think → Act) pattern using a vision-capable LLM.
Basic Usage
import SwiftAutoGUI
let backend = OpenAIVisionBackend(apiKey: "sk-...", model: "gpt-4o")
let agent = Agent(backend: backend, maxIterations: 15)
let result = try await agent.run(goal: "Open Safari and search for Swift")
print("Completed: \(result.completed), Steps: \(result.iterationsUsed)")
With Step Callback
let result = try await agent.run(goal: "Click the Settings icon") { step in
print("Reasoning: \(step.reasoning)")
print("Actions: \(step.actions)")
}
Custom Backend
You can implement the VisionActionGenerating protocol to use any vision-capable LLM:
struct MyBackend: VisionActionGenerating {
var isAvailable: Bool { true }
var unavailableReason: String? { nil }
func generateActions(
goal: String,
screenshot: Data,
screenSize: CGSize,
history: [AgentStep]
) async throws -> AgentResponse {
// Send screenshot to your LLM and parse the response
...
}
}
CLI
# Run the agent from the command line
sagui agent "Open Safari and search for Swift" --api-key sk-...
# With options
sagui agent "Click the trash icon" --model gpt-4o --max-iterations 15 --delay 2.0
# Using environment variable for the API key
export OPENAI_API_KEY=sk-...
sagui agent "Open Terminal"
Action Pattern
SwiftAutoGUI provides an intuitive Action pattern for building and executing automation sequences.
Basic Usage
import SwiftAutoGUI
// Execute single actions
await Action.leftClick.execute()
await Action.write("Hello, World!").execute()
await Action.keyShortcut([.command, .a]).execute() // Select all
// Build and execute action sequences
let actions: [Action] = [
.move(to: CGPoint(x: 100, y: 100)),
.wait(0.5),
.leftClick,
.write("Hello, SwiftAutoGUI!"),
.keyShortcut([.returnKey])
]
await actions.execute()
Keyboard Actions
// Text input and shortcuts
let typingActions: [Action] = [
.write("Fast typing"),
.wait(1.0),
.write("Slow typing", interval: 0.1), // 0.1 second between characters
.keyShortcut([.command, .z]) // Undo
]
await typingActions.execute()
// Common shortcuts as convenience methods
await Action.copy().execute() // Cmd+C
await Action.paste().execute() // Cmd+V
await Action.cut().execute() // Cmd+X
await Action.selectAll().execute() // Cmd+A
await Action.save().execute() // Cmd+S
await Action.undo().execute() // Cmd+Z
await Action.redo().execute() // Cmd+Shift+Z
// Special keys
await Action.keyDown(.soundUp).execute()
await Action.keyUp(.soundUp).execute()
Mouse Actions