Table of Contents
About
aws-polly-tts-tool is a comprehensive CLI tool and Python library for Amazon Polly text-to-speech synthesis. Built with a CLI-first philosophy, it provides both command-line convenience and programmatic access to AWS Polly's full feature set.
What is Amazon Polly?
Amazon Polly is AWS's fully-managed text-to-speech service that converts text into lifelike speech using deep learning. It offers 60+ voices in 30+ languages with multiple quality tiers.
Why This Tool?
- Agent-Friendly: Designed for Claude Code and AI agents with self-documenting help and structured errors
- Composable: JSON output to stdout, logs to stderr - perfect for Unix piping
- Dual-Mode: Use as CLI or import as Python library
- Production-Ready: Type-safe, tested, linted with comprehensive error handling
- Cost-Transparent: Real-time cost estimates and AWS billing integration
Why CLI-First?
This tool prioritizes CLI design to enable:
- 🤖 AI Agent Integration: Claude Code and other AI tools can use structured commands and parse outputs
- 🔄 ReAct Loops: Clear error messages help agents self-correct and retry operations
- 🔗 Composability: Standard Unix patterns (stdin/stdout/stderr) enable piping and automation
- 🧱 Building Blocks: Commands serve as reusable components for skills, MCP servers, and scripts
- 📊 Predictability: Type-safe implementation ensures consistent behavior in automated workflows
Features
Voice Engines
- ✅ Standard - Cost-effective traditional TTS ($4/1M chars)
- ✅ Neural - Natural, human-like voices ($16/1M chars)
- ✅ Generative - Most advanced, emotionally engaged ($30/1M chars)
- ✅ Long-form - Optimized for audiobooks ($100/1M chars)
Voice Selection
- 📢 60+ voices across 30+ languages
- 🔍 Dynamic fetching from Polly API (always up-to-date)
- 🎚️ Filter by engine, language, gender
- 🌍 Multiple accents and speaking styles
Output Options
- 🎵 mp3 - General purpose (default)
- 🎶 ogg_vorbis - Open format for web
- 🎙️ pcm - Raw audio, lowest latency
Advanced Features
- 📝 Full SSML support (prosody, breaks, emphasis, phonemes)
- 💰 Dual cost tracking (estimates + AWS Cost Explorer)
- 📊 Billing queries with engine breakdown
- 🔐 AWS environment variable authentication
- 📤 Stdin support for piping
Installation
Prerequisites
- Python 3.12+ (Python 3.13+ has pydub compatibility issues - see Known Issues)
- uv package manager (recommended)
- AWS credentials configured
- ffmpeg (for audio playback - not required for file output)
Note: For a detailed explanation of how the TTS pipeline works and why these dependencies are needed, see TTS Pipeline Architecture
Install from Source
# Clone repository
git clone https://github.com/dnvriend/aws-polly-tts-tool.git
cd aws-polly-tts-tool
# Install with uv (Python 3.12)
uv tool install . --python 3.12
# Verify installation
aws-polly-tts-tool --version
Install with mise (Development)
cd aws-polly-tts-tool
mise use python@3.12
uv sync
uv tool install .
Configuration
AWS Credentials
Configure AWS credentials using any of these methods:
# Method 1: AWS CLI configuration
aws configure
# Method 2: Environment variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"