npx claudepluginhub aws/agent-toolkit-for-aws --plugin aws-agentsThis skill is limited to using the following tools:
Prepare your AgentCore agent for production — security, reliability, and performance.
Diagnoses AgentCore agent failures including wrong answers, errors, timeouts, tool issues, and CLI problems by reading traces, logs, and checking prerequisites.
Builds generative AI apps on Amazon Bedrock: invokes models via Converse/InvokeModel APIs, sets up RAG Knowledge Bases/Agents/Guardrails/AgentCore, troubleshoots errors like ThrottlingException/AccessDeniedException, handles prompt caching/quotas/costs/migrations/chunking/model selection (Claude/Llama/Nova/Titan).
Builds, modifies, debugs, and deploys Salesforce Agentforce AI agents using Agent Script, .agent files, aiAuthoringBundle metadata, and sf agent CLI commands.
Share bugs, ideas, or general feedback.
Prepare your AgentCore agent for production — security, reliability, and performance.
references/limits.md)No arguments required. The skill reads your project config and produces a checklist with specific findings for your project.
Run agentcore --version. This skill requires v0.9.0 or later. If the version is older, tell the developer to run agentcore update before proceeding.
Read agentcore/agentcore.json to understand:
Work through each category and report findings specific to the project.
The auto-created execution role has broad Bedrock access (arn:aws:bedrock:*::foundation-model/*). For production, scope it to the specific models your agent uses.
Check the current execution role:
agentcore status --json | jq -r '.runtimes[0].executionRoleArn'
Recommended production Bedrock policy:
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:<REGION>::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0"
]
}
Replace the resource ARN with the specific model(s) your agent uses.
ECR access: Scope to your specific repository:
{
"Effect": "Allow",
"Action": ["ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer"],
"Resource": "arn:aws:ecr:<REGION>:<YOUR_ACCOUNT_ID>:repository/bedrock-agentcore-<AGENT_NAME>-*"
}
Trust policy: Verify the execution role's trust policy is scoped to your account:
{
"Principal": {"Service": "bedrock-agentcore.amazonaws.com"},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {"aws:SourceAccount": "<YOUR_ACCOUNT_ID>"},
"ArnLike": {"aws:SourceArn": "arn:aws:bedrock-agentcore:<REGION>:<YOUR_ACCOUNT_ID>:*"}
}
}
Runtime resource-based policies (API-only): For fine-grained control over which principals can invoke your runtime — beyond what IAM roles and JWT auth provide — use PutAgentRuntimeResourcePolicy via boto3. This is not exposed in the CLI or agentcore.json. Use the awsknowledge MCP server if available to look up the current API shape.
InvokeAgentRuntimeCommand separatelyIf your project uses InvokeAgentRuntimeCommand (see agents-build/references/integrate.md), audit its IAM permissions separately from InvokeAgentRuntime. The two actions have different blast radii: InvokeAgentRuntimeCommand is arbitrary shell execution inside a live microVM with the runtime's full execution role — callers can read/write the filesystem, reach any network resource the agent can reach, and access the execution role's credentials.
Check which principals have the permission:
# List customer-managed policies in your account, then inspect each for InvokeAgentRuntimeCommand
aws iam list-policies --scope Local \
--query 'Policies[*].[PolicyName, Arn, DefaultVersionId]' \
--output table
# Then for each policy of interest:
aws iam get-policy-version \
--policy-arn <POLICY_ARN> \
--version-id <VERSION_ID> \
--query 'PolicyVersion.Document'
Alternatively, use the IAM console: IAM → Policies → Filter by type: Customer managed → search for InvokeAgentRuntimeCommand in the policy JSON editor.
Separate IAM policy for command callers — keep this distinct from the policy granting InvokeAgentRuntime:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "bedrock-agentcore:InvokeAgentRuntimeCommand",
"Resource": "arn:aws:bedrock-agentcore:<REGION>:<YOUR_ACCOUNT_ID>:runtime/<RUNTIME_NAME>-*"
}]
}
Enable CloudTrail alerting. Create an EventBridge rule to notify your security team when InvokeAgentRuntimeCommand is called:
aws events put-rule \
--name AgentCoreCommandExecution \
--event-pattern '{"source":["aws.bedrock-agentcore"],"detail-type":["AWS API Call via CloudTrail"],"detail":{"eventName":["InvokeAgentRuntimeCommand"]}}' \
--state ENABLED
If commands are constructed from user input anywhere in calling code: validate before passing — reject strings containing &&, ;, $(...), backticks, |, or other shell metacharacters.
By default, agents use AWS IAM (SigV4) for inbound auth. For production, verify this is configured correctly.
Check current auth config:
agentcore status --runtime <AgentName> --json | jq '.runtimes[0].authorizerConfig'
Options:
AWS_IAM (default) — callers must sign requests with SigV4. Good for internal services and AWS-native clients.
CUSTOM_JWT — callers present a JWT from your identity provider. Good for web/mobile apps and external clients.
agentcore add agent \
--name MyAgent \
--authorizer-type CUSTOM_JWT \
--discovery-url https://your-idp.example.com/.well-known/openid-configuration \
--allowed-audience my-api \
--allowed-clients my-client-id
[!WARNING] Never use
--authorizer-type NONEin production. It allows unauthenticated access to your agent — anyone with the endpoint URL can invoke it. Always use AWS_IAM or CUSTOM_JWT. If you see NONE in production, change it immediately.
allowedClients vs allowedAudienceThis is the most common JWT misconfiguration. The right choice depends on what's inside the token your IdP issues.
Decode a sample token (at your IdP or with jwt.io) and look at the payload:
client_id claim, no aud claim → configure allowedClients on the runtimeaud claim → configure allowedAudience on the runtimeallowedAudience. The aud claim is the standard OIDC audience field; use that as the primary check.If you pick the wrong one, invocations return 403 even with a valid token — the runtime is validating against a claim the token doesn't have.
AgentCore enforces the OIDC discovery spec (RFC 8414 §3): the issuer value in the discovery document must be a URL prefix of the discovery endpoint.
That means if your discovery URL is https://qa.example.com/.well-known/openid-configuration, the issuer field in that document must start with https://qa.example.com. If the document advertises an issuer like https://example.com (no subdomain), validation fails.
Some enterprise IdPs (PingFederate, Paylocity, some Keycloak setups) host the discovery endpoint on an environment-specific subdomain while advertising a production-level issuer. This pattern is incompatible with the RFC 8414 prefix rule.
Fix options:
When invocations fail with 403, narrow down which check is failing.
Authorization method mismatch — the runtime's auth type and the request's auth type don't match. Two cases:
AWS_IAM (or no authorizer) but the caller is sending a Bearer token → reconfigure the runtime for CUSTOM_JWT, or have the caller use SigV4.CUSTOM_JWT but the caller's request is being SigV4-signed → likely the SDK or environment is injecting SigV4 headers alongside the Bearer token. Check for X-Amz-Date, X-Amz-Security-Token, or Authorization: AWS4-HMAC-SHA256 in the outbound request. Remove the SigV4 path and send only the Bearer token.Invalid inbound token (or similar) — the token was rejected by the JWT validator. Walk through these in order:
iss claim matches the discovery URL's originallowedClients vs allowedAudience — is the runtime configured for the right claim for your token format?jwks_uri listed in the discovery document? It must be publicly reachable.exp against nowOnly after ruling all of those out should you treat it as a service-side issue.
Check that your agent code handles errors without exposing internal details:
from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload, context):
try:
# your agent logic
return {"response": result}
except Exception as e:
# Log the full error internally
app.logger.error(f"Agent error: {e}", exc_info=True)
# Return a safe message to the caller
return {"error": "An error occurred. Please try again."}
if __name__ == "__main__":
app.run()
Check for: bare except blocks that swallow errors silently, error messages that expose stack traces or internal details to callers, missing error handling in tool call code.
Agent entrypoints receive arbitrary payloads from callers. Validate inputs before processing:
@app.entrypoint
def invoke(payload, context):
prompt = payload.get("prompt", "")
# Validate input
if not prompt or not isinstance(prompt, str):
return {"error": "Missing or invalid 'prompt' field"}
if len(prompt) > 10000:
return {"error": "Prompt exceeds maximum length (10,000 characters)"}
# Sanitize — strip control characters, excessive whitespace
prompt = " ".join(prompt.split())
# Proceed with validated input
result = agent(prompt)
return {"response": str(result)}
What to validate:
Rate limiting: AgentCore Runtime has built-in invocation rate limits (default 25 TPS per agent — see references/limits.md). For application-level rate limiting (per-user, per-tenant), implement it in your calling application or API Gateway layer, not in the agent code itself. The agent should assume it's already been rate-limited by the time a request reaches it.
Two failure modes to check for:
# Search for common secret patterns in agent code
grep -r "sk-\|api_key\s*=\s*['\"]" app/ --include="*.py"
grep -r "password\s*=\s*['\"]" app/ --include="*.py"
AgentCore Runtime environment variables are not vault-backed. Anything a developer stuffs into the runtime's env (via CDK, boto3 UpdateAgentRuntime, or similar) is a plaintext config value, not a secret. Audit for the pattern:
# Flag any os.getenv / os.environ call whose name implies a secret
grep -rE "os\.(getenv|environ).*(TOKEN|SECRET|KEY|PASSWORD|CREDENTIAL)" app/ --include="*.py"
Non-secret identifiers injected by the platform are fine and should not match an allowlist (e.g., MEMORY_*_ID, AGENTCORE_GATEWAY_*_URL, AWS_REGION, downstream agent ARNs). Review hits and confirm none are secrets.
Correct pattern: Register each outbound credential with agentcore add credential, then fetch it in code via the integrated credential providers:
from bedrock_agentcore.identity.auth import requires_api_key, requires_access_token
@requires_api_key(provider_name="MyAPI")
def call_api(payload: dict, *, api_key: str) -> dict:
...
@requires_access_token(provider_name="MyOAuthProvider", scopes=["read"], auth_flow="M2M")
async def call_downstream(data: dict, *, access_token: str) -> dict:
...
The decorator fetches from Secrets Manager at call time and handles caching/refresh. Credentials registered this way are encrypted at rest and rotated without a redeploy.
Local dev: agentcore/.env.local (gitignored) is read by agentcore dev so the decorator resolves locally. This file is not uploaded to runtime on deploy — production credentials live in the credential provider.
A related audit — for every external service the agent calls, ask whether it should be a Gateway target instead of a direct HTTP call buried in agent code. Gateway's credential providers inject auth at the edge (so the agent process never sees the secret), the tool catalog is policy-enforceable, and a leaked traceback/log line from agent code can't exfiltrate credentials that never reached it.
# Find direct outbound HTTP calls in agent code
grep -rEn 'httpx\.|requests\.|aiohttp\.' app/ --include="*.py"
For each hit, decide:
| Hit looks like | Action |
|---|---|
| Calls an external REST API the agent treats as a tool | Front as a Gateway target (agentcore add gateway-target --type open-api-schema or api-gateway). Load agents-connect/SKILL.md Path C. |
| Calls an MCP server directly | Front as a Gateway target (--type mcp-server). Load agents-connect/SKILL.md Path A. |
Calls an AWS service (S3, DynamoDB, etc.) — not appropriate to match this row, should be boto3 | Migrate from requests/httpx to the boto3 client, using the runtime's execution role for IAM. No credential needed. |
| Calls a streaming service (SSE-with-live-output, WebSocket, WebRTC) | OK to keep direct — Gateway doesn't front these yet. Confirm any auth uses @requires_*, not os.getenv. |
| Calls another agent via A2A | OK to keep direct — A2A is HTTP-by-design. Confirm it uses @requires_access_token for the bearer token. |
| Calls a measured latency hot path and the team chose it | OK, but confirm measurement exists and auth uses @requires_*. |
If the hit fits none of the "OK to keep direct" rows, open a ticket to convert it to a Gateway target. Gateway targets can be added without a code change in the agent for most framework integrations (MCP tool discovery handles binding).
AgentCore enables X-Ray tracing and CloudWatch logging automatically. Verify:
agentcore status --runtime <AgentName> --json | jq '.runtimes[0].observabilityConfig'
CloudWatch dashboard: AWS Console → CloudWatch → GenAI Observability → Bedrock AgentCore
Log retention: By default, logs are retained indefinitely. Set a retention policy for cost control:
aws logs put-retention-policy \
--log-group-name /aws/bedrock-agentcore/runtimes/<AGENT_ID>-DEFAULT \
--retention-in-days 30
Before going to production, establish a quality baseline so you can detect regressions:
# Run a baseline eval
agentcore run eval \
--evaluator "Builtin.Helpfulness" \
--evaluator "Builtin.GoalSuccessRate"
# Set up continuous monitoring
agentcore add online-eval \
--name production_monitor \
--runtime <AgentName> \
--evaluator "Builtin.Helpfulness" \
--sampling-rate 5
agentcore deploy -y
Record the baseline scores. If scores drop significantly after a change, investigate before continuing.
If your agent accesses private AWS resources (RDS, internal APIs), configure VPC:
agentcore add agent \
--name MyAgent \
--network-mode VPC \
--subnets subnet-abc,subnet-def \
--security-groups sg-123
See agents-build (loads references/vpc.md) for full VPC configuration guidance.
Slow agent initialization causes timeouts, 424 errors, and poor user experience — especially on first invocation after a period of inactivity. Everything the agent does before it's ready to handle a request adds to the time users wait.
A typical cold start for a new environment takes around 20–30 seconds. The breakdown, roughly:
The two you control are image size and application startup. Optimizing either one directly reduces time to first response.
Same-session requests route to an existing initialized environment — no cold start. The first request per session pays the cold-start cost; every subsequent request on that session is fast.
Concrete patterns:
session_id across turns. Don't generate a new UUID per turn.session_id across items in the batch.Cross-SDK note: if you're using MCP, pass one session identifier, not both runtimeSessionId and mcpSessionId at once. Sending both can cause the platform to bind two separate environments to the same logical session, doubling cold-start cost.
Every MB of deployment package adds to cold-start time.
.dockerignore.pyproject.toml / requirements.txt. Don't ship tests/, docs/, .git/, local caches.pip list (Python) or npm ls (Node) will show you what's actually installed. Remove anything you're not using.Don't load large models, connect to databases, or initialize MCP clients at module import time. Every second spent in module import is a second the agent can't respond to requests.
# ❌ Slow — runs at import time, before the agent can handle requests
import heavy_library
client = heavy_library.Client(config)
# ✅ Fast — defers until first request
_client = None
def get_client():
global _client
if _client is None:
import heavy_library
_client = heavy_library.Client(config)
return _client
The skill previously recommended CodeZip over Container when possible. That's an oversimplification. Here's the real trade-off:
Neither wins universally. Both benefit the same way from session reuse and from keeping the package small. If your traffic pattern has lots of bursty cold sessions, invest in shrinking whichever deployment artifact you're using. If your traffic pattern reuses sessions, the deployment type matters much less.
Use provisioned concurrency on the Lambda function to eliminate Lambda cold starts. This is separate from Runtime initialization — it's the Lambda itself that adds latency on first invocation of a cold Lambda.
Session management is tightly linked to cost, performance, and the maxVms quota. Getting this right is often the difference between a smooth production launch and a quota-blocked one.
When a request arrives with a new session ID, the runtime initializes a fresh environment for it. That environment stays alive until one of:
StopRuntimeSession.idleRuntimeSessionTimeout (default 900 seconds).maxLifetime, default 8 hours).Idle environments count against your maxVms quota until they're reclaimed, even though they're not serving traffic. This is the #1 cause of unexpected maxVms errors.
Don't leave defaults for production. Pick values that match how your workload actually uses sessions:
| Workload | idleRuntimeSessionTimeout | maxLifetime | Reasoning |
|---|---|---|---|
| Interactive chat / support agent | 600–900s (default) | 3600–7200s | Users pause to read/think. Reclaim fast after they leave. |
| Request/reply API with no follow-up | 60–120s | 1800s | Each call is self-contained — release the VM quickly. |
| Batch processing, one session per job | 120s | match job length + buffer | Idle gap between items in the batch is small; reclaim aggressively between jobs. |
Background / long-running tasks (use add_async_task) | 120–300s | up to 28800s (8h) | Async task API keeps the VM alive during tracked work; idle timeout applies between tasks. |
Trade-offs at a glance:
maxVms, lower cost. Risk: reclaim mid-conversation causing next turn to cold-start.maxVms errors on bursts.Call StopRuntimeSession when the work is done. If your agent finishes a task and doesn't expect more requests on that session, explicitly stop it. This releases the environment immediately instead of waiting for idle timeout.
# After your invocation logic completes and you know the session is done:
client.stop_runtime_session(
agentRuntimeArn=runtime_arn,
runtimeSessionId=session_id,
)
Reuse session IDs for related work. A new session ID for every HTTP request means a new environment for every HTTP request. For multi-turn conversations, batch jobs, or user-facing interactions, use one session ID per conversation/batch/user-interaction and route all related requests to it.
Tune idleRuntimeSessionTimeout to your workload. The default 900 seconds is appropriate for interactive workloads where you expect quick follow-up requests. For request-reply workloads where sessions are short-lived, lower it.
Edit the runtime's entry in agentcore/agentcore.json:
{
"runtimes": [
{
"name": "MyAgent",
"lifecycleConfiguration": {
"idleRuntimeSessionTimeout": 120,
"maxLifetime": 3600
}
}
]
}
Then agentcore deploy to apply. The CLI and CDK handle the underlying UpdateAgentRuntime call for you.
If you prefer the CLI, agentcore add agent ... --idle-timeout 120 --max-lifetime 3600 writes the same fields into agentcore.json. The file is the source of truth — every field in it has IDE autocomplete via the $schema URL at the top of the file (https://schema.agentcore.aws.dev/v1/agentcore.json).
Lower timeout = faster VM reclamation = more headroom under maxVms. Too low = environments get reclaimed mid-conversation, causing the next turn to cold-start.
Don't pass both runtimeSessionId and mcpSessionId together. For MCP agents, use one. Passing both can bind two separate VMs to the same logical session.
maxVms problemsIf you hit ServiceQuotaExceededException: maxVms limit exceeded, don't request a quota increase first. CloudWatch's concurrent-sessions metric is not the same as live VM count — idle environments count against the quota until reclaimed.
Work through this order:
StopRuntimeSession after each logical request completesidleRuntimeSessionTimeout if your sessions are short-livedSee references/limits.md for the increase-request workflow (via the Service Quotas console) and the justification template.
If your agent fires off work that outlives the /invocations response — background processing, async jobs, long tool chains — a fire-and-forget pattern isn't enough. The environment can be reclaimed at idleRuntimeSessionTimeout even while your background task is still running, because the runtime considers the session idle once the invocation response is sent.
The bedrock-agentcore SDK provides task registration that keeps the environment alive while tracked work runs. In Python:
from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload, context):
# Register the task BEFORE starting it
task_id = app.add_async_task("background_work")
# Kick off the work (in a thread, asyncio, etc.)
start_background_work(task_id, payload)
# Return the invocation response — the task is still tracked
return {"status": "processing", "taskId": task_id}
def start_background_work(task_id, payload):
try:
# Long-running work here
do_the_work(payload)
finally:
# Mark the task complete when done — this releases the "busy" signal
app.complete_async_task(task_id)
if __name__ == "__main__":
app.run()
While at least one registered task is active, the runtime sees the environment as busy and doesn't reclaim it at idleRuntimeSessionTimeout. maxLifetime (default 8 hours) still applies as a hard ceiling.
Check the bedrock-agentcore SDK docs for your language for the equivalent API — the TypeScript SDK has an analogous pattern.
idleRuntimeSessionTimeout to match your expected task duration. If you know tasks run up to 10 minutes, set the timeout to 12 minutes. Keep it well under maxLifetime.agents-debug/SKILL.md ("Connection drops mid-stream" section).If you're hitting throttling, ServiceQuotaExceededException, or any other quota-related error — or you're about to launch and want to make sure quotas won't block you — load references/limits.md.
That reference covers:
Generate a checklist specific to the project:
Production Readiness Checklist for <AgentName>
IAM
[ ] Execution role Bedrock access scoped to specific model ARNs
[ ] ECR access scoped to specific repository
[ ] Trust policy scoped to your account ID
Authentication
[ ] Inbound auth is AWS_IAM or CUSTOM_JWT (not NONE)
[ ] If CUSTOM_JWT: discovery URL, audience, and client IDs configured
Shell Access (if using InvokeAgentRuntimeCommand)
[ ] InvokeAgentRuntimeCommand permission granted only to identities that need it
[ ] Separate IAM policy from InvokeAgentRuntime policy
[ ] CloudTrail / EventBridge alert configured for InvokeAgentRuntimeCommand calls
[ ] If commands constructed from user input: shell injection validation implemented
Code quality
[ ] Error handling wraps all agent logic
[ ] Input validation on payload fields (type, length, format)
[ ] No secrets hardcoded in agent code
[ ] Credentials registered via agentcore add credential
Observability
[ ] X-Ray tracing enabled (auto-configured)
[ ] CloudWatch log retention policy set
[ ] Eval baseline established
Performance
[ ] Agent initialization time measured and optimized
[ ] Deployment package size under 200 MB (target under 100 MB)
[ ] Dependencies audited — no unused packages
[ ] Heavy initialization deferred to request time
[ ] Session reuse strategy chosen for multi-turn / batch workloads
[ ] `StopRuntimeSession` called after work completes where applicable
[ ] `idleRuntimeSessionTimeout` tuned to workload (default 900s)
[ ] For long-running background tasks: `add_async_task` / `complete_async_task` used
Resources
[ ] Memory strategies appropriate for use case (if using memory)
[ ] Gateway auth configured (if using gateway)
[ ] Policy engine attached (if restricting tool access)
Testing
[ ] Agent tested with production-representative inputs
[ ] Error cases tested (tool failures, model errors)
[ ] Memory cross-session tested (if using LTM)