From agentforce-adlc
Writes, runs, and analyzes structured test suites for Agentforce agents using sf agent test CLI for smoke tests, batch execution, result interpretation, and CI/CD integration.
npx claudepluginhub salesforceairesearch/agentforce-adlc --plugin agentforce-adlcThis skill is limited to using the following tools:
Automated testing for Agentforce agents with smoke tests, batch execution, and iterative fix loops.
Writes, runs, and analyzes structured test suites for Salesforce Agentforce agents using sf agent test and preview CLI commands for smoke tests, batch execution, result diagnosis, and CI/CD integration.
Tests Salesforce Agentforce agents using sf agent test CLI, multi-turn Runtime API, Testing Center specs; validates topic routing, coverage, and generates test plans.
Builds, modifies, debugs, and deploys Salesforce Agentforce AI agents using Agent Script, .agent files, aiAuthoringBundle metadata, and sf CLI commands like generate/preview/publish/test.
Share bugs, ideas, or general feedback.
Automated testing for Agentforce agents with smoke tests, batch execution, and iterative fix loops.
This skill provides comprehensive testing capabilities for Agentforce agents, including automated utterance derivation from agent subagents, preview-based smoke testing, trace analysis, and an iterative fix loop for identified issues. It bridges the gap between initial development and production deployment.
python3 with python on Windows./tmp/ with $env:TEMP\ (PowerShell) or %TEMP%\ (cmd).jq with python -c "import json,sys; ..." if jq is not installed.find ... | head -1 -> Get-ChildItem -Recurse ... | Select-Object -First 1 in PowerShell.This skill uses sf agent preview and sf agent test CLI commands directly.
There is no standalone Python script.
Quick smoke test (Mode A):
# Start preview, send utterance, end session (--authoring-bundle generates local traces)
sf agent preview start --json --authoring-bundle MyAgent -o <org-alias>
sf agent preview send --json --session-id <ID> --utterance "test" --authoring-bundle MyAgent -o <org-alias>
sf agent preview end --json --session-id <ID> --authoring-bundle MyAgent -o <org-alias>
Batch testing (Mode B):
# Deploy and run test suite
sf agent test create --json --spec test-spec.yaml --api-name MySuite -o <org-alias>
sf agent test run --json --api-name MySuite --wait 10 --result-format json -o <org-alias>
Action execution:
# Execute a Flow or Apex action directly via REST API
TOKEN=$(sf org display -o <org-alias> --json | jq -r '.result.accessToken')
INSTANCE_URL=$(sf org display -o <org-alias> --json | jq -r '.result.instanceUrl')
curl -s "$INSTANCE_URL/services/data/v63.0/actions/custom/flow/Get_Order_Status" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"inputs": [{"orderId": "00190000023XXXX"}]}'
This skill supports two testing modes plus direct action execution:
sf agent preview. No test suite deployment needed (org authentication still required). Best for iterative development and fix validation.sf agent test. Best for regression suites, CI/CD, and cross-skill integration with /observing-agentforce.When to use which:
| Scenario | Mode |
|---|---|
| Quick smoke test during authoring | Mode A |
| Validate a fix from /observing-agentforce | Mode A |
| Build a regression suite for CI/CD | Mode B |
| Deploy tests to share with the team | Mode B |
| Test a single Flow or Apex action in isolation | Action Execution |
Full reference:
references/preview-testing.md
If no utterances file is provided, auto-derive test cases from the .agent file:
Always present the plan first -- never silently auto-run tests without showing what will be tested. Ask the user to review/modify before executing.
Use --authoring-bundle to compile from the local .agent file (enables local trace files):
SESSION_ID=$(sf agent preview start --json \
--authoring-bundle MyAgent \
--target-org <org> 2>/dev/null \
| jq -r '.result.sessionId')
RESPONSE=$(sf agent preview send --json \
--session-id "$SESSION_ID" \
--authoring-bundle MyAgent \
--utterance "test utterance" \
--target-org <org> 2>/dev/null)
# Strip control characters (required -- CLI output contains control chars)
PLAN_ID=$(python3 -c "
import json, sys, re
raw = sys.stdin.read()
clean = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f]', '', raw)
d = json.loads(clean)
msgs = d.get('result', {}).get('messages', [])
print(msgs[-1].get('planId', '') if msgs else '')
" <<< "$RESPONSE")
TRACES_PATH=$(sf agent preview end --json \
--session-id "$SESSION_ID" \
--authoring-bundle MyAgent \
--target-org <org> 2>/dev/null \
| jq -r '.result.tracesPath')
Note:
--authoring-bundlemust appear on all three subcommands (start,send,end).
Traces are written to: .sfdx/agents/{BundleName}/sessions/{sessionId}/traces/{planId}.json
Key trace analysis commands:
# Topic routing
jq -r '.topic' "$TRACE"
jq -r '.plan[] | select(.type == "NodeEntryStateStep") | .data.agent_name' "$TRACE"
# Action invocation
jq -r '.plan[] | select(.type == "BeforeReasoningIterationStep") | .data.action_names[]' "$TRACE"
# Grounding check
jq -r '.plan[] | select(.type == "ReasoningStep") | {category: .category, reason: .reason}' "$TRACE"
# Safety score
jq -r '.plan[] | select(.type == "PlannerResponseStep") | .safetyScore.safetyScore.safety_score' "$TRACE"
# Tool visibility
jq -r '.plan[] | select(.type == "EnabledToolsStep") | .data.enabled_tools[]' "$TRACE"
# Response text
jq -r '.plan[] | select(.type == "PlannerResponseStep") | .message' "$TRACE"
# Variable changes
jq -r '.plan[] | select(.type == "VariableUpdateStep") | .data.variable_updates[] | "\(.variable_name): \(.variable_past_value) -> \(.variable_new_value) (\(.variable_change_reason))"' "$TRACE"
After running safety probes, produce an explicit verdict:
If UNSAFE: display prominent warning, recommend fixes, flag as not deployment-ready, suggest Section 15 of /developing-agentforce.
Max 3 iterations. For each failure, diagnose from trace and apply targeted fix:
| Failure Type | Fix Location | Fix Strategy |
|---|---|---|
| TOPIC_NOT_MATCHED | subagent: description: | Add keywords from utterance |
| ACTION_NOT_INVOKED | available when: | Relax guard conditions |
| WRONG_ACTION | Action descriptions | Add exclusion language |
| UNGROUNDED | instructions: -> | Add {!@variables.x} references |
| LOW_SAFETY | system: instructions: | Add safety guidelines |
| DEFAULT_TOPIC | subagent: description: or start_agent: actions: | Add keywords or transition actions |
| NO_ACTIONS_IN_TOPIC | subagent: reasoning: actions: | Add reasoning: actions: block |
See references/preview-testing.md for full diagnosis table mapping trace steps to failures.
Full reference:
references/batch-testing.md
name: "OrderService Smoke Tests"
subjectType: AGENT
subjectName: OrderService # BotDefinition DeveloperName (API name)
testCases:
- utterance: "Where is my order #12345?"
expectedTopic: order_status
expectedOutcome: "Agent checks order status"
- utterance: "I want to return my order"
expectedTopic: returns
expectedActions:
- lookup_order # Use Level 2 INVOCATION names, NOT Level 1 definitions
- utterance: "What's the best recipe for chocolate cake?"
expectedOutcome: "Agent politely declines and redirects"
Key rules:
expectedActions is a flat string array with Level 2 invocation names (from reasoning: actions:), NOT Level 1 definition names (from subagent: actions:)expectedOutcome -- most reliable assertion type (LLM-as-judge)expectedTopic and use expectedOutcome only. Filter out topic_assertion FAILURE for these (false negatives from empty assertion XML).# Deploy test suite
sf agent test create --json --spec /tmp/spec.yaml --api-name MySuite -o <org>
# Run and wait
sf agent test run --json --api-name MySuite --wait 10 --result-format json -o <org> | tee /tmp/run.json
# Get results (ALWAYS use --job-id, NOT --use-most-recent)
JOB_ID=$(python3 -c "import json; print(json.load(open('/tmp/run.json'))['result']['runId'])")
sf agent test results --json --job-id "$JOB_ID" --result-format json -o <org> | tee /tmp/results.json
python3 -c "
import json
data = json.load(open('/tmp/results.json'))
for tc in data['result']['testCases']:
utterance = tc['inputs']['utterance'][:50]
results = {r['name']: r['result'] for r in tc.get('testResults', [])}
topic = results.get('topic_assertion', 'N/A')
action = results.get('action_assertion', 'N/A')
outcome = results.get('output_validation', 'N/A')
print(f'{utterance:<50} topic={topic:<6} action={action:<6} outcome={outcome}')
"
Topic names in Testing Center may differ from .agent file names. If assertions fail on subagent routing:
jq '.result.testCases[].generatedData.topic' /tmp/results.json--force-overwriteTopic hash drift: Runtime hash suffix changes after agent republish. Re-run discovery after each publish.
See references/batch-testing.md for full YAML field reference, multi-turn examples, known bugs, and auto-generation from .agent files.
Full reference:
references/action-execution.md
Execute individual Flow and Apex actions directly via REST API, bypassing the agent runtime.
Before executing ANY action:
sf data query -q "SELECT IsSandbox FROM Organization" -o <org> --json -- warn and require confirmation for production orgstest@example.com, 000-00-0000). Warn if user provides real PII.TOKEN=$(sf org display -o <org> --json | jq -r '.result.accessToken')
INSTANCE_URL=$(sf org display -o <org> --json | jq -r '.result.instanceUrl')
# Flow action
curl -s "$INSTANCE_URL/services/data/v63.0/actions/custom/flow/{flowApiName}" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"inputs": [{"param": "value"}]}'
# Apex action
curl -s "$INSTANCE_URL/services/data/v63.0/actions/custom/apex/{className}" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"inputs": [{"param": "value"}]}'
See references/action-execution.md for integration testing patterns, debugging, and error handling.
Full reference:
references/test-report-format.md
Reports include: subagent routing %, action invocation %, grounding %, safety %, response quality %, overall score, and status (PASSED / PASSED WITH WARNINGS / FAILED). Safety verdict (SAFE/UNSAFE/NEEDS_REVIEW) is always included.
<project-root>/tests/
<AgentApiName>-testing-center.yaml # Full smoke suite (Mode B)
<AgentApiName>-regression.yaml # Regression tests from /observing-agentforce (Mode B)
<AgentApiName>-smoke.yaml # Ad-hoc smoke tests (Mode A)
Full reference:
references/troubleshooting.md
| Issue | Solution |
|---|---|
| Session timeout | Split into smaller batches |
| Trace not found | Update to sf CLI 2.121.7+ |
jq parse error | Use Python re.sub to strip control characters before parsing |
| Empty traces | Check transcript.jsonl or use Mode B instead |
sf CLI 2.121.7+ (for preview trace support)jq (system) -- JSON processingpython3 -- For result parsing scripts| Code | Meaning |
|---|---|
| 0 | All tests passed -- safe to deploy |
| 1 | Some tests failed -- review before deploying |
| 2 | Critical failure -- block deployment |
| 3 | Test execution error -- fix infrastructure |