Analyze production Agentforce agent behavior using STDM session traces in Data Cloud, plus a fallback path using sf agent test + sf agent preview --authoring-bundle when STDM is unavailable. Use when investigating production failures, regressions, or performance regressions; querying ssot__AiAgentSession__dlm; reproducing reported issues in preview; or improving the .agent file based on production evidence. Trigger phrases: 'why is my agent failing in production', 'analyze production sessions', 'investigate this Agentforce regression', 'what happened in this session', 'reproduce this production issue', 'find sessions where the agent misrouted'. Do NOT trigger for development-time iteration — use /agentforce-develop or /agentforce-test.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sf-compound-engineering:agentforce-observe [org alias; optional --agent-file <path> --session-id <id> --days <n>][org alias; optional --agent-file <path> --session-id <id> --days <n>]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **<span data-proof="authored" data-by="ai:claude">Principles enforced:</span>** <span data-proof="authored" data-by="ai:claude">7 (outsource thinking, not understanding), 3 (jagged intelligence in production), 5 (taste / drift detection), 1 (preserve the quality ceiling). See</span> <span data-proof="authored" data-by="ai:claude">`PRINCIPLES.md`.</span>
Principles enforced: 7 (outsource thinking, not understanding), 3 (jagged intelligence in production), 5 (taste / drift detection), 1 (preserve the quality ceiling). See
PRINCIPLES.md.
Improve a deployed Agentforce agent using session-trace evidence. Three phases: (1) Observe
— query STDM session traces from Data Cloud (or fall back to sf agent test + sf agent preview
--authoring-bundle when STDM is unavailable); (2) Reproduce — re-run problematic conversations
in sf agent preview, classify CONFIRMED / INTERMITTENT / NOT REPRODUCED across 3 runs; (3)
Improve — edit the .agent file with targeted fixes, validate, publish, activate, then verify
in preview and post 24-48h re-run Phase 1 against baseline. Always pass --json on every sf
CLI command. Always re-run safety probes after any fix.
This skill is for production-grade observation — the agent has shipped, real users are hitting it, and you need evidence to drive improvement. The institutional-memory principle (7) lives here: STDM is the wiki for production agent behavior, and docs/solutions/ is the wiki for what we learned from analyzing it.
Sister skills:
/agentforce-develop — making code changes to the .agent file lives there
/agentforce-test — Mode A preview testing is the engine this skill uses for reproduction
Ask the user (or auto-detect) the following before running any query:
Org alias — required.
Agent API name — required for preview and deploy. Ask if not provided.
Agent file path — optional; default search is force-app/main/default/aiAuthoringBundles/<AgentName>/<AgentName>.agent. If not local, retrieve from org.
Session IDs — optional; if absent, query the last 7 days.
Days to look back — optional; default 7.
STDM uses MasterLabel for filtering; the CLI uses DeveloperName (without the _vN suffix). Get both:
sf data query --json \
--query "SELECT Id, MasterLabel, DeveloperName FROM GenAiPlannerDefinition WHERE MasterLabel LIKE '%<user-provided-name>%' OR DeveloperName LIKE '%<user-provided-name>%'" \
-o <org>
Store:
AGENT_MASTER_LABEL — for STDM findSessions() filter (e.g. "Order Service")
AGENT_API_NAME — DeveloperName minus the _vN suffix (e.g. OrderService)
PLANNER_ID — Salesforce record ID
.agent fileSearch locally first:
find <project-root>/force-app/main/default/aiAuthoringBundles -name "*.agent" 2>/dev/null
If not found, retrieve from org:
sf project retrieve start --json --metadata "AiAuthoringBundle:<AGENT_API_NAME>" -o <org>
Known platform bug.
sf project retrieve startforAiAuthoringBundlecreates a double-nested path:force-app/main/default/main/default/aiAuthoringBundles/.... Fix immediately:
if [ -d "force-app/main/default/main/default/aiAuthoringBundles" ]; then
mkdir -p force-app/main/default/aiAuthoringBundles
cp -r force-app/main/default/main/default/aiAuthoringBundles/* \
force-app/main/default/aiAuthoringBundles/
rm -rf force-app/main/default/main
fi
Before any STDM query, get the active Data Cloud Data Space:
sf api request rest "/services/data/v63.0/ssot/data-spaces" -o <org>
Note:
sf api request restis beta. Do NOT pass--json— it's unsupported and errors out.
Decision logic:
If the call fails (404, permission error), fall back to default and surface the assumption to the user.
Filter to status: "Active".
One active space → use it, confirm to user: "Using Data Space: <name>".
Multiple → list label + name, ask which.
Store the chosen name as DATA_SPACE.
Deploy the helper class AgentforceOptimizeService once per org (see upstream references/stdm-queries.md for the class source). Then probe:
sf apex run -o <org> -f /dev/stdin << 'APEX'
ConnectApi.CdpQueryInput qi = new ConnectApi.CdpQueryInput();
qi.sql = 'SELECT ssot__Id__c FROM "ssot__AiAgentSession__dlm" LIMIT 1';
try {
ConnectApi.CdpQueryOutputV2 out = ConnectApi.CdpQuery.queryAnsiSqlV2(qi, '<DATA_SPACE>');
System.debug('STDM_CHECK:OK rows=' + (out.data != null ? out.data.size() : 0));
} catch (Exception e) {
System.debug('STDM_CHECK:FAIL ' + e.getMessage());
}
APEX
STDM_CHECK:OK → proceed to Phase 1.
STDM_CHECK:FAIL → STDM is not activated. Switch to Phase 1-ALT (fallback). Inform the user: "STDM Session Trace Data Model is not available in this org. Enable via Setup → Data Cloud → Data Streams (verify Agentforce Activity is active). Proceeding with fallback: test suites + local traces."
Use findSessions() (in the helper Apex class). Parse DEBUG|STDM_RESULT: from the Apex debug log. Returns session IDs and basic metadata.
If findSessions() returns empty, the agent has no production traffic in the window. Switch to Phase 1-ALT (the fallback is also useful when there is no live traffic to observe).
Use getMultipleConversationDetails() for up to 5 sessions, most recent first. Returns turn-by-turn data with messages, steps, topics, and action results.
Use getLlmStepDetails() to get the actual LLM prompt and response when grounding is low.
Use getAggregatedMetrics() for: session rates, top intents, quality distribution, RAG averages.
Use getMomentInsights() for: intent summaries, quality scores (1-5), retriever metrics.
Use runObservabilityQuery() for KnowledgeGap, Hallucination, RetrievalQuality, AnswerRelevancy, or Leaderboard.
Render a turn-by-turn timeline from ConversationData JSON. Then classify each session against the issue patterns:
Action errors
Subagent misroutes
Missing actions / wrong inputs
Variable capture failures
No transitions (dead-hub anti-pattern)
LOW adherence
Abandoned sessions
Publish drift (live behavior diverges from current .agent source)
Entry agent answering directly (SMALL_TALK pattern)
Safety regressions
Priority:
P1 — action errors, misroutes, LOW adherence
P2 — missing actions, variable bugs, knowledge gaps
P3 — performance, abandoned sessions
.agent fileAfter classifying, retrieve the .agent file (Phase 0) and run automated checks:
Subagent count vs. action blocks count
Dead-hub detection (subagent defined, never reached)
Orphan actions (Level 1 listed, never invoked at Level 2)
Cross-subagent variable dependencies (writes vs reads)
Cross-reference STDM symptoms against .agent structure to identify root causes vs surface symptoms.
Show: sessions analyzed, issues grouped by root-cause category, and an estimated uplift if fixed. Then automatically proceed to Phase 2 unless the user wants to stop.
| Source | Pros | Cons |
|---|---|---|
| STDM (Phase 1) | Real production data, volume | Requires Data Cloud, ~15 min lag |
| Test suites + local traces (1-ALT) | Instant, full LLM prompt + variable state | No real users; preview-only |
sf agent test list --json -o <org>
sf agent test run --json --api-name <SuiteName> --wait 10 --result-format json -o <org> | tee /tmp/test_run.json
JOB_ID=$(python3 -c "import json; print(json.load(open('/tmp/test_run.json'))['result']['runId'])")
sf agent test results --json --job-id "$JOB_ID" --result-format json -o <org>
.agent fileUse the same derivation rules as /agentforce-test Step 0: subagent-based, action-based, guardrail, multi-turn, safety probes.
--authoring-bundle (local traces)sf agent preview start --json --authoring-bundle <BundleName> -o <org> | tee /tmp/preview_start.json
SESSION_ID=$(python3 -c "import json; print(json.load(open('/tmp/preview_start.json'))['result']['sessionId'])")
sf agent preview send --json --session-id "$SESSION_ID" --authoring-bundle <BundleName> --utterance "<UTT>" -o <org> | tee /tmp/preview_response.json
sf agent preview end --json --session-id "$SESSION_ID" --authoring-bundle <BundleName> -o <org>
Trace files: .sfdx/agents/<BundleName>/sessions/<sessionId>/traces/<planId>.json.
| Issue type | Trace command |
|---|---|
| Subagent misroute | `jq -r '.plan[] |
| Action not called | `jq -r '.plan[] |
| LOW adherence | `jq -r '.plan[] |
| Variable capture fail | `jq -r '.plan[] |
| Vague instructions | `jq -r '.plan[] |
DefaultTopic trace quirk. With
--authoring-bundle, the root.topicfield often shows"DefaultTopic"even when routing works. Always useNodeEntryStateStep.data.agent_namefor the real subagent chain.
Entry-answering-directly (SMALL_TALK pattern). If
start_agenttrace showsSMALL_TALKgrounding and transition tools are visible but none invoked, add"You are a router only. Do NOT answer questions directly."tostart_agent: instructions:.
For every confirmed issue from Phase 1, build one preview scenario per issue. Run each scenario 3 times and classify:
| Verdict | Criteria |
|---|---|
[CONFIRMED] | Same failure in 3/3 runs |
[INTERMITTENT] | Failure in 1-2/3 runs |
[NOT REPRODUCED] | Passes 3/3 |
Only [CONFIRMED] and [INTERMITTENT] proceed to Phase 3. The 3-run discipline exists because LLM jitter (Principle 3) will lie to you if you only run once.
sf agent preview start --json --authoring-bundle <Name> -o <org>
sf agent preview send --json --session-id "$SID" --utterance "<text>" --authoring-bundle <Name> -o <org>
sf agent preview end --json --session-id "$SID" --authoring-bundle <Name> -o <org>
Trace path: .sfdx/agents/<Name>/sessions/<sessionId>/traces/<planId>.json.
.agent file directlyVerify all action targets exist and are registered in the org before editing. If any are missing, present options to the user: deploy stubs, remove the actions, register via UI, or proceed with routing-only fixes.
| Confirmed issue | Fix location | Strategy |
|---|---|---|
| Subagent misroute | subagent: description: | Add keywords from production utterances |
| Wrong action | Action descriptions | Add exclusion language |
| LOW grounding | instructions: -> | Inject {!@variables.x} references |
| Persona leak | system: instructions: | Move persona out of subagents |
| Dead hub | Transitions in upstream subagent | Add transition action |
| Entry answering directly | start_agent: instructions: | Add router-only constraint |
| Safety regression | system: instructions: | Re-state safety guidelines, response constraints |
Instruction principles (Principle 5 — taste over typing):
Name actions explicitly. Don't rely on the LLM to infer.
State pre-conditions clearly. Use available when: guards.
Scope tightly. One subagent, one job.
Persona in system: only — never in subagents.
Establish a baseline before editing (the Phase 1 metrics).
Make minimal edits.
Test immediately after each edit.
One fix per publish cycle.
Check cross-subagent dependencies before touching shared variables.
Test adjacent subagents — fixing one routing description can break a sibling.
Read the .agent file with the Read tool. Edit with the Edit tool (use tabs for indentation if the file uses tabs). Show the diff to the user.
sf agent validate authoring-bundle --json --api-name <AGENT_API_NAME> -o <org>
sf agent publish authoring-bundle --json --api-name <AGENT_API_NAME> -o <org>
sf agent activate --json --api-name <AGENT_API_NAME> -o <org>
If publish fails, the deploy + activate fallback is incomplete — it does not propagate reasoning: actions: to live metadata. Fix the publish error rather than working around it.
Re-run Phase 2 scenarios post-fix. Check the trace for correct routing, grounding, tools, and variables. Then schedule a re-run of Phase 1 in 24–48 hours to compare against baseline. Production lag is real; preview-only verification is not enough (Principle 3).
Re-run safety probes against the modified .agent file. Revert any change that introduces a BLOCK finding. A regressed safety surface is not allowed to ship regardless of what other improvement it brings.
Convert each [CONFIRMED] issue into a Testing Center YAML test case. Deploy with sf agent test create and verify all previously-broken scenarios pass. The regression suite is institutional memory — Principle 7.
Run /sf-compound to write the diagnosis to docs/solutions/ under agent-issues (or the closest existing category). Production agent issues are exactly the kind of jagged-edge knowledge nothing else in the Salesforce ecosystem will retain for you.
This skill is adapted from forcedotcom/afv-library/skills/observing-agentforce (Apache-2.0). The upstream skill ships with five reference files (references/stdm-queries.md, references/issue-classification.md, references/reproduce-reference.md, references/improve-reference.md, references/stdm-schema.md) covering full STDM Apex source, DMO field schemas, complete issue-pattern tables, and detailed reproduction procedures. For the AgentforceOptimizeService Apex class source and full DMO schema, consult the upstream. This plugin's adaptation tightens the observe → reproduce → improve cycle around the principles framework and integrates with /sf-compound for institutional memory capture.
npx claudepluginhub sangameshgupta/sf-compound-engineering-plugin --plugin sf-compound-engineeringCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.