Skill

agentforce-observe

Analyze production Agentforce agent behavior using STDM session traces in Data Cloud, plus a fallback path using sf agent test + sf agent preview --authoring-bundle when STDM is unavailable. Use when investigating production failures, regressions, or performance regressions; querying ssot__AiAgentSession__dlm; reproducing reported issues in preview; or improving the .agent file based on production evidence. Trigger phrases: 'why is my agent failing in production', 'analyze production sessions', 'investigate this Agentforce regression', 'what happened in this session', 'reproduce this production issue', 'find sessions where the agent misrouted'. Do NOT trigger for development-time iteration — use /agentforce-develop or /agentforce-test.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/sf-compound-engineering:agentforce-observe [org alias; optional --agent-file <path> --session-id <id> --days <n>]

User invocable

Model invocable

Inline context

Default effort

Argument hint[org alias; optional --agent-file <path> --session-id <id> --days <n>]

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

> **Principles enforced:** 7 (outsource thinking, not understanding), 3 (jagged intelligence in production), 5 (taste / drift detection), 1 (preserve the quality ceiling). See `PRINCIPLES.md`.

SKILL.md

368 lines · ~9.5k tokens(exceeds 5k compaction limit)

Stats

LanguageTypeScript

Stars1

MaintenanceExcellent

Last CommitApr 30, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

/agentforce-observe

Principles enforced: 7 (outsource thinking, not understanding), 3 (jagged intelligence in production), 5 (taste / drift detection), 1 (preserve the quality ceiling). See PRINCIPLES.md.

Copy-paste-to-agent

Improve a deployed Agentforce agent using session-trace evidence. Three phases: (1) Observe
— query STDM session traces from Data Cloud (or fall back to sf agent test + sf agent preview
--authoring-bundle when STDM is unavailable); (2) Reproduce — re-run problematic conversations
in sf agent preview, classify CONFIRMED / INTERMITTENT / NOT REPRODUCED across 3 runs; (3)
Improve — edit the .agent file with targeted fixes, validate, publish, activate, then verify
in preview and post 24-48h re-run Phase 1 against baseline. Always pass --json on every sf
CLI command. Always re-run safety probes after any fix.

When to use this skill

This skill is for production-grade observation — the agent has shipped, real users are hitting it, and you need evidence to drive improvement. The institutional-memory principle (7) lives here: STDM is the wiki for production agent behavior, and docs/solutions/ is the wiki for what we learned from analyzing it.

Sister skills:

/agentforce-develop — making code changes to the .agent file lives there
/agentforce-test — Mode A preview testing is the engine this skill uses for reproduction

Inputs to gather before starting

Ask the user (or auto-detect) the following before running any query:

Org alias — required.
Agent API name — required for preview and deploy. Ask if not provided.
Agent file path — optional; default search is force-app/main/default/aiAuthoringBundles/<AgentName>/<AgentName>.agent. If not local, retrieve from org.
Session IDs — optional; if absent, query the last 7 days.
Days to look back — optional; default 7.

Resolve the agent name (mandatory before STDM queries)

STDM uses MasterLabel for filtering; the CLI uses DeveloperName (without the _vN suffix). Get both:

sf data query --json \
  --query "SELECT Id, MasterLabel, DeveloperName FROM GenAiPlannerDefinition WHERE MasterLabel LIKE '%<user-provided-name>%' OR DeveloperName LIKE '%<user-provided-name>%'" \
  -o <org>

Store:

AGENT_MASTER_LABEL — for STDM findSessions() filter (e.g. "Order Service")
AGENT_API_NAME — DeveloperName minus the _vN suffix (e.g. OrderService)
PLANNER_ID — Salesforce record ID

Locate the `.agent` file

Search locally first:

find <project-root>/force-app/main/default/aiAuthoringBundles -name "*.agent" 2>/dev/null

If not found, retrieve from org:

sf project retrieve start --json --metadata "AiAuthoringBundle:<AGENT_API_NAME>" -o <org>

Known platform bug. sf project retrieve start for AiAuthoringBundle creates a double-nested path: force-app/main/default/main/default/aiAuthoringBundles/.... Fix immediately:

if [ -d "force-app/main/default/main/default/aiAuthoringBundles" ]; then
  mkdir -p force-app/main/default/aiAuthoringBundles
  cp -r force-app/main/default/main/default/aiAuthoringBundles/* \
        force-app/main/default/aiAuthoringBundles/
  rm -rf force-app/main/default/main
fi

Phase 0: Discover the Data Space

Before any STDM query, get the active Data Cloud Data Space:

sf api request rest "/services/data/v63.0/ssot/data-spaces" -o <org>

Note: sf api request rest is beta. Do NOT pass --json — it's unsupported and errors out.

Decision logic:

If the call fails (404, permission error), fall back to default and surface the assumption to the user.
Filter to status: "Active".
One active space → use it, confirm to user: "Using Data Space: <name>".
Multiple → list label + name, ask which.

Store the chosen name as DATA_SPACE.

Probe STDM availability

Deploy the helper class AgentforceOptimizeService once per org (see upstream references/stdm-queries.md for the class source). Then probe:

sf apex run -o <org> -f /dev/stdin << 'APEX'
ConnectApi.CdpQueryInput qi = new ConnectApi.CdpQueryInput();
qi.sql = 'SELECT ssot__Id__c FROM "ssot__AiAgentSession__dlm" LIMIT 1';
try {
    ConnectApi.CdpQueryOutputV2 out = ConnectApi.CdpQuery.queryAnsiSqlV2(qi, '<DATA_SPACE>');
    System.debug('STDM_CHECK:OK rows=' + (out.data != null ? out.data.size() : 0));
} catch (Exception e) {
    System.debug('STDM_CHECK:FAIL ' + e.getMessage());
}
APEX

STDM_CHECK:OK → proceed to Phase 1.
STDM_CHECK:FAIL → STDM is not activated. Switch to Phase 1-ALT (fallback). Inform the user: "STDM Session Trace Data Model is not available in this org. Enable via Setup → Data Cloud → Data Streams (verify Agentforce Activity is active). Proceeding with fallback: test suites + local traces."

Phase 1: Observe — query STDM (preferred path)

1.1 Find sessions

Use findSessions() (in the helper Apex class). Parse DEBUG|STDM_RESULT: from the Apex debug log. Returns session IDs and basic metadata.

If findSessions() returns empty, the agent has no production traffic in the window. Switch to Phase 1-ALT (the fallback is also useful when there is no live traffic to observe).

1.2 Get conversation details

Use getMultipleConversationDetails() for up to 5 sessions, most recent first. Returns turn-by-turn data with messages, steps, topics, and action results.

1.2b LLM prompt + response (for LOW adherence cases)

Use getLlmStepDetails() to get the actual LLM prompt and response when grounding is low.

1.2c Aggregated metrics (start here for a health dashboard)

Use getAggregatedMetrics() for: session rates, top intents, quality distribution, RAG averages.

1.2d Moment insights (per-session)

Use getMomentInsights() for: intent summaries, quality scores (1-5), retriever metrics.

1.2e Targeted observability queries (RAG)

Use runObservabilityQuery() for KnowledgeGap, Hallucination, RetrievalQuality, AnswerRelevancy, or Leaderboard.

1.3 Reconstruct and classify

Render a turn-by-turn timeline from ConversationData JSON. Then classify each session against the issue patterns:

Action errors
Subagent misroutes
Missing actions / wrong inputs
Variable capture failures
No transitions (dead-hub anti-pattern)
LOW adherence
Abandoned sessions
Publish drift (live behavior diverges from current .agent source)
Entry agent answering directly (SMALL_TALK pattern)
Safety regressions

Priority:

P1 — action errors, misroutes, LOW adherence
P2 — missing actions, variable bugs, knowledge gaps
P3 — performance, abandoned sessions

1.4 Cross-reference against the `.agent` file

After classifying, retrieve the .agent file (Phase 0) and run automated checks:

Subagent count vs. action blocks count
Dead-hub detection (subagent defined, never reached)
Orphan actions (Level 1 listed, never invoked at Level 2)
Cross-subagent variable dependencies (writes vs reads)

Cross-reference STDM symptoms against .agent structure to identify root causes vs surface symptoms.

1.5 Present findings

Show: sessions analyzed, issues grouped by root-cause category, and an estimated uplift if fixed. Then automatically proceed to Phase 2 unless the user wants to stop.

Phase 1-ALT: Fallback when STDM is unavailable

Source	Pros	Cons
STDM (Phase 1)	Real production data, volume	Requires Data Cloud, ~15 min lag
Test suites + local traces (1-ALT)	Instant, full LLM prompt + variable state	No real users; preview-only

1-ALT.1 Run an existing test suite, if any

sf agent test list --json -o <org>
sf agent test run --json --api-name <SuiteName> --wait 10 --result-format json -o <org> | tee /tmp/test_run.json
JOB_ID=$(python3 -c "import json; print(json.load(open('/tmp/test_run.json'))['result']['runId'])")
sf agent test results --json --job-id "$JOB_ID" --result-format json -o <org>

1-ALT.2 Derive utterances from the `.agent` file

Use the same derivation rules as /agentforce-test Step 0: subagent-based, action-based, guardrail, multi-turn, safety probes.

1-ALT.3 Preview with `--authoring-bundle` (local traces)

sf agent preview start --json --authoring-bundle <BundleName> -o <org> | tee /tmp/preview_start.json
SESSION_ID=$(python3 -c "import json; print(json.load(open('/tmp/preview_start.json'))['result']['sessionId'])")

sf agent preview send --json --session-id "$SESSION_ID" --authoring-bundle <BundleName> --utterance "<UTT>" -o <org> | tee /tmp/preview_response.json

sf agent preview end --json --session-id "$SESSION_ID" --authoring-bundle <BundleName> -o <org>

Trace files: .sfdx/agents/<BundleName>/sessions/<sessionId>/traces/<planId>.json.

1-ALT.4 Local trace diagnosis

Issue type	Trace command
Subagent misroute	`jq -r '.plan[]
Action not called	`jq -r '.plan[]
LOW adherence	`jq -r '.plan[]
Variable capture fail	`jq -r '.plan[]
Vague instructions	`jq -r '.plan[]

DefaultTopic trace quirk. With --authoring-bundle, the root .topic field often shows "DefaultTopic" even when routing works. Always use NodeEntryStateStep.data.agent_name for the real subagent chain.

Entry-answering-directly (SMALL_TALK pattern). If start_agent trace shows SMALL_TALK grounding and transition tools are visible but none invoked, add "You are a router only. Do NOT answer questions directly." to start_agent: instructions:.

Phase 2: Reproduce — live preview, 3-run classification

For every confirmed issue from Phase 1, build one preview scenario per issue. Run each scenario 3 times and classify:

Verdict	Criteria
`[CONFIRMED]`	Same failure in 3/3 runs
`[INTERMITTENT]`	Failure in 1-2/3 runs
`[NOT REPRODUCED]`	Passes 3/3

Only [CONFIRMED] and [INTERMITTENT] proceed to Phase 3. The 3-run discipline exists because LLM jitter (Principle 3) will lie to you if you only run once.

sf agent preview start --json --authoring-bundle <Name> -o <org>
sf agent preview send --json --session-id "$SID" --utterance "<text>" --authoring-bundle <Name> -o <org>
sf agent preview end --json --session-id "$SID" --authoring-bundle <Name> -o <org>

Trace path: .sfdx/agents/<Name>/sessions/<sessionId>/traces/<planId>.json.

Phase 3: Improve — edit the `.agent` file directly

3.0 Pre-flight

Verify all action targets exist and are registered in the org before editing. If any are missing, present options to the user: deploy stubs, remove the actions, register via UI, or proceed with routing-only fixes.

3.1–3.3 Map each issue to a fix location

Confirmed issue	Fix location	Strategy
Subagent misroute	`subagent: description:`	Add keywords from production utterances
Wrong action	Action descriptions	Add exclusion language
LOW grounding	`instructions: ->`	Inject `{!@variables.x}` references
Persona leak	`system: instructions:`	Move persona out of subagents
Dead hub	Transitions in upstream subagent	Add transition action
Entry answering directly	`start_agent: instructions:`	Add router-only constraint
Safety regression	`system: instructions:`	Re-state safety guidelines, response constraints

Instruction principles (Principle 5 — taste over typing):

Name actions explicitly. Don't rely on the LLM to infer.
State pre-conditions clearly. Use available when: guards.
Scope tightly. One subagent, one job.
Persona in system: only — never in subagents.

3.4 Regression prevention

Establish a baseline before editing (the Phase 1 metrics).
Make minimal edits.
Test immediately after each edit.
One fix per publish cycle.
Check cross-subagent dependencies before touching shared variables.
Test adjacent subagents — fixing one routing description can break a sibling.

3.5 Apply

Read the .agent file with the Read tool. Edit with the Edit tool (use tabs for indentation if the file uses tabs). Show the diff to the user.

3.6 Validate, deploy, publish, activate

sf agent validate authoring-bundle --json --api-name <AGENT_API_NAME> -o <org>
sf agent publish authoring-bundle --json --api-name <AGENT_API_NAME> -o <org>
sf agent activate --json --api-name <AGENT_API_NAME> -o <org>

If publish fails, the deploy + activate fallback is incomplete — it does not propagate reasoning: actions: to live metadata. Fix the publish error rather than working around it.

3.7 Verify

Re-run Phase 2 scenarios post-fix. Check the trace for correct routing, grounding, tools, and variables. Then schedule a re-run of Phase 1 in 24–48 hours to compare against baseline. Production lag is real; preview-only verification is not enough (Principle 3).

3.7b Safety re-verification (mandatory, Principle 1)

Re-run safety probes against the modified .agent file. Revert any change that introduces a BLOCK finding. A regressed safety surface is not allowed to ship regardless of what other improvement it brings.

3.8 Capture regression tests

Convert each [CONFIRMED] issue into a Testing Center YAML test case. Deploy with sf agent test create and verify all previously-broken scenarios pass. The regression suite is institutional memory — Principle 7.

Capture learnings

Run /sf-compound to write the diagnosis to docs/solutions/ under agent-issues (or the closest existing category). Production agent issues are exactly the kind of jagged-edge knowledge nothing else in the Salesforce ecosystem will retain for you.

Inspiration

This skill is adapted from forcedotcom/afv-library/skills/observing-agentforce (Apache-2.0). The upstream skill ships with five reference files (references/stdm-queries.md, references/issue-classification.md, references/reproduce-reference.md, references/improve-reference.md, references/stdm-schema.md) covering full STDM Apex source, DMO field schemas, complete issue-pattern tables, and detailed reproduction procedures. For the AgentforceOptimizeService Apex class source and full DMO schema, consult the upstream. This plugin's adaptation tightens the observe → reproduce → improve cycle around the principles framework and integrates with /sf-compound for institutional memory capture.

agentforce-observe

Popularity

Invocation

Context Preview

SKILL.md

agentforce-observe

Popularity

Invocation

Context Preview

SKILL.md

/agentforce-observe

Copy-paste-to-agent

When to use this skill

Inputs to gather before starting

Resolve the agent name (mandatory before STDM queries)

Locate the .agent file

Phase 0: Discover the Data Space

Probe STDM availability

Phase 1: Observe — query STDM (preferred path)

1.1 Find sessions

1.2 Get conversation details

1.2b LLM prompt + response (for LOW adherence cases)

1.2c Aggregated metrics (start here for a health dashboard)

1.2d Moment insights (per-session)

1.2e Targeted observability queries (RAG)

1.3 Reconstruct and classify

1.4 Cross-reference against the .agent file

1.5 Present findings

Phase 1-ALT: Fallback when STDM is unavailable

1-ALT.1 Run an existing test suite, if any

1-ALT.2 Derive utterances from the .agent file

1-ALT.3 Preview with --authoring-bundle (local traces)

1-ALT.4 Local trace diagnosis

Phase 2: Reproduce — live preview, 3-run classification

Phase 3: Improve — edit the .agent file directly

3.0 Pre-flight

3.1–3.3 Map each issue to a fix location

3.4 Regression prevention

3.5 Apply

3.6 Validate, deploy, publish, activate

3.7 Verify

3.7b Safety re-verification (mandatory, Principle 1)

3.8 Capture regression tests

Capture learnings

Inspiration

Similar Skills

/agentforce-observe

Copy-paste-to-agent

When to use this skill

Inputs to gather before starting

Resolve the agent name (mandatory before STDM queries)

Locate the .agent file

Phase 0: Discover the Data Space

Probe STDM availability

Phase 1: Observe — query STDM (preferred path)

1.1 Find sessions

1.2 Get conversation details

1.2b LLM prompt + response (for LOW adherence cases)

1.2c Aggregated metrics (start here for a health dashboard)

1.2d Moment insights (per-session)

1.2e Targeted observability queries (RAG)

1.3 Reconstruct and classify

1.4 Cross-reference against the .agent file

1.5 Present findings

Phase 1-ALT: Fallback when STDM is unavailable

1-ALT.1 Run an existing test suite, if any

1-ALT.2 Derive utterances from the .agent file

1-ALT.3 Preview with --authoring-bundle (local traces)

1-ALT.4 Local trace diagnosis

Phase 2: Reproduce — live preview, 3-run classification

Phase 3: Improve — edit the .agent file directly

3.0 Pre-flight

3.1–3.3 Map each issue to a fix location

3.4 Regression prevention

3.5 Apply

3.6 Validate, deploy, publish, activate

3.7 Verify

3.7b Safety re-verification (mandatory, Principle 1)

3.8 Capture regression tests

Capture learnings

Locate the `.agent` file

1.4 Cross-reference against the `.agent` file

1-ALT.2 Derive utterances from the `.agent` file

1-ALT.3 Preview with `--authoring-bundle` (local traces)

Phase 3: Improve — edit the `.agent` file directly

Locate the `.agent` file

1.4 Cross-reference against the `.agent` file

1-ALT.2 Derive utterances from the `.agent` file

1-ALT.3 Preview with `--authoring-bundle` (local traces)

Phase 3: Improve — edit the `.agent` file directly