From amplitude
Compares behaviors of two user groups using Amplitude session replays and metrics to produce behavioral diffs for cohort analysis like converters vs drop-offs or power users vs casual.
npx claudepluginhub amplitude/mcp-marketplace --plugin amplitudeThis skill uses the workspace's default tool permissions.
Investigate what distinguishes two user groups by pulling session replays and metrics for each, then producing a behavioral diff. This skill answers "what do winners do differently?" with concrete evidence from real sessions.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Builds scalable data pipelines, modern data warehouses, and real-time streaming architectures using Spark, dbt, Airflow, Kafka, and cloud platforms like Snowflake, BigQuery.
Builds production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. For data pipelines, workflow orchestration, and batch job scheduling.
Investigate what distinguishes two user groups by pulling session replays and metrics for each, then producing a behavioral diff. This skill answers "what do winners do differently?" with concrete evidence from real sessions.
Primary tools:
Amplitude:get_session_replays — Find sessions for each user group using event count filters and user property filters. Run this twice — once per group.Amplitude:get_session_replay_events — Decode replays into interaction timelines. Run this for sessions from both groups.Supporting tools:
Amplitude:search — Search for existing charts, funnels, dashboards, events, and cohorts relevant to the comparison. Always check for existing analysis before building from scratch.Amplitude:get_cohorts — Discover existing cohorts that define the groups (e.g., "power users", "churned", "converted").Amplitude:get_events — Discover valid event names. Never guess event names.Amplitude:get_event_properties — Discover properties available for segmentation.Amplitude:query_chart / Amplitude:query_charts — Pull quantitative metrics segmented by the two groups for statistical grounding.Amplitude:get_feedback_insights / Amplitude:get_feedback_mentions — Check if feedback themes differ between groups.Parse the user's request to identify Group A and Group B. Common comparison patterns:
| Comparison | Group A | Group B |
|---|---|---|
| Conversion | Users who completed the goal | Users who didn't |
| Engagement | Power users / high-frequency | Casual / low-frequency |
| Retention | Retained users | Churned users |
| Plan tier | Enterprise / paid | Free / trial |
| Outcome | Successful (e.g., activated) | Failed (e.g., dropped off) |
| A/B test | Variant A | Variant B |
For each group, determine:
If the user's request is ambiguous (e.g., "compare power users"), ask: "What defines a power user for your product — is there an existing cohort, or should I use a frequency/engagement threshold?"
Amplitude:get_context. If multiple projects, ask which to compare within.Amplitude:get_cohorts to check if existing cohorts match the requested groups. Existing cohorts encode institutional knowledge — prefer them over ad-hoc definitions.Amplitude:get_events to discover events relevant to the comparison (the goal event for conversion comparisons, engagement events for activity comparisons, etc.).Amplitude:get_event_properties for the key events to discover available segmentation properties.Before watching replays, establish the statistical picture. Budget: 3-4 chart queries.
First, search for existing analysis. Call Amplitude:search with keywords related to the comparison (e.g., "onboarding funnel", "agent creation", "checkout conversion"). Existing charts and funnels encode institutional knowledge and save query budget. If you find a relevant funnel or segmented chart, use Amplitude:query_chart on it instead of building from scratch.
Use Amplitude:query_chart or Amplitude:query_charts to compare the two groups on:
What to look for:
Record the top 3-5 quantitative differences — these become hypotheses to validate or enrich with replays.
Run Amplitude:get_session_replays twice — once for each group. Request limit: 8 for each.
Exclude internal users by default. Unless you're specifically studying internal behavior, add a user property filter to exclude your company's email domain (e.g., gp:email does not contain "amplitude.com"). This prevents internal test sessions from polluting the comparison.
Building group filters:
If groups are defined by a cohort, filter on the cohort's defining user property or event.
If groups are defined by an event outcome (e.g., completed checkout vs. didn't):
eventCountFilters with operator: "greater or equal", count: "1").get_session_replays doesn't support "did not do" filters directly, filter for the entry event only, then when watching replays in Step 5, check the interaction timeline for absence of the conversion event. Discard sessions where the user actually converted and replace with additional sessions until you have 4-6 confirmed drop-offs.If groups are defined by a user property (plan, segment):
event_type: "_all".For each group, call Amplitude:get_session_replay_events for 4-6 sessions. Use event_limit: 300.
Budget: 8-12 total sessions (4-6 per group).
Rate limiting and large responses:
get_session_replay_events in batches of 3-4 at a time, not all at once. The Session Replay API enforces concurrency limits and will return 429 errors if overwhelmed. If you hit a 429, wait briefly and retry.total_raw_events, consider using a lower event_limit (e.g., 150) to stay within response size limits.While analyzing, track these behavioral dimensions for each session:
| Dimension | What to capture |
|---|---|
| Navigation path | Sequence of pages visited. Note the order and any pages unique to this session. |
| Feature engagement | Which features/areas the user interacted with. Note depth (quick glance vs. extended use). |
| Pace & hesitation | Fast and confident vs. slow with pauses. Note long gaps between actions. |
| Exploration vs. focus | Did the user go straight to their goal or browse around? |
| Friction encountered | Errors, rage clicks, back-navigation, abandoned inputs. |
| Session outcome | Did they accomplish something? What was the last action before leaving? |
For each session, write a 2-3 sentence behavioral summary capturing the overall pattern.
This is the core analytical step. Compare the two groups across the dimensions above.
Tabulate patterns. For each behavioral dimension, summarize what Group A typically does vs. Group B.
Identify differentiating actions. Find behaviors that are:
Rank by explanatory power. Which differences most plausibly explain the outcome gap? Lead with the most compelling.
Generate hypotheses. For each major difference, propose a causal hypothesis:
Cross-reference with feedback (optional but valuable). Call Amplitude:get_feedback_insights with terms related to the differentiating actions. If Group B complains about something Group A doesn't, that strengthens the signal.
Structure the output as a comparative report a PM or growth lead can act on.
Required sections:
Comparison Summary (3-4 sentences): Who was compared, the sample size, the single most striking behavioral difference, and the top hypothesis. Written as a narrative you could paste into a strategy doc.
Groups Defined:
Quantitative Context — Key metric differences between groups (from Step 3). Present as a concise table:
| Metric | [Group A Label] | [Group B Label] | Delta |
|--------|-----------------|-----------------|-------|
| [Metric 1] | [value] | [value] | [diff] |
| [Metric 2] | [value] | [value] | [diff] |
For each differentiating behavior:
### [Behavior Title — ≤10 words]
**Signal strength:** [Strong/Moderate/Emerging] | **Seen in:** X of Y [Group A] sessions vs. X of Y [Group B] sessions
**[Group A Label]:** What this group does — specific actions, pages, patterns observed.
**[Group B Label]:** What this group does instead — or doesn't do.
**Hypothesis:** Why this difference matters and what it suggests about the outcome gap.
**Evidence:** Replay links from both groups showing this contrast.
[Group A]: Landing → Templates → Create Project → Invite Team → Active Use
[Group B]: Landing → Create Project → Settings → ... → Churn
Actionable Hypotheses (3-5 items): Concrete experiments or changes suggested by the behavioral differences. Each should follow the format: "If we [change], then [Group B behavior] should shift toward [Group A behavior], because [evidence]."
Confidence & Caveats: Note sample size limitations, whether patterns were consistent across sessions, and what additional data would increase confidence.
Follow-on prompt: Suggest next steps — "Want me to audit the [specific flow] where the groups diverge most, build a chart tracking [differentiating action], or investigate a specific user from either group?"
get_events and propose 2-3 candidate events. Let the user confirm.User says: "What do users who complete onboarding do differently from those who drop off?"
Actions:
User says: "How do our most active users use the product differently?"
Actions:
User says: "Why are some users churning after the first week?"
Actions: