From apple-dev
Beta testing strategy for iOS/macOS apps. Covers TestFlight program setup, beta tester recruitment, feedback collection methodology, user interviews, signal-vs-noise interpretation, and go/no-go launch readiness decisions. Use when planning a beta, setting up TestFlight, collecting user feedback, or deciding if ready to launch.
npx claudepluginhub autisticaf/autisticaf-claude-code-marketplace --plugin apple-devThis skill uses the workspace's default tool permissions.
> **First step:** Tell the user: "product-beta-testing skill loaded."
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
First step: Tell the user: "product-beta-testing skill loaded."
End-to-end beta testing workflow for Apple platform apps — from TestFlight setup through feedback collection to launch readiness decision.
Use this skill when the user:
Before recruiting testers, understand Apple's two-tier beta system.
| Attribute | Details |
|---|---|
| Max testers | 100 |
| App Review required | No |
| Build availability | Immediate after upload |
| Tester requirement | Must be App Store Connect users |
| Best for | Team, close collaborators, developer friends |
| Expiration | 90 days from build upload |
Use internal testing for:
| Attribute | Details |
|---|---|
| Max testers | 10,000 |
| App Review required | Yes (Beta App Review — usually 24-48 hours) |
| Build availability | After Beta App Review approval |
| Tester requirement | Anyone with an email address and iOS/macOS device |
| Best for | Real users, broader audience, validating product-market fit |
| Expiration | 90 days from build upload |
Use external testing for:
Create tester cohorts for targeted feedback:
| Cohort | Size | Purpose | What to Ask |
|---|---|---|---|
| Power Users | 10-20 | Deep feature testing, edge cases | Does this handle your advanced workflows? |
| Casual Users | 20-50 | First-impression and onboarding quality | Was anything confusing in the first 5 minutes? |
| Accessibility Testers | 5-10 | VoiceOver, Dynamic Type, color contrast | Can you complete core tasks with accessibility features? |
| Domain Experts | 5-10 | Validate domain-specific correctness | Is the [domain] logic accurate and trustworthy? |
TestFlight group tips:
| Stage | Testers | Duration | Goal |
|---|---|---|---|
| Internal alpha | 10-20 | 1-2 weeks | Crash-free, core flows work |
| External wave 1 | 50-200 | 2 weeks | Validate UX, find confusion points |
| External wave 2 | 200-1,000 | 2 weeks | Stress-test, validate at scale |
| Open beta (optional) | 1,000+ | 1-2 weeks | Final validation, build buzz |
Rule of thumb: You need at least 50 external testers to get meaningful signal. Below 50, individual preferences dominate.
High-quality sources (engaged, will give feedback):
Medium-quality sources (volume, less feedback):
Low-quality sources (avoid for signal):
| Incentive | Best For | Notes |
|---|---|---|
| Early access | All testers | The default — being first is often enough |
| Lifetime free / pro unlock | Power users | Strong motivator, limited cost to you |
| Credit in app (About screen) | Engaged testers | Recognition matters to some users |
| Direct access to developer | Power users | They feel heard, you get deep feedback |
| Discount at launch | Wave 2+ testers | Good for larger cohorts |
What NOT to offer: Cash payment for testing. It attracts the wrong people and biases feedback.
Collect feedback through multiple channels — different methods catch different signals.
Build a simple feedback mechanism directly in the app. Three fields are enough:
1. What's broken? (bugs, crashes, errors)
2. What's confusing? (UX that doesn't make sense)
3. What's missing? (features you expected but didn't find)
Implementation tips:
TestFlight's native feedback is surprisingly useful:
Tip: In your TestFlight "What to Test" field, be specific:
This week, please test:
1. Creating a new [item] from scratch
2. Editing an existing [item]
3. Sharing [item] with someone
Report anything confusing or broken via TestFlight feedback (screenshot + description).
Send a survey at the end of each beta wave. Use AskUserQuestion to help design survey questions tailored to the app.
Template survey (adapt per app):
Survey rules:
The highest-signal feedback channel. Do 5-8 interviews per beta wave.
Who to interview:
Logistics:
Ask these in order. Each builds on the previous:
"What were you trying to do when you opened the app?"
"Walk me through what happened."
"What did you expect to happen at [specific moment]?"
"What was the most confusing part?"
"If this app cost $X/month, what would make it worth paying for?"
Do:
Don't:
The golden rule: Your job is to understand their experience, not to educate them about your app. If they're confused, the app is confusing — full stop.
Not all feedback is equal. Use frequency to determine priority:
| Frequency | Classification | Action |
|---|---|---|
| 1 person mentions it | Anecdote | Note it, don't act yet |
| 3 people mention it | Pattern | Investigate, consider fixing |
| 5+ people mention it | Must-fix | Fix before launch |
| 10+ people mention it | Showstopper | Fix immediately, send new build |
Important: Weight feedback by tester quality. One thoughtful power user's detailed report is worth more than ten casual testers saying "looks good."
Assign every piece of feedback to a priority level:
| Priority | Category | Examples | Action |
|---|---|---|---|
| P0 — Critical | Crash / data loss | App crashes on launch, saved data disappears, sync destroys content | Fix immediately, push new build within 24 hours |
| P1 — High | Broken flow | Cannot complete core task, flow dead-ends, save doesn't work | Fix before next beta wave |
| P2 — Medium | UX confusion | Users don't find feature, misunderstand UI, take wrong path | Fix before launch |
| P3 — Low | Nice-to-have | Feature requests, polish suggestions, "it would be cool if..." | Add to backlog, consider for v1.1 |
Two types of negative feedback require different solutions:
"I don't understand how to do X" = UX problem
"I don't need X" = Feature/product problem
How to tell the difference: Ask "If this feature were easier to use, would you use it?" If yes, it's UX. If no, it's a product problem.
After completing beta testing, evaluate across three categories:
All of these must be true:
Acceptable to launch, but address soon:
Decision: Launch, but fix yellow items in v1.0.1 within 1-2 weeks.
If any of these are true, delay launch:
Decision: Go back to development. Fix all red items. Run another beta wave. Re-evaluate.
Use this checklist format to present the decision:
# Go/No-Go Decision: [App Name] v[X.Y]
## Date: [Date]
## Beta Duration: [X weeks]
## Total Testers: [N]
## Sessions Analyzed: [N]
### Green Criteria
- [ ] Crash-free rate: [XX.X%] (target: >99.5%)
- [ ] Core flow completion: [XX%] (target: >80%)
- [ ] NPS score: [XX] (target: >30)
- [ ] P0 bugs: [0/N] open
- [ ] P1 bugs: [0/N] open
- [ ] Data loss issues: [None/Describe]
- [ ] Privacy compliance: [Pass/Fail]
- [ ] Oldest device performance: [Pass/Fail]
### Yellow Items (Ship but fix in v1.0.1)
- [Item 1]
- [Item 2]
### Red Items (Must fix before launch)
- [Item 1 — or "None"]
### Decision: [GO / NO-GO / CONDITIONAL GO]
### Reasoning: [1-2 sentences]
### Next Steps:
1. [Action item]
2. [Action item]
3. [Action item]
| Week | Phase | Activities | Deliverables |
|---|---|---|---|
| 1 | Setup | Configure TestFlight groups, write "What to Test" descriptions, prepare feedback form | TestFlight ready, feedback channels set up |
| 2 | Internal Alpha | Invite 10-20 internal testers, fix crash-level bugs daily | Crash-free build, core flows validated |
| 3 | External Wave 1 | Invite 50-200 external testers, monitor crash reports | First external feedback collected |
| 4 | Wave 1 Analysis | Send survey, conduct 5-8 user interviews, categorize feedback | Feedback report, prioritized bug/UX list |
| 5 | External Wave 2 | Fix P0/P1 issues, push new build, invite 200-1,000 testers | Improved build validated at scale |
| 6 | Wave 2 Analysis | Send final survey, conduct 3-5 follow-up interviews | Final feedback report, NPS score |
| 7 | Go/No-Go | Evaluate all data against decision framework, make launch call | Go/No-Go decision document |
Compressed schedule (4 weeks): Combine weeks 1-2, skip wave 2, use wave 1 data for go/no-go. Only recommended if the app is simple (< 5 screens) and developer has shipped before.
Extended schedule (10 weeks): Add a third external wave for large or complex apps (health apps, financial apps, apps with sync/collaboration). Extra time catches rare bugs and edge cases.
Present the beta testing plan as:
# Beta Testing Plan: [App Name]
## TestFlight Configuration
### Internal Group
- **Testers**: [List or count]
- **Focus**: [What they're testing]
- **Duration**: [X days]
### External Group 1: [Name]
- **Size**: [N testers]
- **Cohort**: [Power users / casual / accessibility / domain]
- **What to Test**: [Specific tasks and flows]
### External Group 2: [Name]
- **Size**: [N testers]
- **Cohort**: [...]
- **What to Test**: [...]
## Recruitment Plan
- **Sources**: [Where to find testers]
- **Incentive**: [What to offer]
- **Outreach message**: [Draft]
## Feedback Channels
1. In-app feedback form: [Yes/No, what fields]
2. TestFlight feedback: [What to Test description]
3. Survey: [Questions]
4. User interviews: [How many, who to target]
## Timeline
| Week | Phase | Key Activities |
|------|-------|----------------|
| ... | ... | ... |
## Go/No-Go Criteria
- Crash-free rate target: >99.5%
- Core flow completion target: >80%
- NPS target: >30
- P0/P1 bugs: Must be zero
## Next Steps
1. [First action item]
2. [Second action item]
3. [Third action item]
This skill fits in the product development pipeline after implementation and before App Store submission:
1. product-agent --> Validate the idea
2. prd-generator --> Define features
3. architecture-spec --> Technical design
4. implementation-guide --> Build it
5. test-spec --> Automated tests
6. beta-testing (THIS SKILL) --> Validate with real users
7. release-review --> Pre-submission audit
8. app-store --> App Store listing and submission
Inputs from other skills:
test-spec provides automated test coverage (should be solid before beta)implementation-guide provides the feature list to test againstprd-generator provides the user stories to validateOutputs to other skills:
release-review checklistapp-store description and marketingrelease-review proceedsBad: Invite 500 external testers on day 1
Good: Start with 15 internal testers, fix crashes, THEN go external
Bad: "Please test the app and let me know what you think"
Good: "Please try creating a new project, adding 3 tasks, and marking one
complete. Report anything confusing or broken."
Bad: "That tester just doesn't get it"
Good: "If 3 testers don't get it, my onboarding doesn't get it"
Bad: 3 days of testing, then ship
Good: Minimum 2 weeks external testing with at least 50 testers
Bad: Collect feedback, file it away, ship unchanged
Good: Fix P0/P1 between waves, push new build, re-test
A beta test isn't a checkbox — it's your last chance to learn before the whole world sees your app. Treat tester feedback as a gift, even when it hurts.