Discover valuable GitHub fork divergence beyond stars. TRIGGERS - fork analysis, fork intelligence, find forks, valuable forks, fork divergence, fork discovery, upstream forks.
From gh-toolsnpx claudepluginhub terrylica/cc-skills --plugin gh-toolsThis skill is limited to using the following tools:
references/domain-patterns.mdreferences/empirical-data.mdreferences/evolution-log.mdreferences/signal-priority.mdExecutes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.
Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Systematic methodology for discovering valuable work in GitHub fork ecosystems. Stars-only filtering misses 60-100% of substantive forks — this skill uses branch-level divergence analysis, upstream PR cross-referencing, and domain-specific heuristics to find what matters.
Validated empirically across 10 repositories spanning Python, Rust, TypeScript, C++/Python, and Node.js (tensortrade, backtesting.py, kokoro, pymoo, firecrawl, barter-rs, pueue, dukascopy-node, ArcticDB, flowsurface).
Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.
MANDATORY: Select and load the appropriate template before any fork analysis.
1. Get upstream baseline (stars, forks, default branch, last push)
2. List all forks with pagination, note timestamp clusters
3. Filter to unique-timestamp forks (skip bulk mirrors)
4. Check default branch divergence (ahead_by/behind_by)
5. Check non-default branches for all forks with recent push or >1 branch
6. Evaluate commit content, author emails, tags/releases
7. Cross-reference upstream PR history from fork owners
8. Tier ranking and cross-fork convergence analysis
9. Produce report with actionable recommendations
1. Get upstream baseline
2. List forks, filter by timestamp clustering
3. Check default branch divergence only
4. Report forks with ahead_by > 0
1. Compare fork vs upstream on all branches
2. Examine commit messages and changed files
3. Check for tags/releases, open issues, PRs
4. Assess cherry-pick viability
Ranked by empirical reliability across 10 repositories. See signal-priority.md for details.
| Rank | Signal | Reliability | What It Catches |
|---|---|---|---|
| 1 | Branch-level divergence | Highest | Work on feature branches (50%+ of substantive forks) |
| 2 | Upstream PR cross-reference | High | Rebased/force-pushed work invisible to compare API |
| 3 | Tags/releases on fork | High | Independent maintenance intent |
| 4 | Commit email domains | High | Institutional contributors (@company.com) |
| 5 | Timestamp clustering | Medium | Eliminates 85%+ mirror noise |
| 6 | Cross-fork convergence | Medium | Reveals unmet upstream demand |
| 7 | Stars | Lowest | Often anti-correlated with actual value |
UPSTREAM="OWNER/REPO"
gh api "repos/$UPSTREAM" --jq '{forks_count, pushed_at, default_branch, stargazers_count}'
# List all forks with activity signals
gh api "repos/$UPSTREAM/forks" --paginate \
--jq '.[] | {full_name, pushed_at, stargazers_count, default_branch}'
Timestamp clustering: Forks sharing exact pushed_at with upstream are bulk mirrors created by GitHub's fork mechanism and never touched. Group by pushed_at — forks with unique timestamps warrant investigation. This alone eliminates 85%+ of noise.
# Filter to unique-timestamp forks (skip bulk mirrors)
gh api "repos/$UPSTREAM/forks" --paginate \
--jq '.[] | {full_name, pushed_at, stargazers_count}' | \
jq -s 'group_by(.pushed_at) | map(select(length == 1)) | flatten'
BRANCH=$(gh api "repos/$UPSTREAM" --jq '.default_branch')
# For each candidate fork
gh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:$BRANCH" \
--jq '{ahead_by, behind_by, status}'
The status field meanings:
identical — pure mirror, skipbehind — stale mirror, skipdiverged — has original commits AND is behind (interesting)ahead — has original commits, up-to-date with upstream (rare, most valuable)Important: Always compare from the upstream repo's perspective (repos/UPSTREAM/compare/...). The reverse direction (repos/FORK/compare/...) returns 404 for some repositories.
This is the single biggest methodology improvement. Across all 10 repos tested, 50%+ of the most valuable fork work lived exclusively on feature branches.
Examples:
shader-heatmapconda_build, clang, apple_changescesc/duration# List branches on a fork
gh api "repos/FORK_OWNER/REPO/branches" --jq '.[].name' | head -20
# Check divergence on a specific branch
gh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:FEATURE_BRANCH" \
--jq '{ahead_by, behind_by, status}'
Heuristics for which forks need branch checks:
pushed_at more recent than upstream but ahead_by == 0 on default branchgh api "repos/$UPSTREAM/compare/$BRANCH...FORK_OWNER:BRANCH" \
--jq '.commits[] | {sha: .sha[:8], message: .commit.message | split("\n")[0], date: .commit.committer.date[:10], author: .commit.author.email}'
What to look for:
@man.com, @quantstack.net)CMakeLists.txt, Cargo.toml, pyproject.toml) indicate platform enablement# Tags/releases (strongest independent maintenance signal)
gh api "repos/FORK_OWNER/REPO/tags" --jq '.[].name' | head -10
gh api "repos/FORK_OWNER/REPO/releases" --jq '.[] | {tag_name, name, published_at}' | head -5
# Open issues on the fork (signals independent project maintenance)
gh api "repos/FORK_OWNER/REPO/issues?state=open" --jq 'length'
# Check if repo was renamed (strong divergence intent signal)
gh api "repos/FORK_OWNER/REPO" --jq '.name'
| Signal | Strength | Example |
|---|---|---|
| Tags/releases on fork | Highest | pueue/freesrz93 had 6 releases |
| Open PRs against upstream | High | Formal proposals with review context |
| Open issues on the fork | High | Independent project maintenance |
| Repo renamed | Medium | flowsurface/sinaha81 became volume_flow |
| Build config changes | High (compiled languages) | Cargo.toml, CMakeLists.txt diff |
| Description changed | Weak | Many vanity renames with no code |
# Check upstream PRs from fork owners
gh api "repos/$UPSTREAM/pulls?state=all" --paginate \
--jq '.[] | select(.head.repo.fork) | {number, title, state, user: .user.login}'
Cross-fork convergence: When multiple forks independently solve the same problem, it signals unmet upstream demand:
Upstream PR cross-reference catches:
After running the pipeline, classify forks into tiers:
| Tier | Criteria | Action |
|---|---|---|
| Tier 1: Major Extensions | New features, architectural changes, >10 original commits | Deep evaluation, cherry-pick candidates |
| Tier 2: Targeted Features | Focused additions, bug fixes, 2-10 commits | Cherry-pick individual commits |
| Tier 3: Infrastructure | CI/CD, packaging, deployment, docs | Evaluate if relevant to your setup |
| Tier 4: Historical | Merged upstream or stale but once significant | Note for context, no action needed |
Different codebases exhibit different fork behaviors. See domain-patterns.md for full details.
| Domain | Key Pattern | Example |
|---|---|---|
| Scientific/ML | Researchers fork-implement-publish-vanish, zero social engagement | pymoo: 300-file fork with 0 stars |
| Trading/Finance | Exchange connectors dominate; best forks are private | barter-rs: 4 independent Bybit impls |
| Infrastructure/DevTools | Self-hosting/SaaS-removal is the dominant theme | firecrawl: devflowinc/firecrawl-simple (630 stars) |
| C++/Python Mixed | Feature work lives on branches; email domains reveal institutions | ArcticDB: @man.com, @quantstack.net |
| Node.js Libraries | Check npm publication as separate packages | dukascopy-node: kyo06 published dukascopy-node-plus |
| Rust CLI | Cargo.toml diff is reliable quick filter; "superset" forks add subcommands | pueue: freesrz93 added 7 subcommands |
For rapid triage of any new repo:
UPSTREAM="OWNER/REPO"
BRANCH=$(gh api "repos/$UPSTREAM" --jq '.default_branch')
# 1. Baseline
gh api "repos/$UPSTREAM" --jq '{forks_count, pushed_at, stargazers_count}'
# 2. Forks with unique timestamps (skip mirrors)
gh api "repos/$UPSTREAM/forks" --paginate \
--jq '.[] | {full_name, pushed_at, stargazers_count}' | \
jq -s 'group_by(.pushed_at) | map(select(length == 1)) | flatten | sort_by(.pushed_at) | reverse'
# 3. Check ahead_by for each candidate
# (loop over candidates from step 2)
# 4. Check upstream PRs from fork authors
gh api "repos/$UPSTREAM/pulls?state=all" --paginate \
--jq '.[] | select(.head.repo.fork) | {number, title, state, user: .user.login}'
| Limitation | Impact | Workaround |
|---|---|---|
| GitHub compare API 250-commit limit | Highly divergent forks may truncate | Use gh api repos/FORK/commits?per_page=1 to get total count |
| Private forks invisible | Trading firms keep best work private | Accepted limitation |
| Force-pushed branches break compare API | Shows 0 ahead despite significant work | Cross-reference upstream PR history |
| Renamed forks may break API calls | Old URLs may 404 | Use gh api repos/FORK_OWNER/REPO --jq '.name' to detect renames |
| Rate limiting on large fork ecosystems | >1000 forks = many API calls | Use timestamp clustering to reduce calls by 85%+ |
| Maintainer dev forks look like independent work | Branch names 1:1 with upstream PRs | Cross-reference branch names against upstream PR branch names |
Use this structure for the final analysis report:
# Fork Analysis Report: OWNER/REPO
**Repository**: OWNER/REPO (N stars, M forks)
**Analysis date**: YYYY-MM-DD
## Fork Landscape Summary
| Metric | Value |
| ------------------------------------- | ------ |
| Total forks | N |
| Pure mirrors | N (X%) |
| Divergent forks (ahead on any branch) | N |
| Substantive forks (meaningful work) | N |
| Stars-only miss rate | X% |
## Tiered Ranking
### Tier 1: Major Extensions
(fork details with ahead_by, key features, files changed)
### Tier 2: Targeted Features
...
### Tier 3: Infrastructure/Packaging
...
## Cross-Fork Convergence Patterns
(themes that multiple forks independently implemented)
## Actionable Recommendations
- Cherry-pick candidates
- Feature inspiration
- Security fixes
After modifying THIS skill:
./references/ links resolveAfter this skill completes, reflect before closing the task:
Do NOT defer. The next invocation inherits whatever you leave behind.