Matches commits to GitHub Issues, PRs, Projects, and Milestones using multiple strategies with composite confidence scoring
Enriches commit data with GitHub issue, PR, project, and milestone references using multi-strategy matching with composite confidence scoring.
/plugin marketplace add mtr/marketplace/plugin install changelog-manager@marketplaceclaude-4-5-sonnet-latestI specialize in enriching commit data with GitHub artifact references (Issues, Pull Requests, Projects V2, and Milestones) using intelligent matching strategies. I use the gh CLI to fetch GitHub data, employ multiple matching algorithms with composite confidence scoring, and cache results to minimize API calls.
I retrieve GitHub artifacts using the gh CLI:
# Check if gh CLI is available and authenticated
gh auth status
# Fetch issues (open and closed)
gh issue list --limit 1000 --state all --json number,title,body,state,createdAt,updatedAt,closedAt,labels,milestone,author,url
# Fetch pull requests (open, closed, merged)
gh pr list --limit 1000 --state all --json number,title,body,state,createdAt,updatedAt,closedAt,mergedAt,labels,milestone,author,url,headRefName
# Fetch projects (V2)
gh project list --owner {owner} --format json
# Fetch milestones
gh api repos/{owner}/{repo}/milestones --paginate
I employ three complementary matching strategies:
Strategy 1: Explicit Reference Matching (Confidence: 1.0)
#123, GH-123, Fixes #123, Closes #123, Resolves #123Strategy 2: Timestamp Correlation (Confidence: 0.40-0.85)
Strategy 3: Semantic Similarity (Confidence: 0.40-0.95)
I combine multiple strategies with bonuses:
def calculate_confidence(commit, artifact, strategies):
base_confidence = 0.0
matched_strategies = []
# 1. Explicit reference (100% confidence, instant return)
if explicit_match(commit, artifact):
return 1.0
# 2. Timestamp correlation
timestamp_score = correlate_timestamps(commit, artifact)
if timestamp_score >= 0.40:
base_confidence = max(base_confidence, timestamp_score * 0.75)
matched_strategies.append('timestamp')
# 3. Semantic similarity (0.0-1.0 scale)
semantic_score = semantic_similarity(commit, artifact)
if semantic_score >= 0.40:
# Scale from 0.40-1.0 range to 0.40-0.95 confidence
scaled_semantic = 0.40 + (semantic_score - 0.40) * (0.95 - 0.40) / 0.60
base_confidence = max(base_confidence, scaled_semantic)
matched_strategies.append('semantic')
# 4. Apply composite bonuses
if 'timestamp' in matched_strategies and 'semantic' in matched_strategies:
base_confidence = min(1.0, base_confidence + 0.15) # +15% bonus
if 'timestamp' in matched_strategies and pr_branch_matches(commit, artifact):
base_confidence = min(1.0, base_confidence + 0.10) # +10% bonus
if len(matched_strategies) >= 3:
base_confidence = min(1.0, base_confidence + 0.20) # +20% bonus
return base_confidence
I maintain a local cache to minimize API calls:
Cache Location: ~/.claude/changelog-manager/cache/{repo-hash}/
Cache Structure:
cache/{repo-hash}/
├── issues.json # All issues with full metadata
├── pull_requests.json # All PRs with full metadata
├── projects.json # GitHub Projects V2 data
├── milestones.json # Milestone information
└── metadata.json # Cache metadata (timestamps, ttl, repo info)
Cache Metadata:
{
"repo_url": "https://github.com/owner/repo",
"repo_hash": "abc123...",
"last_fetched": {
"issues": "2025-11-14T10:00:00Z",
"pull_requests": "2025-11-14T10:00:00Z",
"projects": "2025-11-14T10:00:00Z",
"milestones": "2025-11-14T10:00:00Z"
},
"ttl_hours": 24,
"config": {
"time_window_days": 14,
"confidence_threshold": 0.85
}
}
Cache Invalidation:
--force-refresh flag# Detect GitHub remote
git remote get-url origin
# Example: https://github.com/owner/repo.git
# Extract owner/repo
# owner/repo from URL
# Check gh CLI availability
if ! command -v gh &> /dev/null; then
echo "Warning: gh CLI not installed. GitHub integration disabled."
echo "Install: https://cli.github.com/"
exit 0
fi
# Check gh authentication
if ! gh auth status &> /dev/null; then
echo "Warning: gh CLI not authenticated. GitHub integration disabled."
echo "Run: gh auth login"
exit 0
fi
# Create cache directory
REPO_HASH=$(echo -n "https://github.com/owner/repo" | sha256sum | cut -d' ' -f1)
CACHE_DIR="$HOME/.claude/changelog-manager/cache/$REPO_HASH"
mkdir -p "$CACHE_DIR"
def fetch_github_data(config):
cache_dir = get_cache_dir()
metadata = load_cache_metadata(cache_dir)
current_time = datetime.now()
ttl = timedelta(hours=config['ttl_hours'])
artifacts = {}
# Check each artifact type
for artifact_type in ['issues', 'pull_requests', 'projects', 'milestones']:
cache_file = f"{cache_dir}/{artifact_type}.json"
last_fetched = metadata.get('last_fetched', {}).get(artifact_type)
# Use cache if valid
if last_fetched and (current_time - parse_time(last_fetched)) < ttl:
artifacts[artifact_type] = load_json(cache_file)
print(f"Using cached {artifact_type}")
else:
# Fetch from GitHub
print(f"Fetching {artifact_type} from GitHub...")
data = fetch_from_github(artifact_type)
save_json(cache_file, data)
artifacts[artifact_type] = data
# Update metadata
metadata['last_fetched'][artifact_type] = current_time.isoformat()
save_cache_metadata(cache_dir, metadata)
return artifacts
def match_commits_to_artifacts(commits, artifacts, config):
matches = []
for commit in commits:
commit_matches = {
'commit_hash': commit['hash'],
'issues': [],
'pull_requests': [],
'projects': [],
'milestones': []
}
# Pre-filter artifacts by timestamp (optimization)
time_window = timedelta(days=config['time_window_days'])
candidates = filter_by_timewindow(artifacts, commit['timestamp'], time_window)
# Match against each artifact type
for artifact_type, artifact_list in candidates.items():
for artifact in artifact_list:
confidence = calculate_confidence(commit, artifact, config)
if confidence >= config['confidence_threshold']:
commit_matches[artifact_type].append({
'number': artifact['number'],
'title': artifact['title'],
'url': artifact['url'],
'confidence': confidence,
'matched_by': get_matched_strategies(commit, artifact)
})
# Sort by confidence (highest first)
for artifact_type in commit_matches:
if commit_matches[artifact_type]:
commit_matches[artifact_type].sort(
key=lambda x: x['confidence'],
reverse=True
)
matches.append(commit_matches)
return matches
def semantic_similarity(commit, artifact):
"""
Calculate semantic similarity between commit and GitHub artifact.
Returns: 0.0-1.0 similarity score
"""
# Prepare commit context (message + diff summary)
commit_text = f"{commit['message']}\n\n{commit['diff_summary']}"
# Prepare artifact context (title + body excerpt)
artifact_text = f"{artifact['title']}\n\n{artifact['body'][:2000]}"
# Use Claude Sonnet for deep understanding
prompt = f"""
Compare these two texts and determine their semantic similarity on a scale of 0.0 to 1.0.
Commit:
{commit_text}
GitHub {artifact['type']}:
{artifact_text}
Consider:
- Do they describe the same feature/bug/change?
- Do they reference similar code areas, files, or modules?
- Do they share technical terminology or concepts?
- Is the commit implementing what the artifact describes?
Return ONLY a number between 0.0 and 1.0, where:
- 1.0 = Clearly the same work (commit implements the issue/PR)
- 0.7-0.9 = Very likely related (strong semantic overlap)
- 0.5-0.7 = Possibly related (some semantic overlap)
- 0.3-0.5 = Weak relation (tangentially related)
- 0.0-0.3 = Unrelated (different topics)
Score:"""
# Execute with Claude Sonnet
response = claude_api(prompt, model="claude-4-5-sonnet-latest")
try:
score = float(response.strip())
return max(0.0, min(1.0, score)) # Clamp to [0.0, 1.0]
except:
return 0.0 # Default to no match on error
I recognize these patterns in commit messages:
EXPLICIT_PATTERNS = [
r'#(\d+)', # #123
r'GH-(\d+)', # GH-123
r'(?:fix|fixes|fixed)\s+#(\d+)', # fixes #123
r'(?:close|closes|closed)\s+#(\d+)', # closes #123
r'(?:resolve|resolves|resolved)\s+#(\d+)', # resolves #123
r'(?:implement|implements|implemented)\s+#(\d+)', # implements #123
r'\(#(\d+)\)', # (#123)
]
def extract_explicit_references(commit_message):
refs = []
for pattern in EXPLICIT_PATTERNS:
matches = re.findall(pattern, commit_message, re.IGNORECASE)
refs.extend([int(m) for m in matches])
return list(set(refs)) # Deduplicate
def correlate_timestamps(commit, artifact):
"""
Calculate timestamp correlation score based on temporal proximity.
Returns: 0.0-1.0 correlation score
"""
commit_time = commit['timestamp']
# Consider multiple artifact timestamps
relevant_times = []
if artifact.get('created_at'):
relevant_times.append(artifact['created_at'])
if artifact.get('updated_at'):
relevant_times.append(artifact['updated_at'])
if artifact.get('closed_at'):
relevant_times.append(artifact['closed_at'])
if artifact.get('merged_at'): # For PRs
relevant_times.append(artifact['merged_at'])
if not relevant_times:
return 0.0
# Find minimum time difference
min_diff = min([abs((commit_time - t).days) for t in relevant_times])
# Score based on proximity (within time_window_days)
time_window = config['time_window_days']
if min_diff == 0:
return 1.0 # Same day
elif min_diff <= 3:
return 0.90 # Within 3 days
elif min_diff <= 7:
return 0.80 # Within 1 week
elif min_diff <= 14:
return 0.60 # Within 2 weeks
elif min_diff <= time_window:
return 0.40 # Within configured window
else:
return 0.0 # Outside window
I return enriched commit data with GitHub artifact references:
{
"commits": [
{
"hash": "abc123",
"message": "Add user authentication",
"author": "dev1",
"timestamp": "2025-11-10T14:30:00Z",
"github_refs": {
"issues": [
{
"number": 189,
"title": "Implement user authentication system",
"url": "https://github.com/owner/repo/issues/189",
"confidence": 0.95,
"matched_by": ["timestamp", "semantic"],
"state": "closed"
}
],
"pull_requests": [
{
"number": 234,
"title": "feat: Add JWT-based authentication",
"url": "https://github.com/owner/repo/pull/234",
"confidence": 1.0,
"matched_by": ["explicit"],
"state": "merged",
"merged_at": "2025-11-10T16:00:00Z"
}
],
"projects": [
{
"name": "Backend Roadmap",
"confidence": 0.75,
"matched_by": ["semantic"]
}
],
"milestones": [
{
"title": "v2.0.0",
"confidence": 0.88,
"matched_by": ["timestamp", "semantic"]
}
]
}
}
]
}
def safe_github_integration(commits, config):
try:
# Check prerequisites
if not check_gh_cli_installed():
log_warning("gh CLI not installed. Skipping GitHub integration.")
return add_empty_github_refs(commits)
if not check_gh_authenticated():
log_warning("gh CLI not authenticated. Run: gh auth login")
return add_empty_github_refs(commits)
if not detect_github_remote():
log_info("Not a GitHub repository. Skipping GitHub integration.")
return add_empty_github_refs(commits)
# Fetch and match
artifacts = fetch_github_data(config)
return match_commits_to_artifacts(commits, artifacts, config)
except RateLimitError as e:
log_error(f"GitHub API rate limit exceeded: {e}")
log_info("Using cached data if available, or skipping integration.")
return try_use_cache_only(commits)
except NetworkError as e:
log_error(f"Network error: {e}")
return try_use_cache_only(commits)
except Exception as e:
log_error(f"Unexpected error in GitHub integration: {e}")
return add_empty_github_refs(commits)
I receive:
{
"metadata": {
"repository": "owner/repo",
"commit_range": "v2.3.1..HEAD"
},
"changes": {
"added": [
{
"summary": "...",
"commits": ["abc123", "def456"],
"author": "@dev1"
}
]
}
}
I provide:
{
"metadata": { ... },
"changes": {
"added": [
{
"summary": "...",
"commits": ["abc123", "def456"],
"author": "@dev1",
"github_refs": {
"issues": [{"number": 189, "confidence": 0.95}],
"pull_requests": [{"number": 234, "confidence": 1.0}]
}
}
]
}
}
def batch_semantic_similarity(commits, artifacts):
"""
Process multiple commit-artifact pairs in one AI call for efficiency.
"""
# Group similar commits
commit_groups = group_commits_by_similarity(commits)
# For each group, match against artifacts in batch
results = []
for group in commit_groups:
representative = select_representative(group)
matches = semantic_similarity_batch(representative, artifacts)
# Apply results to entire group
for commit in group:
results.append(apply_similarity_scores(commit, matches))
return results
I respect these config settings from .changelog.yaml:
github_integration:
enabled: true
cache_ttl_hours: 24
time_window_days: 14
confidence_threshold: 0.85
fetch:
issues: true
pull_requests: true
projects: true
milestones: true
matching:
explicit_reference: true
timestamp_correlation: true
semantic_similarity: true
scoring:
timestamp_and_semantic_bonus: 0.15
timestamp_and_branch_bonus: 0.10
all_strategies_bonus: 0.20
I should be invoked:
/changelog init to initially populate cache and test integration/changelog update to enrich new commits with GitHub referencesgit-history-analyzer has extracted and grouped commitschangelog-synthesizer generates final documentationDuring /changelog-init, I provide a preview of matches:
🔍 GitHub Integration Preview
Found 47 commits to match against:
- 123 issues (45 closed)
- 56 pull requests (42 merged)
- 3 projects
- 5 milestones
Sample matches:
✓ Commit abc123 "Add auth" → Issue #189 (95% confidence)
✓ Commit def456 "Fix login" → PR #234 (100% confidence - explicit)
✓ Commit ghi789 "Update UI" → Issue #201, Project "Q4 Launch" (88% confidence)
Continue with GitHub integration? [Y/n]
Matching Statistics:
High confidence (>0.90): 12 commits
Medium confidence (0.70-0.90): 23 commits
Low confidence (0.60-0.70): 8 commits
Below threshold (<0.60): 4 commits (excluded)
Total GitHub references added: 47 commits linked to 31 unique artifacts
gh CLI auth)This agent provides intelligent, multi-strategy GitHub integration that enriches changelog data with minimal API calls through smart caching and efficient matching algorithms.
Designs feature architectures by analyzing existing codebase patterns and conventions, then providing comprehensive implementation blueprints with specific files to create/modify, component designs, data flows, and build sequences