rag_for_git

🇷🇺 Русская версия: README.ru.md

An agent that automatically reviews pull/merge requests using RAG + a code graph + Claude Code.

What it is

Plain linters catch syntax and style but miss meaning and relationships: a broken function contract, the impact of a change on its callers, a removed guard, a contradiction with an existing test. This agent gives an LLM the same context a human reviewer has — semantic + lexical retrieval over the whole repository, structural code-graph expansion, and an agentic tool loop — then posts the result back to GitHub as inline comments on diff lines plus a summary.

A single PR review runs as three stages:

prepare_review (MCP) → analyze (Claude subagents) → publish_review (MCP)

prepare — GitHubProvider pulls the PR (base/head SHA) and changed files; changed .py files are chunked (tree-sitter) and embedded (Voyage) into an ephemeral overlay ref="pr:N"; policy and per-file review units are assembled.
analyze — the Claude Code skill fans out one subagent per file. Each reasons over the diff in a tool loop, pulling in whatever code it needs: search_code, get_related_symbols, read_file, get_definition, find_callers, get_changed_file_diff.
publish — a deterministic tail: policy gate (category/severity/confidence/paths) → line grounding by exact code quote (anti-hallucination) → dedup → assemble (inline vs summary, suggestion invariants, fingerprint idempotency, comment cap) → post to GitHub → history record → overlay/session cleanup.

Status: working v1. Target analysis language is Python; VCS is GitHub (behind a VCSProvider interface). Proven live: it catches real bugs and sees the impact on calling code and existing tests.

How it works / Architecture

The core is the reviewer/ library, assembled in reviewer/app.py::build_components(settings) from Settings (pydantic-settings, .env). Entry points are reviewer/entrypoints/cli.py (Click) and reviewer/entrypoints/mcp_server.py (FastMCP). Three pieces work together:

RAG (hybrid retrieval). Postgres/ParadeDB stores code chunks with pgvector (HNSW ANN) and pg_search (BM25). A query embeds with Voyage, runs both ANN and BM25 search, and the result lists are merged with Reciprocal Rank Fusion (RRF), then reranked with Voyage rerank-2.5.
Code graph (SCIP or tree-sitter, Neo4j). Symbols and their relationships live in Neo4j. The graph orchestrator (graph/backend.py) picks a backend via GRAPH_BACKEND (auto|scip|treesitter): SCIP (@sourcegraph/scip-python) gives a precise, type-aware graph with CALLS + IMPLEMENTS edges; tree-sitter is a fast fallback with CALLS-by-name only. Retrieval expands the changed symbols 1–2 hops to surface callers/callees/implementations/tests.
Claude Code plugin via MCP. The reviewer-mcp server exposes prepare_review, publish_review, and the agent tools. The Claude Code plugin (plugin/) drives the review: it calls prepare_review, runs analysis subagents against those MCP tools, then calls publish_review.

The single key linking RAG and the graph is node_id = "path#fqn" (e.g. rag/embedder.py#VoyageEmbedder.embed_query). Both the chunk in Postgres and the node in Neo4j use it, so graph expansion and chunk retrieval are stitched together without any mapping table.

Index freshness: a stable base + a PR overlay. A full reindex of a large repo is expensive, so the index keeps a persistent base and layers PR changes on top:

ref="base:<branch>" — the persistent index of a tracked branch (e.g. "base:main", "base:master"). Each tracked branch in REVIEW_BRANCHES has its own isolated index. Updated incrementally by reviewer index --ref <branch> (only changed files are chunked; only chunks with a new content_hash are re-embedded — embeddings are reused across branches by hash, saving Voyage quota).
ref="pr:N" — an ephemeral overlay of just the PR's changed files at its HEAD.
On a query: retrieval = (base:<branch> where path ∉ changed) ∪ overlay. For changed files the agent sees the new version; for everything else, the stable base.
Multi-branch. A PR is reviewed against the index of its target branch (base_ref from the PR). A PR targeting an untracked branch is skipped (prepare_review returns {"status":"skipped",...}). The code graph (Neo4j :Symbol) is also branch-scoped via a branch property, with unique constraint (repo, branch, id).

rag-reviewer

Popularity

Health & Quality

What's Inside

README

rag_for_git

What it is

How it works / Architecture

Confidence

Similar Plugins

claudette

code-review

pr-reviewer

pair-review

greptile

adamsreview

More by mimfort

rag-reviewer

More by mimfort

rag-reviewer

Similar Plugins

claudette

code-review

pr-reviewer

pair-review

greptile

adamsreview

Popularity

Health & Quality