Internal guidance for composing Gemini 2.5 Pro/Flash prompts for coding, review, diagnosis, and research tasks inside the Gemini Claude Code plugin
From gemininpx claudepluginhub abiswas97/gemini-plugin-cc --plugin geminiThis skill uses the workspace's default tool permissions.
references/gemini-prompt-antipatterns.mdreferences/gemini-prompt-recipes.mdGuides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Optimizes cloud costs on AWS, Azure, GCP via rightsizing, tagging strategies, reserved instances, spot usage, and spending analysis. Use for expense reduction and governance.
Reference document for writing effective prompts when delegating to Gemini 2.5 Pro or Flash. Covers model selection, thinking mode, context window use, structured output, tool use, and antipatterns.
Choose based on task complexity and cost tolerance:
| Model | Use when | Context | Thinking |
|---|---|---|---|
gemini-2.5-pro | Complex architecture, cross-file refactors, SWE-bench-style tasks, novel algorithm design | 1M tokens | Always on (128–32,768 tokens, default dynamic) |
gemini-2.5-flash | Production tasks with good cost/quality balance: reviews, summaries, data extraction, chat | 1M tokens | Dynamic by default (0–24,576 tokens, can disable) |
gemini-2.5-flash-lite | High-volume, low-cost: classification, routing, simple translation, triage | 1M tokens | Off by default (512–24,576 tokens) |
Decision rule: Default to Flash for most coding assistance. Switch to Pro only when Flash produces shallow or incorrect reasoning on complex multi-step problems. Use Flash-Lite only when cost is the primary constraint and quality requirements are low.
SWE-bench data point: Gemini 2.5 Pro scores ~63.8% on SWE-bench Verified with a custom agent setup — comparable to frontier models for real-world GitHub issue resolution.
Gemini 2.5 models have an internal reasoning phase ("thinking") before responding. You control how many tokens it can spend reasoning.
# Python SDK
from google import genai
from google.genai import types
client = genai.Client()
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Refactor this authentication module...",
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(
thinking_budget=8192 # or -1 for dynamic, or 0 to disable
)
)
)
| Model | Min | Max | Default |
|---|---|---|---|
| gemini-2.5-pro | 128 | 32,768 | Dynamic (cannot disable) |
| gemini-2.5-flash | 0 | 24,576 | Dynamic (-1) |
| gemini-2.5-flash-lite | 512 | 24,576 | 0 (disabled) |
Disable (budget=0) or minimal:
Medium (512–4096 tokens):
High (8192–32,768 tokens) or dynamic (-1):
Cost warning: High thinking budgets cost significantly more. One practitioner reported auto-mode costing ~37x more than flash-only mode for identical workloads ($6k vs $163/month at scale). Match thinking budget to actual task complexity.
[Role/persona — optional but effective]
[Context: what exists, what matters]
[Task: specific, scoped, explicit]
[Constraints: what NOT to do, style rules]
[Output format: how to structure the response]
Gemini responds well to role framing. Set it in the system prompt or at the start:
You are a senior Go engineer specializing in performance-critical systems.
You follow idiomatic Go, prefer composition over inheritance, and always
handle errors explicitly rather than panicking.
Break large tasks into sequential prompts. Do not cram multi-phase work into one prompt.
Bad:
Refactor the authentication module, add rate limiting, write tests,
and update the API documentation.
Good (4 separate prompts):
Gemini's most common failure mode: "I Know Better" syndrome — it fixes unrelated code you didn't ask it to touch. Always state what NOT to do:
Bad:
Refactor this function to be more efficient.
Good:
Refactor this function for efficiency.
DO NOT change the function signature.
DO NOT modify the input validation logic.
DO NOT add or remove comments.
DO NOT change error return types.
For complex features, establish the design before writing code:
Before writing any code, describe the architecture for adding webhook
support to this service. Cover: data model changes, handler design,
retry strategy, and failure modes. I'll confirm the approach before
you implement.
For anything generating hundreds of lines, do it in stages:
Generate the HTML structure for the dashboard component first.
Stop after the HTML. I'll review it before we proceed to CSS and JS.
Aggressively filter before passing a repo:
| Exclude | Reason |
|---|---|
*.csv, *.json data files | No semantic value for code tasks |
*.svg, *.png, static assets | Not code |
*_test.py, test_*.py | Unless understanding tests is the goal |
*.lock, *.sum files | Noise |
node_modules/, .venv/, dist/ | Never include |
| Comments + whitespace | Compress with tools like yek or repomix |
Filtering workflow for large repos:
repomix --output-show-line-numbersPractical reality: 1M tokens = ~30,000 lines of code. Most real projects that need full context fit within this after filtering. For repos that genuinely exceed 1M tokens even after filtering, use RAG or chunk by subdirectory.
Always put your question/task AFTER the context, not before.
[All code/documents here]
---
Given the above codebase, identify all places where database connections
are not properly closed on error paths.
Google's official guidance: "the model's performance will be better if you put your query at the end of the prompt."
Gemini handles single-query retrieval well (up to 99% accuracy) but performance degrades when searching for multiple independent facts simultaneously. For multi-part questions, ask them sequentially rather than in one prompt.
For repeated queries over the same large context (e.g., a full codebase you're asking multiple questions about), use context caching to avoid re-sending the same tokens:
# Cache the codebase context, then query multiple times
cached_content = client.caches.create(
model="gemini-2.5-pro",
contents=[large_codebase_content],
ttl="3600s"
)
# Subsequent queries reuse the cache, costing much less
Despite the 1M token window, reliability drops after ~200K tokens in practice. If you notice Gemini forgetting earlier instructions, reinserting bugs it already fixed, or contradicting itself: start a fresh session. Use "checkpoint messages" to hand off state between sessions.
Review this diff for: (1) correctness issues, (2) error handling gaps,
(3) performance concerns. Focus on the changed lines only.
Flag issues by severity: CRITICAL / WARNING / SUGGESTION.
Do not suggest style changes.
[diff here]
This test is failing with the following output. Identify the root cause.
Do not propose a fix yet — explain the cause first.
Test: [test name]
Error: [full stack trace]
Relevant code: [paste the relevant functions]
I'm sharing my project's codebase. Analyze its structure and identify:
1. Architectural issues (coupling, violation of single responsibility)
2. Missing error handling
3. Inconsistencies in naming or patterns
Organize findings by severity. Do not suggest new features.
[codebase here]
Refactor the `processPayment` function in payments/processor.go.
Goal: reduce cyclomatic complexity from ~18 to below 10.
Constraints:
- Do not change the function signature
- Do not change behavior — existing tests must still pass
- Do not modify other functions
- Preserve all existing error types
Show the refactored function only, not the entire file.
Migrate this Python 2 module to Python 3.10+.
Rules:
- Use f-strings, not .format()
- Replace print statements with logging module
- Replace unicode() with str()
- Do not change the public API
- Add type hints to all function signatures
[module code]
"Please respond in JSON format" is fragile. Use response_schema instead:
import typing_extensions as typing
class CodeIssue(typing.TypedDict):
file: str
line: int
severity: typing.Literal["critical", "warning", "suggestion"]
description: str
suggestion: str
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=review_prompt,
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=list[CodeIssue],
),
)
description fields on every property — they directly improve extraction accuracyrequired when they must always be presentenum for fixed value sets (severity levels, categories)When there are tool calls in the message history, structured output fails for Gemini 2.5 models (works in 2.0). If you need both tool use and structured output in the same session, collect tool results first, then make a final structured-output request with the results injected as context.
text/x.enumresponse = client.models.generate_content(
model="gemini-2.5-flash",
contents=f"Classify this bug: {bug_description}",
config=types.GenerateContentConfig(
response_mime_type="text/x.enum",
response_schema={"enum": ["null_pointer", "race_condition", "off_by_one", "type_error", "other"]},
),
)
tool_config = types.ToolConfig(
function_calling_config=types.FunctionCallingConfig(
mode="AUTO" # Model decides — best default for coding agents
# mode="ANY" # Always calls a function — use for strict pipelines
# mode="NONE" # No function calls — use for pure text generation
)
)
The quality of your function description directly determines call accuracy:
Bad:
{"name": "read_file", "description": "Read a file"}
Good:
{
"name": "read_file",
"description": "Read the full contents of a file at the given path. "
"Use this when you need to examine existing code before "
"making changes. Returns the file contents as a string.",
"parameters": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute or relative path to the file, e.g. 'src/auth/handler.go'"
}
},
"required": ["path"]
}
}
finishReason in the response to detect failed tool call attemptsthought_signature fields intactGemini can call multiple independent functions in a single turn. Design your tool set to enable this: independent operations (read file A, read file B) should be separate tools, not one combined tool.
| Task type | Temperature |
|---|---|
| Code generation / editing | 0.0–0.2 |
| Code review / analysis | 0.2–0.4 |
| Technical documentation | 0.4–0.7 |
| Default / general purpose | 1.0 |
| Creative / exploratory design | 1.5–2.0 |
Gemini 3 exception: Google strongly recommends keeping temperature at the default 1.0 for Gemini 3 models. Setting it below 1.0 may cause looping or degraded performance, especially with function calling.
Symptom: Gemini rewrites things you didn't ask it to change, adds unsolicited comments, "improves" unrelated code.
Fix: Be hyper-explicit about scope. State what to do AND what not to do.
Symptom: One objective is addressed well; others are shallow or ignored.
Fix: One prompt, one task. Chain prompts sequentially.
Symptom: Asked for JSON, got prose. Asked for a list, got paragraphs.
Fix: Specify format explicitly in the prompt. For machine-readable output, use the API's response_schema.
Symptom: Slow, expensive responses; model focuses on irrelevant files; hits context degradation.
Fix: Filter to relevant files before passing context. Exclude data files, lock files, assets, test directories if not relevant.
Symptom: Instructions from early in a long session are forgotten; fixed bugs reappear.
Fix: Start fresh sessions. Use checkpoint messages. For long tasks, pass only what's needed per turn.
Symptom: "Respond in JSON" works until it doesn't — random markdown fences, extra text, malformed output.
Fix: Use response_mime_type="application/json" + response_schema via the API.
Symptom: API returns errors on complex schemas.
Cause: Very long property names, large enums, deeply nested objects, many optional fields.
Fix: Simplify schema. Split into multiple smaller schemas if needed.
Symptom: Latency and cost spike without meaningful quality improvement on simple tasks.
Fix: Use thinking_budget=0 for simple queries, -1 (dynamic) for general use, high values only for genuinely complex tasks.
Symptom: Unpredictable behavior, partial compliance.
Fix: Review the prompt for contradictions before sending. Place critical instructions at the beginning (system prompt) and reinforce at the end for long prompts.
Symptom: Gemini gives you wrong code and you don't know why.
Fix: Ask for reasoning before implementation: "Before writing code, explain your approach and what assumptions you're making." Review the thought process in AI Studio when unexpected results occur.
| Dimension | Gemini 2.5 | Claude |
|---|---|---|
| Context volume | Handles massive contexts (1M tokens); put query at the end | Also strong at long context; query placement less critical |
| Constraint following | Requires explicit "DO NOT" statements; tends to over-help | Follows complex constraint lists more reliably |
| System prompt depth | Benefits from role + scope + format specified upfront | Handles 2000-word system prompts with 15+ constraints reliably |
| JSON output | Native schema support preferred over prompt instructions | Both work; Claude more reliable from prompt alone |
| Temperature for function calling | 0.0–0.2 for 2.5; 1.0 for Gemini 3 | 0.0–0.3 generally safe |
| Thinking/reasoning | Explicit budget control (0 to 32K tokens) | Extended thinking with budget_tokens parameter |
| Codebase tasks | Excellent with full repo in context, but filter aggressively | Strong at cross-file reasoning; explicit about which files to read |
| Unsolicited changes | Common antipattern — needs explicit scope constraints | Less prone to modifying unrequested code |
Where Gemini excels: massive codebase analysis, multimodal (code + screenshots), research-style tasks requiring broad synthesis.
Where Claude excels: following long complex constraint lists, consistent behavior across long conversations, precise surgical edits.
Review this code for correctness, security, and performance issues.
Severity levels: CRITICAL | WARNING | SUGGESTION.
Do not comment on style or formatting.
[code]
Fix the bug described below.
DO NOT change anything outside the broken function.
DO NOT refactor or clean up unrelated code.
Bug: [description]
Error: [error message / stack trace]
[relevant code]
You are a [language] engineer following [project conventions].
Implement [feature] in [file/module].
Requirements:
- [requirement 1]
- [requirement 2]
Constraints:
- DO NOT modify the existing public API
- Follow the patterns used in [reference file]
- Add error handling consistent with the existing code
Return only the changed file(s), not a full explanation.
Analyze the architecture of this codebase. Identify:
1. Tight coupling between components
2. Missing abstractions
3. Violation of single responsibility
4. Any patterns that will make the system hard to test or extend
Do not suggest new features. Focus on structural issues only.
[codebase]