From deep-research
This skill should be used when evaluating source credibility, deciding which search results to trust, choosing between search providers, detecting SEO spam or content farms, selecting domain-specific sources (academic, medical, legal, technical), evaluating software packages or libraries, comparing tools or technologies, assessing GitHub repo health, checking adoption metrics, or when research quality depends on retrieval quality. Covers the source credibility taxonomy (T1-T6 tiers), CRAAP framework adaptation, multi-provider search strategy, artifact evaluation framework (health/adoption/authority signals for packages, repos, APIs, standards, technologies), and source quality anti-patterns.
npx claudepluginhub oborchers/fractional-cto --plugin deep-researchThis skill uses the workspace's default tool permissions.
Source quality is the primary bottleneck in research agent pipelines. Research on deep research agent trajectories found that over 57% of source errors occur in early retrieval stages, where initial fabrication acts as the primary catalyst for cascading downstream errors (arXiv 2601.22984). A single bad source in the first retrieval round contaminates the entire research trajectory.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Source quality is the primary bottleneck in research agent pipelines. Research on deep research agent trajectories found that over 57% of source errors occur in early retrieval stages, where initial fabrication acts as the primary catalyst for cascading downstream errors (arXiv 2601.22984). A single bad source in the first retrieval round contaminates the entire research trajectory.
Every source encountered during research falls into one of six tiers. Always prefer higher-tier sources and cite the tier when reporting findings.
| Tier | Source Type | Examples | Trust Level |
|---|---|---|---|
| T1 — Primary | Peer-reviewed journals, official specs, primary datasets | Nature, Science, IEEE, IETF RFCs, W3C specs | Highest |
| T2 — Institutional | Government agencies, established research institutions | NIH, WHO, NIST, ACM Digital Library | High |
| T3 — Expert | Named expert blogs, conference proceedings, major tech engineering blogs | Anthropic blog, Google Research, NeurIPS/ICML papers | Moderate-High |
| T4 — Quality Editorial | Major publications with editorial review | MIT Technology Review, Ars Technica, The Verge | Moderate |
| T5 — Community | Well-moderated forums, high-reputation answers | Stack Overflow (high-score), GitHub discussions | Low-Moderate |
| T6 — Unverified | Content farms, SEO-optimized articles, anonymous posts, AI-generated content | Medium listicles, affiliate blogs, uncredited tutorials | Do not cite |
Rule: Never cite T6 sources. Prefer T1-T3 for factual claims. Use T4-T5 for context and community consensus only.
Adapted from the CRAAP framework (CSU Chico), five dimensions for evaluating sources:
| Dimension | What to Check | Red Flags |
|---|---|---|
| Currency | Publication date, last-modified headers | No date visible, information predates major changes in the field |
| Relevance | Does it address the specific research question? | Tangential coverage, keyword-stuffed but shallow |
| Authority | Who published it? Credentials? | Anonymous author, no institutional affiliation, no citations |
| Accuracy | Are claims sourced? Can they be verified? | No inline citations, contradicts known facts, round numbers without source |
| Purpose | Is it informing, selling, or persuading? | High ad density, affiliate links, promotional language |
Note: CRAAP evaluates surface features. Use it as an initial filter, not the sole credibility signal (Stanford research found reliance on CRAAP alone makes researchers susceptible to misinformation).
Different search providers excel in different domains. Route queries to the appropriate provider:
| Provider | Best For | Limitations |
|---|---|---|
| WebSearch (general) | Broad topics, recent events, technical documentation | May surface SEO-optimized content |
| arXiv / Semantic Scholar | Academic ML/AI research, preprints | Not peer-reviewed, may be superseded |
| PubMed | Medical, biomedical, clinical research | Limited to biomedical domain |
| Official documentation | API specs, library usage, framework guides | May lag behind actual behavior |
| GitHub | Code examples, implementation patterns, issue discussions | Quality varies widely |
Strategy: Start with domain-appropriate providers. Use general web search to fill gaps. Cross-reference findings across multiple providers when possible.
Red flags that indicate low-quality, SEO-optimized content:
When a source triggers 2+ red flags, discard it and search for a higher-quality alternative.
Research often involves evaluating non-content artifacts — packages, tools, technologies, standards, organizations. These require different signals than content sources. Every artifact has three signal dimensions:
| Dimension | What It Measures | Key Question |
|---|---|---|
| Health | Is it alive and maintained? | When was the last meaningful activity? |
| Adoption | Does anyone actually use it? | What are the real usage numbers? |
| Authority | Who's behind it and are they credible? | Is this backed by a credible entity? |
| Artifact Type | Health Signals | Adoption Signals | Authority Signals |
|---|---|---|---|
| Software packages | Last commit, release frequency, open issue response time | Downloads (npm weekly, PyPI monthly), dependents count | Maintainer reputation, organizational backing, license |
| GitHub repos | Commit frequency, PR merge time, stale issue ratio | Stars, forks, contributor count | Bus factor (>1 critical), corporate sponsor, notable users |
| APIs/Services | Uptime history, changelog frequency, deprecation notices | Customer logos, integration count, community size | Company funding, revenue stability, enterprise adoption |
| Standards/Specs | Last revision date, errata activity | Implementation count, conformance test suites | Standards body status (draft/proposed/standard), industry backing |
| Technologies | Release cadence, roadmap activity, CVE response time | Stack Overflow survey ranking, job postings, TIOBE/RedMonk index | Backing organization, governance model, ecosystem size |
| Architectural patterns | Recent case studies, active community discussion | Industry adoption breadth, conference talk frequency | Documented at-scale deployments, known failure case studies |
| People/Authors | Recent publication activity | Citation count, h-index, follower count | Institutional affiliation, industry role, peer recognition |
| Companies/Orgs | Recent funding, hiring activity, product releases | Revenue, customer count, market share | Investor quality, leadership track record, industry awards |
| Communities | Messages per week, new member rate | Member count, active member ratio | Moderation quality, notable members, signal-to-noise ratio |
| Datasets/Benchmarks | Last update, known issues addressed | Citation count, leaderboard participation | Creator credentials, methodology transparency, peer review |
| Claims/Statistics | Date of study, methodology recency | Citation count, replication status | Funding source, sample size, peer review, original source |
When evaluating artifacts, use APIs for exact stats instead of search snippets:
| Ecosystem | API Command | Returns |
|---|---|---|
| GitHub | gh api repos/{owner}/{name} | Stars, forks, license, language, last update, open issues |
| GitHub releases | gh api repos/{owner}/{name}/releases/latest | Latest version tag, release date |
| npm | curl api.npmjs.org/downloads/point/last-week/{pkg} | Exact weekly downloads |
| PyPI | curl pypistats.org/api/packages/{pkg}/recent | Recent download counts |
| crates.io | curl crates.io/api/v1/crates/{crate} | Downloads, version, recent downloads |
| RubyGems | curl rubygems.org/api/v1/gems/{gem}.json | Downloads, latest version |
| Maven | Search site:mvnrepository.com {artifact} | Usage stats page |
These APIs return ground truth. Search snippets for these stats are unreliable.
Software packages and repos:
Technologies and standards:
Claims and statistics:
General rule: When an artifact triggers 2+ red flags, flag it explicitly in the research output. Do not recommend it without noting the risks.
For detailed per-artifact-type evaluation guides and how to check each signal programmatically, consult references/artifact-signals.md.
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Single-provider dependency | All searches go through one provider | Route by domain; use multiple providers |
| First-result trust | Accepting the top search result without evaluation | Evaluate credibility tier before incorporating |
| Equal credibility | Treating a blog post the same as a journal paper | Apply tier system; weight higher-tier sources |
| Ignoring retrieval failures | Silent fallback when search returns nothing useful | Log the gap; try alternative queries or providers |
| Breadth without depth | Fetching 20 URLs but reading none carefully | Fetch fewer sources; read each thoroughly |
For detailed provider comparison, domain-specific source guides, and artifact evaluation:
references/provider-comparison.md — Detailed comparison of search providers with API specifics, rate limits, and optimal use casesreferences/artifact-signals.md — Per-artifact-type evaluation guides with health/adoption/authority thresholds, how to check each signal, and the quick evaluation checklist