From engineering
Audit third-party software (GitHub repos, tarballs, compiled binaries, npm packages, Claude Code plugins) for safety BEFORE install. Checks for telemetry, data exfiltration, prompt injection, supply-chain attacks, closed-source phone-home components, hardcoded credentials, user-hostile defaults, and unauthenticated local services. Produces a structured verdict (SAFE/CAUTION/UNSAFE) with file:line citations, a plain-English description of what the software actually does based on code rather than marketing, and a concrete install recommendation. Use this skill whenever the user wants to clone/install/try/run/use ANY third-party code, package, binary, or tool from the internet — even casually. Trigger on phrases like "is X safe to install", "clone and review", "audit this repo", "check this before I use it", "no telemetry check", "should I install X", "what does this do", "review this tool", or whenever the user shares a GitHub URL, npm package name, tarball, or plugin/skill reference with apparent install intent. Also trigger proactively whenever the user is about to run an install command (`pnpm add`, `npm install`, `brew install`, `curl | sh`, `git clone` followed by build/run) against an unfamiliar source. Skip only when the user explicitly says they've already audited it or it's first-party code they wrote.
How this skill is triggered — by the user, by Claude, or both
Slash command
/engineering:audit-third-party-softwareThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Before the user installs, clones-and-runs, or otherwise executes unfamiliar third-party code from the internet. The user's threshold for concern is almost always stricter than marketing copy suggests, so the job here is to verify claims against the actual code — not to provide reassurance.
Before the user installs, clones-and-runs, or otherwise executes unfamiliar third-party code from the internet. The user's threshold for concern is almost always stricter than marketing copy suggests, so the job here is to verify claims against the actual code — not to provide reassurance.
eval over network-fetched strings, dotfile reads, postinstall-time downloads, anti-sandbox checks. Block.Clone or extract to an isolated location (default: /tmp/audit-<name>/ or the user's working dir if they specified).
git clone <repo-url> /path/to/dest
cd /path/to/dest
git log --oneline | head -10 # activity signal
git remote -v # flag any credentials embedded in the URL
Get the shape before reading:
find . -type f -not -path './.git/*' | wc -lpackage.json, Cargo.toml, pyproject.toml, *.csproj, Dockerfile, binary artifacts.husky/, scripts/, .github/workflows/, bin/, setup.sh, pre-commit-config.yamlclaude-kanban for a worked example of a small template project.claude_agent_teams_ui and simonc602-agentic-os for worked examples.The reason for dispatching is context budget — a 500-file app will burn your main thread. The subagent produces a concise report you synthesize.
Regardless of direct-read or subagent, cover these domains:
Telemetry & data exfiltration:
grep -r -iE "posthog|mixpanel|amplitude|segment\.io|sentry\.init|datadog|rollbar|launchdarkly|statsig" — the active SDKs. Distinguish imported and initialized from documented as example.Prompt injection (for AI-adjacent tools):
<info_for_agent>, <system-override>, invisible unicode)..md files that steer behavior beyond the advertised function.Supply chain:
package.json/pnpm-lock.yaml/bun.lock/Cargo.toml — scan for typo-squats, recently-published packages, preinstall/postinstall scripts.onlyBuiltDependencies pinning (pnpm) is a positive signal..husky/, scripts/install.sh, CI workflows executing code.curl | sh in install paths (acceptable for trusted sources like astral.sh, bun.sh, rustup; flag unfamiliar ones).Closed-source phone-home components:
*.lock.json or manifest files pointing to a sourceRepository that returns 404 (private/deleted).references/binary-analysis.md for deep-inspection steps when a downloadable binary is part of the picture.Privilege, permissions, and unsafe local services:
spawn(..., shell: true) or exec(...) with user-derivable input.--dangerously-skip-permissions or equivalent YOLO flags as defaults.run-hidden-command.ps1 may just be a legitimate Windows UI helper).Credentials & secrets:
.git/config with embedded tokens (flag to user regardless — could be their own leaked credential, a distribution mechanism, or someone else's leaked token)..env.example with clearly-named endpoints vs mystery endpoints.README vs reality:
See references/binary-analysis.md for the full recipe. Abbreviated:
shasum -a 256 <binary> # pin it
file <binary> # type
codesign -dv <binary> 2>&1 # signing (macOS)
otool -L <binary> 2>&1 # linked libs (macOS)
strings -a <binary> > /tmp/strings.txt
python3 scripts/extract_strings_urls.py /tmp/strings.txt # URL + domain classification
Then context-grep suspicious domains to distinguish hardcoded endpoints from upstream-dep artifacts (e.g. plus-innovations.com is the systeminformation npm author — benign; api.voicetext.site as __Y="..." is a hardcoded backend — finding).
When you find unfamiliar hostnames, classify them before reporting. See references/domain-triage.md for a checklist of common false-alarm patterns and clear red flags.
Structure defined below. Keep it scannable — bullet density per finding should be high enough that a reader skimming for 30 seconds gets the verdict + top 3 concerns.
Also write the report to AUDIT_FINDINGS.md at the repo root (or the audit directory root) so the user has a persistent artifact.
Don't end on findings alone. Based on the verdict, offer specific follow-ups:
references/templates.mdMatch the user's stated threshold. If they said "no warnings", CAUTION means do not install; don't auto-recommend proceeding with mitigations unless they ask.
Use this exact shape:
## Verdict: SAFE | CAUTION | UNSAFE
One-sentence reason.
## Critical findings (blocking)
Only findings that affect the verdict. Each with file:line citation.
- **Finding name.** Evidence (`path/to/file.ts:123`). One-line impact.
## Notable but non-blocking
Things the user should know but that don't change the verdict.
## What it actually does
One paragraph, code-grounded. If this contradicts the README, note that explicitly.
## Install recommendation
Match the user's threshold. Concrete commands, patch locations, or "do not install".
When dispatching to a subagent for a large repo, use this scaffold (fill in specifics):
Conduct a security/safety audit of the repo at <ABSOLUTE_PATH>. The user wants to <INSTALL|USE|RUN> this on their <OS> and cares about:
1. Telemetry / data exfiltration — fetch/axios calls to unexpected domains, analytics SDKs actively initialized (not just referenced), env-var uploads.
2. Prompt injection — hidden instructions injected into model prompts, steering via skill/agent .md files, hardcoded system prompts that misrepresent the tool.
3. Supply chain — package.json / lockfile dependencies, typo-squats, postinstall scripts, GitHub Actions workflows, curl|sh install paths.
4. Closed-source phone-home — lock files referencing private source repos, binaries downloaded at launch, remote killswitches or capability servers, installation-unique tracking IDs.
5. Privilege & local services — child processes with shell=true or user input, unauthenticated local HTTP/IPC servers, dangerous flags as defaults.
6. Credential handling — .env examples, API key storage, hardcoded tokens.
Context: <relevant signals from your scoping step — lockfiles present, frameworks, suspicious file names, etc.>
Start with: README/CLAUDE.md/AGENTS.md, all package.json files, install/postinstall scripts, .github/workflows/, then framework-specific (API routes for Next.js, main-process files for Electron, etc.). Grep for the patterns above.
Specific files to scrutinize: <list any that stood out during scoping>
Report format (<=800 words):
- Verdict: SAFE / CAUTION / UNSAFE
- Critical findings with file:line
- Notable but non-blocking
- What the software actually does (1 paragraph, code-grounded)
- Install recommendation with exact commands
Cite files and line numbers. Don't trust marketing copy. Read the code.
Three audits from the same working session that calibrated this skill:
claude_agent_teams_ui → CAUTION. Static analysis of a Bun-compiled binary found hardcoded api.voicetext.site backend with installation-unique clientId, guest-session OAuth flow, and server-controlled killswitches. Marketing claimed "runs entirely locally" — false.claude-kanban → SAFE. 26-file template of Claude Code hooks and agent personas, zero network calls, readable in 15 minutes.simonc602-agentic-os → CAUTION. Next.js app with --dangerously-skip-permissions as default for every Claude spawn, plus unauthenticated local terminal endpoint exposing bash over HTTP. No phone-home. Distribution PAT embedded in remote URL (separate issue worth raising with user).Each had a different shape (binary, text-only template, app with API routes); the core workflow — scope, dispatch-or-direct-read, cross-check claims, classify domains, cite specifics, produce tiered report — worked for all three.
npx claudepluginhub kzarzycki/agent-skills --plugin engineeringProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.