By anyshift-io
Investigate Kubernetes incidents and audit AWS security configurations including IAM privilege escalation paths, S3 bucket exposure, security group lateral movement, and SQS misconfigurations, enabling AI agents to perform vendor-neutral SRE incident response and reliability audits without live credentials.
Audit the union of every IAM policy attached to one principal for privilege-escalation paths that no single statement reveals, and for apparent escalations that are already neutralised. Resolves the effective permission set across all attached policies (Allow minus blanket Deny), then checks the cross-statement escalation combos (iam:PassRole + a compute-launch action, policy-rewrite-in-place, function-code hijack, self-attach admin, trust-policy rewrite + assume, credential minting for another identity), the wildcard grants (Action '*' on Resource '*', service-level wildcards, Allow+NotAction), and the trust-policy exposure. Its discipline is symmetric: it does NOT flag a PassRole combo killed by an explicit Deny, an Action '*' pinned to one bucket, an sts:AssumeRole whose target does not trust back, a mutation kit capped by a permissions-boundary Deny, or a cross-account assume sealed by an unsatisfiable Condition. Reports findings with severity and a fix, then names what a single principal's policies cannot answer (the privileges of a passed/assumed role, the permissions boundary, the org SCPs). Use when asked to audit an IAM policy, role, or user for escalation, over-broad grants, or "can this principal become admin." Vendor-neutral; runs offline against the policy JSON with no Anyshift account.
Investigate a live or recent incident in a Kubernetes cluster. Anchor the window, bisect the change surface (rollouts, ConfigMaps/Secrets, RBAC, HPA/cluster changes, CronJobs), classify against four reference failure paths (OOM, DNS, cascading-failure, deploy-correlator), confirm the hypothesis with three independent signals, quantify blast radius, and propose mitigation before root cause. Use whenever an agent is asked "what is breaking in the cluster right now", "why did this pod/Deployment just page", "did the rollout cause Z", or to triage an active Kubernetes incident. Vendor-neutral by default (works with kubectl, kube-state-metrics, and whatever telemetry you have); an opt-in Anyshift integration is documented separately.
Audit an estate of AWS S3 buckets for the one bucket that is genuinely publicly or cross-account exposed, without over-flagging the many buckets that READ as exposed but are neutralised. Resolves each bucket's EFFECTIVE verdict by composing four layers (Block Public Access x bucket policy x bucket ACL x access points), never one layer alone, then rolls the per-bucket verdicts up into an estate verdict. Its discipline is symmetric: BPA (RestrictPublicBuckets / BlockPublicPolicy) neutralises a Principal '*' policy but NOT a cross-account grant; IgnorePublicAcls kills a public-group ACL grant but NOT a cross-account canonical-user grant; a narrowing Condition (org id, ExternalId, SourceIp, access-point delegation) scopes a Principal '*' so it is not public. On a needle estate it names the ONE live bucket as the primary finding; on a clean estate it reports NO live exposure and does not manufacture findings. Then it states what the bucket configs alone cannot answer (per-object ACLs, CloudFront/CDN fronting, the trusted principals' identity policies, account-level BPA dependency, data sensitivity). Use when asked to review an S3 bucket fleet for public exposure, cross-account access, or whether the estate is clean. Vendor-neutral; runs offline against describe-bucket / get-bucket-policy / get-bucket-acl / list-access-points JSON with no Anyshift account.
Audit a fleet of AWS security groups for the multi-hop lateral-movement path that no single ingress rule reveals. Builds a directed reachability graph from the SG-to-SG references (an ingress rule on SG B naming SG A means a host in A can reach B), adds an internet edge for every 0.0.0.0/0 rule, then composes those edges into the transitive closure from a named entry point (the internet, or a compromised host). Reports the shortest reachable path to the crown-jewel tier, the blast radius, and any pivot/hub SG that bridges otherwise-isolated regions, each ranked by severity with a fix. Its discipline is symmetric: on a segmented or orphaned fleet where the chain does NOT reach the crown jewel, it reports clean and names the boundary instead of fabricating a path. Then it states what the SG graph alone cannot answer (live host membership, route tables, NACLs, app-layer auth). Use when asked to review a security-group fleet for lateral movement, blast radius, or whether the internet can reach a sensitive tier. Vendor-neutral; runs offline against describe-security-groups + describe-instances JSON with no Anyshift account.
Audit a single AWS SQS queue's configuration for the misconfigurations that silently drop or re-deliver messages while every attribute reads as fine. Parses the GetQueueAttributes output (and the referenced dead-letter queue), checks the redrive path (DLQ present, maxReceiveCount band, DLQ-vs-source retention ordering), the message lifecycle (poison messages aging out before they reach the DLQ, default visibility timeout, short retention), and exposure (open resource policy, encryption at rest, FIFO dedup contract). Reports findings with severity and a recommendation, then names the boundary: the questions a single queue's config cannot answer (consumer processing time, live behaviour, the IAM union, the producers and consumers on either side). Use when asked to review, harden, or sanity-check an SQS queue, or to explain why messages are going missing. Vendor-neutral; runs offline against the queue attributes with no Anyshift account.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
SRE methodology skills for AI agents. Each skill packages one reliability workflow (investigating a live incident, handing over oncall, writing a postmortem) as a self-contained module your agent loads and runs.
Built and maintained by Anyshift.
Your agent already writes code and runs commands. It does not know how a seasoned SRE actually works an incident: which signals to correlate first, when a deploy is the prime suspect, when to stop digging and page a human. These skills encode that methodology so the agent follows a real playbook instead of improvising.
Every skill runs end-to-end with no Anyshift account and no external credentials. The methodology, the worked examples, and the replay tests all work offline against fixtures.
Each skill targets one real product and one job over it: audit an IAM policy, triage a Terraform plan, resolve an S3 bucket's effective access. It does that job end-to-end, offline, against fixtures. Not a wrapper that dumps the API response back: each one carries the judgment a senior engineer applies to that one source, the thresholds and known-bad combinations that separate signal from a clean-looking config.
Then it stops. A single source only knows itself. The moment a question needs a join (this role to everything it can actually reach, this queue to its producers and consumers, this plan to the running infrastructure it will move) the data runs out. Each skill names exactly where that happens and what's missing, so the boundary is explicit instead of a silent wrong answer. That boundary is the same one every time: the join across resources, across sources, or across time.
| Skill | Domain | What it does |
|---|---|---|
sqs-queue-auditor | AWS | Audits redrive/DLQ wiring, maxReceiveCount, retention ordering against the DLQ, and a visibility timeout left at the risky default: the queue-side config that silently drops or re-delivers messages while every attribute reads as fine. |
iam-deceptive-escalation-auditor | AWS | Resolves the effective permission set across every policy on a principal (Allow minus blanket Deny), flags the cross-statement escalation combos (PassRole+compute-launch, policy-rewrite-in-place, trust-policy rewrite) that no single statement reveals, and stays symmetric: it does not flag an escalation already killed by a Deny, a scoped wildcard, or a sealed Condition. |
sg-deceptive-reachability-auditor | AWS | Builds a directed reachability graph from SG-to-SG references plus internet edges, composes the transitive closure from a named entry point, and reports the shortest path to the crown-jewel tier and the bridging hub SGs that a per-rule read misses, reporting clean when a segmented fleet has no reachable path. |
s3-estate-calibration-auditor | AWS | Resolves each bucket's effective verdict by composing all four layers (Block Public Access, bucket policy, ACL, access points), then calibrates across an estate: it names the one bucket that is genuinely public or cross-account exposed without over-flagging the many siblings that read as exposed but are neutralised, and reports clean when nothing is live. |
terraform-plan-risk-reporter | IaC | Ranks plan changes by blast risk, isolating destroys and force-replacements of stateful or irreplaceable resources from the harmless in-place updates they hide among. |
github-actions-flake-reporter | CI/CD | Detects flaky jobs (pass-on-rerun on an unchanged SHA), clusters failures by cause, and flags duration regressions across run history, not just the last red run. |
sqs-queue-auditor, iam-deceptive-escalation-auditor, sg-deceptive-reachability-auditor, and s3-estate-calibration-auditor are built out, each with fixture-based replay tests and a committed control-vs-treatment lift eval; the rest are planned. kubectl-investigator stays as the methodology-shaped reference template: it shows the directory shape, the worked-example format, and the fixture-based replay tests every skill above follows.
These skills ship as a plugin in Anyshift's Claude Code marketplace. In a Claude Code session:
/plugin marketplace add anyshift-io/claude-plugins
/plugin install sre-skills@anyshift
Anyshift-specific skills for AI agents: how to drive the Annie CLI and the Anyshift MCP server (read-only infra investigation: resource graph, recent changes, dependents, blast radius, temporal diffs).
npx claudepluginhub anyshift-io/claude-plugins --plugin sre-skillsAssist with SOC2 audit preparation
Cybersecurity skills for AI agents — code audit, cloud, recon, IR, AI security, and more
Skills and plugins to accelerate security workflows with the Orca Cloud Platform
DevsForge site reliability engineering specialist for building resilient and scalable systems
Production reliability and observability across all environments. Master Datadog, CloudWatch, monitoring, incident response, SRE practices, and audit logging for enterprise compliance.
🐉 Specialised SRE skills for outage investigations, monitoring graphs, and post-mortems on Google Cloud Platform.