From armory
Performs architecture reviews across 7 dimensions (structural, scalability, enterprise readiness, performance, security, ops, data) with scored reports and recommendations. For design critiques and system audits.
npx claudepluginhub mathews-tom/armory --plugin armoryThis skill uses the workspace's default tool permissions.
Systematic, framework-driven architecture review skill. Acts as a senior staff/principal
assets/report-template.mdevals/cases.yamlreferences/codebase-signals.mdreferences/data-architecture.mdreferences/document-review-guide.mdreferences/enterprise-readiness.mdreferences/operational-excellence.mdreferences/performance.mdreferences/scalability.mdreferences/scoring-rubric.mdreferences/security.mdreferences/structural-integrity.mdscripts/scan_codebase.shImplements Playwright E2E testing patterns: Page Object Model, test organization, configuration, reporters, artifacts, and CI/CD integration for stable suites.
Guides Next.js 16+ Turbopack for faster dev via incremental bundling, FS caching, and HMR; covers webpack comparison, bundle analysis, and production builds.
Discovers and evaluates Laravel packages via LaraPlugins.io MCP. Searches by keyword/feature, filters by health score, Laravel/PHP compatibility; fetches details, metrics, and version history.
Systematic, framework-driven architecture review skill. Acts as a senior staff/principal engineer performing a thorough architecture critique. Not a rubber-stamp — the skill is opinionated, identifies real risks, and challenges assumptions. Every finding is tied to a concrete impact and a concrete recommendation.
The review proceeds in 4 phases:
These constraints are NON-NEGOTIABLE. Memorize before starting any review.
SCORE SCALE: 1-5 only (NOT 1-10, NOT percentages)
Half-scores (3.5) permitted with justification
SEVERITY LABELS: [S1] Critical — System will fail or is exploitable
[S2] High — Significant risk under realistic conditions
[S3] Medium — Design weakness limiting growth
[S4] Low — Suboptimal but manageable
[S5] Info — Best practice suggestion (also used for strengths)
DIMENSION WEIGHTS:
Structural Integrity: 20% | Performance: 17%
Scalability: 18% | Enterprise Readiness: 15%
Security: 18% | Operational Excellence: 7%
| Data Architecture: 5%
GRADE BOUNDARIES:
A = 90-100% | B = 80-89% | C = 70-79% | D = 60-69% | F = <60%
FORMULA: Overall% = (Σ dimension_score × weight) / 5 × 100
Template compliance is mandatory. See Phase 4 checklist before finalizing any report.
Determine the review mode from what the user provides:
Mode A — Codebase Review: User provides a directory path, repository, or uploaded code files.
scripts/scan_codebase.sh <path> for structural overview.Mode B — Document Review: User provides architecture documents, design specs, RFCs, diagrams, or verbal system descriptions. No codebase available.
Mode C — Hybrid: User provides both code and documents.
If Mode A or C (codebase available): Run the scan script to get a structural fingerprint:
bash scripts/scan_codebase.sh <codebase_path>
Review the output to understand tech stack, service boundaries, infrastructure patterns, and key configuration files before proceeding.
If Mode B or C (documents available): Read all provided documents. Extract:
Always ask clarifying questions before starting the analysis. Tailor questions based on what is already known from the input, but always cover these areas:
System Context:
Scale & Performance Expectations:
Deployment & Operations:
Compliance & Security:
Scope & Focus:
Adapt the questions — skip what's already answered by the input, and add domain-specific questions based on what you see. Keep questions focused and avoid overwhelming the user.
Wait for the user's responses before proceeding to Phase 2.
Evaluate the architecture across 7 weighted dimensions. For each dimension:
references/scoring-rubric.md| # | Dimension | Weight | Reference File |
|---|---|---|---|
| 1 | Structural Integrity & Design Principles | 20% | references/structural-integrity.md |
| 2 | Scalability | 18% | references/scalability.md |
| 3 | Enterprise Readiness | 15% | references/enterprise-readiness.md |
| 4 | Performance | 17% | references/performance.md |
| 5 | Security | 18% | references/security.md |
| 6 | Operational Excellence | 7% | references/operational-excellence.md |
| 7 | Data Architecture | 5% | references/data-architecture.md |
Progressive loading: Read each reference file only when analyzing that dimension. Do not load all references at once.
Mode-specific guidance:
references/codebase-signals.md for what files and
patterns to inspect per dimension.references/document-review-guide.md for completeness
checklists and common gaps.| Level | Label | Meaning |
|---|---|---|
| S1 | Critical | System will fail in production or has an active exploitable vulnerability |
| S2 | High | Significant risk that will cause problems under realistic conditions |
| S3 | Medium | Design weakness that limits growth or creates tech debt |
| S4 | Low | Suboptimal choice with manageable impact |
| S5 | Informational | Observation, best practice suggestion, or note for awareness |
The review is architecture-pattern-agnostic. Do not assume any pattern is inherently superior. Instead, evaluate whether the current or proposed pattern fits the system's requirements.
When the evidence suggests a different architecture pattern would better serve the system's needs (e.g., a distributed monolith that should be either a true monolith or properly decomposed microservices), include this as a finding with:
After completing all 7 dimensions, perform synthesis:
Multi-Dimension Findings — Identify issues that span dimensions (e.g., missing cache is both a performance AND scalability issue). Consolidate duplicates, note the cross-cutting nature.
Conflicting Decisions — Detect contradictions (e.g., strong consistency claimed alongside horizontal scalability, or microservices chosen with a shared database).
Architectural Coherence — Do the parts fit together into a unified whole? Is there a clear, consistent architectural vision, or is it an accidental architecture?
Requirements Alignment — Does this architecture actually solve the stated problem at the stated scale? Is it over-engineered or under-engineered for the requirements?
Architecture Pattern Fitness — Based on the full analysis, is the chosen (or emergent) architecture pattern the right one? If not, what would be better and why?
Severity Reconciliation — Review findings that appear in multiple dimensions or combine to create compound risks. When cross-cutting analysis reveals that multiple issues together are more severe than individually assessed:
Systemic Risk — Identify the single biggest risk. If one thing will sink this system, what is it? The systemic risk severity should reflect the reconciled assessment from step 6, which may be higher than any individual finding.
references/scoring-rubric.mdOverall = Σ(dimension_score × weight) / 5 × 100
Use assets/report-template.md as the skeleton. Fill in all sections:
Before finalizing the report, verify ALL of the following. Non-compliance invalidates the review.
Scoring Format Compliance:
Severity Label Compliance:
Arithmetic Verification (from v1.1):
Report Structure Compliance:
Content Quality Gates:
If any checkbox fails: Fix the issue before delivering the report. Do not proceed with a non-compliant report.
Output the completed report as a markdown file.
Apply these rules to ensure fair, useful reviews:
Stage-aware: A greenfield design should not be penalized for missing implementation details. Evaluate plans, not missing code. Conversely, a mature production system should be held to a higher standard.
Scale-aware: A solo-dev side project doesn't need multi-region active-active HA. Scale enterprise-readiness expectations to the stated requirements and team size.
"Not applicable" vs "Missing": If the system is a batch analytics pipeline, P99 latency targets are irrelevant — mark as N/A, don't score as zero. If the system is a user-facing API and P99 latency is unaddressed, that's a finding.
Acknowledge strengths: Highlight what's done well. Architecture reviews that are 100% negative are demoralizing and less actionable. Lead with genuine strengths.
Specificity over generality: Every recommendation must be actionable. "Add caching" is insufficient. Specify what to cache, with what strategy, what TTL, and why.
Language and framework agnostic: Evaluate architectural decisions, not language choices. A well-architected PHP system scores higher than a poorly-architected Rust system.
Honest about unknowns: If the input doesn't provide enough information to evaluate a sub-criterion, say so explicitly. Don't guess. Flag it as requiring more information.
| Rationalization | Reality |
|---|---|
| "It works in production already" | Working today doesn't mean it scales, maintains, or survives team turnover — architecture debt compounds silently |
| "We'll refactor when it becomes a problem" | By then the cost is 10x higher — refactoring under load with accumulated dependencies is surgical, not routine |
| "The framework handles that" | Frameworks provide defaults, not architecture — you're still responsible for boundaries, error propagation, and data flow |
| "It's an internal service, standards don't apply" | Internal services become external faster than you expect — technical debt migrates across boundaries |
| "Performance is fine for our current scale" | Architecture reviews evaluate the next 10x, not the current state — O(n^2) at 1k rows is invisible at 100k rows |
| "We don't have time for a full review" | Partial reviews create false confidence — better to review fewer dimensions thoroughly than all dimensions superficially |