Internal Developer Platform

Comprehensive guide to designing and building Internal Developer Platforms (IDPs) that improve developer productivity and experience.

When to Use This Skill

Designing an Internal Developer Platform
Building or restructuring platform teams
Improving developer experience (DevEx)
Evaluating platform technologies (Backstage, Port, etc.)
Creating self-service capabilities for developers
Measuring platform adoption and success

Platform Engineering Fundamentals

What is an Internal Developer Platform?

Internal Developer Platform (IDP):
A layer on top of infrastructure that provides self-service
capabilities to development teams while maintaining governance.

┌─────────────────────────────────────────────────────────────┐
│                    DEVELOPERS                                │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐       │
│  │ Team A  │  │ Team B  │  │ Team C  │  │ Team D  │       │
│  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘       │
│       │            │            │            │              │
│       └────────────┴─────┬──────┴────────────┘              │
│                          │                                   │
│  ┌───────────────────────┴───────────────────────────────┐  │
│  │              INTERNAL DEVELOPER PLATFORM               │  │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │  │
│  │  │ Service  │ │ Template │ │ Self-    │ │ Docs &   │ │  │
│  │  │ Catalog  │ │ Library  │ │ Service  │ │ Discovery│ │  │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ │  │
│  └───────────────────────────────────────────────────────┘  │
│                          │                                   │
│  ┌───────────────────────┴───────────────────────────────┐  │
│  │                  INFRASTRUCTURE                        │  │
│  │  Kubernetes │ Cloud │ CI/CD │ Observability │ Security │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Key Value Propositions:
├── Self-service: Developers can provision without tickets
├── Standardization: Consistent patterns across teams
├── Guardrails: Security and compliance built-in
├── Visibility: Centralized service catalog and docs
└── Efficiency: Reduce cognitive load on developers

Platform vs Infrastructure

Infrastructure Team (Traditional):
- Ticket-based requests
- Manual provisioning
- Bespoke solutions per team
- Ops handles deployments
- Documentation scattered

Platform Team (Modern):
- Self-service capabilities
- Automated provisioning
- Standardized templates
- Developers own deployments
- Centralized documentation

Key Shift:
"You Build It, You Run It" + "Platform Handles the How"

Platform Core Components

Service Catalog

Service Catalog:
Centralized registry of all services with ownership, docs, and metadata.

┌─────────────────────────────────────────────────────────────┐
│                    SERVICE CATALOG                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ Payment Service                                [API]     │ │
│  │ Owner: Payments Team     │ Tier: Critical              │ │
│  │ Tech: Node.js, PostgreSQL │ Dependencies: 4            │ │
│  │ [Docs] [API Spec] [Runbook] [Alerts] [Deploy]          │ │
│  └─────────────────────────────────────────────────────────┘ │
│                                                              │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ User Service                                 [Backend]   │ │
│  │ Owner: Identity Team     │ Tier: High                  │ │
│  │ Tech: Go, MongoDB        │ Dependencies: 2             │ │
│  │ [Docs] [API Spec] [Runbook] [Alerts] [Deploy]          │ │
│  └─────────────────────────────────────────────────────────┘ │
│                                                              │
│  Service Metadata:                                           │
│  ├── Owner team and contacts                                │
│  ├── Technical stack                                        │
│  ├── Service tier/criticality                               │
│  ├── Dependencies (upstream/downstream)                     │
│  ├── API specifications                                     │
│  ├── Documentation links                                    │
│  ├── Deployment information                                 │
│  └── Observability dashboards                               │
└─────────────────────────────────────────────────────────────┘

Template Library

Template Library:
Pre-built templates for common patterns that encode best practices.

Template Categories:
├── Application Templates
│   ├── REST API (Go, Node.js, .NET, Python)
│   ├── GraphQL Service
│   ├── gRPC Service
│   ├── Event Consumer
│   ├── Scheduled Job
│   └── Frontend (React, Vue, Angular)
│
├── Infrastructure Templates
│   ├── Database (PostgreSQL, MySQL, MongoDB)
│   ├── Cache (Redis, Memcached)
│   ├── Message Queue (Kafka, RabbitMQ)
│   └── Storage (S3, GCS)
│
└── Integration Templates
    ├── Third-party API client
    ├── Authentication flow
    └── Webhook handler

Template Contents:
┌─────────────────────────────────────────────────────────────┐
│ Template: node-rest-api                                      │
├─────────────────────────────────────────────────────────────┤
│ ├── src/                    │ Application code              │
│ ├── tests/                  │ Test setup                    │
│ ├── Dockerfile              │ Container image               │
│ ├── helm/                   │ Kubernetes deployment         │
│ ├── .github/workflows/      │ CI/CD pipelines               │
│ ├── docs/                   │ Documentation templates       │
│ ├── catalog-info.yaml       │ Backstage registration        │
│ └── terraform/              │ Infrastructure as Code        │
│                                                              │
│ Built-in:                                                    │
│ ✓ Health checks             ✓ Structured logging            │
│ ✓ OpenTelemetry tracing     ✓ Prometheus metrics           │
│ ✓ Security headers          ✓ Input validation             │
│ ✓ Error handling            ✓ API documentation            │
└─────────────────────────────────────────────────────────────┘

Self-Service Portal

Self-Service Capabilities:
Actions developers can perform without tickets or approvals.

┌─────────────────────────────────────────────────────────────┐
│                  SELF-SERVICE PORTAL                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Create New Service     [5 min setup, no tickets]           │
│  ├── Choose template                                        │
│  ├── Configure options                                      │
│  ├── Generate repository                                    │
│  ├── Create CI/CD pipeline                                  │
│  ├── Provision infrastructure                               │
│  └── Register in catalog                                    │
│                                                              │
│  Common Self-Service Actions:                               │
│  ┌────────────────┬────────────────┬────────────────┐      │
│  │ Environments   │ Databases      │ Secrets        │      │
│  │ ├── Create env │ ├── Provision  │ ├── Create     │      │
│  │ ├── Clone env  │ ├── Scale      │ ├── Rotate     │      │
│  │ └── Destroy    │ └── Backup     │ └── Access     │      │
│  └────────────────┴────────────────┴────────────────┘      │
│  ┌────────────────┬────────────────┬────────────────┐      │
│  │ Deployments    │ Domains        │ Access         │      │
│  │ ├── Deploy     │ ├── Request    │ ├── Request    │      │
│  │ ├── Rollback   │ ├── Configure  │ ├── Review     │      │
│  │ └── Promote    │ └── Cert       │ └── Audit      │      │
│  └────────────────┴────────────────┴────────────────┘      │
│                                                              │
│  Guardrails (automatic):                                    │
│  ✓ Security scanning        ✓ Compliance checks            │
│  ✓ Cost limits              ✓ Naming conventions           │
│  ✓ Resource quotas          ✓ Approval workflows           │
└─────────────────────────────────────────────────────────────┘

Platform Team Structure

Team Topologies

Platform Team Types:

1. Enabling Team (Recommended Start)
   Purpose: Help stream-aligned teams adopt platform
   Size: 3-5 people
   Activities:
   ├── Pair programming with product teams
   ├── Create documentation and guides
   ├── Gather feedback and requirements
   └── Provide training and support

2. Platform Team (Mature)
   Purpose: Build and maintain the platform
   Size: 5-15 people (scale with org)
   Activities:
   ├── Build self-service capabilities
   ├── Maintain templates and tooling
   ├── Define and enforce standards
   └── Operate platform infrastructure

3. Complicated Subsystem Team (Specialized)
   Purpose: Handle complex technical domains
   Size: 3-7 people per domain
   Examples:
   ├── Data platform team
   ├── ML platform team
   └── Security platform team

Team Interaction:
┌─────────────────────────────────────────────────────────────┐
│                                                              │
│  ┌──────────────┐         ┌──────────────┐                 │
│  │ Stream-      │◄───────►│ Platform     │                 │
│  │ Aligned Team │ X-as-a- │ Team         │                 │
│  └──────────────┘ Service └──────────────┘                 │
│         │                        │                          │
│         │ Collaboration          │ Facilitation            │
│         │                        │                          │
│         ▼                        ▼                          │
│  ┌──────────────┐         ┌──────────────┐                 │
│  │ Complicated  │◄───────►│ Enabling     │                 │
│  │ Subsystem    │ Service │ Team         │                 │
│  └──────────────┘         └──────────────┘                 │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Platform Team Skills

Platform Team Competencies:

Technical:
├── Kubernetes and container orchestration
├── Infrastructure as Code (Terraform, Pulumi)
├── CI/CD pipeline design
├── API design and development
├── Observability tooling
├── Security engineering
└── Cloud platforms (AWS, GCP, Azure)

Product:
├── Developer experience research
├── User journey mapping
├── Metrics and analytics
├── Documentation writing
└── Training and enablement

Organizational:
├── Stakeholder management
├── Communication skills
├── Change management
└── Technical leadership

Platform Technology Choices

Backstage (Spotify)

Backstage:
Open-source developer portal framework by Spotify.

Core Features:
├── Service Catalog (software component registry)
├── Software Templates (scaffolding)
├── TechDocs (docs-as-code)
├── Search (unified search across everything)
└── Plugins (extensible ecosystem)

Architecture:
┌─────────────────────────────────────────────────────────────┐
│                     BACKSTAGE                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │                    Frontend (React)                    │   │
│  │  ├── Catalog UI    ├── Templates UI    ├── Plugins   │   │
│  └──────────────────────────────────────────────────────┘   │
│                           │                                  │
│  ┌──────────────────────────────────────────────────────┐   │
│  │                    Backend (Node.js)                   │   │
│  │  ├── Catalog API   ├── Auth          ├── Plugin APIs │   │
│  └──────────────────────────────────────────────────────┘   │
│                           │                                  │
│  ┌──────────────────────────────────────────────────────┐   │
│  │                    Integrations                        │   │
│  │  ├── GitHub     ├── Kubernetes    ├── CI/CD          │   │
│  │  ├── PagerDuty  ├── Prometheus    ├── Custom         │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Catalog Entity:
# catalog-info.yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
  description: Handles payment processing
  annotations:
    github.com/project-slug: org/payment-service
    backstage.io/techdocs-ref: dir:.
spec:
  type: service
  lifecycle: production
  owner: payments-team
  system: payments
  dependsOn:
    - component:user-service
  providesApis:
    - payment-api

Alternative Platforms

Platform Options Comparison:

| Platform | Type | Strengths | Considerations |
|----------|------|-----------|----------------|
| Backstage | OSS | Extensible, active community | Requires customization |
| Port | Commercial | Quick setup, polished UI | Vendor lock-in |
| Cortex | Commercial | SRE focused, scorecards | Enterprise pricing |
| OpsLevel | Commercial | Service maturity | Smaller ecosystem |
| Roadie | Managed | Hosted Backstage | Less control |

Decision Factors:
├── Build vs Buy tolerance
├── Customization requirements
├── Team capacity for maintenance
├── Integration needs
├── Budget constraints
└── Timeline expectations

Developer Experience Metrics

DORA Metrics

DORA (DevOps Research and Assessment) Metrics:

1. Deployment Frequency
   How often you deploy to production
   ├── Elite: Multiple times per day
   ├── High: Weekly to monthly
   ├── Medium: Monthly to every 6 months
   └── Low: Every 6+ months

2. Lead Time for Changes
   Time from code commit to production
   ├── Elite: < 1 hour
   ├── High: 1 day to 1 week
   ├── Medium: 1 week to 1 month
   └── Low: 1 month to 6 months

3. Mean Time to Recovery (MTTR)
   Time to recover from production failure
   ├── Elite: < 1 hour
   ├── High: < 1 day
   ├── Medium: < 1 week
   └── Low: 1 week to 1 month

4. Change Failure Rate
   Percentage of deployments causing failure
   ├── Elite: 0-15%
   ├── High: 16-30%
   ├── Medium: 31-45%
   └── Low: 46-60%

Platform-Specific Metrics

Platform Success Metrics:

Adoption:
├── % of services in catalog
├── % of teams using templates
├── Self-service usage rate
├── Portal active users
└── Template utilization

Efficiency:
├── Time to first deployment (new service)
├── Time to provision infrastructure
├── Ticket reduction rate
├── Toil automation percentage
└── Developer time saved

Satisfaction:
├── Developer NPS
├── Platform satisfaction surveys
├── Support ticket volume
├── Documentation usefulness
└── Onboarding feedback

Quality:
├── Template adoption vs custom builds
├── Security compliance rate
├── Standards adherence
└── Incident rate for platform-built services

Implementation Roadmap

Phased Approach

Phase 1: Foundation (3-6 months)
├── Service catalog (inventory what exists)
├── Basic documentation site
├── Initial template (1-2 golden paths)
├── Platform team formation
└── Metrics baseline

Phase 2: Self-Service (6-12 months)
├── Template library expansion
├── Self-service provisioning
├── CI/CD standardization
├── Developer portal launch
└── Adoption campaigns

Phase 3: Optimization (12-18 months)
├── Advanced templates
├── Platform APIs
├── Automation expansion
├── Cost optimization
└── Advanced analytics

Phase 4: Ecosystem (18+ months)
├── Plugin ecosystem
├── ML/data platform integration
├── Cross-team collaboration features
├── External developer experience
└── Continuous evolution

Success Criteria Per Phase:
Phase 1: 50% service discovery complete
Phase 2: 70% of new services use templates
Phase 3: 80% self-service capability
Phase 4: Platform is indispensable

Common Anti-Patterns

Platform Anti-Patterns:

1. "Build It and They Will Come"
   ❌ Building features without user research
   ✓ Start with developer interviews and pain points

2. "One Size Fits All"
   ❌ Forcing every team into same workflow
   ✓ Provide flexibility with sensible defaults

3. "Platform as Gatekeeper"
   ❌ Adding friction and approval gates
   ✓ Enable self-service with guardrails

4. "Technical Purity"
   ❌ Choosing tech for platform team excitement
   ✓ Choose what solves developer problems

5. "Big Bang Launch"
   ❌ Building for 2 years before releasing
   ✓ Iterate quickly with early adopters

6. "Mandates Without Value"
   ❌ Forcing adoption via policy
   ✓ Make platform so good teams want to use it

7. "Documentation Afterthought"
   ❌ Minimal or outdated docs
   ✓ Treat docs as product feature

8. "Ivory Tower Platform"
   ❌ Platform team isolated from users
   ✓ Embed with product teams regularly

Best Practices

Platform Engineering Best Practices:

1. Treat Platform as Product
   ├── Have product owner/manager
   ├── Conduct user research
   ├── Prioritize based on impact
   └── Measure outcomes, not outputs

2. Start with Golden Paths
   ├── Identify most common use cases
   ├── Create templates for those first
   ├── Make golden path easiest choice
   └── Don't block non-golden paths

3. Optimize for Self-Service
   ├── Target <5 minutes for common tasks
   ├── Eliminate manual approvals where safe
   ├── Provide escape hatches when needed
   └── Clear error messages and guidance

4. Build Community
   ├── Developer advocates/champions
   ├── Office hours and support channels
   ├── Contribution guidelines
   └── Celebrate platform wins

5. Measure Everything
   ├── Adoption metrics
   ├── Developer satisfaction
   ├── Time savings
   └── Platform reliability

6. Iterate Rapidly
   ├── Ship early, improve often
   ├── Gather feedback continuously
   ├── Deprecate gracefully
   └── Communicate changes clearly

Related Skills

golden-paths - Designing standardized development workflows
self-service-infrastructure - Infrastructure self-service patterns
slo-sli-error-budget - Platform reliability targets
observability-patterns - Platform observability

internal-developer-platform

Internal Developer Platform

When to Use This Skill

Platform Engineering Fundamentals

What is an Internal Developer Platform?

Platform vs Infrastructure

Platform Core Components

Service Catalog

Template Library

Self-Service Portal

Platform Team Structure

Team Topologies

Platform Team Skills

Platform Technology Choices

Backstage (Spotify)

Alternative Platforms

Developer Experience Metrics

DORA Metrics

Platform-Specific Metrics

Implementation Roadmap

Phased Approach

Common Anti-Patterns

Best Practices

Related Skills

Similar Skills