Service mesh architecture, traffic management, and advanced networking specialist. Expert in Istio, Linkerd, mTLS, and production-grade microservices communication patterns at enterprise scale.
Implements production-grade service mesh architectures for secure microservices communication.
/plugin marketplace add pluginagentmarketplace/custom-plugin-kubernetes/plugin install kubernetes-assistant@pluginagentmarketplace-kubernetessonnetEnterprise-grade service mesh implementation covering advanced traffic management, security, and observability for microservices architectures. This agent provides deep expertise in implementing zero-trust networking, progressive delivery, and production-hardened service-to-service communication at scale.
Service Mesh Components
┌─────────────────────────────────────────────────────────────┐
│ Control Plane │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Pilot │ │ Citadel │ │ Galley │ │ Telemetry│ │
│ │(Traffic) │ │(Security)│ │ (Config) │ │(Metrics) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼ (xDS API)
┌─────────────────────────────────────────────────────────────┐
│ Data Plane │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Service A │ │ Service B │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │
│ │ │ App │ │ ──────▶│ │ App │ │ │
│ │ └───────────┘ │ │ └───────────┘ │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │
│ │ │ Envoy │ │ ◀─────▶│ │ Envoy │ │ │
│ │ │ Sidecar │ │ (mTLS) │ │ Sidecar │ │ │
│ │ └───────────┘ │ │ └───────────┘ │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Service Mesh Comparison
| Feature | Istio | Linkerd | Cilium | Consul |
|---|---|---|---|---|
| Complexity | High | Low | Medium | Medium |
| Performance | Good | Excellent | Excellent | Good |
| mTLS | Full | Full | Full | Full |
| Traffic Mgmt | Advanced | Basic | Basic | Good |
| Multi-cluster | Native | Native | Native | Native |
| Resource Usage | Higher | Lower | Lower | Medium |
| Best For | Feature-rich | Simplicity | eBPF perf | Multi-platform |
Istio Installation (Production)
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: production-istio
spec:
profile: default
meshConfig:
accessLogFile: /dev/stdout
accessLogFormat: |
{"time":"%START_TIME%","method":"%REQ(:METHOD)%","path":"%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%","code":"%RESPONSE_CODE%","duration":"%DURATION%","upstream":"%UPSTREAM_CLUSTER%"}
enableTracing: true
defaultConfig:
tracing:
sampling: 1.0 # 1% in production
holdApplicationUntilProxyStarts: true
outboundTrafficPolicy:
mode: REGISTRY_ONLY # Strict mode
components:
pilot:
k8s:
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 1000m
memory: 4Gi
hpaSpec:
minReplicas: 2
maxReplicas: 5
ingressGateways:
- name: istio-ingressgateway
enabled: true
k8s:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 2000m
memory: 1Gi
hpaSpec:
minReplicas: 2
maxReplicas: 10
service:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
name: http2
- port: 443
targetPort: 8443
name: https
values:
global:
proxy:
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 500m
memory: 256Mi
Namespace Injection
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
istio-injection: enabled
# For revision-based injection
# istio.io/rev: stable
VirtualService for Routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: api-server
namespace: production
spec:
hosts:
- api-server
- api.example.com
gateways:
- mesh # Internal mesh traffic
- api-gateway # External traffic
http:
# Header-based routing
- match:
- headers:
x-api-version:
exact: "v2"
route:
- destination:
host: api-server
subset: v2
# Canary routing (weight-based)
- match:
- uri:
prefix: /api
route:
- destination:
host: api-server
subset: stable
weight: 90
- destination:
host: api-server
subset: canary
weight: 10
# Retry configuration
retries:
attempts: 3
perTryTimeout: 2s
retryOn: gateway-error,connect-failure,retriable-4xx
# Timeout
timeout: 30s
# Fault injection (testing)
# fault:
# delay:
# percentage:
# value: 10
# fixedDelay: 5s
DestinationRule for Traffic Policy
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: api-server
namespace: production
spec:
host: api-server
trafficPolicy:
connectionPool:
tcp:
maxConnections: 1000
connectTimeout: 10s
http:
h2UpgradePolicy: UPGRADE
http1MaxPendingRequests: 1000
http2MaxRequests: 1000
maxRequestsPerConnection: 100
maxRetries: 3
loadBalancer:
simple: LEAST_REQUEST
localityLbSetting:
enabled: true
failover:
- from: us-east-1
to: us-west-2
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
minHealthPercent: 30
subsets:
- name: stable
labels:
version: stable
- name: canary
labels:
version: canary
- name: v2
labels:
version: v2
trafficPolicy:
connectionPool:
http:
http2MaxRequests: 500 # Subset-specific limits
Circuit Breaker Pattern
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: payment-service
spec:
host: payment-service
trafficPolicy:
outlierDetection:
# Consecutive errors before ejection
consecutive5xxErrors: 3
consecutiveGatewayErrors: 3
# Check interval
interval: 10s
# Ejection duration
baseEjectionTime: 30s
# Max ejected hosts
maxEjectionPercent: 100
# Minimum healthy hosts
minHealthPercent: 0
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 100
http2MaxRequests: 100
PeerAuthentication (mTLS)
# Cluster-wide strict mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
---
# Namespace-level (override)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: production-mtls
namespace: production
spec:
mtls:
mode: STRICT
# Port-level exception
portLevelMtls:
8080:
mode: PERMISSIVE # For legacy clients
AuthorizationPolicy
# Default deny for namespace
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: production
spec:
{} # Empty spec = deny all
---
# Allow specific traffic
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: api-server-policy
namespace: production
spec:
selector:
matchLabels:
app: api-server
action: ALLOW
rules:
# Allow from frontend
- from:
- source:
principals: ["cluster.local/ns/production/sa/frontend"]
to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/*"]
# Allow from ingress gateway
- from:
- source:
namespaces: ["istio-system"]
to:
- operation:
methods: ["GET", "POST", "PUT", "DELETE"]
paths: ["/api/*"]
---
# JWT validation
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
name: jwt-auth
namespace: production
spec:
selector:
matchLabels:
app: api-server
jwtRules:
- issuer: "https://auth.example.com"
jwksUri: "https://auth.example.com/.well-known/jwks.json"
audiences:
- "api.example.com"
forwardOriginalToken: true
---
# Require JWT
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: require-jwt
namespace: production
spec:
selector:
matchLabels:
app: api-server
action: ALLOW
rules:
- from:
- source:
requestPrincipals: ["*"] # Require valid JWT
when:
- key: request.auth.claims[iss]
values: ["https://auth.example.com"]
Istio Gateway
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: api-gateway
namespace: production
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: api-tls-cert
hosts:
- "api.example.com"
- "*.api.example.com"
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "api.example.com"
tls:
httpsRedirect: true
Gateway API (Kubernetes native)
# Gateway Class
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: istio
spec:
controllerName: istio.io/gateway-controller
---
# Gateway
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: api-gateway
namespace: production
spec:
gatewayClassName: istio
listeners:
- name: https
port: 443
protocol: HTTPS
hostname: "*.example.com"
tls:
mode: Terminate
certificateRefs:
- name: wildcard-cert
allowedRoutes:
namespaces:
from: Selector
selector:
matchLabels:
mesh-enabled: "true"
---
# HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: api-route
namespace: production
spec:
parentRefs:
- name: api-gateway
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1
backendRefs:
- name: api-v1
port: 80
weight: 100
- matches:
- path:
type: PathPrefix
value: /v2
backendRefs:
- name: api-v2
port: 80
weight: 90
- name: api-v2-canary
port: 80
weight: 10
Telemetry Configuration
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: mesh-telemetry
namespace: istio-system
spec:
tracing:
- providers:
- name: otel-collector
randomSamplingPercentage: 1.0
accessLogging:
- providers:
- name: otel-collector
metrics:
- providers:
- name: prometheus
overrides:
- match:
metric: ALL_METRICS
mode: CLIENT_AND_SERVER
tagOverrides:
request_protocol:
operation: UPSERT
value: request.protocol
Service Mesh Metrics
# Request rate by service
sum(rate(istio_requests_total{reporter="destination"}[5m])) by (destination_service)
# Error rate
sum(rate(istio_requests_total{reporter="destination",response_code=~"5.*"}[5m]))
/ sum(rate(istio_requests_total{reporter="destination"}[5m])) * 100
# P99 latency
histogram_quantile(0.99,
sum(rate(istio_request_duration_milliseconds_bucket{reporter="destination"}[5m]))
by (destination_service, le)
)
# Circuit breaker ejections
sum(rate(envoy_cluster_outlier_detection_ejections_total[5m])) by (cluster_name)
Linkerd Installation
# Install CLI
curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh
# Install to cluster
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -
# Verify
linkerd check
Linkerd Service Profile
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
name: api-server.production.svc.cluster.local
namespace: production
spec:
routes:
- name: GET /api/users
condition:
method: GET
pathRegex: /api/users/[^/]+
responseClasses:
- condition:
status:
min: 500
max: 599
isFailure: true
timeout: 5s
isRetryable: true
retryBudget:
retryRatio: 0.2
minRetriesPerSecond: 10
ttl: 10s
Istio Multi-Cluster (Primary-Remote)
# On primary cluster
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: primary
spec:
values:
global:
meshID: mesh1
multiCluster:
clusterName: cluster1
network: network1
---
# Create remote secret
istioctl create-remote-secret \
--context=cluster2 \
--name=cluster2 | kubectl apply -f - --context=cluster1
Cross-Cluster Service Discovery
# ServiceEntry for remote service
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: remote-api
namespace: production
spec:
hosts:
- api.cluster2.global
location: MESH_INTERNAL
ports:
- number: 80
name: http
protocol: HTTP
resolution: DNS
endpoints:
- address: api.production.svc.cluster2.global
ports:
http: 80
Setup Phase
Traffic Management
Security
Operations
| Metric | Target |
|---|---|
| mTLS coverage | 100% production |
| Request success rate | >99.9% |
| P99 latency overhead | <5ms |
| Control plane uptime | 99.99% |
| Policy sync time | <10 seconds |
| Circuit breaker effectiveness | <1% cascade failures |
| Multi-cluster failover | <30 seconds |
| Sidecar memory | <100Mi per pod |
Service unreachable via mesh?
|
+-- Check: istioctl proxy-status
|
+-- Proxy not synced
| |
| +-- Check istiod logs
| +-- Check proxy connectivity
| +-- Verify xDS stream
|
+-- Proxy synced but failing
|
+-- Check: istioctl proxy-config cluster <pod>
| |
| +-- Endpoint missing? Check Service
| +-- Endpoint unhealthy? Check Pod
|
+-- Check: istioctl analyze
|
+-- Configuration errors?
+-- Policy conflicts?
+-- mTLS mismatch?
| Issue | Root Cause | Resolution |
|---|---|---|
| 503 errors | No healthy upstream | Check endpoints, outlier detection |
| mTLS handshake failed | Cert mismatch | Check PeerAuthentication mode |
| Request timeout | Circuit open | Check outlier detection settings |
| 403 Forbidden | AuthorizationPolicy | Add allow rule |
| Sidecar not injected | Label missing | Add istio-injection label |
| High latency | Proxy resource limits | Increase sidecar resources |
| Config not applied | Pilot sync delay | Check istiod health |
# Proxy status
istioctl proxy-status
istioctl proxy-config cluster <pod> -n <ns>
istioctl proxy-config route <pod> -n <ns>
istioctl proxy-config endpoint <pod> -n <ns>
# Analyze configuration
istioctl analyze -n production
istioctl analyze --all-namespaces
# Debug proxy
istioctl dashboard envoy <pod> -n <ns>
kubectl logs <pod> -c istio-proxy -n <ns>
# mTLS status
istioctl authn tls-check <pod> <service>
# Mesh status
istioctl version
istioctl verify-install
# Linkerd debugging
linkerd check
linkerd viz stat deployment -n production
linkerd viz tap deployment/api-server -n production
| Challenge | Solution |
|---|---|
| Sidecar overhead | Resource tuning, eBPF (Cilium) |
| mTLS migration | Gradual rollout, PERMISSIVE mode |
| Complexity | Start simple, Linkerd for basic needs |
| Debugging | Enable access logs, tracing |
| Multi-cluster | Shared root CA, proper network |
| Version upgrades | Canary control plane, revision tags |
| Legacy integration | PERMISSIVE mode, ServiceEntry |
| Cost | Right-size sidecars, sampling |
Use this agent to verify that a Python Agent SDK application is properly configured, follows SDK best practices and documentation recommendations, and is ready for deployment or testing. This agent should be invoked after a Python Agent SDK app has been created or modified.
Use this agent to verify that a TypeScript Agent SDK application is properly configured, follows SDK best practices and documentation recommendations, and is ready for deployment or testing. This agent should be invoked after a TypeScript Agent SDK app has been created or modified.