# ADR-009: Istio Service Mesh para Kubernetes **Status**: Accepted | Implemented **Date**: 2024-11-01 **Deciders**: Kubernetes Architecture Team **Technical Story**: Adding zero-trust security and traffic management for microservices in K8s --- ## Decision Usar **Istio** como service mesh para mTLS, traffic management, rate limiting, y observability en Kubernetes. --- ## Rationale 1. **mTLS Out-of-Box**: Automático TLS entre servicios sin código cambios 2. **Zero-Trust**: Enforced mutual TLS por defecto 3. **Traffic Management**: Circuit breakers, retries, timeouts sin lógica en aplicación 4. **Observability**: Tracing automático, metrics collection 5. **VAPORA Multiservice**: 4 deployments (backend, agents, LLM router, frontend) necesitan seguridad inter-service --- ## Alternatives Considered ### ❌ Plain Kubernetes Networking - **Pros**: Simpler setup, fewer components - **Cons**: No mTLS, no traffic policies, manual observability ### ❌ Linkerd (Minimal Service Mesh) - **Pros**: Lighter weight than Istio - **Cons**: Less feature-rich, smaller ecosystem ### ✅ Istio (CHOSEN) - Industry standard, feature-rich, VAPORA deployment compatible --- ## Trade-offs **Pros**: - ✅ Automatic mTLS between services - ✅ Declarative traffic policies (no code changes) - ✅ Circuit breakers and retries built-in - ✅ Integrated observability (tracing, metrics) - ✅ Gradual rollout support (canary deployments) - ✅ Rate limiting and authentication policies **Cons**: - ⚠️ Operational complexity (data plane + control plane) - ⚠️ Memory overhead per pod (sidecar proxy) - ⚠️ Debugging complexity (multiple proxy layers) - ⚠️ Certification/certificate rotation management --- ## Implementation **Installation**: ```bash # Install Istio istioctl install --set profile=production -y # Enable sidecar injection for namespace kubectl label namespace vapora istio-injection=enabled # Verify installation kubectl get pods -n istio-system ``` **Service Mesh Configuration**: ```yaml # kubernetes/platform/istio-config.yaml # Virtual Service for traffic policies apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: vapora-backend namespace: vapora spec: hosts: - vapora-backend http: - match: - uri: prefix: /api/health route: - destination: host: vapora-backend port: number: 8001 timeout: 5s retries: attempts: 3 perTryTimeout: 2s --- # Destination Rule for circuit breaker apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: name: vapora-backend namespace: vapora spec: host: vapora-backend trafficPolicy: connectionPool: tcp: maxConnections: 100 http: http1MaxPendingRequests: 100 http2MaxRequests: 1000 outlierDetection: consecutive5xxErrors: 5 interval: 30s baseEjectionTime: 30s --- # Authorization Policy (deny all by default) apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: vapora-default-deny namespace: vapora spec: {} # Default deny-all --- # Allow backend to agents apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: allow-backend-to-agents namespace: vapora spec: rules: - from: - source: principals: ["cluster.local/ns/vapora/sa/vapora-backend"] to: - operation: ports: ["8002"] ``` **Key Files**: - `/kubernetes/platform/istio-config.yaml` (Istio configuration) - `/kubernetes/base/` (Deployment manifests with sidecar injection) - `istioctl` commands for traffic management --- ## Verification ```bash # Check sidecar injection kubectl get pods -n vapora -o jsonpath='{.items[*].spec.containers[*].name}' | grep istio-proxy # List virtual services kubectl get virtualservices -n vapora # Check mTLS status istioctl analyze -n vapora # Monitor traffic between services kubectl logs -n vapora deployment/vapora-backend -c istio-proxy --tail 20 # Test circuit breaker (should retry and fail gracefully) kubectl exec -it deployment/vapora-backend -n vapora -- \ curl -v http://vapora-agents:8002/health -X GET \ --max-time 10 # Verify authorization policies kubectl get authorizationpolicies -n vapora # Check metrics collection kubectl port-forward -n istio-system svc/prometheus 9090:9090 # Open http://localhost:9090 and query: rate(istio_request_total[1m]) ``` **Expected Output**: - All pods have istio-proxy sidecar - VirtualServices and DestinationRules configured - mTLS enabled between services - Circuit breaker protects against cascading failures - Authorization policies enforce least-privilege access - Metrics collected for all inter-service traffic --- ## Consequences ### Operational - Certificate rotation automatic (Istio CA) - Service-to-service debugging requires understanding proxy layers - Traffic policies applied without code redeployment ### Performance - Sidecar proxy adds ~5-10ms latency per call - Memory per pod: +50MB for proxy container - Worth the security/observability trade-off ### Debugging - Use `istioctl analyze` to diagnose issues - Envoy proxy logs in sidecar containers - Distributed tracing via Jaeger/Zipkin integration ### Scaling - Automatic load balancing via DestinationRule - Circuit breaker prevents thundering herd - Support for canary rollouts via traffic splitting --- ## References - [Istio Documentation](https://istio.io/latest/docs/) - [Istio Security](https://istio.io/latest/docs/concepts/security/) - `/kubernetes/platform/istio-config.yaml` (configuration) - [Prometheus Integration](https://istio.io/latest/docs/ops/integrations/prometheus/) --- **Related ADRs**: ADR-001 (Workspace), ADR-010 (Cedar Authorization)