Vapora/docs/adrs/0009-istio-service-mesh.md
Jesús Pérez 7110ffeea2
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: extend doc: adr, tutorials, operations, etc
2026-01-12 03:32:47 +00:00

5.6 KiB

ADR-009: Istio Service Mesh para Kubernetes

Status: Accepted | Implemented Date: 2024-11-01 Deciders: Kubernetes Architecture Team Technical Story: Adding zero-trust security and traffic management for microservices in K8s


Decision

Usar Istio como service mesh para mTLS, traffic management, rate limiting, y observability en Kubernetes.


Rationale

  1. mTLS Out-of-Box: Automático TLS entre servicios sin código cambios
  2. Zero-Trust: Enforced mutual TLS por defecto
  3. Traffic Management: Circuit breakers, retries, timeouts sin lógica en aplicación
  4. Observability: Tracing automático, metrics collection
  5. VAPORA Multiservice: 4 deployments (backend, agents, LLM router, frontend) necesitan seguridad inter-service

Alternatives Considered

Plain Kubernetes Networking

  • Pros: Simpler setup, fewer components
  • Cons: No mTLS, no traffic policies, manual observability

Linkerd (Minimal Service Mesh)

  • Pros: Lighter weight than Istio
  • Cons: Less feature-rich, smaller ecosystem

Istio (CHOSEN)

  • Industry standard, feature-rich, VAPORA deployment compatible

Trade-offs

Pros:

  • Automatic mTLS between services
  • Declarative traffic policies (no code changes)
  • Circuit breakers and retries built-in
  • Integrated observability (tracing, metrics)
  • Gradual rollout support (canary deployments)
  • Rate limiting and authentication policies

Cons:

  • ⚠️ Operational complexity (data plane + control plane)
  • ⚠️ Memory overhead per pod (sidecar proxy)
  • ⚠️ Debugging complexity (multiple proxy layers)
  • ⚠️ Certification/certificate rotation management

Implementation

Installation:

# Install Istio
istioctl install --set profile=production -y

# Enable sidecar injection for namespace
kubectl label namespace vapora istio-injection=enabled

# Verify installation
kubectl get pods -n istio-system

Service Mesh Configuration:

# kubernetes/platform/istio-config.yaml

# Virtual Service for traffic policies
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: vapora-backend
  namespace: vapora
spec:
  hosts:
  - vapora-backend
  http:
  - match:
    - uri:
        prefix: /api/health
    route:
    - destination:
        host: vapora-backend
        port:
          number: 8001
    timeout: 5s
    retries:
      attempts: 3
      perTryTimeout: 2s

---
# Destination Rule for circuit breaker
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: vapora-backend
  namespace: vapora
spec:
  host: vapora-backend
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s

---
# Authorization Policy (deny all by default)
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: vapora-default-deny
  namespace: vapora
spec:
  {} # Default deny-all

---
# Allow backend to agents
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-backend-to-agents
  namespace: vapora
spec:
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/vapora/sa/vapora-backend"]
    to:
    - operation:
        ports: ["8002"]

Key Files:

  • /kubernetes/platform/istio-config.yaml (Istio configuration)
  • /kubernetes/base/ (Deployment manifests with sidecar injection)
  • istioctl commands for traffic management

Verification

# Check sidecar injection
kubectl get pods -n vapora -o jsonpath='{.items[*].spec.containers[*].name}' | grep istio-proxy

# List virtual services
kubectl get virtualservices -n vapora

# Check mTLS status
istioctl analyze -n vapora

# Monitor traffic between services
kubectl logs -n vapora deployment/vapora-backend -c istio-proxy --tail 20

# Test circuit breaker (should retry and fail gracefully)
kubectl exec -it deployment/vapora-backend -n vapora -- \
  curl -v http://vapora-agents:8002/health -X GET \
  --max-time 10

# Verify authorization policies
kubectl get authorizationpolicies -n vapora

# Check metrics collection
kubectl port-forward -n istio-system svc/prometheus 9090:9090
# Open http://localhost:9090 and query: rate(istio_request_total[1m])

Expected Output:

  • All pods have istio-proxy sidecar
  • VirtualServices and DestinationRules configured
  • mTLS enabled between services
  • Circuit breaker protects against cascading failures
  • Authorization policies enforce least-privilege access
  • Metrics collected for all inter-service traffic

Consequences

Operational

  • Certificate rotation automatic (Istio CA)
  • Service-to-service debugging requires understanding proxy layers
  • Traffic policies applied without code redeployment

Performance

  • Sidecar proxy adds ~5-10ms latency per call
  • Memory per pod: +50MB for proxy container
  • Worth the security/observability trade-off

Debugging

  • Use istioctl analyze to diagnose issues
  • Envoy proxy logs in sidecar containers
  • Distributed tracing via Jaeger/Zipkin integration

Scaling

  • Automatic load balancing via DestinationRule
  • Circuit breaker prevents thundering herd
  • Support for canary rollouts via traffic splitting

References


Related ADRs: ADR-001 (Workspace), ADR-010 (Cedar Authorization)