Vapora/docs/architecture/adr/0002-kubernetes-deployment-strategy.md

# ADR 0002: Kubernetes Deployment Strategy for kagent Integration

**Status:** Accepted

**Date:** 2026-02-07

**Authors:** VAPORA Team

## Context

kagent integration required a Kubernetes-native deployment strategy that:

- Supports development and production environments
- Maintains A2A protocol connectivity with VAPORA
- Enables horizontal scaling
- Ensures high availability in production
- Minimizes operational complexity
- Facilitates updates and configuration changes

## Decision

We adopted a **Kustomize-based deployment strategy** with environment-specific overlays:

```
kubernetes/kagent/
├── base/              # Environment-agnostic base
│   ├── namespace.yaml
│   ├── rbac.yaml
│   ├── configmap.yaml
│   ├── statefulset.yaml
│   └── service.yaml
├── overlays/
│   ├── dev/          # Development: 1 replica, debug logging
│   └── prod/         # Production: 5 replicas, HA
```

### Key Design Decisions

1. **StatefulSet over Deployment**
   - Provides stable pod identities
   - Supports ordered startup/shutdown
   - Compatible with persistent volumes

2. **Kustomize over Helm**
   - Native Kubernetes tooling (kubectl)
   - YAML-based, no templating language
   - Easier code review of actual manifests
   - Lower complexity for our use case

3. **Separate dev/prod Overlays**
   - Code reuse via base inheritance
   - Clear environment differentiation
   - Easy to add staging, testing, etc.
   - Single source of truth for base configuration

4. **ConfigMap-based A2A Integration**
   - Runtime configuration without rebuilding images
   - Environment-specific values (discovery interval, etc.)
   - Easy rollback via kubectl rollout

5. **Pod Anti-Affinity**
   - Development: Preferred (best-effort distribution)
   - Production: Required (strict node separation)
   - Prevents single-node failure modes

## Rationale

**Why Kustomize?**
- No external dependencies or DSLs to learn
- kubectl integration (no new tools for operators)
- Transparent YAML (easier auditing)
- Suitable for our scale (not complex microservices)

**Why StatefulSet?**
- Pod names are predictable (kagent-0, kagent-1, etc.)
- Simplifies debugging and troubleshooting
- Compatible with persistent volumes for future phase
- A2A clients can reference stable endpoints

**Why ConfigMap for A2A settings?**
- No image rebuild required for config changes
- Easy to adjust discovery intervals per environment
- Transparent configuration in Git
- Can be patched/updated at runtime

**Why separate dev/prod?**
- Resource requirements differ dramatically
- Logging levels should differ
- Scaling policies differ
- Both treated equally in code review

## Consequences

**Positive:**
- Identical code paths in dev and prod (just different replicas/resources)
- Easy to add more environments (staging, testing, etc.)
- Standard kubectl workflows
- Clear separation of concerns
- Configuration in version control
- No external tools beyond kubectl

**Negative:**
- Manual pod management (no autoscaling annotations initially)
- Kustomize has limitations for complex overlays
- No templating language flexibility
- Requires understanding of Kubernetes primitives

## Alternatives Considered

1. **Helm Charts**
   - Rejected: Go templates more complex than needed
   - Revisit if complexity demands it

2. **Deployment + Horizontal Pod Autoscaler**
   - Rejected: StatefulSet provides stability needed for debugging
   - Can layer HPA over StatefulSet if needed

3. **All-in-one manifest**
   - Rejected: Code duplication between dev/prod
   - No clear environment separation

## Migration Path

1. **Current:** Kustomize with manual scaling
2. **Phase 2:** Add HorizontalPodAutoscaler overlay
3. **Phase 3:** Add Prometheus/Grafana monitoring
4. **Phase 4:** Integrate with Istio service mesh

## File Structure Rationale

```
base/                          # Applied to all environments
├── namespace.yaml             # Single kagent namespace
├── rbac.yaml                  # Shared RBAC policies
├── configmap.yaml             # Base A2A configuration
├── statefulset.yaml           # Base deployment template
└── service.yaml               # Shared services

overlays/dev/                  # Development-specific
├── kustomization.yaml         # Patch application order
└── statefulset-patch.yaml     # 1 replica, lower resources

overlays/prod/                 # Production-specific
├── kustomization.yaml         # Patch application order
└── statefulset-patch.yaml     # 5 replicas, higher resources
```

## Related Decisions

- ADR-0001: A2A Protocol Implementation
- ADR-0003: Error Handling and Protocol Compliance

## References

- Kustomize Documentation: https://kustomize.io/
- Kubernetes StatefulSets: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
- kubectl: https://kubernetes.io/docs/reference/kubectl/