544 lines
15 KiB
Markdown
Raw Normal View History

2025-10-07 11:05:08 +01:00
# Cilium Task Service
## Overview
The Cilium task service provides a complete installation and configuration of [Cilium](https://cilium.io/), a cloud-native networking, observability, and security solution built on eBPF. Cilium provides advanced networking features including load balancing, network policies, and service mesh capabilities for Kubernetes environments.
## Features
### Core Networking
- **eBPF-based Networking** - High-performance, programmable networking using eBPF
- **Container Network Interface (CNI)** - Full CNI plugin for Kubernetes
- **Load Balancing** - Layer 3/4 and Layer 7 load balancing
- **Network Address Translation (NAT)** - Advanced NAT capabilities
- **IP Address Management (IPAM)** - Flexible IP address allocation
### Security & Policy
- **Network Policies** - Kubernetes NetworkPolicy and CiliumNetworkPolicy
- **Identity-based Security** - Application-aware security policies
- **Encryption** - Transparent encryption with IPSec and WireGuard
- **Runtime Security** - Real-time threat detection and prevention
- **Service Mesh Security** - mTLS and authentication for service mesh
### Observability
- **Hubble** - Built-in network observability platform
- **Flow Monitoring** - Real-time network flow visibility
- **Service Map** - Visual service dependency mapping
- **Metrics & Monitoring** - Prometheus metrics integration
- **Distributed Tracing** - Jaeger integration for request tracing
### Advanced Features
- **Service Mesh** - Layer 7 proxy and service mesh capabilities
- **Multi-Cluster** - Cross-cluster connectivity and policy
- **Gateway API** - Support for Kubernetes Gateway API
- **BGP Support** - Border Gateway Protocol for advanced routing
- **Bandwidth Management** - Traffic shaping and QoS
## Configuration
### Basic Configuration
```kcl
cilium: Cilium = {
name: "cilium"
version: "1.15.0"
cluster_name: "kubernetes"
mode: "standard"
}
```
### Production Configuration
```kcl
cilium: Cilium = {
name: "cilium"
version: "1.15.0"
cluster_name: "production-cluster"
mode: "production"
networking: {
ipam: {
mode: "kubernetes"
cluster_pool_ipv4_cidr: "10.0.0.0/8"
cluster_pool_ipv4_mask_size: 24
}
tunnel: "vxlan"
native_routing_cidr: "10.0.0.0/8"
}
security: {
network_policy: true
host_firewall: true
encryption: {
enabled: true
type: "ipsec"
}
}
hubble: {
enabled: true
relay: {
enabled: true
replicas: 2
}
ui: {
enabled: true
ingress: {
enabled: true
hosts: ["hubble.company.com"]
}
}
}
operator: {
replicas: 2
resources: {
limits: {
cpu: "1000m"
memory: "1Gi"
}
requests: {
cpu: "100m"
memory: "128Mi"
}
}
}
agent: {
resources: {
limits: {
cpu: "4000m"
memory: "4Gi"
}
requests: {
cpu: "100m"
memory: "512Mi"
}
}
}
}
```
### Service Mesh Configuration
```kcl
cilium: Cilium = {
name: "cilium"
version: "1.15.0"
# ... base configuration
service_mesh: {
enabled: true
envoy: {
enabled: true
log_level: "info"
}
ingress: {
enabled: true
load_balancer_class: "cilium"
}
gateway_api: {
enabled: true
secret_namespace: "cilium-secrets"
}
}
l7_proxy: true
enable_l7_proxy_stats: true
proxy_prometheus_port: 9964
}
```
### Multi-Cluster Configuration
```kcl
cilium: Cilium = {
name: "cilium"
version: "1.15.0"
# ... base configuration
cluster: {
name: "cluster-1"
id: 1
}
clustermesh: {
enabled: true
use_apiserver: true
apiserver: {
replicas: 3
tls: {
auto: {
enabled: true
}
}
}
config: {
enabled: true
}
}
external_workloads: {
enabled: true
}
}
```
### Advanced Security Configuration
```kcl
cilium: Cilium = {
name: "cilium"
version: "1.15.0"
# ... base configuration
security: {
network_policy: true
host_firewall: true
encryption: {
enabled: true
type: "wireguard"
}
policy_enforcement: "default"
host_protection: {
enabled: true
enforce: true
}
auth: {
mutual: {
spire: {
enabled: true
install: true
}
}
}
}
bpf: {
masquerade: true
host_routing: true
tproxy: true
}
enable_runtime_device_id: true
enable_bandwidth_manager: true
}
```
## Usage
### Deploy Cilium
```bash
./core/nulib/provisioning taskserv create cilium --infra <infrastructure-name>
```
### List Available Task Services
```bash
./core/nulib/provisioning taskserv list
```
### SSH to Cilium Server
```bash
./core/nulib/provisioning server ssh <cilium-server>
```
### Service Management
```bash
# Check Cilium status
cilium status
# Check connectivity
cilium connectivity test
# Check cluster mesh status
cilium clustermesh status
# View Cilium configuration
cilium config view
```
### Network Policy Management
```bash
# List network policies
kubectl get networkpolicies --all-namespaces
kubectl get ciliumnetworkpolicies --all-namespaces
# Check policy enforcement
cilium endpoint list
# View policy verdicts
hubble observe --verdict DENIED
```
### Hubble Observability
```bash
# Enable Hubble
cilium hubble enable
# Port forward to Hubble UI
cilium hubble ui
# Observe network flows
hubble observe
# List flows with filters
hubble observe --from-pod default/frontend --to-service default/backend
# Check service map
hubble list nodes
```
### Troubleshooting Commands
```bash
# Check agent status
cilium status --verbose
# Validate installation
cilium connectivity test
# Check eBPF maps
cilium map list
# View agent logs
kubectl logs -n kube-system -l k8s-app=cilium
# Debug connectivity
cilium-dbg endpoint list
cilium-dbg policy trace
```
## Architecture
### System Architecture
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Applications │────│ Cilium Agent │────│ eBPF Kernel │
│ │ │ │ │ │
│ • Pods │ │ • CNI Plugin │ │ • Network │
│ • Services │────│ • Policy Engine │────│ • Security │
│ • Ingress │ │ • Load Balancer │ │ • Observability │
│ • Gateway API │ │ • Service Mesh │ │ • Performance │
└─────────────────┘ └──────────────────┘ └─────────────────┘
```
### Component Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Cilium Platform │
├─────────────────────────────────────────────────────────────┤
│ Hubble (Observability) │ Service Mesh │ Security │
│ │ │ │
│ • Flow Monitoring │ • L7 Proxy │ • Network │
│ • Service Map │ • Ingress │ Policies │
│ • Metrics │ • Gateway API │ • Encryption │
│ • Distributed Tracing │ • Load Balancing │ • Identity │
├─────────────────────────────────────────────────────────────┤
│ Cilium Agent (DaemonSet) │
├─────────────────────────────────────────────────────────────┤
│ eBPF Programs (Kernel Space) │
└─────────────────────────────────────────────────────────────┘
```
### Network Topology
- **Pod-to-Pod Communication** - Direct eBPF-based forwarding
- **Service Load Balancing** - In-kernel load balancing without kube-proxy
- **Ingress/Egress** - Gateway API and Ingress controller
- **Network Policies** - Identity-based security enforcement
- **Cross-Cluster** - Cluster mesh for multi-cluster networking
## Supported Operating Systems
- Ubuntu 20.04+ / Debian 11+
- CentOS 8+ / RHEL 8+ / Fedora 35+
- Amazon Linux 2+
## System Requirements
### Minimum Requirements
- **Kernel Version**: Linux 4.19.57+ (5.4+ recommended)
- **CPU**: 2 cores (4 cores recommended)
- **RAM**: 2GB (4GB+ recommended)
- **Architecture**: x86_64, arm64
### Production Requirements
- **Kernel Version**: Linux 5.4+
- **CPU**: 4+ cores
- **RAM**: 8GB+ (depends on cluster size)
- **Network**: 10Gbps+ for high-throughput workloads
### Kernel Features
- **eBPF JIT compiler** - Required for optimal performance
- **CONFIG_BPF=y** - eBPF support
- **CONFIG_BPF_SYSCALL=y** - eBPF syscall support
- **CONFIG_NET_CLS_BPF=m** - BPF classifier
- **CONFIG_BPF_JIT=y** - eBPF JIT compiler
## Troubleshooting
### Installation Issues
```bash
# Check kernel compatibility
cilium-dbg version
# Verify eBPF support
cilium-dbg status --verbose
# Check system requirements
cilium-dbg status --all-health
# Validate configuration
cilium config validate
```
### Networking Issues
```bash
# Test connectivity between pods
cilium connectivity test
# Check endpoint status
cilium endpoint list
# Debug policy enforcement
cilium policy trace
# Check service load balancing
cilium service list
```
### Performance Issues
```bash
# Check eBPF program statistics
cilium-dbg bpf stats
# Monitor CPU and memory usage
kubectl top pods -n kube-system -l k8s-app=cilium
# Check for packet drops
cilium-dbg metrics list | grep drop
# Analyze network latency
hubble observe --follow
```
### Policy Issues
```bash
# Check policy status
cilium endpoint list -o jsonpath='{range .items[*]}{.status.identity.id}{"\t"}{.status.policy.enforcement}{"\n"}{end}'
# Debug policy enforcement
cilium policy trace --src-identity <id> --dst-identity <id>
# View applied policies
kubectl get ciliumnetworkpolicies -A
```
### Hubble Issues
```bash
# Check Hubble status
cilium hubble status
# Restart Hubble relay
kubectl rollout restart deployment/hubble-relay -n kube-system
# Check Hubble UI
kubectl port-forward -n kube-system svc/hubble-ui 12000:80
```
## Security Considerations
### Network Security
- **Zero Trust Networking** - Default deny with explicit allow policies
- **Identity-based Security** - Cryptographic identity for all workloads
- **Encryption** - Transparent encryption with IPSec or WireGuard
- **Runtime Protection** - Real-time threat detection and response
### Policy Management
- **Least Privilege** - Implement minimal required network access
- **Segmentation** - Use network policies for micro-segmentation
- **Compliance** - Built-in compliance reporting and auditing
- **Threat Detection** - Continuous monitoring for suspicious activity
### Operational Security
- **RBAC Integration** - Kubernetes RBAC for policy management
- **Audit Logging** - Comprehensive audit trail for all network events
- **Certificate Management** - Automatic certificate rotation
- **Secure Defaults** - Security-first default configuration
## Performance Optimization
### eBPF Optimization
- **JIT Compilation** - Enable eBPF JIT for optimal performance
- **CPU Affinity** - Pin Cilium agents to specific CPU cores
- **Kernel Bypass** - Use XDP for ultra-low latency applications
- **Memory Management** - Tune eBPF map sizes for workload
### Network Performance
- **Native Routing** - Use native routing when possible
- **Hardware Offload** - Leverage NIC hardware acceleration
- **Bandwidth Management** - Configure traffic shaping and QoS
- **Connection Pooling** - Optimize connection reuse
### Monitoring Optimization
- **Selective Monitoring** - Monitor only critical flows
- **Metric Filtering** - Reduce metric cardinality
- **Sampling** - Use flow sampling for high-traffic environments
- **Resource Limits** - Set appropriate resource limits
## Integration Examples
### Prometheus Monitoring
```yaml
# ServiceMonitor for Cilium metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cilium-agent
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: cilium
endpoints:
- port: prometheus
interval: 30s
path: /metrics
```
### Grafana Dashboard
```yaml
# Grafana dashboard configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: cilium-dashboard
data:
cilium-overview.json: |
{
"dashboard": {
"title": "Cilium Overview",
"panels": [
{
"title": "Network Policy Drops",
"type": "graph",
"targets": [
{
"expr": "rate(cilium_drop_count_total[5m])"
}
]
}
]
}
}
```
### Network Policy Examples
```yaml
# Example CiliumNetworkPolicy
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: frontend-to-backend
namespace: default
spec:
endpointSelector:
matchLabels:
app: frontend
egress:
- toEndpoints:
- matchLabels:
app: backend
toPorts:
- ports:
- port: "8080"
protocol: TCP
```
## Resources
- **Official Documentation**: [docs.cilium.io](https://docs.cilium.io)
- **GitHub Repository**: [cilium/cilium](https://github.com/cilium/cilium)
- **Hubble Documentation**: [docs.cilium.io/en/stable/observability/hubble](https://docs.cilium.io/en/stable/observability/hubble)
- **eBPF Documentation**: [ebpf.org](https://ebpf.org)
- **Community**: [cilium.slack.com](https://cilium.slack.com)