Kubernetes Orchestration & Auto-Scaling
Production-grade Kubernetes deployment with horizontal, vertical, and event-driven auto-scaling for both the control plane and agent runtime.
Cluster Architecture#
Lobstack runs on a dedicated Kubernetes cluster with strict namespace isolation. The control plane (API server) and agent runtime (individual AI agent pods) are separated into distinct namespaces with independent scaling policies, resource quotas, and security contexts.
Pod Security Standards
Control Plane Deployment#
The Lobstack API runs as a multi-replica Deployment with anti-affinity rules to spread pods across availability zones. This ensures no single zone failure can take down the platform.
| Property | Value | Purpose |
|---|---|---|
| Min Replicas | 3 | Always-on availability across zones |
| Max Replicas | 20 (50 in production) | Handle traffic spikes |
| Strategy | RollingUpdate (maxSurge: 1, maxUnavailable: 0) | Zero-downtime deploys |
| PDB | minAvailable: 2 | Survive node drains and upgrades |
| Topology Spread | maxSkew: 1 per zone | Even distribution across AZs |
| Startup Probe | 5s interval, 12 failures | 60s grace period for cold starts |
| Security Context | runAsNonRoot, readOnlyRootFilesystem, drop ALL | Minimal attack surface |
Agent Runtime#
Each AI agent runs in its own isolated Kubernetes pod with a gVisor sandbox runtime. Agents are created dynamically when a user provisions a new agent and destroyed on teardown. The orchestrator manages the full lifecycle:
Dynamic Pod Creation
The K8s orchestrator creates a dedicated pod + ClusterIP service per agent with Vault-injected secrets.
gVisor Sandbox
Every agent pod runs with RuntimeClass: gvisor (runsc handler) — an application-level kernel that intercepts syscalls.
Resource Isolation
CPU/memory limits enforced per tier (starter: 1 vCPU/2GB → enterprise: 8 vCPU/16GB). ResourceQuota caps the namespace.
Network Isolation
NetworkPolicies prevent inter-agent communication. Each pod can only reach the Lobstack API and external AI APIs.
Ephemeral Workspace
Agent workspace is an emptyDir volume with size limits per tier (5GB → 50GB). Data is ephemeral to the pod lifecycle.
| Tier | CPU Request | CPU Limit | Memory | Workspace |
|---|---|---|---|---|
| Starter | 250m | 1 vCPU | 512Mi → 2Gi | 5 Gi |
| Standard | 500m | 2 vCPU | 1Gi → 4Gi | 10 Gi |
| Performance | 1 vCPU | 4 vCPU | 2Gi → 8Gi | 20 Gi |
| Enterprise | 2 vCPU | 8 vCPU | 4Gi → 16Gi | 50 Gi |
Auto-Scaling#
Lobstack uses three layers of auto-scaling to handle variable load efficiently — from steady-state traffic to sudden spikes in agent provisioning.
Horizontal Pod Autoscaler (HPA)#
The Lobstack API scales horizontally based on CPU utilization, memory utilization, and HTTP request rate.
KEDA Event-Driven Scaling#
Agent pods are scaled by KEDA based on the number of pending provisioning requests in the database. When users request new agents, KEDA detects the queue depth and spins up pods proactively.
Vertical Pod Autoscaler (VPA)#
VPA right-sizes resource requests based on actual usage patterns. It monitors CPU and memory consumption over time and adjusts requests to eliminate waste while preventing OOM kills.
Cluster Autoscaler#
When pods can't be scheduled due to insufficient node capacity, the cluster autoscaler provisions new nodes from the cloud provider. It uses a least-waste expander strategy and scales down idle nodes after 5 minutes.
| Parameter | Value | Description |
|---|---|---|
| scale-down-unneeded-time | 5 minutes | How long a node must be idle before removal |
| scale-down-utilization-threshold | 0.5 | Nodes below 50% utilization are candidates for removal |
| max-node-provision-time | 10 minutes | Timeout for new node to become ready |
| balance-similar-node-groups | true | Even distribution across node pools |
Node Pool Architecture#
The cluster uses dedicated node pools for different workload types, ensuring agents don't compete for resources with the control plane.
| Node Pool | Server Type | Count | Purpose |
|---|---|---|---|
| Control Plane | cpx31 (4 vCPU, 8 GB) | 3 | K8s API server, etcd, scheduler |
| API Workers | cpx21 (3 vCPU, 4 GB) | 3+ | Lobstack API, dashboard serving |
| Agent Workers | cpx41 (8 vCPU, 16 GB) | 5+ | gVisor-enabled agent pods |
Terraform managed
infra/terraform/modules/k8s-cluster/. Changes to cluster size are made through terraform plan and terraform apply.