Skip to main content

Command Palette

Search for a command to run...

The Invisible Load Spike: Why Your Autoscaler Fails When It Matters Most (And How to Fix It)

Updated
14 min read
The Invisible Load Spike: Why Your Autoscaler Fails When It Matters Most (And How to Fix It)
C

CNCF Ambassador, Author & Kubernetes Engineer. Check out my book AcingTheCKA.com!

Your autoscaler is a liar. It promises to scale your workloads when demand spikes, but in production, when revenue is on the line, it chokes. The HorizontalPodAutoscaler (HPA) controller loop runs every 15 seconds, metrics-server takes another 30 seconds to stabilize, and by the time new pods reach Ready your users have already refreshed their browsers and blamed your "slow API" on Twitter.

In dev environments, you can't wait for production traffic to validate autoscaling behavior. You need synthetic, deterministic load that exercises the entire control loop: HPA decision-making, scheduler placement, kubelet startup latency, and metrics collection. The polinux/stress image provides precisely this: a maintained, security-compliant CPU and memory workload generator that exposes every weakness in your autoscaling configuration before customers do.

Why polinux/stress Is the Right Tool

The polinux/stress image is an actively maintained fork of the Linux stress utility with modern base images that pass CVE scanning. It accepts --cpu N arguments to spawn N CPU-intensive workers and --vm with --vm-bytes for memory pressure. Unlike HTTP load generators (wrk, ab, locust), it isolates resource consumption from network I/O, making it ideal for validating:

  • HPA controller calculations: The controller queries metrics-server for pod-level CPU utilization, computes currentMetricValue / desiredMetricValue, and scales when the ratio exceeds thresholds​
  • Metrics-server scraping latency: The default scraping interval and CPU initialization period (5 minutes) can delay scaling decisions​

  • CFS throttling behavior: Containers requesting more CPU than their limit experience throttling via cpu.cfs_quota_us, visible in container_cpu_cfs_throttled_periods_total metrics​

  • Scheduler and kubelet latency: Measures time from HPA scale-up decision to pod Ready status, including image pulls and CNI setup

Compared to the unmaintained vish/stress, polinux/stress provides the same functionality while meeting modern security compliance requirements.

Hands-On Demo: HPA Scale-Up and Scale-Down

Use this Killercoda scenario to follow along!

Setup: Deploy the Workload with Zero-Load Baseline

Create a namespace and deployment starting in an idle state. This establishes a performance baseline before triggering load:

# stress-deployment.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: autoscale-demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cpu-stress
  namespace: autoscale-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cpu-stress
  template:
    metadata:
      labels:
        app: cpu-stress
    spec:
      containers:
      - name: stress-ctr
        image: polinux/stress
        resources:
          limits:
            cpu: "1"
            memory: "512Mi"
          requests:
            cpu: "500m"
            memory: "256Mi"
        command: ["/bin/sh", "-c"]
        args: ["sleep infinity"]

Deploy it:

kubectl apply -f stress-deployment.yaml

# Wait for pod to be ready
kubectl wait --for=condition=ready pod -l app=cpu-stress -n autoscale-demo --timeout=60s

# Verify zero-load state
kubectl top pod -n autoscale-demo
# Expected: cpu-stress-xxxxx   ~1m   ~10Mi (near-zero CPU usage)

Configure HPA with CPU Target

Create an HPA targeting 50% CPU utilization:​

# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cpu-stress-hpa
  namespace: autoscale-demo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cpu-stress
  minReplicas: 1
  maxReplicas: 6
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 60   # Faster scale-down for demo
    scaleUp:
      stabilizationWindowSeconds: 0    # Immediate scale-up

Apply and verify:

kubectl apply -f hpa.yaml

# Wait for metrics to populate (15-30 seconds)
sleep 30

# Check HPA baseline
kubectl get hpa -n autoscale-demo -w

Establish Baseline Metrics

Before triggering load, validate that metrics-server is functioning and HPA recognizes the idle state:

# Describe HPA to see events and current status
kubectl describe hpa cpu-stress-hpa -n autoscale-demo

# Expected output includes:
# Metrics:
#   Resource cpu on pods (as a percentage of request):  0% (0m) / 50%
# Current number of replicas: 1
# ScalingActive condition: True

This confirms the HPA control loop is active and baseline CPU is near zero.​

Trigger Scale-Up with High CPU Load

Patch the deployment to generate CPU load:​

kubectl patch deployment cpu-stress -n autoscale-demo --type='json' \
  -p='[
    {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "100m"},
    {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "128Mi"},
    {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "500m"},
    {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "256Mi"},
    {"op": "replace", "path": "/spec/template/spec/containers/0/command", "value": ["stress"]},
    {"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["--cpu", "2", "--verbose"]}
  ]'


# Wait for rollout
kubectl rollout status deployment/cpu-stress -n autoscale-demo

This combined patch accomplishes multiple objectives:

Resource Optimization for Single-Node:

  • CPU request: 500m → 100m (allows up to 6 pods on a node with ~1 CPU available)

  • CPU limit: 1000m → 500m (prevents individual pods from consuming entire node CPU)

  • Memory request: 256Mi → 128Mi (reduces memory pressure)

  • Memory limit: 512Mi → 256Mi (prevents OOM issues on small nodes)

Load Generation:

  • Command change: ["/bin/sh", "-c"]["stress"]

  • 2 CPU workers: Spawns 2 stress processes attempting to consume CPU continuously

  • No timeout: Runs indefinitely until patched back to sleep

With the new configuration, each pod creates:​

  • Actual usage: ~200m per pod (2 workers generating load)

  • Utilization: 200m / 100m request = 200%

  • HPA action: Current utilization (200%) exceeds target (50%), triggers scale-up

Monitor scale-up with timestamps:

# Watch HPA in real-time (updates every 15s)
kubectl get hpa cpu-stress-hpa -n autoscale-demo -w

# In another terminal, watch pods scaling
kubectl get pods -n autoscale-demo -w

Expected timeline:​

  • T+15-30s: Metrics-server scrapes kubelet, updates cache

  • T+30-45s: HPA polls metrics-server, sees 200% utilization

  • T+45s: HPA updates deployment replicas (calculated as ceil(1 * (200/50)) = 4)

  • T+60-90s: New pods reach Running and Ready state

  • T+90-120s: Metrics stabilize across all pods

Expected HPA output after scale-up:

NAME              REFERENCE               TARGETS    MINPODS   MAXPODS   REPLICAS
cpu-stress-hpa    Deployment/cpu-stress   200%/50%   1         6         4
# After pods stabilize and share 1 CPU on single node:
cpu-stress-hpa    Deployment/cpu-stress   166%/50%   1         6         6

The HPA will scale to maximum replicas (6) because each pod with 2 CPU workers generates approximately 200m CPU load, creating 200% utilization (200m actual / 100m request). After scaling to 6 pods on a single-node cluster with 1 total CPU, each pod receives ~166m average, maintaining 166% utilization and keeping HPA at maxReplicas.

Verify Scaling Behavior and Resource Distribution

# Check final pod count
kubectl get pods -n autoscale-demo -l app=cpu-stress

# Check per-pod CPU usage (wait 60s for metrics stability)
sleep 60
kubectl top pod -n autoscale-demo
# Expected: Each pod consuming ~166m (1000m node capacity / 6 pods)

# View HPA events showing scale decisions
kubectl describe hpa cpu-stress-hpa -n autoscale-demo | grep -A 10 "Events:"
# Expected events:
# - ScalingReplicaSet: New size: 4; reason: cpu resource utilization above target
# - ScalingReplicaSet: New size: 6; reason: cpu resource utilization above target

# Verify updated resource configuration
kubectl get deployment cpu-stress -n autoscale-demo -o jsonpath='{.spec.template.spec.containers[0].resources}' | jq
# Expected:
# {
#   "limits": {"cpu": "500m", "memory": "256Mi"},
#   "requests": {"cpu": "100m", "memory": "128Mi"}
# }

# Check node allocation
kubectl describe node <worker-node> | grep -A 5 "Allocated resources:"
# Expected: cpu ~825m (225m system + 600m from 6 pods @ 100m each)

If you have Prometheus installed, validate CPU throttling:​

# Throttling periods per pod (indicates hitting CPU limits)
rate(container_cpu_cfs_throttled_periods_total{namespace="autoscale-demo"}[5m])

Test Scale-Down with Stabilization Window

Return to zero-load baseline and observe scale-down behavior:​

kubectl patch deployment cpu-stress -n autoscale-demo --type='json' \
  -p='[
    {"op": "replace", "path": "/spec/template/spec/containers/0/command", "value": ["/bin/sh", "-c"]},
    {"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["sleep infinity"]}
  ]'

# Wait for rollout
kubectl rollout status deployment/cpu-stress -n autoscale-demo

We set the HPA to have a 60-second stabilization window for scale-down to increase speed. Monitor the behavior:​

# Watch HPA - note TARGETS drop but REPLICAS remain at 6
kubectl get hpa cpu-stress-hpa -n autoscale-demo -w

# Expected:
# T+0s:   cpu-stress-hpa   Deployment/cpu-stress   0%/50%   1   6   6
# T+300s: cpu-stress-hpa   Deployment/cpu-stress   0%/50%   1   6   1

After 60 seconds, replicas scale down to 1. This delay prevents rapid scaling oscillations during temporary load drops.​

Advanced: Multi-Metric Autoscaling with Memory

HPA supports scaling on multiple metrics simultaneously, selecting the maximum scale recommendation:​

# hpa-multi-metric.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: stress-multi-hpa
  namespace: autoscale-demo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cpu-stress
  minReplicas: 2
  maxReplicas: 15
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 120        # Faster scale-down for testing
      policies:
      - type: Percent
        value: 50                             # Remove max 50% of pods
        periodSeconds: 60
      - type: Pods
        value: 2                              # Or max 2 pods
        periodSeconds: 60
      selectPolicy: Min                       # Use most conservative policy
    scaleUp:
      stabilizationWindowSeconds: 0           # Immediate scale-up
      policies:
      - type: Percent
        value: 100                            # Double pods
        periodSeconds: 15
      - type: Pods
        value: 4                              # Or add 4 pods
        periodSeconds: 15
      selectPolicy: Max                       # Use most aggressive policy

Update the deployment to stress both CPU and memory:​

kubectl patch deployment cpu-stress -n autoscale-demo --type='json' \
  -p='[
    {"op": "replace", "path": "/spec/template/spec/containers/0/command", "value": ["stress"]},
    {"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["--cpu", "2", "--vm", "1", "--vm-bytes", "100M", "--verbose"]}
  ]'

# Apply multi-metric HPA
kubectl apply -f hpa-multi-metric.yaml

# Wait for metrics to stabilize (60 seconds)
sleep 60

# Check multi-metric status
kubectl get hpa stress-multi-hpa -n autoscale-demo

Expected behavior:​

  • CPU: 1000m / 500m = 200% utilization → recommends ~7 replicas

  • Memory: 200Mi / 256Mi = 78% utilization → recommends ~3 replicas

  • HPA decision: Selects maximum (7 replicas)

Validating Scale-Up Latency and Metrics Pipeline

Measure end-to-end latency from load spike to pod readiness:​

# Reset to baseline
kubectl patch deployment cpu-stress -n autoscale-demo --type='json' \
  -p='[
    {"op": "replace", "path": "/spec/template/spec/containers/0/command", "value": ["/bin/sh", "-c"]},
    {"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["sleep infinity"]}
  ]'

# Wait for scale-down to minReplicas
sleep 180

# Trigger load spike at T=0 and record timestamp
echo "Load spike triggered at: $(date +%H:%M:%S)"
kubectl patch deployment cpu-stress -n autoscale-demo --type='json' \
  -p='[
    {"op": "replace", "path": "/spec/template/spec/containers/0/command", "value": ["stress"]},
    {"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["--cpu", "30", "--verbose"]}
  ]'

# Monitor with timestamps every 10 seconds
for i in {1..12}; do
  echo "--- T+$((i*10))s at $(date +%H:%M:%S) ---"
  kubectl get hpa stress-multi-hpa -n autoscale-demo --no-headers
  kubectl get pods -n autoscale-demo -l app=cpu-stress --no-headers | wc -l
  sleep 10
done

Typical latency breakdown:​

  • T+0-30s: Metrics-server CPU initialization delay (default 30s via --horizontal-pod-autoscaler-initial-readiness-delay)​

  • T+30-45s: HPA controller detects high utilization

  • T+45-60s: Scheduler places new pods, image pull (if not cached)

  • T+60-90s: Pods reach Ready, CNI initialization complete

  • T+90-120s: Metrics include new pods, utilization stabilizes

This exposes real control-plane latencies that mirror production behavior.​

PromQL Queries for Observability

Monitor HPA and kubelet behavior with Prometheus:​

# HPA decision latency (controller reconciliation time)
rate(hpa_controller_reconciliation_duration_seconds_sum[5m]) 
  / rate(hpa_controller_reconciliation_duration_seconds_count[5m])

# CPU throttling per pod (indicates hitting limits)
rate(container_cpu_cfs_throttled_periods_total{namespace="autoscale-demo"}[5m])

# Metrics-server scrape latency
histogram_quantile(0.99, 
  rate(metrics_server_kubelet_request_duration_seconds_bucket[5m]))

# Pod startup time (requires kube-state-metrics)
kube_pod_start_time - kube_pod_created

Production-Ready Configuration Patterns

Approach A: Conservative (Avoid Flapping)

  • scaleDown.stabilizationWindowSeconds: 300 (5 min)

  • scaleUp.stabilizationWindowSeconds: 60

  • Target utilization: 70-80%

Pros: Minimal pod churn, predictable costs, cache warmth preserved
Cons: Slower response to traffic drops, potential over-provisioning
When: Steady workloads with gradual changes, stateful apps with warm caches​

Approach B: Aggressive (Low Latency)

  • scaleDown.stabilizationWindowSeconds: 60

  • scaleUp.stabilizationWindowSeconds: 0

  • Target utilization: 40-50%

Pros: Fast scale-up, headroom for bursts, responsive to spikes
Cons: Higher pod churn, wasted capacity, cold cache restarts
When: Latency-sensitive APIs, unpredictable spikes, stateless workloads​

Approach C: Custom Metrics (Advanced)
Use custom metrics (requests/sec, queue depth) instead of CPU for application-aware scaling. Requires Prometheus Adapter or KEDA.​

Pros: Application semantics, not resource proxies; accounts for P99 latency
Cons: Complexity, external dependencies, custom instrumentation
When: RPS-based workloads, async job queues, SLA-driven scaling​

The polinux/stress image removes the guesswork from autoscaling validation while meeting modern security requirements. By establishing a zero-load baseline and then generating deterministic CPU and memory load, you validate HPA calculations, expose metrics-server latency, measure scheduler performance, and identify throttling issues—all before production traffic exercises these code paths. Starting small and scaling gradually reveals configuration problems early, ensuring your autoscaler actually works when it matters most.

Next Steps

Expand Autoscaling Scope

Vertical Pod Autoscaler (VPA)
Move beyond horizontal scaling to automatically adjust CPU and memory requests/limits based on actual usage patterns. VPA analyzes historical consumption, peak usage, and OOM events to calculate optimal resource recommendations.​

# Install VPA (separate project, not part of core Kubernetes)
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# Create VPA for your stress workload
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: stress-vpa
  namespace: autoscale-demo
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cpu-stress
  updatePolicy:
    updateMode: "Auto"      # Auto applies recommendations
  resourcePolicy:
    containerPolicies:
    - containerName: stress-ctr
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "2"
        memory: "1Gi"
EOF

VPA recommends target, lower bound, and upper bound resource values based on actual consumption trends.​

Cluster Autoscaler / Karpenter
Scale the node layer to ensure sufficient capacity for your autoscaling pods. Cluster Autoscaler adds/removes nodes based on pending pods and node utilization.​

KEDA (Event-Driven Autoscaling)
Move beyond resource-based metrics to scale on application events like queue depth, Kafka lag, HTTP requests/sec. KEDA is a CNCF-graduated project supporting 60+ scalers.​

# Install KEDA
kubectl apply -f https://github.com/kedacore/keda/releases/latest/download/keda-2.13.0.yaml

# Example: Scale based on Prometheus metrics
cat <<EOF | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: prometheus-scaler
spec:
  scaleTargetRef:
    name: cpu-stress
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      metricName: http_requests_total
      threshold: '100'
      query: sum(rate(http_requests_total[2m]))
EOF

This scales based on actual application semantics rather than CPU/memory proxies.​

Advanced HPA Configuration

Custom Metrics with Prometheus Adapter
Integrate Prometheus metrics into HPA for application-aware autoscaling:​

# Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter

# Configure HPA with custom metric
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metrics-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cpu-stress
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
EOF

Scheduled Autoscaling
Pre-scale workloads for predictable traffic patterns using CronJobs that patch HPA min/max replicas:​

apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-up-business-hours
spec:
  schedule: "0 8 * * 1-5"  # 8 AM weekdays
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: kubectl
            image: bitnami/kubectl
            command:
            - /bin/sh
            - -c
            - kubectl patch hpa cpu-stress-hpa -n autoscale-demo --patch '{"spec":{"minReplicas":5}}'
          restartPolicy: OnFailure

This proactively scales before demand surges, eliminating cold-start latency.​

Production Observability

Integrate Grafana Dashboards
Visualize HPA metrics, pod lifecycle events, and resource utilization:​

  • HPA metrics: kube_horizontalpodautoscaler_status_current_replicas, kube_horizontalpodautoscaler_status_desired_replicas

  • Scaling events: Query Kubernetes events for ScalingReplicaSet reasons

  • Pod startup latency: kube_pod_start_time - kube_pod_created

Alert on Autoscaling Issues
Set up Prometheus alerts for scaling failures:​

- alert: HPAMaxedOut
  expr: kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_max_replicas
  for: 15m
  annotations:
    summary: "HPA {{ $labels.horizontalpodautoscaler }} has been at max replicas for 15+ minutes"

- alert: HPAScalingDisabled
  expr: kube_horizontalpodautoscaler_status_condition{condition="ScalingActive", status="false"} == 1
  annotations:
    summary: "HPA {{ $labels.horizontalpodautoscaler }} scaling is disabled"

Cost and Performance Optimization

Manual Resource Request Tuning
After observing HPA behavior, adjust resource requests to improve efficiency:

  • Over-provisioning: If HPA rarely scales and CPU stays at 20-30%, reduce requests​

  • Under-provisioning: If HPA constantly maxes out and pods throttle, increase requests​

  • Right-sizing: Match requests to P95 usage, not peak​

Load Testing in CI/CD
Integrate autoscaling validation into your pipeline using k6 or Locust:​

# .gitlab-ci.yml example
autoscaling-test:
  stage: test
  script:
    - kubectl apply -f hpa.yaml
    - k6 run --vus 100 --duration 5m load-test.js
    - kubectl get hpa cpu-stress-hpa -o jsonpath='{.status.currentReplicas}' | grep -v "^1$"  # Verify scale-up

This catches autoscaling regressions before production.​

StormForge Resources for ML-Powered Optimization

StormForge provides machine learning-driven Kubernetes optimization that goes beyond manual tuning. Their platform automates resource rightsizing, HPA configuration, and cost optimization using observability data and experimentation.​​

StormForge Optimize Live

Automated Resource Recommendations
StormForge Optimize Live continuously analyzes production workloads and generates ML-based recommendations for CPU/memory requests, limits, and HPA target utilization.​

Key features:​

  • Continuous optimization: Monitors resource usage and application behavior to identify cost-saving configurations automatically​

  • HPA target tuning: Recommends optimal HPA target utilization percentages (not just static 50%)​

  • OOM protection: Configures temporary memory bump-ups in response to OOM events​

  • One-click apply: Directly applies recommendations to your cluster via the StormForge controller​

Getting Started with StormForge
StormForge provides a comprehensive tutorial for integrating their platform:​

  1. Install the StormForge agent:
# Add Helm repo
helm repo add stormforge https://registry.stormforge.io/chartrepo/library
helm repo update

# Install agent with your API token
helm install stormforge-agent stormforge/stormforge-agent \
  --namespace stormforge-system \
  --create-namespace \
  --set authorization.token=YOUR_API_TOKEN
  1. Deploy an optimization experiment:​
# Create optimization resource
stormforge optimize create nginx-optimization \
  --application=nginx \
  --namespace=autoscale-demo
  1. Review recommendations:​
stormforge optimize recommendations nginx-optimization
  1. Apply optimizations:​
    Visit the StormForge web UI at https://app.stormforge.io and click "Apply Recommendations" to deploy optimized resource configurations directly to your cluster.​

StormForge and Karpenter Integration

For AWS EKS clusters, StormForge integrates with Karpenter for full-stack optimization:​

  • StormForge optimizes workload requests: ML analyzes usage patterns and adjusts CPU/memory requests​

  • Karpenter provisions nodes: Observes incoming pod requests and provisions right-sized nodes​

  • Continuous feedback loop: As StormForge reduces requests, Karpenter consolidates nodes, reducing costs​

AWS provides a detailed walkthrough showing a 30-40% cost reduction using this combination.​

StormForge Documentation and Tutorials

Official Documentation

Video Resources

Key Advantage Over Manual Tuning
StormForge uses ML to analyze thousands of containers simultaneously, identifying optimization opportunities invisible to manual analysis. It automates the tedious process of profiling workloads, calculating percentile-based resource needs, and updating configurations—achieving enterprise-scale optimization with minimal manual intervention.​

Continuous Optimization in CI/CD

StormForge supports automated optimization schedules via CI/CD pipelines:​

# Example GitLab CI integration
stormforge-optimize:
  stage: optimize
  schedule:
    - cron: "0 2 * * 0"  # Weekly optimization run
  script:
    - stormforge optimize run nginx-optimization
    - stormforge optimize recommendations nginx-optimization > recommendations.yaml
    - kubectl apply -f recommendations.yaml

This ensures resource configurations evolve with changing workload patterns.​

The progression from basic HPA demos with polinux/stress to production-grade autoscaling with VPA, KEDA, custom metrics, and ML-powered optimization platforms like StormForge provides a complete path from learning fundamentals to achieving enterprise-scale efficiency.