Skip to content
EdgeServers
Blog

Six Kubernetes cost leaks we find on almost every cluster

May 8, 2026 · 1 min read · by Sudhanshu K.

Every cluster we take over has the same six cost leaks. The numbers vary; the leaks are the same.

1. Overprovisioned requests

Developers set requests: 1Gi because "we might need it" and never come back to lower it. The scheduler then reserves that memory for the pod even when it's using 80 MB. Cluster Autoscaler keeps scaling up to satisfy reservations nothing uses.

Tooling that helps: Vertical Pod Autoscaler (VPA) in recommender mode, KRR (Krr.io), Goldilocks. Pick one, run it for a week, lower requests against the p99 actual usage + 20%.

2. Idle namespaces nobody owns

The dev-tomas-spike-1, the staging-old, the argocd-test. They cost money 24/7. We sweep monthly with a simple report — namespaces with zero pod activity in the last 14 days — and either reclaim the resources or charge them back to the owning team.

3. EBS snapshot sprawl

Every cluster we audit has hundreds of orphaned EBS snapshots — old PVC snapshots, ETCD backups from a cluster that no longer exists, debugging dumps. aws ec2 describe-snapshots plus an age filter usually surfaces $200-2000/month of pure waste.

4. NAT gateway egress

Pods talking to S3 over public DNS go through the NAT gateway and pay egress per GB. The fix is VPC endpoints for S3 + DynamoDB. Five-minute change, often $500-3000/month saved.

5. Wrong node-pool mix

Running everything on m5.large when half the workloads would fit on t3.medium Spot. Karpenter with multiple pools (on-demand for stateful, Spot for stateless) often cuts compute 30-40%.

6. Default loadbalancer-per-service

Every type: LoadBalancer provisions a real cloud load balancer at $20+/month. Convert to a shared ingress controller (NGINX Ingress, Traefik, AWS ALB Ingress Controller). A cluster with 30 services often drops 25 LBs.

Medium post has the audit script we use, what we report monthly to the customer, and what we leave to them to fix vs what we fix in-flight.

Full article available

Read the full article