Kubernetes Pod Stuck in Pending State Fix It Fast

How to Spot What Is Actually Blocking the Pod

Kubernetes pending pods have gotten complicated with all the conflicting advice flying around. As someone who has spent more nights than I care to count staring at deployments that just… sit there, I learned everything there is to know about this particular flavor of pain. Today, I will share it all with you.

The first move is always the same: kubectl describe pod <pod-name>. Every single time. Don’t skip it.

Look at the bottom of that output. The Events section is where the actual signal lives — not in the pod status itself. Here’s what that looks like:

Events:
  Type     Reason            Age    Message
  ----     ------            ---    -------
  Warning  FailedScheduling  5m23s  0/3 nodes are available: 3 Insufficient cpu

That one line tells you everything. Pending doesn’t mean your pod crashed. It doesn’t mean the container image failed to pull. It means the Kubernetes scheduler looked at every node in your cluster and said: “Nope. Can’t fit this anywhere.” That’s what makes this Events section so endearing to us cluster operators — it skips the mystery and just tells you what broke.

The three most common messages you’ll see:

  • Insufficient cpu or Insufficient memory — resource pressure
  • node(s) didn't match Pod's node selector — scheduling constraints
  • waiting for PersistentVolumeClaim — storage not bound

That Events section is your map. Use it. So, without further ado, let’s dive in.

Fix 1 — Not Enough CPU or Memory on Any Node

Probably should have opened with this section, honestly. This is the most common culprit by a wide margin — and the fix usually takes under five minutes.

But what is resource pressure, exactly? In essence, it’s your nodes being fully committed and having nothing left to offer a new pod. But it’s much more than that — the scheduler isn’t just looking at raw totals. It’s accounting for system daemon overhead, node-level reserved capacity, and existing workloads all at once.

Run kubectl get nodes first:

NAME            STATUS   ROLES    AGE   VERSION
worker-1        Ready    <none>   45d   v1.28.0
worker-2        Ready    <none>   45d   v1.28.0
worker-3        Ready    <none>   45d   v1.28.0

Now dig into one node with kubectl describe node worker-1. You’re looking for Allocatable versus what’s actually Requested:

Allocatable:
  cpu:                4
  ephemeral-storage:  50Gi
  memory:             8Gi
Allocated resources:
  cpu:                3800m
  memory:             7500Mi

That node has 4 CPUs allocatable. It’s already using 3800m. Your pod requests 500m — fits on paper. Doesn’t fit in reality. The scheduler leaves headroom for system daemons and there’s nothing left to give.

You have three paths forward.

Option 1: Reduce the pod’s request. Check your deployment spec. The resources block probably looks something like this:

resources:
  requests:
    cpu: 2
    memory: 4Gi
  limits:
    cpu: 2
    memory: 4Gi

Cut that in half. Run kubectl set resources deployment my-app --requests=cpu=1,memory=2Gi. Done in 30 seconds. I’m apparently someone who over-provisions by default and this command works for me while manual YAML editing never goes cleanly the first time. Don’t make my mistake.

Option 2: Add a node. Easiest option if you control the infrastructure. Spin up a new worker, join it to the cluster, and the scheduler places the pod immediately — no restarts required.

Option 3: Enable cluster autoscaler. If you’re on AWS, GKE, or Azure, your cloud provider’s autoscaler handles new nodes automatically when pods sit pending. Set it up once and it handles itself. I won’t walk through the full setup here — that deserves its own guide — but it’s the reason I stopped getting paged at 2 AM when someone deployed a resource-hungry workload without warning.

Fix 2 — Node Selector or Affinity Rules Blocking Placement

Ruled out resource pressure? Good. Now check whether the pod’s own scheduling constraints are the problem. Your deployment might specify a nodeSelector or affinity rule that no actual node satisfies — and the scheduler will never tell you it gave up trying.

List your nodes with labels: kubectl get nodes --show-labels.

NAME      STATUS  ROLES  AGE  VERSION  LABELS
worker-1  Ready   none   45d  v1.28.0  beta.kubernetes.io/arch=amd64,disktype=ssd,gpu=false
worker-2  Ready   none   45d  v1.28.0  beta.kubernetes.io/arch=amd64,disktype=hdd,gpu=false
worker-3  Ready   none   45d  v1.28.0  beta.kubernetes.io/arch=amd64,disktype=hdd,gpu=false

Now check your pod spec. Run kubectl get pod <name> -o yaml | grep -A 10 nodeSelector:

nodeSelector:
  disktype: nvme
  gpu: "true"

There’s your problem. No node has disktype: nvme. Not one. The scheduler walks away every single time — silently — until you dig into that Events output.

The fix depends on what you actually need. Typed the wrong label and meant ssd? Update the deployment:

kubectl patch deployment my-app -p '{"spec":{"template":{"spec":{"nodeSelector":{"disktype":"ssd"}}}}}'

Actually need NVMe? Label a node first:

kubectl label nodes worker-1 disktype=nvme

Then patch the deployment to match. That was the whole fix.

Taints work the same way — frustrated by a pod that never lands despite matching labels, check kubectl describe node worker-1 for the Taints section. If a node carries something like gpu=true:NoSchedule, your pod needs a matching toleration or it’s dead on arrival. Most of the time, a taint got added and nobody updated the pod specs. Or vice versa.

Side-by-side comparison: pod spec and node labels. If they don’t match exactly, nothing happens.

Fix 3 — PersistentVolumeClaim Is Not Bound

This one is sneaky. Your pod mounts a PVC — and that PVC is itself stuck in Pending. The pod cannot start until the storage exists. So it hangs. Indefinitely. No error that jumps out, no crash, just silence.

Check your PVCs: kubectl get pvc.

NAME             STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS
data-claim       Pending   <none>   10Gi       RWO            fast-ssd

Pending PVC means one of two things: no PersistentVolume exists to bind to it, or the StorageClass is misconfigured.

If you manually created a PV, verify it exists and carries matching access modes and capacity:

kubectl get pv

Nothing shows up? Create a PV with the right specs. Something shows up but the PVC still won’t bind? Check the StorageClass name — it must match exactly between PVC and PV. One typo and it sits there forever.

Using dynamic provisioning instead? Verify the StorageClass actually exists:

kubectl get storageclass

Missing entirely? Your cloud provider’s storage provisioner might not be installed. On AWS that’s the EBS CSI driver — version 1.24+ doesn’t include it by default anymore. On GKE it’s built-in. Install what your platform needs.

Once the PVC flips to Bound, the pod starts immediately. No restart needed.

Quick-Reference Diagnosis Checklist

Every time you hit a pending pod, run through these three commands in order. While you won’t need anything beyond basic kubectl access, you will need a handful of minutes and a clear look at each output.

  1. kubectl describe pod <name> — read the Events section. Insufficient resources? Scheduling constraint? PVC issue? It’s in there.
  2. kubectl describe node <any-node> — check Allocatable versus Allocated. Do you have headroom? Check labels and taints while you’re there.
  3. kubectl get pvc — any claims stuck in Pending? If yes, that’s your bottleneck and Fix 3 is your next stop.

That mental model solves nine out of ten stuck pods in under two minutes. The tenth is usually a pod that started but failed immediately — if that’s where you’ve landed, the CrashLoopBackOff guide covers the next layer of diagnosis from there.

Jason Michael

Jason Michael

Author & Expert

Jason covers aviation technology and flight systems for FlightTechTrends. With a background in aerospace engineering and over 15 years following the aviation industry, he breaks down complex avionics, fly-by-wire systems, and emerging aircraft technology for pilots and enthusiasts. Private pilot certificate holder (ASEL) based in the Pacific Northwest.

48 Articles
View All Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay in the loop

Get the latest stigcloud updates delivered to your inbox.