Janardhan's insights
Home PageEmotional Intelligence book
  • About Insights
  • Janardhanpulivarthi.com
  • Books I've read
  • Bridge Engineering
    • code and books
    • Pier calculations
    • Suspension cables
    • Seismic coefficients
    • Shear check in pilecap
    • finance tasks
    • /bin/sh: 1: flex: not found
    • Biology animations
  • 🏕️SEASON 1
    • Social formulation
    • Words in Computer Science
    • Best use of internet
    • Products for sustainability
    • Tools for parents in digital age
    • Identify books in library
    • Control digital footprint
    • Search like a Pro
    • Dev productivity tips & tools
    • McKinsey, How it operates
    • Secrets, which ones to share
    • How to be Straightforward
    • Productive procrastination
    • Everyday things
    • Future of waste
    • Emotional weather
    • Money rules
    • Qualities of an Artist
    • Shameful Inconsistency
    • Mean Sea Level
    • Screenshot on Windows
    • Edu loan EMI calculation
    • Advanced Gmail tips
    • 24 hours before an exam
    • FAQ - git
    • Life skills by 25
    • Install Microsoft IIS server
    • gcloud commands hang
    • Google Cloud Data Engineering
    • Google Cloud Architect
    • find location of the command
    • JNI short tutorial
    • Civil distance
    • Being a friend to children
    • How to talk to a teenager
    • Resilience and relationship
    • 9 to 5 vs 4 hour workweek
    • Return on Investment
    • Alternative to PFA
    • Hand position for Ctrl key
    • Education: The Good Parts
    • Education: The proposed parts
    • On feeling stuck
    • How to read Financial news
    • What is in my hands
    • Mistakes founders do
    • On Self Worth
    • Emotional draining situations
    • Time
    • Hygiene
    • 5 Percent rule
    • How not to build a startup
    • How to lead a mediocre life
    • Day-to-day tips from extraordinary people
    • Who owns .com domain
    • Do not make bed first thing in morning
    • How to read kubernetes docs
    • Do not use morning alarm
  • ☀️Season 2
    • Practicing self care
    • Payment methods
    • Things to learn before MBA
    • Hum toh udd gaye
    • Time windows
    • How to quit a job emotionally
    • Journal on a year after quitting
    • Engineering the future
    • Build linux kernel
    • Important inventions
    • bashrc and bash_profile
    • $TERM
    • How Kubernetes pod get terminated
    • What data does google collect? when you search...
    • Should I build a personal brand?
    • The better startup ideas
    • Regex in Visual Studio Code
    • Few pieces to read before breakfast
    • core file
    • How to keep home for ambience
    • 🐬A story on repetition
    • Basic Unicode characters
    • Java operator precedence
    • Install latest maven on Ubuntu
    • Build ecosystems, not only products
  • ⛈️SEASON 3
  • ❄️Season 4
    • Be original
    • Change and Contingency
    • Read and write email
    • How to use Pomodoro
    • Humans and software
    • Saving money without money
    • Five skills for Civil Engineers
    • Que sera sera
    • Keep scrolling
    • Being unemployed
    • Friendship and modularity
    • Happy is not a default emotion
    • The missing piece in the online courses
    • How to build products
    • I am always tired
    • Instead company subscribe to user
    • Car electric or diesel ask google ngram viewer
    • Emotional currency and binge watching
    • ADR
  • Known Nokia 7.1 problems
  • Coursera-dl HTTPError: 400
  • Failed to retrieve identities from agent
  • Custom domain email
  • Qwiklabs tips and tricks
  • Online shopping rules
  • Protect a apt package from upgrade
  • Kubernetes troubleshooting guide
  • TOOLS
    • Lenovo Ideapad
    • Quantum - Holevo's theorem
    • Install VPP on Ubuntu
  • Phone camera slider
  • Physical internet infrastructure
  • 100 Days Of Code - Learning Java
  • Blogs I found interesting
  • 5g
    • 5G Glossary
    • Archive
      • Latex and gitbook
      • Japan Progress
      • Online buying guide
      • ఏవిధంగా కృషి చేయాలి?
      • Tech prediction 2030
      • Alphabet Financials
      • Apachecoin does not exist
      • non profits
      • What to Google Search
      • Resume tips
      • Discussion before marriage
      • Wi-Fi
      • How to read
      • Basic soil test
Powered by GitBook
On this page
  • kubectl
  • The kubectl command isn't found
  • WARNING: Kubernetes configuration file is group/world-readable
  • error: You must be logged in to the server
  • Pod stuck in pending state
  • Deployed Workloads
  • CrashLoopBackOff
  • ImagePullBackOff and ErrImagePull
  • Pod unschedulable
  • Pods stuck in Terminating state
  • kubelet Unable to attach or mount volumes
  • References

Was this helpful?

Edit on GitHub

Kubernetes troubleshooting guide

PreviousProtect a apt package from upgradeNextLenovo Ideapad

Last updated 3 years ago

Was this helpful?

kubectl

The kubectl command isn't found

Make sure you are in the Deployment Unit (DU) or the master node of the cluster. kubectl utility won't be available in worker nodes.

If you are in DU and the kubectl is still not available, make sure to add kubectl to $PATH.

WARNING: Kubernetes configuration file is group/world-readable

To remove group readable permissions,

chmod g-r ~/.kube/config

To remove world readable setting chmod o-r ~/.kube/config

refer:

error: You must be logged in to the server

[root@master0 ~]# kubectl get nodes
error: You must be logged in to the server (the server has asked for the client to provide credentials)

This happens even though, the configuration is available,

This means the kubectl interface is unable access credentials. A workaround is

$ kubectl --token=<token> get nodes

Pod stuck in pending state

Deployed Workloads

CrashLoopBackOff

A container is repeatedly crashing after the restarts. There are multiple reasons for this error. Take the help of pod logs for additional ideas.

kubectl logs <podname>

<podname> is the problematic pod. If the previous instance of the pod exists you can pass -p flag for its logs too.

Crashed containers restart with an exponential delay of 5 minutes.

Check "Exit Code" of the crashed container

  1. Describe the problematic pod as

    kubectl describe pod <podname>

    replace <podname> with pod name.

  2. Check for containers: CONTAINER_NAME: last state: exit code field

    1. if the exit code is 1, the container crash is due to application crash

    2. if the exit code is 0, verify the duration of the app run

Containers exit when the application's main process exits. If the app finishes up faster, the container might try to restart.

Example if the exit code is 0

STEP 1: Identify the problem

A pod in CrashLoopBackOff status

[root@control-plane ~]# kubectl get pods -A | grep app
app-ns app-pods-dxaifa                                      3/3     Running            0          9h

STEP 2: Gather information

Gather information, and exit code such as ip, missing files, any error messages and exit code and reasons.

Containers:
  upf:
    Container ID:  docker://89faf;dakj;safjoiwqreqwrwqaaaafafrb7c826bee916de4d5eb
    Image:         docker..-distroless
    Image ID:      docker-pullable://example.com/abc/afdsafafsafs@sha512:afsadfdwwqrewqrqfasdsgasgasdfghjkl;wertyuio
    Port:          9013/TCP
    Host Port:     0/TCP
    Command:
      /bin/bash
    Args:
      -c
      ip6tables -F;;/bin/bash
  
    Restart Count:  137

Example if the exit code is other than 0 or 1

STEP 1: Identify the problem

Exit Code: 137 for the container means Out of Memory exception.

A pod in CrashLoopBackOff status

[root@control-plane ~]# kubectl get pods -A | grep app
NAME                       READY   STATUS             RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
app-pods-afdaba   2/3     CrashLoopBackOff   27         97m   10.20.30.40   10.39.12.56   <none>           <none>

Pods comes once again to running state

[root@leoc2-mwp-21-3-1-master0 ~]# upf1
NAME                       READY   STATUS      RESTARTS   AGE    IP              NODE          NOMINATED NODE   READINESS GATES
app-pods-afdaba   2/3     Running   27         97m   10.20.30.40   10.39.12.56   <none>           <none>

STEP 2: Gather information

Gather information, and exit code such as ip, missing files, any error messages and exit code and reasons.


STEP 3: Problem analysis

Here in the container the error is about the json file validation. So, some of the entries in the json file are without values hence failed the validation.


STEP 4: Resolution and Root cause

Here we will agree upon the resolution and possible root cause.

Connect to a running container

Shell into the pod:

kubectl exec -it <podname> -- /bin/bash

if there are multiple containers, use -c <container_name> for a specific container.

Now, you can access the bash terminal of the pod's container where you can check for network, file access, and databases etc.

ImagePullBackOff and ErrImagePull

Container image cannot be loaded from the image registry.

If the image is not found

Make sure to check the following

      1. Verify the image name

      2. Verify the image tag. ( :latest or no tag pulls the latest image. And old tags may not be available)

      3. If the image should either have full path. Also, check for the inherited Docker Hub (artifactory or harbor) registry links.

      4. Try to pull the docker image via terminal:

        1. SSH into the node (master or worker) generally, ssh root@10.69.a.b should work both in powershell or bash

        2. Run docker pull <image name> . For example, docker pull docker.io/nfvpe/sriov-device-plugin:latest .

Permission denied error

If the error is either "permission denied" or "no pull access" error, verify that you have access to the image.

In either case, check whether you could download the image with (or a similar method)

You need to make sure that the group (organizational team) you are in has access to the registry.

Pod unschedulable

Pod cannot be scheduled due to insufficient resource or configuration errors.

Insufficient resources

Error messages can be like,

  • No nodes are available that match all of the predicates: Insufficient cpu (2) which means, on the two nodes there isn't enough CPU available to fulfill the pod's requests

The default CPU request is 100m or 10% of a CPU. spec: containers: resources: requests spec can be updated as per the requirement.

Note: The system containers in the kube-system also use the cluster resources

MatchNodeSelector

MatchNodeSelector means that there are no nodes that match the Pod's label selector.

Check the labels under in the Pod specification's nodeSelector field

nodeSelector:
  spec:
    nodeSelector:

See the node labels

kubectl get nodes --show-labels

Add a label to the node as

kubectl label nodes <node_name> <label_name>=<label_value>

PodToleratesNodeTaints

PodToleratesNodeTaints says that the Pod can't be scheduled to any node because no node currently tolerates its node taint.

1. You can patch the node like this

kubectl patch node 10.ab.cd.efg -p '{"spec":{"unschedulable":false}}'

For example, kubectl patch node 10.20.30.123 -p '{"spec":{"unschedulable":false}}'

2. or you can remove taint like this

kubectl taint nodes <node_name> key:NoSchedule-

ref:

PodFitsHostPorts

Pod is already in use. Change the port name in the Pod specification.

spec:
  containers:
    ports:
      hostPort:

Does not have minimum availability

If there is no availability even if a node has sufficient resources – the nodes might be in SchedulingDisabled or Cordoned status.

Get the nodes to see the status

kubectl get nodes

If the Scheduling is Disabled, try uncordoning the node

kubectl uncordon <node_name>

check that the pvc are in bound state.

Pods stuck in Terminating state

kubelet Unable to attach or mount volumes

kubelet Unable to attach or mount volumes: unmounted volumes=[config-volume], unattached volumes=[sriov-device-plugin-token-ts8p5 devicesock log config-volume device-info]: timed out waiting for the condition

References

Few troubleshooting guides

Refer:

steps to troubleshoot pending state -

ref:

If this method works, you can either add . Also, the image pull secrets are for single namespace only.

wget --user=a.b@ --password=AKCrK13VegV5bKtVZ <artifact-url>

ref:

ref:

ref:

1.

2.

3.

sol: You should not mount single pvc on same pod twice.

https://github.com/helm/helm/issues/9115
https://kubernetes.io/docs/reference/access-authn-authz/authentication/#option-2-use-the-token-option
ip-reconciler race condition with Whereabouts, leads to IP cleanup and duplicate IPs · Issue #162 · k8snetworkplumbingwg/whereabouts (github.com)
https://containersolutions.github.io/runbooks/posts/kubernetes/pod-stuck-in-pending-status/
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
mavenir.com
https://kubernetes.io/docs/concepts/containers/images/#using-a-private-registry
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
Worker nodes have status of Ready,SchedulingDisabled · Issue #3713 · kubernetes/autoscaler
https://cloud.google.com/kubernetes-engine/docs/how-to/node-taints
https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
pod has unbound immediate persistentvolumeclaims · Issue #237 · hashicorp/consul-helm (github.com)
https://stackoverflow.com/q/69544012
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm
https://cloud.google.com/kubernetes-engine/docs/troubleshooting
Page cover image