Install a Nvidia GPU Operator on RKE2 Kubernetes Cluster

4 years ago thenewstack.io

Summary: This is a summary of an article originally published by The New Stack. Read the full original article here →

In a typical GPU-based Kubernetes installation, such as for https://thenewstack.io/category/machine-learning/, each node needs to be configured with the correct version of Nvidia graphics driver, CUDA runtime, and cuDNN libraries followed by a container runtime such as Docker Engine, containerd, podman, or CRI-O. Finally, https://thenewstack.io/category/kubernetes/ is installed, which will interact with the chosen container runtime to manage the lifecycle of workloads. The https://github.com/NVIDIA/gpu-operator dramatically simplifies the process without manually installing the drivers, CUDA runtime, cuDNN libraries, or the Nvidia Container Toolkit.

Let’s add the Nvidia Helm Chart Repo, refresh Helm and install the GPU operator.

As we can see from the output, the GPU operator has successfully installed and configured the Nvidia driver, CUDA runtime, and the Container Toolkit without any manual intervention.

DevOps Articles