This guide describes how to install Charmed Kubeflow (CKF) on NVIDIA DGX hardware. DGX systems are purpose-built hardware for enterprise AI use cases, featuring NVIDIA Tensor Core GPUs.
Requirements
- NVIDIA DGX-enabled hardware setup, including no NVIDIA drivers preinstalled, BIOS settings and bootloader.
kubectl
.
Install MicroK8s
Install MicroK8s and enable required add-ons as follows:
sudo snap install microk8s --classic --channel 1.22
sudo microk8s enable dns:10.229.32.21 storage ingress registry rbac helm3 metallb:10.64.140.43-10.64.140.49,192.168.0.105-192.168.0.111
sudo usermod -a -G microk8s ubuntu
sudo chown -f -R ubuntu ~/.kube
newgrp microk8s
Edit /var/snap/microk8s/current/args/containerd-template.toml
by adding:
[plugins."io.containerd.grpc.v1.cri".registry.configs]
[plugins."io.containerd.grpc.v1.cri".registry.configs."registry-1.docker.io".auth]
username = "afrikha"
password = "<>"
Finally , restart MicroK8s:
microk8s.stop
microk8s.start
Enable GPU add-on
Install the required GPU operator as follows:
sudo microk8s.enable gpu
mkdir .kube
microk8s config > ~/.kube/config
Check the GPU count for MicroK8s:
kubectl get nodes --show-labels | grep gpu.count
Configure MIG
Configure MIG devices running the following command:
kubectl label nodes blanka nvidia.com/mig.config=all-1g.5gb --overwrite
Check again the GPU count to confirm it has increased:
kubectl get nodes --show-labels | grep gpu.count
If no nodes appear in the command output above, uninstall all GPU drivers form K8s nodes and reinstall MicroK8s.
Deploy CKF
Follow the instructions in General installation for this section.
Explore some examples
CKF can be run on different types of DGX hardware:
- See
kubeflow-single-node-dgx
for single-node examples. - See
kubeflow-multi-node-dgx
for multi-node examples.
Last updated a month ago.