This guide describes how to leverage NVIDIA GPU resources in your Charmed Kubeflow (CKF) deployment.
Requirements
- A CKF deployment and access to the Kubeflow dashboard. See Get started for more details.
- An NVIDIA GPU accessible from the Kubernetes cluster that CKF is deployed on. Depending on your deployment, refer to one of the following guides for more details:
Spin a Notebook on a GPU
Kubeflow Notebooks can use any GPU resource available in the Kubernetes cluster. This is configurable during the Notebook’s creation.
When creating a Notebook, under GPUs, select the number of GPUs and NVIDIA
as the GPU vendor. The GPUs number depends both on the cluster setup and your code demands.
If your Notebook uses a Tensorflow-based image with CUDA, use the following code to confirm the notebooks have access to a GPU:
import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")
print(f"Congratz! The following GPUs are available to the notebook: {gpus}" if gpus else "There's no GPU available to the notebook")
In case your cluster setup uses Taints, see Leverage PodDefaults for more details.
Run Pipeline steps on a GPU
Kubeflow Pipelines provides steps to use GPU resources available in your Kubernetes cluster. You can enable this by adding the nvidia.com/gpu: 1
limit to a step during the Pipeline’s definition. See the detailed steps below.
A GPU can be used by one Pod at a time. Thus, a Pipeline can schedule Pods on a GPU only when available. For advanced GPU sharing practices on Kubernetes, see NVIDIA Multi-Instance GPU.
- Open a notebook with your Pipeline. If you don’t have one, use the following code as an example. It creates a Pipeline with a single component that checks GPU access:
# Import required objects
from kfp import dsl
@dsl.component(base_image="kubeflownotebookswg/jupyter-tensorflow-cuda:v1.9.0")
def gpu_check() -> str:
"""Get the list of GPUs and print it. If empty, raise a RuntimeError."""
import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")
print("GPU list:", gpus)
if not gpus:
raise RuntimeError("No GPU has been detected.")
return str(len(gpus) > 0)
@dsl.pipeline
def gpu_check_pipeline() -> str:
"""Create a pipeline that runs code to check access to a GPU."""
gpu_check_object = gpu_check()
return gpu_check_object.output
Make sure the KFP SDK is installed in the Notebook’s environment:
!pip install "kfp>=2.4,<3.0"
- Ensure the step of the Pipeline’s component
gpu_check
runs on a GPU by creating a functionadd_gpu_request(task)
that uses the SDK’s add_node_selector_constraint() and set_accelerator_limit(). This sets the required limit for the step’s Pod:
def add_gpu_request(task: dsl.PipelineTask) -> dsl.PipelineTask:
"""Add a request field for a GPU to the container created by the PipelineTask object."""
return task.add_node_selector_constraint(accelerator="nvidia.com/gpu").set_accelerator_limit(
limit=1
)
- Modify the Pipeline definition by calling
add_gpu_request()
to the component:
@dsl.pipeline
def gpu_check_pipeline() -> str:
"""Create a pipeline that runs code to check access to a GPU."""
gpu_check_object = add_gpu_request(gpu_check())
return gpu_check_object.output
- Submit and run the Pipeline:
# Submit the pipeline executes successfully
from kfp.client import Client
client = Client()
run = client.create_run_from_pipeline_func(
gpu_check_pipeline,
experiment_name="Check access to GPU",
enable_caching=False,
)
- Navigate to the output
Run details
. In its logs, you can see the available GPU devices the step has access to.
Inference with a KServe ISVC on a GPU
KServe inference services (ISVC) can schedule their Pods on a GPU. To ensure the ISVC Pod is using a GPU, add the nvidia.com/gpu: 1
limit to the ISVC’s definition.
You can do so by using the kubectl Command Line Interface (CLI) or within a notebook.
Using kubectl CLI
Using the kubectl CLI, you can enable GPU usage in your InferenceService
Pod by directly modifying its configuration YAML
file. For example, the inference service YAML
file from this example would be modified to:
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
resources:
limits:
nvidia.com/gpu: 1
Within a notebook
A GPU can be used by one Pod at a time. Thus, an ISVC Pod can be scheduled on a GPU only when available. For advanced GPU sharing practices on Kubernetes, see NVIDIA Multi-Instance GPU.
- Open a notebook with your
InferenceService
. If you don’t have one, use this one as an example.
Make sure the Kserve SDK is installed in the Notebook’s environment:
!pip install kserve
- Import V1ResourceRequirements from
kubernetes.client
package and add aresources
field in the workload you want to run on a GPU. See the example for reference:
ISVC_NAME = "sklearn-iris"
isvc = V1beta1InferenceService(
api_version=constants.KSERVE_V1BETA1,
kind=constants.KSERVE_KIND,
metadata=V1ObjectMeta(
name=ISVC_NAME,
annotations={"sidecar.istio.io/inject": "false"},
),
spec=V1beta1InferenceServiceSpec(
predictor=V1beta1PredictorSpec(
sklearn=V1beta1SKLearnSpec(
resources=V1ResourceRequirements(
limits= {"nvidia.com/gpu":"1"}
),
storage_uri="gs://kfserving-examples/models/sklearn/1.0/model"
)
)
),
)
Last updated a day ago.