Deploy Autoscaling model serving

The Autoscaling model serving solution offers the ability to deploy KServe, Knative, and Istio charms on their own to serve Machine Learning (ML) models that can be accessed through ingress.

Requirements

  • Juju 2.9.49 or above.
  • A Kubernetes cluster with a configured LoadBalancer, DNS, and a storage class solution.

Deploy the solution

You can deploy the solution in the following ways:

  1. Deploy with Terraform.
  2. Deploy with charm bundle.

Regardless of the chosen deployment method, the following charm configuration is required:

juju config knative-serving istio.gateway.namespace="<Istio ingress gateway namespace>"

where the Istio ingress gateway namespace corresponds to the model name where the autoscaling-model-serving bundle is deployed.

Deploy with Terraform

The Autoscaling model serving is defined with a Terraform module that facilitates its deployment using the Terraform Juju provider. See Terraform Juju provider for more details.

In its most basic form, the solution can be deployed as follows:

terraform apply -v

Refer to this for more information about inputs and outputs of the module.

Deploy with charm bundle

Charm bundles are now obsolete, but as part of v0.1, the bundle.yaml file is still available. To deploy:

  1. Clone the autoscaling-model-serving repository.
  2. Deploy using the bundle.yaml file:
juju deploy ./bundle/bundle.yaml --trust

Perform inference

  1. Apply an InferenceService.

Kserve offers a simple example in steps 1, 2, and 3.

  1. Perform inference by making a request using the URL from the recently created InfercenceService.

For example, by running:

kubectl get inferenceservices <name of the inferenceservice> -n <namespace where it is deployed>

You get the following output:

NAME       	URL                                             	READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                	AGE
<name>   http://<name>.<namespace>.<LoadBalancer IP.DNS>     	True       	100  

The http://<name>.<namespace>.<LoadBalancer IP.DNS> can be used in any sort of request, for example:

$ curl -v -H "Content-Type: application/json" http://<name>.<namespace>.<LoadBalancer IP.DNS>/v1/models/<name>:predict -d @./some-input.json

Integrate with COS

You can integrate the solution with Canonical Observability Stack (COS) while deploying with the Terraform module by running:

terraform apply -var cos_configuration=true

If the solution was deployed using the charm bundle, or using the Terraform module without the COS options passed, see Integrate with COS for more details.


Last updated 6 days ago.