This guide presents an overview of the Charmed Kubeflow (CKF) charms that provide Prometheus monitoring metrics.
All metrics can be accessed using the Prometheus or Grafana User Interface (UI). See Integrate with COS for more information.
Argo controller
See the argo-controller
upstream documentation for more information on provided metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="argo-controller"}
Dex auth
The dex-auth
charm provides:
- A custom metric counting HTTP requests. See its source code for more details.
- Go runtime and process metrics for monitoring the controller.
- gRPC server metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="dex-auth"}
Envoy
The envoy
charm provides the following metrics:
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="envoy"}
Istio pilot
See the istio-pilot
upstream documentation for more information on provided metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="istio-pilot"}
Istio gateway
See the istio-gateway
upstream documentation for more information on provided metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="istio-gateway"}
Jupyter controller
The jupyter-controller
provides the following metrics:
- Custom notebook-related metrics. See its source code for more details.
- Go runtime metrics for monitoring the controller.
- Controller runtime metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="jupyter-controller"}
Katib controller
The katib
controller provides the following metrics:
- Custom experiment-related metrics. See its source code for more details.
- Go runtime metrics for monitoring the controller.
- Controller runtime metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="katib-controller"}
Kfp api
The kfp-api
provides the following metrics:
- Custom metrics related to its several components. See its source code for more details:
- Resource manager.
- Experiment server.
- Job server.
- Pipeline server.
- Pipeline upload.
- Run server.
- Go runtime and process metrics for monitoring the controller.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="kfp-api"}
Knative eventing
The knative-eventing
metrics come from the knative-operator
charm that deploys otel-collector
. See its upstream documentation for more details.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="knative-operator", namespace_name="knative-eventing"}
Knative serving
The knative-serving
metrics come from the knative-operator
charm that deploys otel-collector
. See its upstream documentation for more details.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="knative-operator", namespace_name="knative-serving"}
Knative operator
See the knative-operator
upstream documentation for more information on provided metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="knative-operator"}
Metacontroller operator
The metacontroller-operator
provides the following metrics:
- Custom metrics. See its source code for more details.
- Go runtime and process metrics for monitoring the controller.
- Controller runtime metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="metacontroller-operator"}
Minio
See the minio
upstream documentation for more information on provided metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="minio"}
Seldon controller manager
See the seldon-controller-manager
upstream documentation for more information on provided metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="seldon-controller-manager"}
Training operator
The training-operator
provides the following metrics:
- Custom job-related metrics. See its source code for more details.
- Go runtime and process metrics for monitoring the controller.
- Controller runtime metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="training-operator"}
Pvcviewer operator
The pvcviewer-operator
provides the following metrics:
- Go runtime and process metrics for monitoring the controller.
- Controller runtime metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="pvcviewer-operator"}
Kserve controller
The kserve-controller
provides the following metrics:
- Go runtime and process metrics for monitoring the controller.
- Controller runtime metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="kserve-controller"}
Kubeflow profiles
Kubeflow profiles manage two Pebble services:
profile-controller
.kfam
.
Profile controller
The profile-controller
provides the following metrics:
- Custom job-related metrics. See its source code for more details.
- Go runtime and process metrics for monitoring the controller.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="kubeflow-profiles"}
Kfam
The kfam
provides the following metrics:
- Custom job-related metrics. See its source code for more details.
- Go runtime and process metrics for monitoring the controller.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="kubeflow-profiles"}
Tensorboard controller
The tensorboard-controller
provides the following metrics:
- Go runtime and process metrics for monitoring the controller.
- Controller runtime metrics.
You can check its metrics through the Prometheus or Grafana UI using the following query:
{juju_charm="tensorboard-controller"}
Last updated 9 days ago.