Run:ai Collector

Incompatible with
on-prem

The Metering & Billing Collector can integrate with Nvidia’s Run:ai to collect allocated and used resources for your AI/ML workloads, including GPUs, CPUs, and memory. This is useful for companies using Run:ai to run GPU workloads that want to bill and invoice their customers based on consumption of allocated and used resources.

How it works

You can install the Collector as a Kubernetes pod in your Run:ai cluster to collect metrics from your Run:ai platform automatically. The collector periodically scrapes the metrics from your Run:ai platform and emits them as CloudEvents to Metering & Billing. This allows you to track usage and monetize Run:ai workloads.

Once you have the usage data ingested into Metering & Billing, you can use it to set up prices and billing for your customers based on their usage.

Example

Let’s say you want to charge your customers $0.2 per GPU minute and $0.05 per CPU minute. The Collector will emit the following events every 30 seconds from your Run:ai workloads:

{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "specversion": "1.0",
  "type": "workload",
  "source": "run_ai",
  "time": "2025-01-01T00:00:00Z",
  "subject": "my-customer-id",
  "data": {
    "name": "my-runai-workload",
    "namespace": "my-runai-benchmark-test",
    "phase": "Running",
    "project": "my-project-id",
    "department": "my-department-id",
    "workload_minutes": 1.0,
    "cpu_limit_core_minutes": 96,
    "cpu_request_core_minutes": 96,
    "cpu_usage_core_minutes": 80,
    "cpu_memory_limit_gigabyte_minutes": 384,
    "cpu_memory_request_gigabyte_minutes": 384,
    "cpu_memory_usage_gigabyte_minutes": 178,
    "gpu_allocation_minutes": 1,
    "gpu_usage_minutes": 1,
    "gpu_memory_request_gigabyte_minutes": 40,
    "gpu_memory_usage_gigabyte_minutes": 27
  }
}

Note: The collector normalizes the collected metrics to a minute, which is configurable, so you can set per second, minute, or hour pricing similar to how AWS EC2 pricing works.

Run:ai metrics

The Collector supports the following Run:ai metrics:

Pod metrics

See the following table for supported pod metrics:

Metric Name

Description

GPU_UTILIZATION_PER_GPU GPU utilization percentage per individual GPU
GPU_UTILIZATION Overall GPU utilization percentage for the pod
GPU_MEMORY_USAGE_BYTES_PER_GPU GPU memory usage in bytes per individual GPU
GPU_MEMORY_USAGE_BYTES Total GPU memory usage in bytes for the pod
CPU_USAGE_CORES Number of CPU cores currently being used
CPU_MEMORY_USAGE_BYTES Amount of CPU memory currently being used in bytes
GPU_GRAPHICS_ENGINE_ACTIVITY_PER_GPU Graphics engine utilization percentage per GPU
GPU_SM_ACTIVITY_PER_GPU Streaming Multiprocessor (SM) activity percentage per GPU
GPU_SM_OCCUPANCY_PER_GPU SM occupancy percentage per GPU
GPU_TENSOR_ACTIVITY_PER_GPU Tensor core usage percentage per GPU
GPU_FP64_ENGINE_ACTIVITY_PER_GPU FP64 (double precision) engine activity percentage per GPU
GPU_FP32_ENGINE_ACTIVITY_PER_GPU FP32 (single precision) engine activity percentage per GPU
GPU_FP16_ENGINE_ACTIVITY_PER_GPU FP16 (half precision) engine activity percentage per GPU
GPU_MEMORY_BANDWIDTH_UTILIZATION_PER_GPU Memory bandwidth usage percentage per GPU
GPU_NVLINK_TRANSMITTED_BANDWIDTH_PER_GPU NVLink transmitted bandwidth per GPU
GPU_NVLINK_RECEIVED_BANDWIDTH_PER_GPU NVLink received bandwidth per GPU
GPU_PCIE_TRANSMITTED_BANDWIDTH_PER_GPU PCIe transmitted bandwidth per GPU
GPU_PCIE_RECEIVED_BANDWIDTH_PER_GPU PCIe received bandwidth per GPU
GPU_SWAP_MEMORY_BYTES_PER_GPU Amount of GPU memory swapped to system memory per GPU

Workload metrics

See the following table for supported workload metrics:

Metric Name

Description

GPU_UTILIZATION Overall GPU usage percentage across all GPUs in the workload
GPU_MEMORY_USAGE_BYTES Total GPU memory usage in bytes across all GPUs
GPU_MEMORY_REQUEST_BYTES Requested GPU memory in bytes for the workload
CPU_USAGE_CORES Number of CPU cores currently being used
CPU_REQUEST_CORES Number of CPU cores requested for the workload
CPU_LIMIT_CORES Maximum number of CPU cores allowed for the workload
CPU_MEMORY_USAGE_BYTES Amount of CPU memory currently being used in bytes
CPU_MEMORY_REQUEST_BYTES Requested CPU memory in bytes for the workload
CPU_MEMORY_LIMIT_BYTES Maximum CPU memory allowed in bytes for the workload
POD_COUNT Total number of pods in the workload
RUNNING_POD_COUNT Number of currently running pods in the workload
GPU_ALLOCATION Number of GPUs allocated to the workload

Get started

First, create a new YAML file for the collector configuration. Use the run_ai Redpanda Connect input:

input:
  run_ai:
    url: '${RUNAI_URL:}'
    app_id: '${RUNAI_APP_ID:}'
    app_secret: '${RUNAI_APP_SECRET:}'
    schedule: '*/30 * * * * *'
    metrics_offset: '30s'
    resource_type: 'workload'
    metrics:
      - CPU_LIMIT_CORES
      - CPU_MEMORY_LIMIT_BYTES
      - CPU_MEMORY_REQUEST_BYTES
      - CPU_MEMORY_USAGE_BYTES
      - CPU_REQUEST_CORES
      - CPU_USAGE_CORES
      - GPU_ALLOCATION
      - GPU_MEMORY_REQUEST_BYTES
      - GPU_MEMORY_USAGE_BYTES
      - GPU_UTILIZATION
      - POD_COUNT
      - RUNNING_POD_COUNT
    http:
      timeout: 30s
      retry_count: 1
      retry_wait_time: 100ms
      retry_max_wait_time: 1s

Configuration options

See the following table for supported configuration options:

Option

Description

Default

Required

url Run:ai base URL - Yes
app_id Run:ai app ID - Yes
app_secret Run:ai app secret - Yes
resource_type Run:ai resource to collect metrics from (workload or pod) workload No
metrics List of Run:ai metrics to collect All available No
schedule Cron expression for the scrape interval */30 * * * * * No
metrics_offset Time offset for queries to account for delays in metric availability 0s No
http HTTP client configuration - No

Next, configure the mapping from the Run:ai metrics to CloudEvents using bloblang:

pipeline:
  processors:
    - mapping: |
        let duration_seconds = (meta("scrape_interval").parse_duration() / 1000 / 1000 / 1000).round().int64()
        let gpu_allocation_minutes = this.allocatedResources.gpu.number(0) * $duration_seconds / 60
        let cpu_limit_core_minutes = this.metrics.values.CPU_LIMIT_CORES.number(0) * $duration_seconds / 60
        # Add metrics as needed...
        
        root = {
          "id": uuid_v4(),
          "specversion": "1.0",
          "type": meta("resource_type"),
          "source": "run_ai",
          "time": now(),
          "subject": this.name,
          "data": {
            "tenant": this.tenantId,
            "project": this.projectId,
            "department": this.departmentId,
            "cluster": this.clusterId,
            "type": this.type,
            "gpuAllocationMinutes": $gpu_allocation_minutes,
            "cpuLimitCoreMinutes": $cpu_limit_core_minutes,
          }
        }

Finally, configure the output:

output:
  label: 'openmeter'
  drop_on:
    error: false
    error_patterns:
      - Bad Request
  output:
    http_client:
      url: '${OPENMETER_URL:https://us.api.konghq.com}/v3/openmeter/events'
      verb: POST
      headers:
        Authorization: 'Bearer $KONNECT_SYSTEM_ACCESS_TOKEN'
        Content-Type: 'application/json'
      timeout: 30s
      retry_period: 15s
      retries: 3
      max_retry_backoff: 1m
      max_in_flight: 64
      batch_as_multipart: false
      drop_on:
        - 400
      batching:
        count: 100
        period: 1s
        processors:
          - metric:
              type: counter
              name: openmeter_events_sent
              value: 1
          - archive:
              format: json_array
      dump_request_log_level: DEBUG

Replace $KONNECT_SYSTEM_ACCESS_TOKEN with your own system access token.

Scheduling

The collector runs on a schedule defined by the schedule parameter using cron syntax. It supports:

  • Standard cron expressions (for example, */30 * * * * * for every 30 seconds)
  • Duration syntax with the @every prefix (for example, @every 30s)

Resource types

The collector can collect metrics from two different resource types:

  • workload: Collects metrics at the workload level, which represents a group of pods
  • pod: Collects metrics at the individual pod level

Installation

The Metering & Billing Collector (a custom Redpanda Connect distribution) is available via the following distribution strategies:

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!