Run:ai Collector

Uses: Metering & Billing Konnect API

How it works

You can install the Collector as a Kubernetes pod in your Run:ai cluster to collect metrics from your Run:ai platform automatically. The collector periodically scrapes the metrics from your Run:ai platform and emits them as CloudEvents to Metering & Billing. This allows you to track usage and monetize Run:ai workloads.

Once you have the usage data ingested into Metering & Billing, you can use it to set up prices and billing for your customers based on their usage.

Example

Let’s say you want to charge your customers $0.2 per GPU minute and $0.05 per CPU minute. The Collector will emit the following events every 30 seconds from your Run:ai workloads:

{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "specversion": "1.0",
  "type": "workload",
  "source": "run_ai",
  "time": "2025-01-01T00:00:00Z",
  "subject": "my-customer-id",
  "data": {
    "name": "my-runai-workload",
    "namespace": "my-runai-benchmark-test",
    "phase": "Running",
    "project": "my-project-id",
    "department": "my-department-id",
    "workload_minutes": 1.0,
    "cpu_limit_core_minutes": 96,
    "cpu_request_core_minutes": 96,
    "cpu_usage_core_minutes": 80,
    "cpu_memory_limit_gigabyte_minutes": 384,
    "cpu_memory_request_gigabyte_minutes": 384,
    "cpu_memory_usage_gigabyte_minutes": 178,
    "gpu_allocation_minutes": 1,
    "gpu_usage_minutes": 1,
    "gpu_memory_request_gigabyte_minutes": 40,
    "gpu_memory_usage_gigabyte_minutes": 27
  }
}

Copied!

Note: The collector normalizes the collected metrics to a minute, which is configurable, so you can set per second, minute, or hour pricing similar to how AWS EC2 pricing works.

Run:ai metrics

The Collector supports the following Run:ai metrics:

Pod metrics

See the following table for supported pod metrics:

Metric Name	Description
`GPU_UTILIZATION_PER_GPU`	GPU utilization percentage per individual GPU
`GPU_UTILIZATION`	Overall GPU utilization percentage for the pod
`GPU_MEMORY_USAGE_BYTES_PER_GPU`	GPU memory usage in bytes per individual GPU
`GPU_MEMORY_USAGE_BYTES`	Total GPU memory usage in bytes for the pod
`CPU_USAGE_CORES`	Number of CPU cores currently being used
`CPU_MEMORY_USAGE_BYTES`	Amount of CPU memory currently being used in bytes
`GPU_GRAPHICS_ENGINE_ACTIVITY_PER_GPU`	Graphics engine utilization percentage per GPU
`GPU_SM_ACTIVITY_PER_GPU`	Streaming Multiprocessor (SM) activity percentage per GPU
`GPU_SM_OCCUPANCY_PER_GPU`	SM occupancy percentage per GPU
`GPU_TENSOR_ACTIVITY_PER_GPU`	Tensor core usage percentage per GPU
`GPU_FP64_ENGINE_ACTIVITY_PER_GPU`	FP64 (double precision) engine activity percentage per GPU
`GPU_FP32_ENGINE_ACTIVITY_PER_GPU`	FP32 (single precision) engine activity percentage per GPU
`GPU_FP16_ENGINE_ACTIVITY_PER_GPU`	FP16 (half precision) engine activity percentage per GPU
`GPU_MEMORY_BANDWIDTH_UTILIZATION_PER_GPU`	Memory bandwidth usage percentage per GPU
`GPU_NVLINK_TRANSMITTED_BANDWIDTH_PER_GPU`	NVLink transmitted bandwidth per GPU
`GPU_NVLINK_RECEIVED_BANDWIDTH_PER_GPU`	NVLink received bandwidth per GPU
`GPU_PCIE_TRANSMITTED_BANDWIDTH_PER_GPU`	PCIe transmitted bandwidth per GPU
`GPU_PCIE_RECEIVED_BANDWIDTH_PER_GPU`	PCIe received bandwidth per GPU
`GPU_SWAP_MEMORY_BYTES_PER_GPU`	Amount of GPU memory swapped to system memory per GPU

Workload metrics

See the following table for supported workload metrics:

Metric Name	Description
`GPU_UTILIZATION`	Overall GPU usage percentage across all GPUs in the workload
`GPU_MEMORY_USAGE_BYTES`	Total GPU memory usage in bytes across all GPUs
`GPU_MEMORY_REQUEST_BYTES`	Requested GPU memory in bytes for the workload
`CPU_USAGE_CORES`	Number of CPU cores currently being used
`CPU_REQUEST_CORES`	Number of CPU cores requested for the workload
`CPU_LIMIT_CORES`	Maximum number of CPU cores allowed for the workload
`CPU_MEMORY_USAGE_BYTES`	Amount of CPU memory currently being used in bytes
`CPU_MEMORY_REQUEST_BYTES`	Requested CPU memory in bytes for the workload
`CPU_MEMORY_LIMIT_BYTES`	Maximum CPU memory allowed in bytes for the workload
`POD_COUNT`	Total number of pods in the workload
`RUNNING_POD_COUNT`	Number of currently running pods in the workload
`GPU_ALLOCATION`	Number of GPUs allocated to the workload

Get started

First, create a new YAML file for the collector configuration. Use the run_ai Redpanda Connect input:

input:
  run_ai:
    url: '${RUNAI_URL:}'
    app_id: '${RUNAI_APP_ID:}'
    app_secret: '${RUNAI_APP_SECRET:}'
    schedule: '*/30 * * * * *'
    metrics_offset: '30s'
    resource_type: 'workload'
    metrics:
      - CPU_LIMIT_CORES
      - CPU_MEMORY_LIMIT_BYTES
      - CPU_MEMORY_REQUEST_BYTES
      - CPU_MEMORY_USAGE_BYTES
      - CPU_REQUEST_CORES
      - CPU_USAGE_CORES
      - GPU_ALLOCATION
      - GPU_MEMORY_REQUEST_BYTES
      - GPU_MEMORY_USAGE_BYTES
      - GPU_UTILIZATION
      - POD_COUNT
      - RUNNING_POD_COUNT
    http:
      timeout: 30s
      retry_count: 1
      retry_wait_time: 100ms
      retry_max_wait_time: 1s

Copied!

Configuration options

See the following table for supported configuration options:

Option	Description	Default	Required
`url`	Run:ai base URL	-	Yes
`app_id`	Run:ai app ID	-	Yes
`app_secret`	Run:ai app secret	-	Yes
`resource_type`	Run:ai resource to collect metrics from (`workload` or `pod`)	`workload`	No
`metrics`	List of Run:ai metrics to collect	All available	No
`schedule`	Cron expression for the scrape interval	`/30 * * * *`	No
`metrics_offset`	Time offset for queries to account for delays in metric availability	`0s`	No
`http`	HTTP client configuration	-	No

Next, configure the mapping from the Run:ai metrics to CloudEvents using bloblang:

pipeline:
  processors:
    - mapping: |
        let duration_seconds = (meta("scrape_interval").parse_duration() / 1000 / 1000 / 1000).round().int64()
        let gpu_allocation_minutes = this.allocatedResources.gpu.number(0) * $duration_seconds / 60
        let cpu_limit_core_minutes = this.metrics.values.CPU_LIMIT_CORES.number(0) * $duration_seconds / 60
        # Add metrics as needed...
        
        root = {
          "id": uuid_v4(),
          "specversion": "1.0",
          "type": meta("resource_type"),
          "source": "run_ai",
          "time": now(),
          "subject": this.name,
          "data": {
            "tenant": this.tenantId,
            "project": this.projectId,
            "department": this.departmentId,
            "cluster": this.clusterId,
            "type": this.type,
            "gpuAllocationMinutes": $gpu_allocation_minutes,
            "cpuLimitCoreMinutes": $cpu_limit_core_minutes,
          }
        }

Copied!

Finally, configure the output:

output:
  label: 'openmeter'
  drop_on:
    error: false
    error_patterns:
      - Bad Request
  output:
    http_client:
      url: '${OPENMETER_URL:https://us.api.konghq.com}/v3/openmeter/events'
      verb: POST
      headers:
        Authorization: 'Bearer $KONNECT_SYSTEM_ACCESS_TOKEN'
        Content-Type: 'application/json'
      timeout: 30s
      retry_period: 15s
      retries: 3
      max_retry_backoff: 1m
      max_in_flight: 64
      batch_as_multipart: false
      drop_on:
        - 400
      batching:
        count: 100
        period: 1s
        processors:
          - metric:
              type: counter
              name: openmeter_events_sent
              value: 1
          - archive:
              format: json_array
      dump_request_log_level: DEBUG

Copied!

Replace $KONNECT_SYSTEM_ACCESS_TOKEN with your own system access token.

Scheduling

The collector runs on a schedule defined by the schedule parameter using cron syntax. It supports:

Standard cron expressions (for example, */30 * * * * * for every 30 seconds)
Duration syntax with the @every prefix (for example, @every 30s)

Resource types

The collector can collect metrics from two different resource types:

workload: Collects metrics at the workload level, which represents a group of pods
pod: Collects metrics at the individual pod level

Installation

The Metering & Billing Collector (a custom Redpanda Connect distribution) is available via the following distribution strategies:

Binaries can be downloaded from the GitHub Releases page.
Container images are available on ghcr.io.
A Helm chart is also available on GitHub Packages.

Run:ai Collector

How it works

Example

Run:ai metrics

Pod metrics

Workload metrics

Get started

Configuration options

Scheduling

Resource types

Installation

Help us make these docs great!

Still need help