Gen AI OpenTelemetry spans attributes reference

v3.13+ AI Gateway supports OpenTelemetry instrumentation for generative AI traffic. When the OpenTelemetry (OTEL) plugin is enabled in AI Gateway, a set of Gen AI-specific attributes are emitted on tracing spans. These attributes complement the core tracing instrumentations described in the Kong Gateway tracing guide, giving insight into the Gen AI request lifecycle (inputs, model, and outputs), usage, and tool/agent interactions.

v3.14+ A2A agent traffic is also instrumented via the AI A2A Proxy plugin.

You can export these attributes via a supported backend such as Jaeger configured through Kong’s OpenTelemetry plugin or the Zipkin plugin to:

  • Inspect which model or provider handled a request
  • Track conversation/session identifiers across requests
  • Analyze prompt structure (system vs. user vs. tool messages)
  • Evaluate model parameters (such as temperature, top-k)
  • Measure tool-call behavior (which tools were invoked, and their metadata)
  • Monitor token usage (input vs. output) for cost or performance analysis

The span data is sent to the configured OTEL endpoint through the existing tracing plugins. Use the OpenTelemetry plugin or Zipkin plugin to export these spans to backends such as Jaeger.

This page covers span attributes (per-request tracing data). AI Gateway also supports OTLP metrics (aggregated counters and histograms for latency, token usage, cost, and error rates). See the Gen AI OpenTelemetry metrics reference for details.

Collecting telemetry data

To set up an OpenTelemetry backend, you need support for OTLP over HTTP with Protobuf encoding. You can:

  • Send data directly to an OpenTelemetry-compatible backend that natively supports OTLP over HTTP with Protobuf encoding, like Jaeger (v1.35.0+).

    This is the simplest setup, since it doesn’t require any additional components between the data plane and the backend.

  • Use the OpenTelemetry Collector, which acts as an intermediary between the data plane and one or more backends.

    OTEL Collector can receive all OpenTelemetry signals supported by the OpenTelemetry plugin, including traces, metrics, and logs, and then process, transform, or route that data before exporting it to a compatible backend.

    This option is useful when you need capabilities such as signal fan-out, filtering, enrichment, batching, or exporting to multiple backends. The OpenTelemetry Collector supports a wide range of exporters, available at open-telemetry/opentelemetry-collector-contrib.

Check OpenTelemetry and Kong Gateway tracing documentation for more details about OpenTelemetry and tracing in Kong Gateway.

Some Gen AI span attributes can include sensitive request or response payload data. In particular, gen_ai.input.messages and gen_ai.output.messages may contain prompts, model outputs, PII, secrets, or credentials. Review your tracing, retention, access-control, and redaction requirements before enabling or exporting payload-related tracing data.

Span attribute reference

Gen AI span attributes v3.13+

Gen AI tracing span emitted for AI Proxy and AI Proxy Advanced requests.

The following span attributes use the kong.gen_ai prefix:

Attribute

Attribute description

gen_ai.operation.name Operation requested, such as chat or embeddings.
gen_ai.provider.name Name of the Gen AI provider.
gen_ai.request.model Model name targeted by the request.
gen_ai.request.max_tokens Maximum token limit configured for the request.
gen_ai.request.temperature Sampling temperature configured for the request.
gen_ai.input.messages Array of input messages sent to the model.
gen_ai.output.type Output payload type, such as json.
gen_ai.output.messages Array containing the full model response payload.
gen_ai.response.id Unique identifier returned by the provider for the response.
gen_ai.response.model Model name reported by the provider in the response.
gen_ai.response.finish_reasons Array of finish reasons returned by the provider.
gen_ai.usage.input_tokens Number of input tokens consumed by the request.
gen_ai.usage.output_tokens Number of output tokens generated in the response.

Gen AI tool call span attributes v3.13+

Gen AI tracing span emitted when the provider response includes a tool call.

The following span attributes use the kong.gen_ai prefix:

Attribute

Attribute description

gen_ai.operation.name Operation requested, such as chat or embeddings.
gen_ai.provider.name Name of the Gen AI provider.
gen_ai.request.model Model name targeted by the request.
gen_ai.request.max_tokens Maximum token limit configured for the request.
gen_ai.request.temperature Sampling temperature configured for the request.
gen_ai.response.finish_reasons Array of finish reasons returned by the provider.
gen_ai.response.id Unique identifier returned by the provider for the response.
gen_ai.response.model Model name reported by the provider in the response.
gen_ai.tool.call.id Unique identifier for the specific tool call.
gen_ai.tool.name Name of the tool or function requested by the model.
gen_ai.tool.type Tool type, such as function.
gen_ai.usage.input_tokens Number of input tokens consumed by the request.
gen_ai.usage.output_tokens Number of output tokens generated in the response.
gen_ai.output.type Output payload type, such as json.

A2A span attributes v3.14+

A2A tracing span emitted for AI A2A Proxy requests.

The following span attributes use the kong.a2a prefix or the rpc prefix:

Attribute

Attribute description

kong.a2a.protocol.version A2A protocol version used for the request.
rpc.system RPC protocol used by the request, such as jsonrpc.
rpc.method RPC method invoked by the client.
kong.a2a.task.id Identifier of the A2A task.
kong.a2a.task.state Current state of the A2A task.
kong.a2a.context.id Identifier of the A2A conversation context.
kong.a2a.operation A2A operation name, such as message/send.

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!