Gen AI OpenTelemetry metrics reference

Tech Preview and uses: AI Gateway Kong Gateway

Prerequisites

To collect AI OTel metrics, enable the following settings:

Setting	Plugin	Required for
`config.metrics.enable_ai_metrics`: `true`	OpenTelemetry	All AI metrics
`config.metrics.endpoint`	OpenTelemetry	All AI metrics (set to a valid OTLP-compatible metrics endpoint)
`config.logging.log_statistics`: `true`	AI Proxy or AI Proxy Advanced	Gen AI metrics
`config.logging.log_statistics`: `true`	AI MCP Proxy	MCP metrics
`config.logging.log_statistics`: `true`	AI A2A Proxy	A2A metrics

Some metrics have additional requirements:

gen_ai.server.request.duration and mcp.client.operation.duration require config.metrics.enable_latency_metrics set to true in the OpenTelemetry plugin.
The error.type attribute on duration metrics requires config.metrics.enable_request_metrics set to true in the OpenTelemetry plugin.

Gen AI metrics (OTel semantic conventions)

These metrics follow the OpenTelemetry Gen AI semantic conventions. They capture request duration, upstream latency, token usage, and streaming performance.

gen_ai.client.operation.duration

Total time Kong Gateway spends processing a Gen AI operation, such as an LLM request. Kong Gateway acts as the client calling the Gen AI provider.

Type: Histogram
Unit: s (seconds)

Attributes:

Attribute	Description
`gen_ai.provider.name`	Name of the Gen AI provider.
`gen_ai.request.model`	Model name targeted by the request.
`gen_ai.response.model`	Model name reported by the provider in the response.
`gen_ai.operation.name`	Operation requested, such as `chat` or `embeddings`.
`kong.workspace.name`	Name of the Workspace.
`kong.auth.consumer.name`	Name of the authenticated Consumer.
`kong.gen_ai.request.mode`	Request mode: `oneshot`, `stream`, or `realtime`.
`error.type`	Error type, if the request failed. Requires `enable_request_metrics`.

gen_ai.server.request.duration

Time the LLM provider spends processing the request (upstream latency). Requires enable_latency_metrics set to true.

Type: Histogram
Unit: s (seconds)

Attributes: Same as gen_ai.client.operation.duration.

gen_ai.client.token.usage

Number of tokens consumed by the Gen AI operation. Each data point is labeled with a gen_ai.token.type attribute that identifies the token category.

Type: Counter
Unit: {token}

Attribute	Description
`gen_ai.provider.name`	Name of the Gen AI provider.
`gen_ai.request.model`	Model name targeted by the request.
`gen_ai.response.model`	Model name reported by the provider in the response.
`gen_ai.token.type`	Token category: `input`, `output`, or `total`.
`gen_ai.operation.name`	Operation requested, such as `chat` or `embeddings`.
`kong.workspace.name`	Name of the Workspace.
`kong.auth.consumer.name`	Name of the authenticated Consumer.
`kong.gen_ai.request.mode`	Request mode: `oneshot`, `stream`, or `realtime`.

gen_ai.server.time_to_first_token

Time from when the model server receives the request until the first output token is generated. Relevant for streaming responses.

Type: Histogram
Unit: s (seconds)

Attribute	Description
`gen_ai.provider.name`	Name of the Gen AI provider.
`gen_ai.request.model`	Model name targeted by the request.
`gen_ai.response.model`	Model name reported by the provider in the response.
`gen_ai.operation.name`	Operation requested, such as `chat` or `embeddings`.
`kong.workspace.name`	Name of the Workspace.
`kong.auth.consumer.name`	Name of the authenticated Consumer.
`kong.gen_ai.request.mode`	Request mode: `oneshot`, `stream`, or `realtime`.

gen_ai.server.time_per_output_token

Time between successive output tokens generated by the model server after the first token. Measures inter-token latency for streaming responses.

Type: Histogram
Unit: s (seconds)

Attributes: Same as gen_ai.server.time_to_first_token.

Kong Gen AI metrics

These metrics use the kong.gen_ai.* namespace and capture Kong-specific AI observability data, including cost tracking, cache and RAG latency, and AWS Guardrails processing time.

kong.gen_ai.llm.cost

Cost of AI requests. To populate this metric, define model.options.input_cost and model.options.output_cost in the AI Proxy or AI Proxy Advanced plugin configuration.

Type: Counter
Unit: {cost}

Attribute	Description
`gen_ai.provider.name`	Name of the Gen AI provider.
`gen_ai.request.model`	Model name targeted by the request.
`gen_ai.response.model`	Model name reported by the provider in the response.
`gen_ai.operation.name`	Operation requested, such as `chat` or `embeddings`.
`kong.gen_ai.cache.status`	Cache status: `hit` or empty if not cached.
`kong.gen_ai.vector_db`	Vector database used for caching, such as `redis`.
`kong.gen_ai.embeddings.provider`	Embeddings provider used for caching.
`kong.gen_ai.embeddings.model`	Embeddings model used for caching.
`kong.workspace.name`	Name of the Workspace.
`kong.auth.consumer.name`	Name of the authenticated Consumer.
`kong.gen_ai.request.mode`	Request mode: `oneshot`, `stream`, or `realtime`.

kong.gen_ai.cache.fetch.latency

Time to fetch a response from the semantic cache.

Type: Histogram
Unit: s (seconds)

Attributes: Same as kong.gen_ai.llm.cost.

kong.gen_ai.cache.embeddings.latency

Time to generate embeddings during cache operations.

Type: Histogram
Unit: s (seconds)

Attributes: Same as kong.gen_ai.llm.cost.

kong.gen_ai.rag.fetch.latency

Time to fetch data from a RAG (Retrieval-Augmented Generation) source.

Type: Histogram
Unit: s (seconds)

Attributes: Same as kong.gen_ai.llm.cost.

kong.gen_ai.rag.embeddings.latency

Time to generate embeddings for RAG operations.

Type: Histogram
Unit: s (seconds)

Attributes: Same as kong.gen_ai.llm.cost.

kong.gen_ai.aws.guardrails.latency

Time for AWS Guardrails to process a request.

Type: Histogram
Unit: s (seconds)

Attribute	Description
`kong.gen_ai.aws.guardrails.id`	ID of the AWS Guardrails configuration.
`kong.gen_ai.aws.guardrails.version`	Version of the AWS Guardrails configuration.
`kong.gen_ai.aws.guardrails.mode`	Mode of the AWS Guardrails evaluation.
`kong.gen_ai.aws.guardrails.region`	AWS region of the Guardrails service.
`kong.workspace.name`	Name of the Workspace.
`kong.auth.consumer.name`	Name of the authenticated Consumer.

MCP metrics

These metrics provide observability into MCP (Model Context Protocol) server interactions, including latency, response sizes, errors, and ACL decisions.

Duration of the MCP request as observed by the sender. Only available when the AI MCP Proxy plugin is in passthrough-listener mode (the upstream is an MCP server). Requires enable_latency_metrics set to true.

Type: Histogram
Unit: s (seconds)

Attribute	Description
`kong.service.name`	Name of the Gateway Service.
`kong.route.name`	Name of the Route.
`kong.workspace.name`	Name of the Workspace.
`mcp.method.name`	MCP method name, such as `tools/call`.
`gen_ai.tool.name`	Name of the tool invoked.
`error.type`	JSON-RPC error code, if the request failed.
`gen_ai.operation.name`	Operation name, such as `execute_tool` for `tools/call`.

mcp.server.operation.duration

Duration of the MCP request as observed by the receiver.

Type: Histogram
Unit: s (seconds)

Attributes: Same as mcp.client.operation.duration.

kong.gen_ai.mcp.response.size

Size of the MCP response body.

Type: Histogram
Unit: By (bytes)

Attribute	Description
`kong.service.name`	Name of the Gateway Service.
`kong.route.name`	Name of the Route.
`kong.workspace.name`	Name of the Workspace.
`mcp.method.name`	MCP method name, such as `tools/call`.
`gen_ai.tool.name`	Name of the tool invoked.

kong.gen_ai.mcp.request.error.count

Number of MCP request errors.

Type: Counter
Unit: {error}

Attribute	Description
`kong.service.name`	Name of the Gateway Service.
`kong.route.name`	Name of the Route.
`kong.workspace.name`	Name of the Workspace.
`mcp.method.name`	MCP method name, such as `tools/call`.
`gen_ai.tool.name`	Name of the tool invoked.
`error.type`	JSON-RPC error code.

kong.gen_ai.mcp.acl.allowed

Number of MCP requests allowed by ACL rules.

Type: Counter
Unit: {request}

Attribute	Description
`kong.service.name`	Name of the Gateway Service.
`kong.route.name`	Name of the Route.
`kong.workspace.name`	Name of the Workspace.
`kong.gen_ai.mcp.primitive`	MCP primitive type, such as `tool`.
`kong.gen_ai.mcp.primitive_name`	Name of the MCP primitive.

kong.gen_ai.mcp.acl.denied

Number of MCP requests denied by ACL rules.

Type: Counter
Unit: {request}

Attributes: Same as kong.gen_ai.mcp.acl.allowed.

A2A metrics

These metrics provide observability into A2A (Agent-to-Agent) traffic, including request volume, latency, response sizes, and task state transitions.

kong.gen_ai.a2a.request.count

Total number of A2A requests.

Type: Counter
Unit: {request}

Attribute	Description
`kong.service.name`	Name of the Gateway Service.
`kong.route.name`	Name of the Route.
`kong.workspace.name`	Name of the Workspace.
`kong.gen_ai.a2a.method`	A2A method name.
`kong.gen_ai.a2a.binding`	A2A binding type.

kong.gen_ai.a2a.request.duration

Duration of an A2A request.

Type: Histogram
Unit: s (seconds)

Attributes: Same as kong.gen_ai.a2a.request.count.

kong.gen_ai.a2a.response.size

Size of the A2A response body.

Type: Histogram
Unit: By (bytes)

Attributes: Same as kong.gen_ai.a2a.request.count.

kong.gen_ai.a2a.ttfb

Time to first byte for A2A streaming responses.

Type: Histogram
Unit: s (seconds)

Attributes: Same as kong.gen_ai.a2a.request.count.

kong.gen_ai.a2a.request.error.count

Number of A2A request errors.

Type: Counter
Unit: {error}

Attribute	Description
`kong.service.name`	Name of the Gateway Service.
`kong.route.name`	Name of the Route.
`kong.workspace.name`	Name of the Workspace.
`kong.gen_ai.a2a.method`	A2A method name.
`kong.gen_ai.a2a.binding`	A2A binding type.
`kong.gen_ai.a2a.error.type`	Type of the A2A error.

kong.gen_ai.a2a.task.state.count

Number of A2A task state transitions.

Type: Counter
Unit: {state}

Attribute	Description
`kong.service.name`	Name of the Gateway Service.
`kong.route.name`	Name of the Route.
`kong.workspace.name`	Name of the Workspace.
`kong.gen_ai.a2a.task.state`	Task state, such as `completed`, `failed`, or `in_progress`.

Gen AI OpenTelemetry metrics reference

Prerequisites

Gen AI metrics (OTel semantic conventions)

gen_ai.client.operation.duration

gen_ai.server.request.duration

gen_ai.client.token.usage

gen_ai.server.time_to_first_token

gen_ai.server.time_per_output_token

Kong Gen AI metrics

kong.gen_ai.llm.cost

kong.gen_ai.cache.fetch.latency

kong.gen_ai.cache.embeddings.latency

kong.gen_ai.rag.fetch.latency

kong.gen_ai.rag.embeddings.latency

kong.gen_ai.aws.guardrails.latency

MCP metrics

mcp.client.operation.duration

mcp.server.operation.duration

kong.gen_ai.mcp.response.size

kong.gen_ai.mcp.request.error.count

kong.gen_ai.mcp.acl.allowed

kong.gen_ai.mcp.acl.denied

A2A metrics

kong.gen_ai.a2a.request.count

kong.gen_ai.a2a.request.duration

kong.gen_ai.a2a.response.size

kong.gen_ai.a2a.ttfb

kong.gen_ai.a2a.request.error.count

kong.gen_ai.a2a.task.state.count

Help us make these docs great!

Still need help?