Release date 2025/10/01
Feature
-
Added latency and cost observability to realtime API.
-
Added support for Gemini Rerank in native format.
-
Exposed
input_tokens_details.cached_tokens_details.image_tokens_count
in observability metrics.
Bugfix
-
Fixed an issue where the AI Proxy Advanced plugin’s lowest latency load balancer could fail to select a target when only one target was available, leading to 500 errors.
-
Fixed an issue where the ai-retry phase was not correctly setting the namespace in the
kong.plugin.ctx
, causing the ai-proxy-advanced balancer to retry the first target more than once. -
Fixed an issue where the Gemini provider didn’t support Model Garden.
-
Fixed an issue where managed identity could be cached incorrectly.
-
Fixed an issue where Mistral models would return
Unsupported field: seed
when using some inference libraries. -
Fixed an issue where array input wasn’t being properly validated for certain providers. Note that Bedrock, Gemini Public, and Mistral providers don’t support array input, as before.
-
Fixed an issue where some Titan embeddings models reported malformed requests.