Release date 2025/04/15
Bugfix
-
Fixed an issue where AI Proxy and AI Proxy Advanced would use corrupted plugin config.
Release date 2025/04/15
Fixed an issue where AI Proxy and AI Proxy Advanced would use corrupted plugin config.
Release date 2025/03/27
Changed the serialized log key of AI metrics from ai.ai-proxy
to ai.proxy
to avoid conflicts with metrics generated from plugins other than AI Proxy and AI Proxy Advanced. If you are using logging plugins (for example, File Log, HTTP Log, etc.), you will have to update metrics pipeline configurations to reflect this change.
Deprecated preserve
mode in config.route_type
. Use config.llm_format
instead. The preserve
mode setting will be removed in a future release.
Added support for boto3 SDKs for Bedrock provider, and for Google GenAI SDKs for Gemini provider.
Added new priority
balancer algorithm, which allows setting apriority group for each upstream model.
Added the failover_criteria
configuration option, which allows retrying requests to the next upstream server in case of failure.
Added cost to tokens_count_strategy
when using the lowest-usage load balancing strategy.
Added the huggingface
, azure
, vertex
, and bedrock
providers to embeddings. They can be used by the ai-proxy-advanced, ai-semantic-cache, ai-semantic-prompt-guard, and ai-rag-injector plugins.
Allow authentication to Bedrock services with assume roles in AWS.
Added the ability to set a catch-all target in semantic routing.
Fixed an issue where AI upstream URL trailing would be empty.
Fixed an issue where the ai-proxy-advanced plugin failed to failover between providers of different formats.
Fixed an issue where the ai-proxy-advanced plugin identity running failed in retry scenarios.
Release date 2024/12/12
Added support for streaming responses to the AI Proxy Advanced plugin.
Made the
embeddings.model.name
config field a free text entry, enabling use of a
self-hosted (or otherwise compatible) model.
Fixed an issue where stale plugin config was not updated in dbless and hybrid mode.
Fixed an issue where lowest-usage and lowest-latency strategy did not update data points correctly.