Vertex AI provider

Uses: Kong Gateway AI Gateway Admin API deck KIC Konnect API Terraform

Upstream paths

AI Gateway automatically routes requests to the appropriate Gemini Vertex API endpoints. The following table shows the upstream paths used for each capability.

Capability	Upstream path or API
Chat completions	Uses `generateContent` API
Completions	Uses `generateContent` API
Embeddings	Uses `generateContent` API
Function calling	Uses `generateContent` API with function declarations
Files	`/openai/files`
Batches	Uses `batchPredictionJobs` API
Image generations	Uses `generateContent` API
Image edits	Uses `generateContent` API
Video generations	Uses `predictLongRunning` API

Supported capabilities

The following tables show the AI capabilities supported by Gemini Vertex provider when used with the AI Proxy or the AI Proxy Advanced plugin.

Set the plugin’s route_type based on the capability you want to use. See the tables below for supported route types.

Text generation

Support for Gemini Vertex basic text generation capabilities including chat, completions, and embeddings:

Capability	Route type	Model example	Min version
Chat completions	`llm/v1/chat`	gemini-2.5-flash	3.8
Completions	`llm/v1/completions`	gemini-2.5-flash	3.8
Embeddings	`llm/v1/embeddings`	text-embedding-004	3.11

Advanced text generation

Support for Gemini Vertex function calling to allow Gemini Vertex models to use external tools and APIs:

Capability	Route type	Model example	Min version
Function calling	`llm/v1/chat`	gemini-2.5-flash	3.8

Processing

Support for Gemini Vertex file operations, batch operations, assistants, and response handling:

Capability	Route type	Model example	Min version
Files¹	`llm/v1/files`	n/a	3.11
Batches	`llm/v1/batches`	n/a	3.13

¹ Gemini Vertex does not have a dedicated Files API. File storage uses Google Cloud Storage, similar to AWS S3.

Image

Support for Gemini Vertex image generation and editing capabilities:

Capability	Route type	Model example	Min version
Generations	`image/v1/images/generations`	gemini-2.5-flash-preview-image-generation	3.11
Edits	`image/v1/images/edits`	gemini-2.5-flash-preview-image-generation	3.11

For requests with large payloads, consider increasing config.max_request_body_size to three times the raw binary size.

Supported image sizes and formats vary by model. Refer to your provider’s documentation for allowed dimensions and requirements.

Video

Support for Gemini Vertex video generation capabilities:

Capability	Route type	Model example	Min version
Generations	`video/v1/videos/generations`	veo-3.1-generate-001	3.13

For requests with large payloads (video generation), consider increasing config.max_request_body_size to three times the raw binary size.

Gemini Vertex base URL

The base URL is https://aiplatform.googleapis.com/, where {route_type_path} is determined by the capability.

AI Gateway uses this URL automatically. You only need to configure a URL if you’re using a self-hosted or Gemini Vertex-compatible endpoint, in which case set the upstream_url plugin option.

Supported native LLM formats for Gemini Vertex

By default, the AI Proxy plugin uses OpenAI-compatible request formats. Set config.llm_format to a native format to use Gemini Vertex-specific APIs and features.

The following native Gemini Vertex APIs are supported:

LLM format Supported APIs

LLM format	Supported APIs
`gemini`	`/v1/projects/{project_id}/locations/{location}/models/{model_name}:generateContent` `/v1/projects/{project_id}/locations/{location}/models/{model_name}:streamGenerateContent` `/v1/projects/{project_id}/locations/{location}/models/{model_name}:embedContent` `/v1/projects/{project_id}/locations/{location}/models/{model_name}:batchEmbedContent` `/v1/projects/{project_id}/locations/{location}/models/{model_name}:predictLongRunning` `/v1/projects/{project_id}/locations/{location}/rankingConfigs/{config_name}:rank` `/v1/projects/{project_id}/locations/{location}/batchPredictionJobs`

gemini

/v1/projects/{project_id}/locations/{location}/models/{model_name}:generateContent
/v1/projects/{project_id}/locations/{location}/models/{model_name}:streamGenerateContent
/v1/projects/{project_id}/locations/{location}/models/{model_name}:embedContent
/v1/projects/{project_id}/locations/{location}/models/{model_name}:batchEmbedContent
/v1/projects/{project_id}/locations/{location}/models/{model_name}:predictLongRunning
/v1/projects/{project_id}/locations/{location}/rankingConfigs/{config_name}:rank
/v1/projects/{project_id}/locations/{location}/batchPredictionJobs

Configure Gemini Vertex with AI Proxy

To use Gemini Vertex with AI Gateway, configure the AI Proxy or AI Proxy Advanced.

Here’s a minimal configuration for chat completions:

kong.yaml

Copied!

_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      model:
        provider: gemini
        name: gemini-2.0-flash-exp
        options:
          gemini:
            api_endpoint: Bearer ${{ env "DECK_GCP_API_ENDPOINT" }}
            project_id: Bearer ${{ env "DECK_GCP_PROJECT_ID" }}
            location_id: Bearer ${{ env "DECK_GCP_LOCATION_ID" }}
      auth:
        gcp_use_service_account: true
        gcp_service_account_json: Bearer ${{ env "DECK_GCP_SERVICE_ACCOUNT_JSON" }}

curl -i -X POST http://localhost:8001/plugins/ \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
      "name": "ai-proxy",
      "config": {
        "route_type": "llm/v1/chat",
        "model": {
          "provider": "gemini",
          "name": "gemini-2.0-flash-exp",
          "options": {
            "gemini": {
              "api_endpoint": "Bearer '$GCP_API_ENDPOINT'",
              "project_id": "Bearer '$GCP_PROJECT_ID'",
              "location_id": "Bearer '$GCP_LOCATION_ID'"
            }
          }
        },
        "auth": {
          "gcp_use_service_account": true,
          "gcp_service_account_json": "Bearer '$GCP_SERVICE_ACCOUNT_JSON'"
        }
      }
    }
    '

Copied!

Make the following request:

curl -X POST https://{region}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/plugins/ \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $KONNECT_TOKEN" \
    --data '
    {
      "name": "ai-proxy",
      "config": {
        "route_type": "llm/v1/chat",
        "model": {
          "provider": "gemini",
          "name": "gemini-2.0-flash-exp",
          "options": {
            "gemini": {
              "api_endpoint": "Bearer '$GCP_API_ENDPOINT'",
              "project_id": "Bearer '$GCP_PROJECT_ID'",
              "location_id": "Bearer '$GCP_LOCATION_ID'"
            }
          }
        },
        "auth": {
          "gcp_use_service_account": true,
          "gcp_service_account_json": "Bearer '$GCP_SERVICE_ACCOUNT_JSON'"
        }
      }
    }
    '

Copied!

echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
  name: ai-proxy
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
  labels:
    global: 'true'
config:
  route_type: llm/v1/chat
  model:
    provider: gemini
    name: gemini-2.0-flash-exp
    options:
      gemini:
        api_endpoint: Bearer $GCP_API_ENDPOINT
        project_id: Bearer $GCP_PROJECT_ID
        location_id: Bearer $GCP_LOCATION_ID
  auth:
    gcp_use_service_account: true
    gcp_service_account_json: Bearer $GCP_SERVICE_ACCOUNT_JSON
plugin: ai-proxy
" | kubectl apply -f -

Copied!

resource "konnect_gateway_plugin_ai_proxy" "my_ai_proxy" {
  enabled = true

  config = {
    route_type = "llm/v1/chat"

    model = {
      provider = "gemini"
      name = "gemini-2.0-flash-exp"

      options = {

        gemini = {
          api_endpoint = "Bearer var.gcp_api_endpoint"
          project_id = "Bearer var.gcp_project_id"
          location_id = "Bearer var.gcp_location_id"
        }
      }
    }

    auth = {
      gcp_use_service_account = true
      gcp_service_account_json = "Bearer var.gcp_service_account_json"
    }
  }

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
}

Copied!

This example requires the following variables to be added to your manifest. You can specify values at runtime by setting TF_VAR_name=value.

variable "gcp_api_endpoint" {
  type = string
}

Copied!

For more configuration options and examples, see:

AI Proxy examples

AI Proxy Advanced examples

Vertex AI provider

Upstream paths

Supported capabilities

Text generation

Advanced text generation

Processing

Image

Video

Gemini Vertex base URL

Supported native LLM formats for Gemini Vertex

Configure Gemini Vertex with AI Proxy

Tutorials

Help us make these docs great!

Still need help