AI GCP Model Armor

AI License Required
Related Documentation
Made by
Kong Inc.
Supported Gateway Topologies
hybrid db-less traditional
Supported Konnect Deployments
hybrid cloud-gateways serverless
Compatible Protocols
grpc grpcs http https
Minimum Version
Kong Gateway - 3.12
Tags
#ai
AI Gateway Enterprise: This plugin is only available as part of our AI Gateway Enterprise offering.

The GCP Model Armor plugin integrates Kong AI Gateway with Google Cloud’s Model Armor service to enforce content safety guardrails on AI requests and responses. It leverages GCP SaaS APIs to inspect prompts and model outputs, preventing unsafe content from being processed or returned to users.

Features

The plugin provides the following content safety capabilities:

Feature

Description

Request and response guardrails Checks chat requests and chat responses to prevent unsafe content. Controlled by guarding_mode (INPUT, OUTPUT, or BOTH).
Single template enforcement Applies one GCP Model Armor template for all inspections, ensuring consistent filtering. Set with template_id.
Reveal blocked categories Optionally show the categories that triggered blocking (for example, "hate speech"). Controlled by reveal_failure_categories.
Streaming response inspection Buffers streaming responses and terminates if unsafe content is detected. Configurable via response_buffer_size.
Custom failure messages Configure user-facing messages with request_failure_message and response_failure_message when content is blocked.

How it works

The plugin inspects requests and responses using GCP Model Armor:

  • Request inspection: Chat prompts are intercepted, and the relevant content (by default, the last chat message) is sent to the sanitizeUserPrompt API.
  • Response inspection: Chat responses are buffered (supporting gzip and streaming) and sent to the sanitizeModelResponse API. SSE streaming is supported with chunk buffering.

Request guarding flow

  1. An incoming request to an LLM (for example, a chat completion) is intercepted by the plugin.
  2. The plugin extracts the relevant content, usually the last user message in the conversation.
  3. The content is submitted to GCP Model Armor’s sanitizeUserPrompt endpoint for analysis.

Response guarding flow

  1. The plugin buffers the upstream response body (including gzipped responses).
  2. It extracts the model’s response content.
  3. The content is sent to GCP Model Armor’s sanitizeModelResponse endpoint for validation.

Sanitization and action

  1. GCP Model Armor evaluates the provided content against the configured template_id.
  2. The plugin interprets the sanitizationResult from GCP.
  3. If a violation is detected (for example, hatred, sexually explicit content, harassment, or jailbreak attempts), the request or response is blocked.
  4. Blocked traffic results in a 400 Bad Request response with the configured request_failure_message or response_failure_message.
  5. If reveal_failure_categories is enabled, the response also lists the categories that triggered blocking.

When configuring template_id in the AI GCP Model Armor plugin, ensure that it aligns with the content safety policies and categories defined in your GCP Model Armor service.

Review whether your organization requires custom categories or additional policy definitions, and integrate them into the selected template to match compliance and safety requirements.

Best practices

The following configuration guidance helps ensure effective content safety enforcement:

Setting

Description

guarding_mode Set to INPUT for request-only inspection, OUTPUT for response-only, or BOTH to guard both directions.
request_failure_message / response_failure_message Provide user-friendly error messages when prompts or responses are blocked.
reveal_failure_categories Enable to return details on why content was blocked.
response_buffer_size Tune how much of the upstream response is buffered before inspection; smaller values reduce latency.
Default last message inspection with text_source Keep the default behavior of checking only the last user prompt message for highest accuracy.

Caution: Do not set the Model Armor Floor Setting directly in GCP, as it will cause conflicts with this plugin. See the FAQ entry for this error for more information.

Limitations

  • Only chat prompts and chat responses are inspected; embeddings and other modalities are not checked.
  • Inspects one chat message or one response body at a time. Combining multiple messages reduces accuracy.
  • For SSE streaming, unsafe content may appear briefly before termination with "stop_reason: blocked by content safety".
  • Only one template_id can be configured per plugin instance.

FAQs

If you see the following error:

{
  "reason": "MODEL_ARMOR",
  "message": "Blocked by Model Armor Floor Setting: The prompt violated X, Y, and Z filters.",
  "error": true
}

This means the plugin is conflicting with settings configured in GCP Vertex. We recommend disabling the GCP Model Armor Floor in GCP, as this setting fails in some modes (for example, streaming response mode), and blocks all analytics.

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!