The GCP Model Armor plugin integrates Kong AI Gateway with Google Cloud’s Model Armor service to enforce content safety guardrails on AI requests and responses. It leverages GCP SaaS APIs to inspect prompts and model outputs, preventing unsafe content from being processed or returned to users.

AI GCP Model Armor
AI License RequiredAI Gateway Enterprise: This plugin is only available as part of our AI Gateway Enterprise offering.
Features
The plugin provides the following content safety capabilities:
Feature |
Description |
---|---|
Request and response guardrails |
Checks chat requests and chat responses to prevent unsafe content. Controlled by guarding_mode (INPUT , OUTPUT , or BOTH ).
|
Single template enforcement |
Applies one GCP Model Armor template for all inspections, ensuring consistent filtering. Set with template_id .
|
Reveal blocked categories |
Optionally show the categories that triggered blocking (for example, "hate speech" ). Controlled by reveal_failure_categories .
|
Streaming response inspection |
Buffers streaming responses and terminates if unsafe content is detected. Configurable via response_buffer_size .
|
Custom failure messages |
Configure user-facing messages with request_failure_message and response_failure_message when content is blocked.
|
How it works
The plugin inspects requests and responses using GCP Model Armor:
- Request inspection: Chat prompts are intercepted, and the relevant content (by default, the last chat message) is sent to the sanitizeUserPrompt API.
- Response inspection: Chat responses are buffered (supporting gzip and streaming) and sent to the sanitizeModelResponse API. SSE streaming is supported with chunk buffering.
Request guarding flow
- An incoming request to an LLM (for example, a chat completion) is intercepted by the plugin.
- The plugin extracts the relevant content, usually the last user message in the conversation.
- The content is submitted to GCP Model Armor’s
sanitizeUserPrompt
endpoint for analysis.
Response guarding flow
- The plugin buffers the upstream response body (including gzipped responses).
- It extracts the model’s response content.
- The content is sent to GCP Model Armor’s
sanitizeModelResponse
endpoint for validation.
Sanitization and action
- GCP Model Armor evaluates the provided content against the configured
template_id
. - The plugin interprets the
sanitizationResult
from GCP. - If a violation is detected (for example, hatred, sexually explicit content, harassment, or jailbreak attempts), the request or response is blocked.
- Blocked traffic results in a
400 Bad Request
response with the configuredrequest_failure_message
orresponse_failure_message
. - If
reveal_failure_categories
is enabled, the response also lists the categories that triggered blocking.
When configuring
template_id
in the AI GCP Model Armor plugin, ensure that it aligns with the content safety policies and categories defined in your GCP Model Armor service.Review whether your organization requires custom categories or additional policy definitions, and integrate them into the selected template to match compliance and safety requirements.
Best practices
The following configuration guidance helps ensure effective content safety enforcement:
Setting |
Description |
---|---|
guarding_mode
|
Set to INPUT for request-only inspection, OUTPUT for response-only, or BOTH to guard both directions.
|
request_failure_message / response_failure_message
|
Provide user-friendly error messages when prompts or responses are blocked. |
reveal_failure_categories
|
Enable to return details on why content was blocked. |
response_buffer_size
|
Tune how much of the upstream response is buffered before inspection; smaller values reduce latency. |
Default last message inspection with text_source
|
Keep the default behavior of checking only the last user prompt message for highest accuracy. |
Caution: Do not set the Model Armor Floor Setting directly in GCP, as it will cause conflicts with this plugin. See the FAQ entry for this error for more information.
Limitations
- Only chat prompts and chat responses are inspected; embeddings and other modalities are not checked.
- Inspects one chat message or one response body at a time. Combining multiple messages reduces accuracy.
- For SSE streaming, unsafe content may appear briefly before termination with
"stop_reason: blocked by content safety"
. - Only one
template_id
can be configured per plugin instance.
FAQs
What do I do if I see the error Blocked by Model Armor Floor Setting
?
If you see the following error:
{
"reason": "MODEL_ARMOR",
"message": "Blocked by Model Armor Floor Setting: The prompt violated X, Y, and Z filters.",
"error": true
}
This means the plugin is conflicting with settings configured in GCP Vertex. We recommend disabling the GCP Model Armor Floor in GCP, as this setting fails in some modes (for example, streaming response mode), and blocks all analytics.