AI GCP Model Armor

AI License Required

Overview Examples Configuration reference Changelog

Features

The plugin provides the following content safety capabilities:

Feature	Description
Request and response guardrails	Checks chat requests and chat responses to prevent unsafe content. Controlled by `guarding_mode` (`INPUT`, `OUTPUT`, or `BOTH`).
Single template enforcement	Applies one GCP Model Armor template for all inspections, ensuring consistent filtering. Set with `template_id`.
Reveal blocked categories	Optionally show the categories that triggered blocking (for example, `"hate speech"`). Controlled by `reveal_failure_categories`.
Streaming response inspection	Buffers streaming responses and terminates if unsafe content is detected. Configurable via `response_buffer_size`.
Custom failure messages	Configure user-facing messages with `request_failure_message` and `response_failure_message` when content is blocked.

How it works

The plugin inspects requests and responses using GCP Model Armor:

Request inspection: Chat prompts are intercepted, and the relevant content (by default, the last chat message) is sent to the sanitizeUserPrompt API.
Response inspection: Chat responses are buffered (supporting gzip and streaming) and sent to the sanitizeModelResponse API. SSE streaming is supported with chunk buffering.

Request guarding flow

An incoming request to an LLM (for example, a chat completion) is intercepted by the plugin.
The plugin extracts the relevant content, usually the last user message in the conversation.
The content is submitted to GCP Model Armor’s sanitizeUserPrompt endpoint for analysis.

Response guarding flow

The plugin buffers the upstream response body (including gzipped responses).
It extracts the model’s response content.
The content is sent to GCP Model Armor’s sanitizeModelResponse endpoint for validation.

Sanitization and action

GCP Model Armor evaluates the provided content against the configured template_id.
The plugin interprets the sanitizationResult from GCP.
If a violation is detected (for example, hatred, sexually explicit content, harassment, or jailbreak attempts), the request or response is blocked.
Blocked traffic results in a 400 Bad Request response with the configured request_failure_message or response_failure_message.
If reveal_failure_categories is enabled, the response also lists the categories that triggered blocking.

When configuring template_id in the AI GCP Model Armor plugin, ensure that it aligns with the content safety policies and categories defined in your GCP Model Armor service.

Review whether your organization requires custom categories or additional policy definitions, and integrate them into the selected template to match compliance and safety requirements.

Best practices

The following configuration guidance helps ensure effective content safety enforcement:

Setting	Description
`guarding_mode`	Set to `INPUT` for request-only inspection, `OUTPUT` for response-only, or `BOTH` to guard both directions.
`request_failure_message` / `response_failure_message`	Provide user-friendly error messages when prompts or responses are blocked.
`reveal_failure_categories`	Enable to return details on why content was blocked.
`response_buffer_size`	Tune how much of the upstream response is buffered before inspection; smaller values reduce latency.
Default last message inspection with `text_source`	Keep the default behavior of checking only the last user prompt message for highest accuracy.

Caution: Do not set the Model Armor Floor Setting directly in GCP, as it will cause conflicts with this plugin. See the FAQ entry for this error for more information.

Limitations

Only chat prompts and chat responses are inspected; embeddings and other modalities are not checked.
Inspects one chat message or one response body at a time. Combining multiple messages reduces accuracy.
For SSE streaming, unsafe content may appear briefly before termination with "stop_reason: blocked by content safety".
Only one template_id can be configured per plugin instance.

This means the plugin is conflicting with settings configured in GCP Vertex. We recommend disabling the GCP Model Armor Floor in GCP, as this setting fails in some modes (for example, streaming response mode), and blocks all analytics.

Next Steps

Use the AI GCP Model Armor plugin