Output Policy

The Output Policy acts as a final validation layer, scanning the LLM's generated response before it reaches the end user. This ensures that the agent's output is safe, accurate, and professional, preventing hallucinations or data leaks that might have bypassed the initial prompt checks.

info

Prerequisite: Just like Prompt Policies, ensure your selected LLM (e.g., gpt-4o) is configured in the Models section. Output Policies rely on these models to perform semantic analysis on generated text.

1. PII & Data Privacy

Prevents the LLM from inadvertently leaking sensitive data patterns in its response.

Feature	Logic	Description	Action
Regex Scanner	Non-LLM	Uses pattern matching to find specific strings	Blocks response & shows Fallback Message

Example:
If an LLM generates an internal database ID or a restricted serial number format (e.g., DB-XXXX), the Regex scanner will intercept it.

2. Security & Prompt Integrity

Ensures the output does not contain harmful links or unexpected code structures.

Feature	Logic	Description
Malicious URLs	LLM	Analyzes URLs to detect phishing or malware intent. Can Mask or Block.
Code Detection	LLM	Blocks responses containing code if only natural language is expected.
URL Reachability	Non-LLM	Validates if links in the output are live (HTTP 200). Appends status tags.

tip

URL Reachability is excellent for customer support agents to ensure they aren't sending users to "404 Not Found" pages. It appends a status like [Link Active] or [Link Broken] next to the URL.

3. Moderation & Content Safety

Maintains the brand voice and prevents the generation of restricted content.

Scanners Table

Scanner	Logic	Goal
Toxicity Detection	LLM	Blocks harmful, biased, or offensive AI responses.
Ban Topics	LLM	Ensures the AI doesn't discuss restricted subjects (e.g., politics).
Ban Competitors	LLM	Masks competitor names (e.g., replacing "BrandX" with "****").
Refusal Detection	LLM	Detects if the AI is refusing a valid request and allows a fallback.
Ban Substrings	Non-LLM	Blocks responses containing specific banned words/phrases.

4. Factuality & Relevance

This section is critical for reducing "hallucinations", instances where the AI provides confident but false information.

Feature	Logic	Behavior
Factual Consistency	LLM	Compares the output against the source context. Appends a disclaimer if accuracy is questionable.
Hallucination Detection	LLM	Mild Case: Appends a warning disclaimer. Severe Case: Blocks the entire response.
Gibberish Detection	LLM	Blocks nonsensical or broken text strings generated by the model.

5. Performance & Utility

Provides metadata to the user to improve the consumption experience.

Property	Example Result	Logic
Reading Time	"Estimated reading time: 2 mins"	Word Count / 200 WPM

Summary of Actions

When a guardrail is triggered in the Output Policy, the platform can take three primary actions:

Block: The user never sees the AI response; instead, they see the Fallback Message.
Mask/Redact: Sensitive parts (PII, Competitors) are replaced with symbols or placeholders.
Append: The response is delivered, but with a system-generated note (e.g., Reading Time or Factual Disclaimer).

1. PII & Data Privacy​

2. Security & Prompt Integrity​

3. Moderation & Content Safety​

Scanners Table​

4. Factuality & Relevance​

5. Performance & Utility​

Summary of Actions​