Output Policy
The Output Policy acts as a final validation layer, scanning the LLM's generated response before it reaches the end user. This ensures that the agent's output is safe, accurate, and professional, preventing hallucinations or data leaks that might have bypassed the initial prompt checks.
Prerequisite: Just like Prompt Policies, ensure your selected LLM (e.g., gpt-4o) is configured in the Models section. Output Policies rely on these models to perform semantic analysis on generated text.
1. PII & Data Privacy
Prevents the LLM from inadvertently leaking sensitive data patterns in its response.
| Feature | Logic | Description | Action |
|---|---|---|---|
| Regex Scanner | Non-LLM | Uses pattern matching to find specific strings | Blocks response & shows Fallback Message |
Example:
If an LLM generates an internal database ID or a restricted serial number format (e.g., DB-XXXX), the Regex scanner will intercept it.
2. Security & Prompt Integrity
Ensures the output does not contain harmful links or unexpected code structures.
| Feature | Logic | Description |
|---|---|---|
| Malicious URLs | LLM | Analyzes URLs to detect phishing or malware intent. Can Mask or Block. |
| Code Detection | LLM | Blocks responses containing code if only natural language is expected. |
| URL Reachability | Non-LLM | Validates if links in the output are live (HTTP 200). Appends status tags. |
URL Reachability is excellent for customer support agents to ensure they aren't sending users to "404 Not Found" pages. It appends a status like [Link Active] or [Link Broken] next to the URL.
3. Moderation & Content Safety
Maintains the brand voice and prevents the generation of restricted content.
Scanners Table
| Scanner | Logic | Goal |
|---|---|---|
| Toxicity Detection | LLM | Blocks harmful, biased, or offensive AI responses. |
| Ban Topics | LLM | Ensures the AI doesn't discuss restricted subjects (e.g., politics). |
| Ban Competitors | LLM | Masks competitor names (e.g., replacing "BrandX" with "****"). |
| Refusal Detection | LLM | Detects if the AI is refusing a valid request and allows a fallback. |
| Ban Substrings | Non-LLM | Blocks responses containing specific banned words/phrases. |
4. Factuality & Relevance
This section is critical for reducing "hallucinations", instances where the AI provides confident but false information.
| Feature | Logic | Behavior |
|---|---|---|
| Factual Consistency | LLM | Compares the output against the source context. Appends a disclaimer if accuracy is questionable. |
| Hallucination Detection | LLM | Mild Case: Appends a warning disclaimer. Severe Case: Blocks the entire response. |
| Gibberish Detection | LLM | Blocks nonsensical or broken text strings generated by the model. |
5. Performance & Utility
Provides metadata to the user to improve the consumption experience.
| Property | Example Result | Logic |
|---|---|---|
| Reading Time | "Estimated reading time: 2 mins" | Word Count / 200 WPM |
Summary of Actions
When a guardrail is triggered in the Output Policy, the platform can take three primary actions:
- Block: The user never sees the AI response; instead, they see the Fallback Message.
- Mask/Redact: Sensitive parts (PII, Competitors) are replaced with symbols or placeholders.
- Append: The response is delivered, but with a system-generated note (e.g., Reading Time or Factual Disclaimer).