Skip to main content

Output Policy

The Output Policy acts as a final validation layer, scanning the LLM's generated response before it reaches the end user. This ensures that the agent's output is safe, accurate, and professional, preventing hallucinations or data leaks that might have bypassed the initial prompt checks.

info

Prerequisite: Just like Prompt Policies, ensure your selected LLM (e.g., gpt-4o) is configured in the Models section. Output Policies rely on these models to perform semantic analysis on generated text.

1. PII & Data Privacy

Prevents the LLM from inadvertently leaking sensitive data patterns in its response.

FeatureLogicDescriptionAction
Regex ScannerNon-LLMUses pattern matching to find specific stringsBlocks response & shows Fallback Message

Example:
If an LLM generates an internal database ID or a restricted serial number format (e.g., DB-XXXX), the Regex scanner will intercept it.

2. Security & Prompt Integrity

Ensures the output does not contain harmful links or unexpected code structures.

FeatureLogicDescription
Malicious URLsLLMAnalyzes URLs to detect phishing or malware intent. Can Mask or Block.
Code DetectionLLMBlocks responses containing code if only natural language is expected.
URL ReachabilityNon-LLMValidates if links in the output are live (HTTP 200). Appends status tags.
tip

URL Reachability is excellent for customer support agents to ensure they aren't sending users to "404 Not Found" pages. It appends a status like [Link Active] or [Link Broken] next to the URL.

3. Moderation & Content Safety

Maintains the brand voice and prevents the generation of restricted content.

Scanners Table

ScannerLogicGoal
Toxicity DetectionLLMBlocks harmful, biased, or offensive AI responses.
Ban TopicsLLMEnsures the AI doesn't discuss restricted subjects (e.g., politics).
Ban CompetitorsLLMMasks competitor names (e.g., replacing "BrandX" with "****").
Refusal DetectionLLMDetects if the AI is refusing a valid request and allows a fallback.
Ban SubstringsNon-LLMBlocks responses containing specific banned words/phrases.

4. Factuality & Relevance

This section is critical for reducing "hallucinations", instances where the AI provides confident but false information.

FeatureLogicBehavior
Factual ConsistencyLLMCompares the output against the source context. Appends a disclaimer if accuracy is questionable.
Hallucination DetectionLLMMild Case: Appends a warning disclaimer. Severe Case: Blocks the entire response.
Gibberish DetectionLLMBlocks nonsensical or broken text strings generated by the model.

5. Performance & Utility

Provides metadata to the user to improve the consumption experience.

PropertyExample ResultLogic
Reading Time"Estimated reading time: 2 mins"Word Count / 200 WPM

Summary of Actions

When a guardrail is triggered in the Output Policy, the platform can take three primary actions:

  • Block: The user never sees the AI response; instead, they see the Fallback Message.
  • Mask/Redact: Sensitive parts (PII, Competitors) are replaced with symbols or placeholders.
  • Append: The response is delivered, but with a system-generated note (e.g., Reading Time or Factual Disclaimer).