Security Mechanisms¶

Defense API provides multiple layers of security to protect your LLM applications from attacks and privacy breaches. Each mechanism can be independently configured to match your security requirements.

Overview¶

The gateway implements four primary security mechanisms:

Spotlighting - Defend against prompt injection attacks
LLM-as-a-Judge - Content safety evaluation and filtering
PII Anonymization - Privacy protection through automated data masking
System Prompt Checking - Validation of system prompts

All mechanisms are applied transparently - your application code remains unchanged while requests are secured in transit.

1. Spotlighting (Prompt Injection Defense)¶

Purpose: Protect against indirect prompt injection attacks by clearly marking user-generated content.

How it works: Spotlighting transforms user inputs before sending them to the LLM, making it clear to the model which parts are instructions vs. user data. This prevents malicious content (e.g., from compromised websites or documents) from being interpreted as system instructions.

Methods¶

Delimit Method¶

Wraps user content with delimiter characters to create clear boundaries.

Configuration: - delimiter: Custom delimiter string (default: $$$)

Example:

Original: "Tell me about security"
Transformed: "$$$Tell me about security$$$"

Encode Method¶

Encodes user content using standard encoding schemes, instructing the LLM to decode before processing.

Configuration: - encoding: Choose from: - base64 (default) - Base64 encoding - rot13 - ROT13 cipher - binary - Binary encoding

Example (base64):

Original: "Tell me about security"
Transformed: "VGVsbCBtZSBhYm91dCBzZWN1cml0eQ=="

Mark Method¶

Replaces spaces with special Unicode markers to make tampering detectable.

Configuration: - marking: Custom Unicode marker character (default: \uE000)

Example:

Original: "Tell me about security"
Transformed: "Tell\uE000me\uE000about\uE000security"

Use Cases¶

Protecting AI agents that process web content
Securing document analysis workflows
Defending against indirect prompt injection in RAG systems

Performance Overhead¶

< 10ms - Spotlighting adds minimal latency to requests. The transformation is lightweight and performed locally in the gateway.

2. LLM-as-a-Judge (Content Safety Guard)¶

Purpose: Evaluate the safety of user prompts and LLM responses using an LLM-based classifier.

How it works: Before processing requests (and optionally after receiving responses), the gateway sends the content to a specialized guard model that evaluates safety across multiple risk categories. Unsafe content is blocked based on your configured policies.

Safety Levels¶

Medium Level¶

Blocks content rated as UNSAFE only. Allows controversial but not explicitly harmful content.

Use case: General-purpose applications with moderate content restrictions.

Strict Level¶

Blocks content rated as UNSAFE or CONTROVERSIAL. Provides maximum protection with tighter filtering.

Use case: Enterprise applications, educational platforms, customer-facing chatbots.

Blocked Categories¶

Configure which content categories to filter. Available categories include:

Violent Content - Violence, weapons, gore
Non-violent Illegal Acts - Fraud, theft, illegal activities
Sexual Content - Explicit sexual material
PII (Personally Identifiable Information) - Attempts to extract private data
Suicide & Self-Harm - Self-harm instructions or encouragement
Unethical Acts - Deception, manipulation, unethical behavior
Politically Sensitive Topics - Controversial political content
Copyright Violation - Requests for copyrighted material
Jailbreak - Attempts to bypass safety guardrails

Configuration Options¶

active: Enable/disable the guard
judge_level: medium or strict
blocked_categories: List of categories to block
judge_output: Whether to also evaluate LLM responses (not just inputs)

Example Flow¶

User sends: "How do I hack into a system?"
Guard evaluates → UNSAFE, category: Jailbreak
Request blocked with message: "I'm sorry, but I can't help with that. Your query is deemed unsafe in the category: Jailbreak"

Use Cases¶

Enterprise content filtering
Compliance with content policies
Protection against jailbreak attempts
Output validation for customer-facing applications

Performance Overhead¶

~170ms - The guard mechanism requires an LLM inference call to evaluate content safety. Overhead depends on the guard model's response time and content length. This is the most significant latency contributor among all security mechanisms.

3. PII Anonymization (Privacy Protection)¶

Purpose: Automatically detect and anonymize personally identifiable information in user inputs, then restore it in LLM responses.

How it works: The gateway analyzes incoming messages for PII using the Presidio framework, replaces detected entities with anonymized tokens, sends the anonymized content to the LLM, and then deanonymizes the response before returning it to the user. This ensures PII never reaches the LLM provider.

Anonymization Operators¶

Redact (default)¶

Removes PII entirely, replacing it with entity type labels.

Example:

Original: "My email is john@example.com"
Anonymized: "My email is [EMAIL_ADDRESS]"

Mask¶

Partially masks PII while preserving format.

Configuration: - chars_to_mask: Number of characters to mask - from_end: Mask from end (default: false, masks from start) - masking_char: Character to use for masking (default: *)

Example:

Original: "My SSN is 123-45-6789"
Anonymized: "My SSN is ***-**-6789"

Replace¶

Replaces PII with a synthetic value.

Configuration: - new_value: Replacement string

Example:

Original: "Call me at 555-1234"
Anonymized: "Call me at [PHONE]"

Encrypt¶

Encrypts PII with AES encryption, enabling exact restoration.

Requirements: - Must set PII_ENCRYPTION_KEY environment variable in dapi-core

Example:

Original: "My name is Alice"
Anonymized: "My name is [[RPL#v1#aGF3dGhvcm5l...]]"
LLM sees: "My name is [[RPL#v1#aGF3dGhvcm5l...]]"
Response: "Hello [[RPL#v1#aGF3dGhvcm5l...]], nice to meet you!"
Deanonymized: "Hello Alice, nice to meet you!"

Detected PII Entities¶

The system can detect and anonymize:

Person - Names of individuals
Email Address - Email addresses
Phone Number - Phone numbers in various formats
Credit Card - Credit card numbers
IP Address - IPv4 and IPv6 addresses
IBAN Code - Bank account numbers
Location - Physical addresses, cities, countries
DateTime - Dates and times that could identify individuals
US SSN - US Social Security Numbers
URL - URLs that may contain PII

Configuration Options¶

active: Enable/disable PII protection
score_threshold: Confidence threshold for detection (0.0-1.0)
entities: List of entity types to detect (if not specified, detects all)
operator: Anonymization method (redact, mask, replace, encrypt)
operator_config: Method-specific configuration (for mask/replace operators)

Privacy Flow Example¶

Input (from user):

"My email is alice@company.com and my phone is +1-555-0100. What should I do?"

Anonymized (sent to LLM):

"My email is [[RPL#v1#Xk9s...]] and my phone is [[RPL#v1#mQ31...]]. What should I do?"

LLM Response:

"I'll send the information to [[RPL#v1#Xk9s...]]. You can also be reached at [[RPL#v1#mQ31...]]."

Deanonymized (returned to user):

"I'll send the information to alice@company.com. You can also be reached at +1-555-0100."

Use Cases¶

GDPR/CCPA compliance
Protecting customer data from third-party LLM providers
Healthcare applications (HIPAA compliance)
Financial services
HR and recruitment systems

Performance Overhead¶

~800ms - PII detection and anonymization requires two API calls to the privacy service (analyze + anonymize). The overhead scales with text length and the number of detected entities. Deanonymization of responses adds negligible additional latency.

4. System Prompt Checking¶

Purpose: Validate and sanitize system prompts before they're sent to LLMs.

How it works: Checks system prompts for potential security issues or policy violations. The system analyzes the prompt content to identify problematic patterns, malicious instructions, or content that violates safety policies.

Use Cases¶

Application-defined system prompts - Validating instructions defined by your application
User-customizable prompts - Checking prompts that users can modify or configure
MCP server tool descriptions - Protecting against malicious tool descriptions from Model Context Protocol (MCP) servers, as these descriptions are injected into the system prompt
Third-party integrations - Validating prompts from external services or plugins
Multi-tenant environments - Ensuring tenant-specific system prompts comply with policies

Performance Overhead¶

~1 second (first request only) - System prompt checking adds approximately 1 second of overhead, but this occurs only once per unique system prompt. The validation result is cached, so subsequent requests using the same system prompt experience almost no execution overhead during the conversation. This makes the mechanism highly efficient for typical usage patterns where system prompts remain constant across multiple interactions.

Combining Security Mechanisms¶

All security mechanisms can be used together for defense-in-depth:

Example Configuration:

firewall_config = {
    "spotlighting": {
        "active": True,
        "method": "delimit",
        "delimiter": "$$$"
    },
    "llm_judge": {
        "active": True,
        "judge_level": "strict",
        "judge_output": True,
        "blocked_categories": ["Jailbreak", "PII", "Violent Content"]
    },
    "privacy": {
        "active": True,
        "operator": "encrypt",
        "score_threshold": 0.5
    }
}

Request Flow: 1. PII Anonymization - User PII is encrypted 2. Content Safety - Guard evaluates anonymized content 3. Spotlighting - Content is delimited for injection defense 4. LLM Processing - Request sent to provider 5. Output Safety - Guard evaluates LLM response (if enabled) 6. Deanonymization - PII is restored in response

Performance & Overhead¶

Each security mechanism adds processing time to requests. Understanding these overheads helps you balance security requirements with latency constraints.

Latency by Mechanism¶

Mechanism	Approximate Overhead	Notes
Spotlighting	< 10ms	Lightweight local transformation with minimal impact
PII Anonymization	~800ms	Two API calls (analyze + anonymize); scales with text length
LLM-as-a-Judge	~170ms	Requires LLM inference; most significant latency contributor
System Prompt Checking	~1s (first request only)	Result is cached; negligible overhead for subsequent requests

Factors Affecting Performance¶

Actual overhead varies based on:

Content length - Longer texts require more processing time (especially for PII detection)
Number of active mechanisms - Mechanisms run sequentially, so overhead is cumulative
Configuration complexity - More PII entity types or stricter guard policies increase processing time
Network latency - Calls to dapi-core service are subject to network conditions
Guard model selection - Different LLM guard models have varying inference speeds

Example Combined Overhead¶

Configuration: All mechanisms enabled (PII + Guard + Spotlighting)

Expected latency: - PII Anonymization: ~ 100 - 800ms (depending on number of pii entities) - LLM-as-a-Judge (input): ~160ms - Spotlighting: <10ms - Total overhead: ~1s (before actual LLM call)

If judge_output is enabled, add another ~160ms to evaluate the LLM response.

Optimization Tips¶

Disable unused mechanisms - Only enable what you need
Skip output judging - Set judge_output: false if input filtering is sufficient
Adjust PII threshold - Higher score_threshold reduces false positives and processing time
Selective categories - Block only high-risk guard categories instead of all
Use delimit method - Fastest spotlighting option for most use cases

Configuration via Dashboard¶

All security mechanisms can be configured through the Defense API Dashboard:

Navigate to Firewalls section
Select or create a firewall
Enable desired security mechanisms under Protection settings
Configure mechanism-specific options
Save and test with generated cURL commands

For detailed configuration instructions, see the Firewall Configuration Guide.

Best Practices¶

Spotlighting¶

Use delimit for general-purpose protection with minimal overhead
Use encode when processing content from untrusted sources
Use mark for maximum detectability in specialized scenarios

LLM-as-a-Judge¶

Start with medium level and specific categories for most applications
Use strict level for high-risk or customer-facing deployments
Enable judge_output to validate LLM responses in addition to inputs
Regularly review blocked requests to tune category selections

PII Anonymization¶

Use encrypt operator when exact restoration is needed
Use redact for maximum privacy (one-way anonymization)
Set appropriate score_threshold (0.5-0.7 recommended) to balance precision/recall
Test with sample data to verify entity detection accuracy

General¶

Enable only the mechanisms you need to minimize latency
Monitor analytics to understand performance impact
Test configurations in development before production deployment
Combine mechanisms thoughtfully based on your threat model

Need Help?¶

Documentation: Getting Started Guide
Dashboard: app.dapi.smart-labs.ai
Support: Contact the Smart Labs AI team
Research: Latest security research