Security Mechanisms¶
Defense API provides multiple layers of security to protect your LLM applications from attacks and privacy breaches. Each mechanism can be independently configured to match your security requirements.
Overview¶
The gateway implements four primary security mechanisms:
- Spotlighting - Defend against prompt injection attacks
- LLM-as-a-Judge - Content safety evaluation and filtering
- PII Anonymization - Privacy protection through automated data masking
- System Prompt Checking - Validation of system prompts
All mechanisms are applied transparently - your application code remains unchanged while requests are secured in transit.
1. Spotlighting (Prompt Injection Defense)¶
Purpose: Protect against indirect prompt injection attacks by clearly marking user-generated content.
How it works: Spotlighting transforms user inputs before sending them to the LLM, making it clear to the model which parts are instructions vs. user data. This prevents malicious content (e.g., from compromised websites or documents) from being interpreted as system instructions.
Methods¶
Delimit Method¶
Wraps user content with delimiter characters to create clear boundaries.
Configuration: - delimiter: Custom delimiter string (default: $$$)
Example:
Encode Method¶
Encodes user content using standard encoding schemes, instructing the LLM to decode before processing.
Configuration: - encoding: Choose from: - base64 (default) - Base64 encoding - rot13 - ROT13 cipher - binary - Binary encoding
Example (base64):
Mark Method¶
Replaces spaces with special Unicode markers to make tampering detectable.
Configuration: - marking: Custom Unicode marker character (default: \uE000)
Example:
Use Cases¶
- Protecting AI agents that process web content
- Securing document analysis workflows
- Defending against indirect prompt injection in RAG systems
Performance Overhead¶
< 10ms - Spotlighting adds minimal latency to requests. The transformation is lightweight and performed locally in the gateway.
2. LLM-as-a-Judge (Content Safety Guard)¶
Purpose: Evaluate the safety of user prompts and LLM responses using an LLM-based classifier.
How it works: Before processing requests (and optionally after receiving responses), the gateway sends the content to a specialized guard model that evaluates safety across multiple risk categories. Unsafe content is blocked based on your configured policies.
Safety Levels¶
Medium Level¶
Blocks content rated as UNSAFE only. Allows controversial but not explicitly harmful content.
Use case: General-purpose applications with moderate content restrictions.
Strict Level¶
Blocks content rated as UNSAFE or CONTROVERSIAL. Provides maximum protection with tighter filtering.
Use case: Enterprise applications, educational platforms, customer-facing chatbots.
Blocked Categories¶
Configure which content categories to filter. Available categories include:
- Violent Content - Violence, weapons, gore
- Non-violent Illegal Acts - Fraud, theft, illegal activities
- Sexual Content - Explicit sexual material
- PII (Personally Identifiable Information) - Attempts to extract private data
- Suicide & Self-Harm - Self-harm instructions or encouragement
- Unethical Acts - Deception, manipulation, unethical behavior
- Politically Sensitive Topics - Controversial political content
- Copyright Violation - Requests for copyrighted material
- Jailbreak - Attempts to bypass safety guardrails
Configuration Options¶
active: Enable/disable the guardjudge_level:mediumorstrictblocked_categories: List of categories to blockjudge_output: Whether to also evaluate LLM responses (not just inputs)
Example Flow¶
- User sends: "How do I hack into a system?"
- Guard evaluates → UNSAFE, category: Jailbreak
- Request blocked with message: "I'm sorry, but I can't help with that. Your query is deemed unsafe in the category: Jailbreak"
Use Cases¶
- Enterprise content filtering
- Compliance with content policies
- Protection against jailbreak attempts
- Output validation for customer-facing applications
Performance Overhead¶
~170ms - The guard mechanism requires an LLM inference call to evaluate content safety. Overhead depends on the guard model's response time and content length. This is the most significant latency contributor among all security mechanisms.
3. PII Anonymization (Privacy Protection)¶
Purpose: Automatically detect and anonymize personally identifiable information in user inputs, then restore it in LLM responses.
How it works: The gateway analyzes incoming messages for PII using the Presidio framework, replaces detected entities with anonymized tokens, sends the anonymized content to the LLM, and then deanonymizes the response before returning it to the user. This ensures PII never reaches the LLM provider.
Anonymization Operators¶
Redact (default)¶
Removes PII entirely, replacing it with entity type labels.
Example:
Mask¶
Partially masks PII while preserving format.
Configuration: - chars_to_mask: Number of characters to mask - from_end: Mask from end (default: false, masks from start) - masking_char: Character to use for masking (default: *)
Example:
Replace¶
Replaces PII with a synthetic value.
Configuration: - new_value: Replacement string
Example:
Encrypt¶
Encrypts PII with AES encryption, enabling exact restoration.
Requirements: - Must set PII_ENCRYPTION_KEY environment variable in dapi-core
Example:
Original: "My name is Alice"
Anonymized: "My name is [[RPL#v1#aGF3dGhvcm5l...]]"
LLM sees: "My name is [[RPL#v1#aGF3dGhvcm5l...]]"
Response: "Hello [[RPL#v1#aGF3dGhvcm5l...]], nice to meet you!"
Deanonymized: "Hello Alice, nice to meet you!"
Detected PII Entities¶
The system can detect and anonymize:
- Person - Names of individuals
- Email Address - Email addresses
- Phone Number - Phone numbers in various formats
- Credit Card - Credit card numbers
- IP Address - IPv4 and IPv6 addresses
- IBAN Code - Bank account numbers
- Location - Physical addresses, cities, countries
- DateTime - Dates and times that could identify individuals
- US SSN - US Social Security Numbers
- URL - URLs that may contain PII
Configuration Options¶
active: Enable/disable PII protectionscore_threshold: Confidence threshold for detection (0.0-1.0)entities: List of entity types to detect (if not specified, detects all)operator: Anonymization method (redact,mask,replace,encrypt)operator_config: Method-specific configuration (for mask/replace operators)
Privacy Flow Example¶
Input (from user):
Anonymized (sent to LLM):
LLM Response:
Deanonymized (returned to user):
Use Cases¶
- GDPR/CCPA compliance
- Protecting customer data from third-party LLM providers
- Healthcare applications (HIPAA compliance)
- Financial services
- HR and recruitment systems
Performance Overhead¶
~800ms - PII detection and anonymization requires two API calls to the privacy service (analyze + anonymize). The overhead scales with text length and the number of detected entities. Deanonymization of responses adds negligible additional latency.
4. System Prompt Checking¶
Purpose: Validate and sanitize system prompts before they're sent to LLMs.
How it works: Checks system prompts for potential security issues or policy violations. The system analyzes the prompt content to identify problematic patterns, malicious instructions, or content that violates safety policies.
Use Cases¶
- Application-defined system prompts - Validating instructions defined by your application
- User-customizable prompts - Checking prompts that users can modify or configure
- MCP server tool descriptions - Protecting against malicious tool descriptions from Model Context Protocol (MCP) servers, as these descriptions are injected into the system prompt
- Third-party integrations - Validating prompts from external services or plugins
- Multi-tenant environments - Ensuring tenant-specific system prompts comply with policies
Performance Overhead¶
~1 second (first request only) - System prompt checking adds approximately 1 second of overhead, but this occurs only once per unique system prompt. The validation result is cached, so subsequent requests using the same system prompt experience almost no execution overhead during the conversation. This makes the mechanism highly efficient for typical usage patterns where system prompts remain constant across multiple interactions.
Combining Security Mechanisms¶
All security mechanisms can be used together for defense-in-depth:
Example Configuration:
firewall_config = {
"spotlighting": {
"active": True,
"method": "delimit",
"delimiter": "$$$"
},
"llm_judge": {
"active": True,
"judge_level": "strict",
"judge_output": True,
"blocked_categories": ["Jailbreak", "PII", "Violent Content"]
},
"privacy": {
"active": True,
"operator": "encrypt",
"score_threshold": 0.5
}
}
Request Flow: 1. PII Anonymization - User PII is encrypted 2. Content Safety - Guard evaluates anonymized content 3. Spotlighting - Content is delimited for injection defense 4. LLM Processing - Request sent to provider 5. Output Safety - Guard evaluates LLM response (if enabled) 6. Deanonymization - PII is restored in response
Performance & Overhead¶
Each security mechanism adds processing time to requests. Understanding these overheads helps you balance security requirements with latency constraints.
Latency by Mechanism¶
| Mechanism | Approximate Overhead | Notes |
|---|---|---|
| Spotlighting | < 10ms | Lightweight local transformation with minimal impact |
| PII Anonymization | ~800ms | Two API calls (analyze + anonymize); scales with text length |
| LLM-as-a-Judge | ~170ms | Requires LLM inference; most significant latency contributor |
| System Prompt Checking | ~1s (first request only) | Result is cached; negligible overhead for subsequent requests |
Factors Affecting Performance¶
Actual overhead varies based on:
- Content length - Longer texts require more processing time (especially for PII detection)
- Number of active mechanisms - Mechanisms run sequentially, so overhead is cumulative
- Configuration complexity - More PII entity types or stricter guard policies increase processing time
- Network latency - Calls to dapi-core service are subject to network conditions
- Guard model selection - Different LLM guard models have varying inference speeds
Example Combined Overhead¶
Configuration: All mechanisms enabled (PII + Guard + Spotlighting)
Expected latency: - PII Anonymization: ~ 100 - 800ms (depending on number of pii entities) - LLM-as-a-Judge (input): ~160ms - Spotlighting: <10ms - Total overhead: ~1s (before actual LLM call)
If judge_output is enabled, add another ~160ms to evaluate the LLM response.
Optimization Tips¶
- Disable unused mechanisms - Only enable what you need
- Skip output judging - Set
judge_output: falseif input filtering is sufficient - Adjust PII threshold - Higher
score_thresholdreduces false positives and processing time - Selective categories - Block only high-risk guard categories instead of all
- Use delimit method - Fastest spotlighting option for most use cases
Configuration via Dashboard¶
All security mechanisms can be configured through the Defense API Dashboard:
- Navigate to Firewalls section
- Select or create a firewall
- Enable desired security mechanisms under Protection settings
- Configure mechanism-specific options
- Save and test with generated cURL commands
For detailed configuration instructions, see the Firewall Configuration Guide.
Best Practices¶
Spotlighting¶
- Use delimit for general-purpose protection with minimal overhead
- Use encode when processing content from untrusted sources
- Use mark for maximum detectability in specialized scenarios
LLM-as-a-Judge¶
- Start with medium level and specific categories for most applications
- Use strict level for high-risk or customer-facing deployments
- Enable
judge_outputto validate LLM responses in addition to inputs - Regularly review blocked requests to tune category selections
PII Anonymization¶
- Use encrypt operator when exact restoration is needed
- Use redact for maximum privacy (one-way anonymization)
- Set appropriate
score_threshold(0.5-0.7 recommended) to balance precision/recall - Test with sample data to verify entity detection accuracy
General¶
- Enable only the mechanisms you need to minimize latency
- Monitor analytics to understand performance impact
- Test configurations in development before production deployment
- Combine mechanisms thoughtfully based on your threat model
Need Help?¶
- Documentation: Getting Started Guide
- Dashboard: app.dapi.smart-labs.ai
- Support: Contact the Smart Labs AI team
- Research: Latest security research