Where it fits
- Screening prompt traffic before it reaches a high-privilege model or agent tool.
- Adding security telemetry to moderation, support, coding, or research assistants.
- Testing whether a known jailbreak family still works after prompt or model changes.
Operational steps
- Send the candidate prompt, recent turns, policy name, and app surface to the detection endpoint.
- Use the response severity to block, review, log, or downscope the request.
- Replay known jailbreak families in staging before release.
- Feed confirmed misses back into a custom test pack for future regression scans.
Common risks
- A single-turn classifier misses gradual roleplay or authority-shifting attacks.
- The app detects toxic language but not attempts to override system instructions.
- Detection logs contain sensitive user text without minimization controls.
How PromptGuard Scan fits the workflow
PromptGuard Scan combines a maintained jailbreak library with scan reports and API responses that fit product telemetry, CI checks, and security review workflows.