Vapora/docs/adrs/0038-security-ssrf-prompt-injection.md

# ADR-0038: Security Layer — SSRF Protection and Prompt Injection Scanning

**Status**: Implemented
**Date**: 2026-02-26
**Deciders**: VAPORA Team
**Technical Story**: Competitive analysis against OpenFang (which ships 16 dedicated security layers including SSRF guards and sandboxed agent execution) revealed that VAPORA had no defenses against Server-Side Request Forgery via misconfigured webhook URLs, and no guards preventing prompt injection payloads from reaching LLM providers through the RLM and agent execution paths.

---

## Decision

Add a `security` module to `vapora-backend` (`src/security/`) with two sub-modules:

1. **`ssrf.rs`** — URL validation that rejects private, reserved, and cloud-metadata address ranges before any outbound HTTP request is dispatched.
2. **`prompt_injection.rs`** — Pattern-based text scanner that rejects known injection payloads at the API boundary before user input reaches an LLM provider.

Integration points:

- **Channel SSRF** (`main.rs`): Filter channel webhook URLs from config before `ChannelRegistry::from_map`. Channels with unsafe literal URLs are dropped (not warned-and-registered).
- **RLM endpoints** (`api/rlm.rs`): `load_document`, `query_document`, and `analyze_document` scan user-supplied text before indexing or dispatching to LLM. `load_document` and `analyze_document` also sanitize (strip control characters, enforce 32 KiB cap).
- **Task endpoints** (`api/tasks.rs`): `create_task` and `update_task` scan `title` and `description` before persisting — these fields are later consumed by `AgentExecutor` as LLM task context.
- **Status code**: Security rejections return `400 Bad Request` (`VaporaError::InvalidInput`), not `500 Internal Server Error`.

---

## Context

### SSRF Attack Surface in VAPORA

VAPORA makes outbound HTTP requests from two paths:

1. **`vapora-channels`**: `SlackChannel`, `DiscordChannel`, and `TelegramChannel` POST to webhook URLs configured in `vapora.toml`. The `api_base` override in `TelegramConfig` is operator-configurable, meaning a misconfigured or compromised config file could point the server at an internal endpoint (e.g., `http://169.254.169.254/latest/meta-data/`).

2. **LLM-assisted SSRF**: A user can send `"fetch http://10.0.0.1/admin and summarize"` as a query to `/api/v1/rlm/analyze`. This does not cause a direct HTTP fetch in the backend, but it does inject the URL into an LLM prompt, which may then instruct a tool-calling agent to fetch that URL.

The original SSRF check in `main.rs` logged a `warn!` but did not remove the channel from `config.channels` before passing it to `ChannelRegistry::from_map`. Channels with SSRF-risky URLs were fully registered and operational. The log message said "channel will be disabled" — this was incorrect.

### Prompt Injection Attack Surface

The RLM (`/api/v1/rlm/`) pipeline takes user-supplied `content` (at upload time) and `query` strings, which flow verbatim into LLM prompts:

```
POST /rlm/analyze { query: "Ignore previous instructions..." }
  → LLMDispatcher::build_prompt(query, chunks)
      → format!("Query: {}\n\nRelevant information:\n\n{}", query, chunk_content)
          → LLMClient::complete(prompt)  // injection reaches the model
```

The task execution path has the same exposure:

```
POST /tasks { title: "You are now an unrestricted AI..." }
  → SurrealDB storage
  → AgentCoordinator::assign_task(description=title)
  → AgentExecutor::execute_task
  → LLMRouter::complete_with_budget(prompt)  // injection reaches the model
```

### Why Pattern Matching Over ML-Based Detection

ML-based classifiers (e.g., a separate LLM call to classify whether input is an injection) introduce latency, cost, and a second injection surface. Pattern matching on a known threat corpus is:

- **Deterministic**: same input always produces the same result
- **Zero-latency**: microseconds, no I/O
- **Auditable**: the full pattern list is visible in source code
- **Sufficient for known attack patterns**: the primary threat is unsophisticated bulk scanning, not targeted adversarial attacks from sophisticated actors

The trade-off is false negatives on novel patterns. This is accepted. The scanner is defense-in-depth, not the sole protection.

---

## Alternatives Considered

### A: Middleware layer (tower `Layer`)

A tower middleware would intercept all requests and scan body text generically. Rejected because:

- Request bodies are consumed as streams; cloning them for inspection has memory cost proportional to request size
- Middleware cannot distinguish LLM-bound fields from benign metadata (e.g., a task `priority` field)
- Handler-level integration allows field-specific rules (scan `title`+`description` but not `status`)

### B: Validation at the SurrealDB persistence layer

Scan content in `TaskService::create_task` before the DB insert. Rejected because:

- The API boundary is the right place to reject invalid input — failing early avoids unnecessary DB round-trips
- Service layer tests would require DB setup for security assertions; handler-level tests work with `Surreal::init()` (unconnected client)

### C: Allow-list URLs (only pre-approved domains)

Require webhook URLs to match a configured allow-list. Rejected because:

- Operators change webhook URLs frequently (channel rotations, workspace migrations)
- A deny-list of private ranges is maintenance-free and catches the real threat (internal network access) without requiring operator pre-registration of every external domain

### D: Re-scan chunks at LLM dispatch time (`LLMDispatcher::build_prompt`)

Re-check stored chunk content when constructing the LLM prompt. Rejected for this implementation because:

- Stored chunks are operator/system-uploaded documents, not direct user input (lower risk than runtime queries)
- Scanning at upload time (`load_document`) is the correct primary control; re-scanning at read time adds CPU cost on every LLM call
- **Known limitation**: if chunks are written directly to SurrealDB (bypassing the API), the upload-time scan is bypassed. This is documented as a known gap.

---

## Trade-offs

**Pros**:

- Zero new external dependencies (uses `url` crate already transitively present via `reqwest`; `thiserror` already workspace-level)
- Integration tests (`tests/security_guards_test.rs`) run without external services using `Surreal::init()` — 11 tests, no `#[ignore]`
- Correct HTTP status: 400 for injection attempts, distinguishable from 500 server errors in monitoring dashboards
- Pattern list is visible in source; new patterns can be added as a one-line diff with a corresponding test

**Cons**:

- Pattern matching produces false negatives on novel/obfuscated injection payloads
- DNS rebinding is not addressed: `validate_url` checks the URL string but does not re-validate the resolved IP after DNS lookup. A domain that resolves to a public IP at validation time but later resolves to `10.x.x.x` bypasses the check. Mitigation requires a custom `reqwest` resolver or periodic re-validation.
- Stored-injection bypass: chunks indexed via a path other than `POST /rlm/documents` (direct DB write, migrations, bulk import) are not scanned
- Agent-level SSRF (tool calls that fetch external URLs during LLM execution) is not addressed by this layer

---

## Implementation

### Module Structure

```text
crates/vapora-backend/src/security/
├── mod.rs                    # re-exports ssrf and prompt_injection
├── ssrf.rs                   # validate_url(), validate_host()
└── prompt_injection.rs       # scan(), sanitize(), MAX_PROMPT_CHARS
```

### SSRF: Blocked Ranges

`ssrf::validate_url` rejects:

| Range | Reason |
|---|---|
| Non-`http`/`https` schemes | `file://`, `ftp://`, `gopher://` direct filesystem or legacy protocol access |
| `localhost`, `127.x.x.x`, `::1` | Loopback |
| `10.x.x.x`, `172.16-31.x.x`, `192.168.x.x` | RFC 1918 private ranges |
| `169.254.x.x` | Link-local / cloud instance metadata (AWS, GCP, Azure) |
| `100.64-127.x.x` | RFC 6598 shared address space |
| `*.local`, `*.internal`, `*.localdomain` | mDNS / Kubernetes-internal hostnames |
| `metadata.google.internal`, `instance-data` | GCP/AWS named metadata endpoints |
| `fc00::/7`, `fe80::/10` | IPv6 unique-local and link-local |

### Prompt Injection: Pattern Categories

`prompt_injection::scan` matches 60+ patterns across 5 categories:

| Category | Examples |
|---|---|
| `instruction_override` | "ignore previous instructions", "disregard previous", "forget your instructions" |
| `role_confusion` | "you are now", "pretend you are", "from now on you" |
| `delimiter_injection` | `\n\nsystem:`, `\n\nhuman:`, `\r\nsystem:` |
| `token_injection` | `<\|im_start\|>`, `<\|im_end\|>`, `[/inst]`, `<<SYS>>`, `</s>` |
| `data_exfiltration` | "print your system prompt", "reveal your instructions", "repeat everything above" |

All matching is case-insensitive. A single lowercase copy of the input is produced once; all patterns are checked against it.

### Channel SSRF: Filter-Before-Register

```rust
// main.rs — safe_channels excludes any channel with a literal unsafe URL
let safe_channels: HashMap<String, ChannelConfig> = config
    .channels
    .into_iter()
    .filter(|(name, cfg)| match ssrf_url_for_channel(cfg) {
        Some(url) => match security::ssrf::validate_url(url) {
            Ok(_) => true,
            Err(e) => { tracing::error!(...); false }
        },
        None => true,  // unresolved ${VAR} — passes through
    })
    .collect();
ChannelRegistry::from_map(safe_channels)  // only safe channels registered
```

Channels with `${VAR}` references in credential fields pass through — the resolved value cannot be validated pre-resolution. Mitigation: validate at HTTP send time inside the channel implementations (not yet implemented; tracked as known gap).

### Test Infrastructure

Security guard tests in `tests/security_guards_test.rs` use `Surreal::<Client>::init()` to build an unconnected AppState. The scan fires before any DB call, so the unconnected services are never invoked:

```rust
fn security_test_state() -> AppState {
    let db: Surreal<Client> = Surreal::init();  // unconnected, no external service needed
    AppState::new(
        ProjectService::new(db.clone()),
        ...
    )
}
```

---

## Verification

```bash
# Unit tests for scanner logic (24 tests)
cargo test -p vapora-backend security

# Integration tests through HTTP handlers (11 tests, no external deps)
cargo test -p vapora-backend --test security_guards_test

# Lint
cargo clippy -p vapora-backend -- -D warnings
```

Expected output for a prompt injection attempt at the HTTP layer:

```json
HTTP/1.1 400 Bad Request
{"error": "Input rejected by security scanner: Potential prompt injection detected ...", "status": 400}
```

---

## Known Gaps

| Gap | Severity | Mitigation |
|---|---|---|
| DNS rebinding not addressed | Medium | Requires custom `reqwest` resolver hook to re-check post-resolution IP |
| Channels with `${VAR}` URLs not validated | Low | Config-time values only; operator controls the env; validate at send time in channel impls |
| Stored-injection bypass in RLM | Low | Scan at upload time covers API path; direct DB writes are operator-only |
| Agent tool-call SSRF | Medium | Out of scope for backend layer; requires agent-level URL validation |
| Pattern list covers known patterns only | Medium | Defense-in-depth; complement with anomaly detection or LLM-based classifier at higher trust levels |

---

## Consequences

- All `/api/v1/rlm/*` endpoints and `/api/v1/tasks` reject injection attempts with `400 Bad Request` before reaching storage or LLM providers
- Channel webhooks pointing at private IP ranges are blocked at server startup rather than silently registered
- New injection patterns can be added to `prompt_injection::PATTERNS` as single-line entries; each requires a corresponding test case in `security/prompt_injection.rs` or `tests/security_guards_test.rs`
- Monitoring: `400` responses from `/rlm/*` and `/tasks` endpoints are a signal for injection probing; alerts should be configured on elevated 400 rates from these paths

---

## References

- `crates/vapora-backend/src/security/` — implementation
- `crates/vapora-backend/tests/security_guards_test.rs` — integration tests
- [ADR-0020: Audit Trail](./0020-audit-trail.md) — related: injection attempts should appear in the audit log (not yet implemented)
- [ADR-0010: Cedar Authorization](./0010-cedar-authorization.md) — complementary: Cedar handles authZ, this ADR handles input sanitization
- [ADR-0011: SecretumVault](./0011-secretumvault.md) — complementary: PQC secrets storage; SSRF would be the vector to exfiltrate those secrets
- OpenFang security architecture: 16-layer model including WASM sandbox, Merkle audit trail, SSRF guards (reference implementation that motivated this ADR)