Some checks failed
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Add vapora-channels crate with trait-based Slack/Discord/Telegram webhook
delivery. ${VAR}/${VAR:-default} interpolation is mandatory inside
ChannelRegistry::from_config — callers cannot bypass secret resolution.
Fire-and-forget dispatch via tokio::spawn in both vapora-workflow-engine
(four lifecycle events) and vapora-backend (task Done, proposal approve/reject).
New REST endpoints: GET /channels, POST /channels/:name/test.
dispatch_notifications extracted as pub(crate) fn for inline testability;
5 handler tests + 6 workflow engine tests + 7 secret resolution unit tests.
Closes: vapora-channels bootstrap, notification gap in workflow/backend layer
ADR: docs/adrs/0035-notification-channels.md
160 lines
8.3 KiB
Markdown
160 lines
8.3 KiB
Markdown
# ADR-0035: Webhook-Based Notification Channels — `vapora-channels` Crate
|
||
|
||
**Status**: Implemented
|
||
**Date**: 2026-02-26
|
||
**Deciders**: VAPORA Team
|
||
**Technical Story**: Workflow events (task completion, proposal approve/reject, schedule fires) had no outbound delivery path; operators had to poll the API to learn about state changes.
|
||
|
||
---
|
||
|
||
## Decision
|
||
|
||
Introduce a dedicated `vapora-channels` crate implementing a **trait-based webhook delivery layer** with:
|
||
|
||
1. `NotificationChannel` trait — single `send(&Message) -> Result<()>` method; consumers implement HTTP webhooks (Slack, Discord, Telegram) without vendor SDK dependencies.
|
||
2. `ChannelRegistry` — name-keyed routing hub; `from_config(HashMap<String, ChannelConfig>)` builds the registry from TOML config, resolving secrets at construction time.
|
||
3. `${VAR}` / `${VAR:-default}` interpolation **inside the library** — secret resolution is mandatory and cannot be bypassed by callers.
|
||
4. Fire-and-forget delivery at both layers: `AppState::notify` (backend) and `WorkflowOrchestrator::notify_*` (workflow engine) spawn background tasks; delivery failures are `warn!`-logged and never surface to API callers.
|
||
5. Per-event routing config (`NotificationConfig`) maps event names to channel-name lists, not hardcoded channel identifiers.
|
||
|
||
---
|
||
|
||
## Context
|
||
|
||
### Gaps Addressed
|
||
|
||
| Gap | Consequence |
|
||
|-----|-------------|
|
||
| No outbound event delivery | Operators must poll 40+ API endpoints to detect state changes |
|
||
| Secrets in TOML as plain strings | If resolution is left to callers, a `${SLACK_WEBHOOK_URL}` placeholder reaches the HTTP layer verbatim when the caller forgets to interpolate |
|
||
| Tight vendor coupling | Using `slack-rs` / `serenity` locks the feature to specific Slack/Discord API versions and transitive dependency trees |
|
||
|
||
### Why `NotificationChannel` Trait Over Vendor SDKs
|
||
|
||
Slack, Discord, and Telegram all accept a simple `POST` with a JSON body to a webhook URL — no OAuth, no persistent connection, no stateful session. A trait with one async method covers all three with less than 50 lines per implementation. Vendor SDKs add 200–500 kB of transitive dependencies and introduce breaking changes on provider API updates.
|
||
|
||
### Why Secret Resolution in the Library
|
||
|
||
Placing the responsibility on the caller creates a **TOFU gap**: the first time any caller forgets to call `resolve_secrets()` before constructing `ChannelRegistry`, a raw `${SLACK_WEBHOOK_URL}` string is sent to Slack's API as the URL. The request fails silently (Slack returns 404 or 400), the placeholder leaks in logs, and no compile-time or runtime warning is raised until a log is inspected.
|
||
|
||
Moving interpolation into `ChannelRegistry::from_config` makes it **structurally impossible to construct a registry with unresolved secrets**: `ChannelError::SecretNotFound(var_name)` is returned immediately if an env var is absent and no default is provided. There is no non-error path that bypasses resolution.
|
||
|
||
### Why Fire-and-Forget With `tokio::spawn`
|
||
|
||
Notification delivery is a best-effort side-effect, not part of the request/response contract. A Slack outage should not cause a `POST /api/v1/proposals/:id/approve` to return 500. Spawning an independent task decouples delivery latency from API latency; `warn!` logging provides observability without blocking the caller.
|
||
|
||
---
|
||
|
||
## Implementation
|
||
|
||
### Crate Structure (`vapora-channels`)
|
||
|
||
```
|
||
vapora-channels/
|
||
├── src/
|
||
│ ├── lib.rs — pub re-exports (ChannelRegistry, Message, NotificationChannel)
|
||
│ ├── channel.rs — NotificationChannel trait
|
||
│ ├── config.rs — ChannelsConfig, ChannelConfig, SlackConfig/DiscordConfig/TelegramConfig
|
||
│ │ resolve_secrets() chain + interpolate() with OnceLock<Regex>
|
||
│ ├── error.rs — ChannelError: NotFound, ApiError, SecretNotFound, SerializationError
|
||
│ ├── message.rs — Message { title, body, level: Info|Success|Warning|Error }
|
||
│ ├── registry.rs — ChannelRegistry: name → Arc<dyn NotificationChannel>
|
||
│ └── webhooks/
|
||
│ ├── slack.rs — SlackChannel: POST IncomingWebhook JSON
|
||
│ ├── discord.rs — DiscordChannel: POST Webhook embed JSON
|
||
│ └── telegram.rs— TelegramChannel: POST bot sendMessage JSON
|
||
```
|
||
|
||
### Secret Resolution
|
||
|
||
```
|
||
interpolate(s: &str) -> Result<String>:
|
||
regex: \$\{([^}:]+)(?::-(.*?))?\} (compiled once via OnceLock)
|
||
fast-path: if !s.contains("${") { return Ok(s) }
|
||
for each capture:
|
||
var_name = capture[1]
|
||
default = capture[2] (optional)
|
||
match env::var(var_name):
|
||
Ok(v) → replace placeholder with v
|
||
Err(_) → if default.is_some(): replace with default
|
||
else: return Err(SecretNotFound(var_name))
|
||
```
|
||
|
||
`resolve_secrets()` is called unconditionally in `ChannelRegistry::from_config` — single mandatory call site, no consumer bypass.
|
||
|
||
### Integration Points
|
||
|
||
#### `vapora-workflow-engine`
|
||
|
||
`WorkflowConfig.notifications: WorkflowNotifications` maps four events to channel-name lists:
|
||
|
||
```toml
|
||
[workflows.myflow.notifications]
|
||
on_stage_complete = ["team-slack"]
|
||
on_stage_failed = ["team-slack", "ops-discord"]
|
||
on_completed = ["team-slack"]
|
||
on_cancelled = ["ops-discord"]
|
||
```
|
||
|
||
`WorkflowOrchestrator` holds `Option<Arc<ChannelRegistry>>` and calls `notify_stage_complete`, `notify_stage_failed`, `notify_completed`, `notify_cancelled` — each spawns `dispatch_notifications`.
|
||
|
||
#### `vapora-backend`
|
||
|
||
`Config.channels: HashMap<String, ChannelConfig>` and `Config.notifications: NotificationConfig`:
|
||
|
||
```toml
|
||
[channels.team-slack]
|
||
type = "slack"
|
||
webhook_url = "${SLACK_WEBHOOK_URL}"
|
||
|
||
[notifications]
|
||
on_task_done = ["team-slack"]
|
||
on_proposal_approved = ["team-slack", "ops-discord"]
|
||
on_proposal_rejected = ["ops-discord"]
|
||
```
|
||
|
||
`AppState` gains `channel_registry: Option<Arc<ChannelRegistry>>` and `notification_config: Arc<NotificationConfig>`. Hooks in three existing handlers:
|
||
|
||
- `update_task_status` — fires `Message::success` on `TaskStatus::Done`
|
||
- `approve_proposal` — fires `Message::success`
|
||
- `reject_proposal` — fires `Message::warning`
|
||
|
||
#### New REST Endpoints
|
||
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| `GET` | `/api/v1/channels` | List registered channel names |
|
||
| `POST` | `/api/v1/channels/:name/test` | Send connectivity test; 200 OK / 404 / 502 |
|
||
|
||
### Testability
|
||
|
||
`dispatch_notifications` is extracted as `pub(crate) async fn` taking `Option<Arc<ChannelRegistry>>` directly, making it testable without a DB or a fully-constructed `AppState`. Five inline tests in `state.rs` use `RecordingChannel` (captures messages) and `FailingChannel` (returns 503 error) to verify:
|
||
|
||
1. No-op when registry is `None`
|
||
2. Single-channel delivery
|
||
3. Multi-channel broadcast
|
||
4. Resilience: delivery continues after one channel fails
|
||
5. Warn-log on unknown channel name, other channels still receive
|
||
|
||
---
|
||
|
||
## Consequences
|
||
|
||
### Positive
|
||
|
||
- Operators get real-time Slack/Discord/Telegram alerts on task completion, proposal decisions, and workflow lifecycle events.
|
||
- Adding a new channel type requires implementing one trait method and one TOML variant — no changes to routing or dispatch code.
|
||
- Secret resolution failures surface immediately at startup (if `ChannelRegistry::from_config` is called at boot), not silently at first delivery.
|
||
- Zero additional infrastructure: webhooks are outbound-only HTTP POSTs.
|
||
|
||
### Negative / Trade-offs
|
||
|
||
- Delivery is best-effort (fire-and-forget). A channel that is consistently down produces `warn!` logs but no alert escalation; consumers needing guaranteed delivery must implement their own retry loop or use a message queue.
|
||
- `${VAR}` interpolation uses `unsafe { std::env::set_var }` in tests (required by Rust 1.80 stabilization of the unsafety annotation). Tests set/unset env vars; multi-threaded test parallelism can cause flaky results if not isolated with `#[serial_test::serial]`.
|
||
- No per-channel rate limiting: a workflow that fires 1,000 stage-complete events will produce 1,000 Slack messages. Operators must configure `notifications` lists deliberately.
|
||
|
||
### Supersedes / Specializes
|
||
|
||
- Builds on `SecretumVault` pattern (ADR-0011) philosophy of never storing secrets as plain strings; specializes it to config-file webhook tokens.
|
||
- Parallel to `vapora-a2a-client`'s retry pattern (ADR-0030) — both handle external HTTP delivery, but channels are fire-and-forget while A2A requires confirmed response.
|