Compare commits
2 Commits
b9e2cee9f7
...
027b8f2836
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
027b8f2836 | ||
|
|
bb55c80d2b |
1
.gitignore
vendored
1
.gitignore
vendored
@ -67,3 +67,4 @@ vendordiff.patch
|
|||||||
# Generated SBOM files
|
# Generated SBOM files
|
||||||
SBOM.*.json
|
SBOM.*.json
|
||||||
*.sbom.json
|
*.sbom.json
|
||||||
|
.claude/settings.local.json
|
||||||
|
|||||||
77
CHANGELOG.md
77
CHANGELOG.md
@ -7,6 +7,83 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|||||||
|
|
||||||
## [Unreleased]
|
## [Unreleased]
|
||||||
|
|
||||||
|
### Added - Webhook Notification Channels (`vapora-channels`)
|
||||||
|
|
||||||
|
#### `vapora-channels` — new crate
|
||||||
|
|
||||||
|
- `NotificationChannel` trait: single `async fn send(&Message) -> Result<()>` — no vendor SDK dependency
|
||||||
|
- Three webhook implementations: `SlackChannel` (Incoming Webhook), `DiscordChannel` (Webhook embed), `TelegramChannel` (Bot API `sendMessage`)
|
||||||
|
- `ChannelRegistry`: name-keyed routing hub; `from_config(HashMap<String, ChannelConfig>)` resolves secrets at construction time
|
||||||
|
- `Message { title, body, level }` — four constructors: `info`, `success`, `warning`, `error`
|
||||||
|
- **Secret resolution built-in**: `${VAR}` / `${VAR:-default}` interpolation via `OnceLock<Regex>` in `config.rs`; `ChannelError::SecretNotFound` if env var absent and no default — callers cannot bypass resolution
|
||||||
|
- `ChannelError`: `NotFound`, `ApiError { channel, status, body }`, `SecretNotFound`, `SerializationError`
|
||||||
|
- 7 unit tests for `interpolate()`: plain string (no-op fast-path), single var, default fallback, missing var error, nested vars, whitespace, multiple vars in one string
|
||||||
|
|
||||||
|
#### `vapora-workflow-engine` — notification hooks
|
||||||
|
|
||||||
|
- `WorkflowNotifications` struct in `config.rs`: `on_stage_complete`, `on_stage_failed`, `on_completed`, `on_cancelled` — each a `Vec<String>` of channel names
|
||||||
|
- `WorkflowConfig.notifications: WorkflowNotifications` (default: empty, no regression)
|
||||||
|
- `WorkflowOrchestrator` gains `Option<Arc<ChannelRegistry>>`; four `notify_*` methods spawn `dispatch_notifications`
|
||||||
|
- 6 new tests in `tests/notification_config.rs`: config parsing, all four event hooks, empty-targets no-op
|
||||||
|
|
||||||
|
#### `vapora-backend` — event hooks and REST endpoints
|
||||||
|
|
||||||
|
- `Config.channels: HashMap<String, ChannelConfig>` and `Config.notifications: NotificationConfig` (TOML config)
|
||||||
|
- `NotificationConfig { on_task_done, on_proposal_approved, on_proposal_rejected }` — per-event channel-name lists
|
||||||
|
- `AppState` gains `channel_registry: Option<Arc<ChannelRegistry>>` and `notification_config: Arc<NotificationConfig>`
|
||||||
|
- `AppState::notify(&[String], Message)` — fire-and-forget; `tokio::spawn(dispatch_notifications(...))`
|
||||||
|
- `pub(crate) async fn dispatch_notifications(Option<Arc<ChannelRegistry>>, Vec<String>, Message)` — extracted for testability without DB
|
||||||
|
- Notification hooks added to three existing handlers:
|
||||||
|
- `update_task_status` — `Message::success` when `TaskStatus::Done`
|
||||||
|
- `approve_proposal` — `Message::success`
|
||||||
|
- `reject_proposal` — `Message::warning`
|
||||||
|
- New endpoints: `GET /api/v1/channels` (list names), `POST /api/v1/channels/:name/test` (connectivity check)
|
||||||
|
- 5 unit tests in `state.rs`: `RecordingChannel` + `FailingChannel` test doubles; dispatch no-op, single delivery, multi-channel, resilience after failure, warn on unknown channel
|
||||||
|
|
||||||
|
#### Documentation
|
||||||
|
|
||||||
|
- **ADR-0035**: design rationale for trait-based channels, built-in secret resolution, and fire-and-forget delivery
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Added - Autonomous Scheduling: Timezone Support and Distributed Fire-Lock
|
||||||
|
|
||||||
|
#### `vapora-workflow-engine` — scheduling hardening
|
||||||
|
|
||||||
|
- **Timezone-aware cron evaluation** (`chrono-tz = "0.10"`):
|
||||||
|
- `ScheduledWorkflow.timezone: Option<String>` — IANA identifier stored per-schedule
|
||||||
|
- `compute_next_fire_at_tz(expr, tz)` / `compute_next_fire_after_tz(expr, after, tz)` — generic over `chrono_tz::Tz`; UTC fallback when `tz = None`
|
||||||
|
- `validate_timezone(tz)` — compile-time exhaustive IANA enum, rejects unknown identifiers
|
||||||
|
- `compute_fire_times_tz` in `scheduler.rs` — catch-up and normal firing both timezone-aware
|
||||||
|
- Config-load validation: `[workflows.schedule] timezone = "..."` validated at startup (fail-fast)
|
||||||
|
- **Distributed fire-lock** (SurrealDB document-level atomic CAS):
|
||||||
|
- `scheduled_workflows` gains `locked_by: option<string>` and `locked_at: option<datetime>` (migration 011)
|
||||||
|
- `ScheduleStore::try_acquire_fire_lock(id, instance_id, now)` — conditional `UPDATE ... WHERE locked_by IS NONE OR locked_at < $expiry`; returns `true` only if update succeeded (non-empty result = lock acquired)
|
||||||
|
- `ScheduleStore::release_fire_lock(id, instance_id)` — `WHERE locked_by = $instance_id` guard prevents stale release after TTL expiry
|
||||||
|
- `WorkflowScheduler.instance_id: String` — UUID generated at startup, identifies lock owner
|
||||||
|
- 120-second TTL: crashed instance's lock auto-expires within two scheduler ticks
|
||||||
|
- Lock acquired before `fire_with_lock`, released in `finally`-style block after (warn on release failure, TTL fallback)
|
||||||
|
- New tests: `test_validate_timezone_valid`, `test_validate_timezone_invalid`, `test_compute_next_fire_at_tz_utc`, `test_compute_next_fire_at_tz_named`, `test_compute_next_fire_at_tz_invalid_tz_fallback`, `test_compute_fires_with_catchup_named_tz`, `test_instance_id_is_unique`
|
||||||
|
- Test count: 48 (was 41)
|
||||||
|
|
||||||
|
#### `vapora-backend` — schedule REST API surface
|
||||||
|
|
||||||
|
- `ScheduleResponse`, `PutScheduleRequest`, `PatchScheduleRequest` gain `timezone: Option<String>`
|
||||||
|
- `validate_tz()` helper validates at API boundary → `400 InvalidInput` on unknown identifier
|
||||||
|
- `put_schedule` and `patch_schedule` use `compute_next_fire_at_tz` / `compute_next_fire_after_tz`
|
||||||
|
- `fire_schedule` uses `compute_next_fire_after_tz` with schedule's stored timezone
|
||||||
|
|
||||||
|
#### Migrations
|
||||||
|
|
||||||
|
- **`migrations/011_schedule_tz_lock.surql`**: `DEFINE FIELD timezone`, `locked_by`, `locked_at` on `scheduled_workflows`
|
||||||
|
|
||||||
|
#### Documentation
|
||||||
|
|
||||||
|
- **ADR-0034**: design rationale for `chrono-tz` selection and SurrealDB conditional UPDATE lock
|
||||||
|
- **`docs/features/workflow-orchestrator.md`**: Autonomous Scheduling section with TOML config, REST API table, timezone/distributed lock explanations, Prometheus metrics
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
### Added - Workflow Engine Hardening (Persistence · Saga · Cedar)
|
### Added - Workflow Engine Hardening (Persistence · Saga · Cedar)
|
||||||
|
|
||||||
#### `vapora-workflow-engine` — three new hardening layers
|
#### `vapora-workflow-engine` — three new hardening layers
|
||||||
|
|||||||
60
Cargo.lock
generated
60
Cargo.lock
generated
@ -1679,6 +1679,16 @@ dependencies = [
|
|||||||
"phf 0.11.3",
|
"phf 0.11.3",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "chrono-tz"
|
||||||
|
version = "0.10.4"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "a6139a8597ed92cf816dfb33f5dd6cf0bb93a6adc938f11039f371bc5bcd26c3"
|
||||||
|
dependencies = [
|
||||||
|
"chrono",
|
||||||
|
"phf 0.12.1",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "chrono-tz-build"
|
name = "chrono-tz-build"
|
||||||
version = "0.3.0"
|
version = "0.3.0"
|
||||||
@ -2236,6 +2246,17 @@ version = "1.2.0"
|
|||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
checksum = "790eea4361631c5e7d22598ecd5723ff611904e3344ce8720784c93e3d83d40b"
|
checksum = "790eea4361631c5e7d22598ecd5723ff611904e3344ce8720784c93e3d83d40b"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "cron"
|
||||||
|
version = "0.12.1"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "6f8c3e73077b4b4a6ab1ea5047c37c57aee77657bc8ecd6f29b0af082d0b0c07"
|
||||||
|
dependencies = [
|
||||||
|
"chrono",
|
||||||
|
"nom 7.1.3",
|
||||||
|
"once_cell",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "crossbeam"
|
name = "crossbeam"
|
||||||
version = "0.8.4"
|
version = "0.8.4"
|
||||||
@ -7051,6 +7072,15 @@ dependencies = [
|
|||||||
"phf_shared 0.11.3",
|
"phf_shared 0.11.3",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "phf"
|
||||||
|
version = "0.12.1"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "913273894cec178f401a31ec4b656318d95473527be05c0752cc41cdc32be8b7"
|
||||||
|
dependencies = [
|
||||||
|
"phf_shared 0.12.1",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "phf"
|
name = "phf"
|
||||||
version = "0.13.1"
|
version = "0.13.1"
|
||||||
@ -7130,6 +7160,15 @@ dependencies = [
|
|||||||
"unicase",
|
"unicase",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "phf_shared"
|
||||||
|
version = "0.12.1"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "06005508882fb681fd97892ecff4b7fd0fee13ef1aa569f8695dae7ab9099981"
|
||||||
|
dependencies = [
|
||||||
|
"siphasher",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "phf_shared"
|
name = "phf_shared"
|
||||||
version = "0.13.1"
|
version = "0.13.1"
|
||||||
@ -10937,7 +10976,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
|||||||
checksum = "e8004bca281f2d32df3bacd59bc67b312cb4c70cea46cbd79dbe8ac5ed206722"
|
checksum = "e8004bca281f2d32df3bacd59bc67b312cb4c70cea46cbd79dbe8ac5ed206722"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"chrono",
|
"chrono",
|
||||||
"chrono-tz",
|
"chrono-tz 0.9.0",
|
||||||
"globwalk",
|
"globwalk",
|
||||||
"humansize",
|
"humansize",
|
||||||
"lazy_static",
|
"lazy_static",
|
||||||
@ -12380,6 +12419,7 @@ dependencies = [
|
|||||||
"tracing-subscriber",
|
"tracing-subscriber",
|
||||||
"uuid",
|
"uuid",
|
||||||
"vapora-agents",
|
"vapora-agents",
|
||||||
|
"vapora-channels",
|
||||||
"vapora-knowledge-graph",
|
"vapora-knowledge-graph",
|
||||||
"vapora-llm-router",
|
"vapora-llm-router",
|
||||||
"vapora-rlm",
|
"vapora-rlm",
|
||||||
@ -12390,6 +12430,21 @@ dependencies = [
|
|||||||
"wiremock",
|
"wiremock",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "vapora-channels"
|
||||||
|
version = "1.2.0"
|
||||||
|
dependencies = [
|
||||||
|
"async-trait",
|
||||||
|
"regex",
|
||||||
|
"reqwest 0.13.1",
|
||||||
|
"serde",
|
||||||
|
"serde_json",
|
||||||
|
"thiserror 2.0.18",
|
||||||
|
"tokio",
|
||||||
|
"tracing",
|
||||||
|
"wiremock",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "vapora-cli"
|
name = "vapora-cli"
|
||||||
version = "1.2.0"
|
version = "1.2.0"
|
||||||
@ -12643,6 +12698,8 @@ dependencies = [
|
|||||||
"async-trait",
|
"async-trait",
|
||||||
"cedar-policy 4.9.0",
|
"cedar-policy 4.9.0",
|
||||||
"chrono",
|
"chrono",
|
||||||
|
"chrono-tz 0.10.4",
|
||||||
|
"cron",
|
||||||
"dashmap 6.1.0",
|
"dashmap 6.1.0",
|
||||||
"futures",
|
"futures",
|
||||||
"mockall",
|
"mockall",
|
||||||
@ -12657,6 +12714,7 @@ dependencies = [
|
|||||||
"tracing",
|
"tracing",
|
||||||
"uuid",
|
"uuid",
|
||||||
"vapora-agents",
|
"vapora-agents",
|
||||||
|
"vapora-channels",
|
||||||
"vapora-knowledge-graph",
|
"vapora-knowledge-graph",
|
||||||
"vapora-shared",
|
"vapora-shared",
|
||||||
"vapora-swarm",
|
"vapora-swarm",
|
||||||
|
|||||||
@ -3,6 +3,7 @@
|
|||||||
resolver = "2"
|
resolver = "2"
|
||||||
|
|
||||||
members = [
|
members = [
|
||||||
|
"crates/vapora-channels",
|
||||||
"crates/vapora-backend",
|
"crates/vapora-backend",
|
||||||
"crates/vapora-frontend",
|
"crates/vapora-frontend",
|
||||||
"crates/vapora-leptos-ui",
|
"crates/vapora-leptos-ui",
|
||||||
@ -36,6 +37,7 @@ categories = ["development-tools", "web-programming"]
|
|||||||
|
|
||||||
[workspace.dependencies]
|
[workspace.dependencies]
|
||||||
# Vapora internal crates
|
# Vapora internal crates
|
||||||
|
vapora-channels = { path = "crates/vapora-channels" }
|
||||||
vapora-shared = { path = "crates/vapora-shared" }
|
vapora-shared = { path = "crates/vapora-shared" }
|
||||||
vapora-leptos-ui = { path = "crates/vapora-leptos-ui" }
|
vapora-leptos-ui = { path = "crates/vapora-leptos-ui" }
|
||||||
vapora-agents = { path = "crates/vapora-agents" }
|
vapora-agents = { path = "crates/vapora-agents" }
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@ -516,8 +516,8 @@
|
|||||||
|
|
||||||
<div class="container">
|
<div class="container">
|
||||||
<header>
|
<header>
|
||||||
<span class="status-badge" data-en="✅ v1.2.0 | 354 Tests | 100% Pass Rate" data-es="✅ v1.2.0 | 354 Tests | 100% Éxito"
|
<span class="status-badge" data-en="✅ v1.2.0 | 372 Tests | 100% Pass Rate" data-es="✅ v1.2.0 | 372 Tests | 100% Éxito"
|
||||||
>✅ v1.2.0 | 354 Tests | 100% Pass Rate</span
|
>✅ v1.2.0 | 372 Tests | 100% Pass Rate</span
|
||||||
>
|
>
|
||||||
<div class="logo-container">
|
<div class="logo-container">
|
||||||
<img id="logo-dark" src="/vapora.svg" alt="Vapora - Development Orchestration" style="display: block;" />
|
<img id="logo-dark" src="/vapora.svg" alt="Vapora - Development Orchestration" style="display: block;" />
|
||||||
@ -785,6 +785,42 @@
|
|||||||
161 backend tests + K8s manifests with Kustomize overlays. Health checks, Prometheus metrics (/metrics endpoint), StatefulSets with anti-affinity. Local Docker Compose for development. Zero vendor lock-in.
|
161 backend tests + K8s manifests with Kustomize overlays. Health checks, Prometheus metrics (/metrics endpoint), StatefulSets with anti-affinity. Local Docker Compose for development. Zero vendor lock-in.
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
|
<div class="feature-box" style="border-left-color: #f97316">
|
||||||
|
<div class="feature-icon">⏰</div>
|
||||||
|
<h3
|
||||||
|
class="feature-title"
|
||||||
|
style="color: #f97316"
|
||||||
|
data-en="Autonomous Scheduling"
|
||||||
|
data-es="Scheduling Autónomo"
|
||||||
|
>
|
||||||
|
Autonomous Scheduling
|
||||||
|
</h3>
|
||||||
|
<p
|
||||||
|
class="feature-text"
|
||||||
|
data-en="Cron-triggered workflow execution with IANA timezone support via chrono-tz. Distributed fire-lock using SurrealDB conditional UPDATE prevents double-fires across multi-instance deployments — no external lock service required. 48 tests."
|
||||||
|
data-es="Ejecución de workflows disparada por cron con soporte de timezone IANA via chrono-tz. Fire-lock distribuido usando UPDATE condicional de SurrealDB previene doble disparo en despliegues multi-instancia — sin servicio de lock externo. 48 tests."
|
||||||
|
>
|
||||||
|
Cron-triggered workflow execution with IANA timezone support via chrono-tz. Distributed fire-lock using SurrealDB conditional UPDATE prevents double-fires across multi-instance deployments — no external lock service required. 48 tests.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
<div class="feature-box" style="border-left-color: #d946ef">
|
||||||
|
<div class="feature-icon">🔔</div>
|
||||||
|
<h3
|
||||||
|
class="feature-title"
|
||||||
|
style="color: #d946ef"
|
||||||
|
data-en="Webhook Notifications"
|
||||||
|
data-es="Notificaciones Webhook"
|
||||||
|
>
|
||||||
|
Webhook Notifications
|
||||||
|
</h3>
|
||||||
|
<p
|
||||||
|
class="feature-text"
|
||||||
|
data-en="Real-time alerts to Slack, Discord, and Telegram — no vendor SDKs. ${VAR} secret resolution is built into ChannelRegistry construction; tokens never reach the HTTP layer unresolved. Fire-and-forget hooks on task completion, proposal approval/rejection, and workflow lifecycle events."
|
||||||
|
data-es="Alertas en tiempo real a Slack, Discord y Telegram — sin SDKs de vendor. Resolución de secretos ${VAR} integrada en la construcción de ChannelRegistry; los tokens nunca llegan sin resolver a la capa HTTP. Hooks fire-and-forget en completado de tareas, aprobación/rechazo de propuestas y eventos del ciclo de vida de workflows."
|
||||||
|
>
|
||||||
|
Real-time alerts to Slack, Discord, and Telegram — no vendor SDKs. ${VAR} secret resolution is built into ChannelRegistry construction; tokens never reach the HTTP layer unresolved. Fire-and-forget hooks on task completion, proposal approval/rejection, and workflow lifecycle events.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
@ -795,7 +831,7 @@
|
|||||||
>
|
>
|
||||||
</h2>
|
</h2>
|
||||||
<div class="tech-stack">
|
<div class="tech-stack">
|
||||||
<span class="tech-badge">Rust (17 crates)</span>
|
<span class="tech-badge">Rust (18 crates)</span>
|
||||||
<span class="tech-badge">Axum REST API</span>
|
<span class="tech-badge">Axum REST API</span>
|
||||||
<span class="tech-badge">SurrealDB</span>
|
<span class="tech-badge">SurrealDB</span>
|
||||||
<span class="tech-badge">NATS JetStream</span>
|
<span class="tech-badge">NATS JetStream</span>
|
||||||
@ -806,6 +842,8 @@
|
|||||||
<span class="tech-badge">RLM (Hybrid Search)</span>
|
<span class="tech-badge">RLM (Hybrid Search)</span>
|
||||||
<span class="tech-badge">A2A Protocol</span>
|
<span class="tech-badge">A2A Protocol</span>
|
||||||
<span class="tech-badge">MCP Server</span>
|
<span class="tech-badge">MCP Server</span>
|
||||||
|
<span class="tech-badge">chrono-tz (Cron)</span>
|
||||||
|
<span class="tech-badge">Webhook Channels</span>
|
||||||
</div>
|
</div>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|||||||
@ -106,7 +106,7 @@
|
|||||||
<!-- TITLE -->
|
<!-- TITLE -->
|
||||||
<!-- ═══════════════════════════════════════ -->
|
<!-- ═══════════════════════════════════════ -->
|
||||||
<text x="700" y="42" font-family="'JetBrains Mono', monospace" font-size="22" font-weight="800" fill="url(#grad-main)" letter-spacing="6" text-anchor="middle" filter="url(#glow-sm)">VAPORA ARCHITECTURE</text>
|
<text x="700" y="42" font-family="'JetBrains Mono', monospace" font-size="22" font-weight="800" fill="url(#grad-main)" letter-spacing="6" text-anchor="middle" filter="url(#glow-sm)">VAPORA ARCHITECTURE</text>
|
||||||
<text x="700" y="62" font-family="'Inter', sans-serif" font-size="11" fill="#a855f7" opacity="0.6" letter-spacing="3" text-anchor="middle">18 CRATES · 354 TESTS · 100% RUST</text>
|
<text x="700" y="62" font-family="'Inter', sans-serif" font-size="11" fill="#a855f7" opacity="0.6" letter-spacing="3" text-anchor="middle">18 CRATES · 372 TESTS · 100% RUST</text>
|
||||||
|
|
||||||
<!-- Layer labels (left side) -->
|
<!-- Layer labels (left side) -->
|
||||||
<text x="30" y="115" font-family="'JetBrains Mono', monospace" font-size="10" fill="#22d3ee" opacity="0.7" letter-spacing="2" transform="rotate(-90 30 115)">PRESENTATION</text>
|
<text x="30" y="115" font-family="'JetBrains Mono', monospace" font-size="10" fill="#22d3ee" opacity="0.7" letter-spacing="2" transform="rotate(-90 30 115)">PRESENTATION</text>
|
||||||
|
|||||||
|
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 42 KiB |
@ -25,6 +25,7 @@ vapora-swarm = { workspace = true }
|
|||||||
vapora-tracking = { path = "../vapora-tracking" }
|
vapora-tracking = { path = "../vapora-tracking" }
|
||||||
vapora-knowledge-graph = { path = "../vapora-knowledge-graph" }
|
vapora-knowledge-graph = { path = "../vapora-knowledge-graph" }
|
||||||
vapora-workflow-engine = { workspace = true }
|
vapora-workflow-engine = { workspace = true }
|
||||||
|
vapora-channels = { workspace = true }
|
||||||
vapora-rlm = { path = "../vapora-rlm" }
|
vapora-rlm = { path = "../vapora-rlm" }
|
||||||
|
|
||||||
# Secrets management
|
# Secrets management
|
||||||
|
|||||||
62
crates/vapora-backend/src/api/channels.rs
Normal file
62
crates/vapora-backend/src/api/channels.rs
Normal file
@ -0,0 +1,62 @@
|
|||||||
|
use axum::{
|
||||||
|
extract::{Path, State},
|
||||||
|
http::StatusCode,
|
||||||
|
response::IntoResponse,
|
||||||
|
Json,
|
||||||
|
};
|
||||||
|
use serde::Serialize;
|
||||||
|
use vapora_channels::{ChannelError, Message};
|
||||||
|
use vapora_shared::VaporaError;
|
||||||
|
|
||||||
|
use crate::api::state::AppState;
|
||||||
|
use crate::api::ApiResult;
|
||||||
|
|
||||||
|
#[derive(Serialize)]
|
||||||
|
struct ChannelListResponse {
|
||||||
|
channels: Vec<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// List all registered notification channels.
|
||||||
|
///
|
||||||
|
/// GET /api/v1/channels
|
||||||
|
pub async fn list_channels(State(state): State<AppState>) -> impl IntoResponse {
|
||||||
|
let names = match &state.channel_registry {
|
||||||
|
Some(r) => {
|
||||||
|
let mut names: Vec<String> = r.channel_names().into_iter().map(str::to_owned).collect();
|
||||||
|
names.sort_unstable();
|
||||||
|
names
|
||||||
|
}
|
||||||
|
None => vec![],
|
||||||
|
};
|
||||||
|
Json(ChannelListResponse { channels: names })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Send a test message to a specific notification channel.
|
||||||
|
///
|
||||||
|
/// POST /api/v1/channels/:name/test
|
||||||
|
///
|
||||||
|
/// Returns 200 on successful delivery, 404 if the channel is unknown or not
|
||||||
|
/// configured, 502 if delivery fails at the remote platform.
|
||||||
|
pub async fn test_channel(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
Path(name): Path<String>,
|
||||||
|
) -> ApiResult<impl IntoResponse> {
|
||||||
|
let registry = state.channel_registry.as_ref().ok_or_else(|| {
|
||||||
|
VaporaError::NotFound(format!(
|
||||||
|
"Channel '{}' not found — no channels configured",
|
||||||
|
name
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let msg = Message::info(
|
||||||
|
"Test notification",
|
||||||
|
format!("Connectivity test from VAPORA backend for channel '{name}'"),
|
||||||
|
);
|
||||||
|
|
||||||
|
registry.send(&name, msg).await.map_err(|e| match e {
|
||||||
|
ChannelError::NotFound(_) => VaporaError::NotFound(e.to_string()),
|
||||||
|
other => VaporaError::InternalError(other.to_string()),
|
||||||
|
})?;
|
||||||
|
|
||||||
|
Ok(StatusCode::OK)
|
||||||
|
}
|
||||||
@ -3,6 +3,7 @@
|
|||||||
pub mod agents;
|
pub mod agents;
|
||||||
pub mod analytics;
|
pub mod analytics;
|
||||||
pub mod analytics_metrics;
|
pub mod analytics_metrics;
|
||||||
|
pub mod channels;
|
||||||
pub mod error;
|
pub mod error;
|
||||||
pub mod health;
|
pub mod health;
|
||||||
pub mod metrics;
|
pub mod metrics;
|
||||||
@ -12,6 +13,7 @@ pub mod proposals;
|
|||||||
pub mod provider_analytics;
|
pub mod provider_analytics;
|
||||||
pub mod provider_metrics;
|
pub mod provider_metrics;
|
||||||
pub mod rlm;
|
pub mod rlm;
|
||||||
|
pub mod schedules;
|
||||||
pub mod state;
|
pub mod state;
|
||||||
pub mod swarm;
|
pub mod swarm;
|
||||||
pub mod tasks;
|
pub mod tasks;
|
||||||
|
|||||||
@ -7,6 +7,7 @@ use axum::{
|
|||||||
Json,
|
Json,
|
||||||
};
|
};
|
||||||
use serde::Deserialize;
|
use serde::Deserialize;
|
||||||
|
use vapora_channels::Message;
|
||||||
use vapora_shared::models::{Proposal, ProposalReview, ProposalStatus, RiskLevel};
|
use vapora_shared::models::{Proposal, ProposalReview, ProposalStatus, RiskLevel};
|
||||||
|
|
||||||
use crate::api::state::AppState;
|
use crate::api::state::AppState;
|
||||||
@ -186,6 +187,12 @@ pub async fn approve_proposal(
|
|||||||
.approve_proposal(&id, tenant_id)
|
.approve_proposal(&id, tenant_id)
|
||||||
.await?;
|
.await?;
|
||||||
|
|
||||||
|
let msg = Message::success(
|
||||||
|
"Proposal approved",
|
||||||
|
format!("'{}' has been approved", proposal.title),
|
||||||
|
);
|
||||||
|
state.notify(&state.notification_config.clone().on_proposal_approved, msg);
|
||||||
|
|
||||||
Ok(Json(proposal))
|
Ok(Json(proposal))
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -203,6 +210,12 @@ pub async fn reject_proposal(
|
|||||||
.reject_proposal(&id, tenant_id)
|
.reject_proposal(&id, tenant_id)
|
||||||
.await?;
|
.await?;
|
||||||
|
|
||||||
|
let msg = Message::warning(
|
||||||
|
"Proposal rejected",
|
||||||
|
format!("'{}' has been rejected", proposal.title),
|
||||||
|
);
|
||||||
|
state.notify(&state.notification_config.clone().on_proposal_rejected, msg);
|
||||||
|
|
||||||
Ok(Json(proposal))
|
Ok(Json(proposal))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
451
crates/vapora-backend/src/api/schedules.rs
Normal file
451
crates/vapora-backend/src/api/schedules.rs
Normal file
@ -0,0 +1,451 @@
|
|||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use axum::{
|
||||||
|
extract::{Path, State},
|
||||||
|
http::StatusCode,
|
||||||
|
Json,
|
||||||
|
};
|
||||||
|
use chrono::Utc;
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use tracing::{error, info, warn};
|
||||||
|
use vapora_shared::VaporaError;
|
||||||
|
use vapora_workflow_engine::{
|
||||||
|
compute_next_fire_after_tz, compute_next_fire_at_tz, validate_cron_expression,
|
||||||
|
validate_timezone, RunStatus, ScheduleRun, ScheduleStore, ScheduledWorkflow,
|
||||||
|
};
|
||||||
|
|
||||||
|
use crate::api::error::ApiError;
|
||||||
|
use crate::api::state::AppState;
|
||||||
|
|
||||||
|
// ─── Response types ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
pub struct ScheduleResponse {
|
||||||
|
pub id: String,
|
||||||
|
pub template_name: String,
|
||||||
|
pub cron_expression: String,
|
||||||
|
pub initial_context: serde_json::Value,
|
||||||
|
pub enabled: bool,
|
||||||
|
pub allow_concurrent: bool,
|
||||||
|
pub catch_up: bool,
|
||||||
|
pub timezone: Option<String>,
|
||||||
|
pub last_fired_at: Option<String>,
|
||||||
|
pub next_fire_at: Option<String>,
|
||||||
|
pub runs_count: u64,
|
||||||
|
pub created_at: String,
|
||||||
|
pub updated_at: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl From<ScheduledWorkflow> for ScheduleResponse {
|
||||||
|
fn from(s: ScheduledWorkflow) -> Self {
|
||||||
|
Self {
|
||||||
|
id: s.id,
|
||||||
|
template_name: s.template_name,
|
||||||
|
cron_expression: s.cron_expression,
|
||||||
|
initial_context: s.initial_context,
|
||||||
|
enabled: s.enabled,
|
||||||
|
allow_concurrent: s.allow_concurrent,
|
||||||
|
catch_up: s.catch_up,
|
||||||
|
timezone: s.timezone,
|
||||||
|
last_fired_at: s.last_fired_at.map(|t| t.to_rfc3339()),
|
||||||
|
next_fire_at: s.next_fire_at.map(|t| t.to_rfc3339()),
|
||||||
|
runs_count: s.runs_count,
|
||||||
|
created_at: s.created_at.to_rfc3339(),
|
||||||
|
updated_at: s.updated_at.to_rfc3339(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
pub struct ScheduleRunResponse {
|
||||||
|
pub id: String,
|
||||||
|
pub schedule_id: String,
|
||||||
|
pub workflow_instance_id: Option<String>,
|
||||||
|
pub fired_at: String,
|
||||||
|
pub status: String,
|
||||||
|
pub notes: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl From<ScheduleRun> for ScheduleRunResponse {
|
||||||
|
fn from(r: ScheduleRun) -> Self {
|
||||||
|
Self {
|
||||||
|
id: r.id,
|
||||||
|
schedule_id: r.schedule_id,
|
||||||
|
workflow_instance_id: r.workflow_instance_id,
|
||||||
|
fired_at: r.fired_at.to_rfc3339(),
|
||||||
|
status: match r.status {
|
||||||
|
RunStatus::Fired => "fired".to_string(),
|
||||||
|
RunStatus::Skipped => "skipped".to_string(),
|
||||||
|
RunStatus::Failed => "failed".to_string(),
|
||||||
|
},
|
||||||
|
notes: r.notes,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
pub struct ScheduleListResponse {
|
||||||
|
pub schedules: Vec<ScheduleResponse>,
|
||||||
|
pub total: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
pub struct RunListResponse {
|
||||||
|
pub runs: Vec<ScheduleRunResponse>,
|
||||||
|
pub total: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
pub struct MessageResponse {
|
||||||
|
pub success: bool,
|
||||||
|
pub message: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─── Request types
|
||||||
|
// ────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/// Body for `PUT /api/v1/schedules/:id` — full replacement.
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
pub struct PutScheduleRequest {
|
||||||
|
pub template_name: String,
|
||||||
|
pub cron_expression: String,
|
||||||
|
#[serde(default)]
|
||||||
|
pub initial_context: Option<serde_json::Value>,
|
||||||
|
#[serde(default)]
|
||||||
|
pub enabled: Option<bool>,
|
||||||
|
#[serde(default)]
|
||||||
|
pub allow_concurrent: bool,
|
||||||
|
#[serde(default)]
|
||||||
|
pub catch_up: bool,
|
||||||
|
/// IANA timezone for cron evaluation (e.g. `"America/New_York"`). UTC when
|
||||||
|
/// absent.
|
||||||
|
#[serde(default)]
|
||||||
|
pub timezone: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Body for `PATCH /api/v1/schedules/:id` — partial update.
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
pub struct PatchScheduleRequest {
|
||||||
|
pub enabled: Option<bool>,
|
||||||
|
pub cron_expression: Option<String>,
|
||||||
|
pub allow_concurrent: Option<bool>,
|
||||||
|
pub catch_up: Option<bool>,
|
||||||
|
pub initial_context: Option<serde_json::Value>,
|
||||||
|
/// Update the timezone. Pass `null` explicitly to clear it (revert to UTC).
|
||||||
|
#[serde(default)]
|
||||||
|
pub timezone: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─── Helper
|
||||||
|
// ───────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
fn require_store(state: &AppState) -> Result<Arc<ScheduleStore>, ApiError> {
|
||||||
|
state.schedule_store.clone().ok_or_else(|| {
|
||||||
|
ApiError(VaporaError::InternalError(
|
||||||
|
"Schedule store not available".to_string(),
|
||||||
|
))
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
fn validate_cron(expr: &str) -> Result<(), ApiError> {
|
||||||
|
validate_cron_expression(expr).map_err(|e| {
|
||||||
|
ApiError(VaporaError::InvalidInput(format!(
|
||||||
|
"Invalid cron expression '{}': {}",
|
||||||
|
expr, e
|
||||||
|
)))
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
fn validate_tz(tz: &str) -> Result<(), ApiError> {
|
||||||
|
validate_timezone(tz).map_err(|e| ApiError(VaporaError::InvalidInput(e)))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─── Handlers
|
||||||
|
// ─────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/// `GET /api/v1/schedules` — list all schedules.
|
||||||
|
pub async fn list_schedules(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
) -> Result<Json<ScheduleListResponse>, ApiError> {
|
||||||
|
let store = require_store(&state)?;
|
||||||
|
let schedules = store.load_all().await.map_err(|e| {
|
||||||
|
error!("list_schedules: {e}");
|
||||||
|
ApiError(VaporaError::InternalError(e.to_string()))
|
||||||
|
})?;
|
||||||
|
let total = schedules.len();
|
||||||
|
Ok(Json(ScheduleListResponse {
|
||||||
|
total,
|
||||||
|
schedules: schedules.into_iter().map(ScheduleResponse::from).collect(),
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// `GET /api/v1/schedules/:id` — get a single schedule.
|
||||||
|
pub async fn get_schedule(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
Path(id): Path<String>,
|
||||||
|
) -> Result<Json<ScheduleResponse>, ApiError> {
|
||||||
|
let store = require_store(&state)?;
|
||||||
|
let schedule = store
|
||||||
|
.load_one(&id)
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
error!(schedule_id = %id, "get_schedule: {e}");
|
||||||
|
ApiError(VaporaError::InternalError(e.to_string()))
|
||||||
|
})?
|
||||||
|
.ok_or_else(|| ApiError(VaporaError::NotFound(format!("Schedule {} not found", id))))?;
|
||||||
|
|
||||||
|
Ok(Json(ScheduleResponse::from(schedule)))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// `PUT /api/v1/schedules/:id` — create or fully replace a schedule.
|
||||||
|
///
|
||||||
|
/// Preserves `last_fired_at` and `runs_count` from the existing record.
|
||||||
|
pub async fn put_schedule(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
Path(id): Path<String>,
|
||||||
|
Json(req): Json<PutScheduleRequest>,
|
||||||
|
) -> Result<(StatusCode, Json<ScheduleResponse>), ApiError> {
|
||||||
|
let store = require_store(&state)?;
|
||||||
|
|
||||||
|
validate_cron(&req.cron_expression)?;
|
||||||
|
if let Some(ref tz) = req.timezone {
|
||||||
|
validate_tz(tz)?;
|
||||||
|
}
|
||||||
|
let tz = req.timezone.as_deref();
|
||||||
|
let next_fire_at = compute_next_fire_at_tz(&req.cron_expression, tz);
|
||||||
|
|
||||||
|
let now = Utc::now();
|
||||||
|
let s = ScheduledWorkflow {
|
||||||
|
id: id.clone(),
|
||||||
|
template_name: req.template_name.clone(),
|
||||||
|
cron_expression: req.cron_expression.clone(),
|
||||||
|
initial_context: req
|
||||||
|
.initial_context
|
||||||
|
.unwrap_or(serde_json::Value::Object(Default::default())),
|
||||||
|
enabled: req.enabled.unwrap_or(true),
|
||||||
|
allow_concurrent: req.allow_concurrent,
|
||||||
|
catch_up: req.catch_up,
|
||||||
|
timezone: req.timezone.clone(),
|
||||||
|
last_fired_at: None,
|
||||||
|
next_fire_at,
|
||||||
|
runs_count: 0,
|
||||||
|
created_at: now,
|
||||||
|
updated_at: now,
|
||||||
|
};
|
||||||
|
|
||||||
|
store.full_upsert(&s).await.map_err(|e| {
|
||||||
|
error!(schedule_id = %id, "put_schedule: {e}");
|
||||||
|
ApiError(VaporaError::InternalError(e.to_string()))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let updated = store
|
||||||
|
.load_one(&id)
|
||||||
|
.await
|
||||||
|
.map_err(|e| ApiError(VaporaError::InternalError(e.to_string())))?
|
||||||
|
.ok_or_else(|| {
|
||||||
|
ApiError(VaporaError::InternalError(
|
||||||
|
"Schedule vanished after upsert".into(),
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
info!(schedule_id = %id, template = %req.template_name, "Schedule PUT");
|
||||||
|
Ok((StatusCode::OK, Json(ScheduleResponse::from(updated))))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// `PATCH /api/v1/schedules/:id` — partial update.
|
||||||
|
///
|
||||||
|
/// Only the provided fields are changed. If `cron_expression` is updated,
|
||||||
|
/// `next_fire_at` is recomputed automatically.
|
||||||
|
pub async fn patch_schedule(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
Path(id): Path<String>,
|
||||||
|
Json(req): Json<PatchScheduleRequest>,
|
||||||
|
) -> Result<Json<ScheduleResponse>, ApiError> {
|
||||||
|
let store = require_store(&state)?;
|
||||||
|
|
||||||
|
// Ensure the schedule exists first.
|
||||||
|
store
|
||||||
|
.load_one(&id)
|
||||||
|
.await
|
||||||
|
.map_err(|e| ApiError(VaporaError::InternalError(e.to_string())))?
|
||||||
|
.ok_or_else(|| ApiError(VaporaError::NotFound(format!("Schedule {} not found", id))))?;
|
||||||
|
|
||||||
|
let mut patch = serde_json::Map::new();
|
||||||
|
|
||||||
|
if let Some(enabled) = req.enabled {
|
||||||
|
patch.insert("enabled".into(), serde_json::json!(enabled));
|
||||||
|
}
|
||||||
|
if let Some(allow_concurrent) = req.allow_concurrent {
|
||||||
|
patch.insert(
|
||||||
|
"allow_concurrent".into(),
|
||||||
|
serde_json::json!(allow_concurrent),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if let Some(catch_up) = req.catch_up {
|
||||||
|
patch.insert("catch_up".into(), serde_json::json!(catch_up));
|
||||||
|
}
|
||||||
|
if let Some(ctx) = req.initial_context {
|
||||||
|
patch.insert("initial_context".into(), ctx);
|
||||||
|
}
|
||||||
|
if let Some(ref tz) = req.timezone {
|
||||||
|
validate_tz(tz)?;
|
||||||
|
patch.insert("timezone".into(), serde_json::json!(tz));
|
||||||
|
}
|
||||||
|
if let Some(ref cron) = req.cron_expression {
|
||||||
|
validate_cron(cron)?;
|
||||||
|
// Recompute next_fire_at using the new cron and whatever timezone is
|
||||||
|
// already in the patch (or falls back to None → UTC).
|
||||||
|
let tz = req.timezone.as_deref();
|
||||||
|
let next_fire_at = compute_next_fire_at_tz(cron, tz);
|
||||||
|
patch.insert("cron_expression".into(), serde_json::json!(cron));
|
||||||
|
patch.insert("next_fire_at".into(), serde_json::json!(next_fire_at));
|
||||||
|
}
|
||||||
|
patch.insert("updated_at".into(), serde_json::json!(Utc::now()));
|
||||||
|
|
||||||
|
let updated = store
|
||||||
|
.patch(&id, serde_json::Value::Object(patch))
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
error!(schedule_id = %id, "patch_schedule: {e}");
|
||||||
|
ApiError(VaporaError::InternalError(e.to_string()))
|
||||||
|
})?
|
||||||
|
.ok_or_else(|| {
|
||||||
|
ApiError(VaporaError::NotFound(format!(
|
||||||
|
"Schedule {} not found after patch",
|
||||||
|
id
|
||||||
|
)))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
info!(schedule_id = %id, "Schedule PATCHed");
|
||||||
|
Ok(Json(ScheduleResponse::from(updated)))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// `DELETE /api/v1/schedules/:id` — permanently remove a schedule.
|
||||||
|
pub async fn delete_schedule(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
Path(id): Path<String>,
|
||||||
|
) -> Result<(StatusCode, Json<MessageResponse>), ApiError> {
|
||||||
|
let store = require_store(&state)?;
|
||||||
|
|
||||||
|
// Verify existence before delete.
|
||||||
|
store
|
||||||
|
.load_one(&id)
|
||||||
|
.await
|
||||||
|
.map_err(|e| ApiError(VaporaError::InternalError(e.to_string())))?
|
||||||
|
.ok_or_else(|| ApiError(VaporaError::NotFound(format!("Schedule {} not found", id))))?;
|
||||||
|
|
||||||
|
store.delete(&id).await.map_err(|e| {
|
||||||
|
error!(schedule_id = %id, "delete_schedule: {e}");
|
||||||
|
ApiError(VaporaError::InternalError(e.to_string()))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
info!(schedule_id = %id, "Schedule deleted");
|
||||||
|
Ok((
|
||||||
|
StatusCode::OK,
|
||||||
|
Json(MessageResponse {
|
||||||
|
success: true,
|
||||||
|
message: format!("Schedule {} deleted", id),
|
||||||
|
}),
|
||||||
|
))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// `GET /api/v1/schedules/:id/runs` — execution history (last 100, desc).
|
||||||
|
pub async fn list_schedule_runs(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
Path(id): Path<String>,
|
||||||
|
) -> Result<Json<RunListResponse>, ApiError> {
|
||||||
|
let store = require_store(&state)?;
|
||||||
|
|
||||||
|
// Ensure schedule exists.
|
||||||
|
store
|
||||||
|
.load_one(&id)
|
||||||
|
.await
|
||||||
|
.map_err(|e| ApiError(VaporaError::InternalError(e.to_string())))?
|
||||||
|
.ok_or_else(|| ApiError(VaporaError::NotFound(format!("Schedule {} not found", id))))?;
|
||||||
|
|
||||||
|
let runs = store.load_runs(&id).await.map_err(|e| {
|
||||||
|
error!(schedule_id = %id, "list_schedule_runs: {e}");
|
||||||
|
ApiError(VaporaError::InternalError(e.to_string()))
|
||||||
|
})?;
|
||||||
|
let total = runs.len();
|
||||||
|
Ok(Json(RunListResponse {
|
||||||
|
total,
|
||||||
|
runs: runs.into_iter().map(ScheduleRunResponse::from).collect(),
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// `POST /api/v1/schedules/:id/fire` — immediately trigger a scheduled
|
||||||
|
/// workflow bypassing the cron timer.
|
||||||
|
///
|
||||||
|
/// Records an auditable `ScheduleRun` with `status = Fired` and advances
|
||||||
|
/// `last_fired_at` / `next_fire_at` exactly like the background scheduler.
|
||||||
|
pub async fn fire_schedule(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
Path(id): Path<String>,
|
||||||
|
) -> Result<(StatusCode, Json<ScheduleRunResponse>), ApiError> {
|
||||||
|
let store = require_store(&state)?;
|
||||||
|
let orchestrator = state.workflow_orchestrator.as_ref().ok_or_else(|| {
|
||||||
|
ApiError(VaporaError::InternalError(
|
||||||
|
"Workflow orchestrator not available".to_string(),
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let schedule = store
|
||||||
|
.load_one(&id)
|
||||||
|
.await
|
||||||
|
.map_err(|e| ApiError(VaporaError::InternalError(e.to_string())))?
|
||||||
|
.ok_or_else(|| ApiError(VaporaError::NotFound(format!("Schedule {} not found", id))))?;
|
||||||
|
|
||||||
|
if !schedule.enabled {
|
||||||
|
return Err(ApiError(VaporaError::InvalidInput(format!(
|
||||||
|
"Schedule {} is disabled",
|
||||||
|
id
|
||||||
|
))));
|
||||||
|
}
|
||||||
|
|
||||||
|
let now = Utc::now();
|
||||||
|
|
||||||
|
let workflow_id = orchestrator
|
||||||
|
.start_workflow(&schedule.template_name, schedule.initial_context.clone())
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
error!(schedule_id = %id, "fire_schedule start_workflow: {e}");
|
||||||
|
ApiError(VaporaError::WorkflowError(e.to_string()))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let run = ScheduleRun {
|
||||||
|
id: uuid::Uuid::new_v4().to_string(),
|
||||||
|
schedule_id: id.clone(),
|
||||||
|
workflow_instance_id: Some(workflow_id.clone()),
|
||||||
|
fired_at: now,
|
||||||
|
status: RunStatus::Fired,
|
||||||
|
notes: Some("Manual fire via API".to_string()),
|
||||||
|
};
|
||||||
|
store.record_run(&run).await.map_err(|e| {
|
||||||
|
warn!(schedule_id = %id, "fire_schedule record_run: {e}");
|
||||||
|
ApiError(VaporaError::InternalError(e.to_string()))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let next_fire_at = compute_next_fire_after_tz(
|
||||||
|
&schedule.cron_expression,
|
||||||
|
&now,
|
||||||
|
schedule.timezone.as_deref(),
|
||||||
|
);
|
||||||
|
|
||||||
|
store
|
||||||
|
.update_after_fire(&id, now, next_fire_at)
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
warn!(schedule_id = %id, "fire_schedule update_after_fire: {e}");
|
||||||
|
ApiError(VaporaError::InternalError(e.to_string()))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
info!(
|
||||||
|
schedule_id = %id,
|
||||||
|
template = %schedule.template_name,
|
||||||
|
workflow_id = %workflow_id,
|
||||||
|
"Schedule manually fired via API"
|
||||||
|
);
|
||||||
|
|
||||||
|
Ok((StatusCode::CREATED, Json(ScheduleRunResponse::from(run))))
|
||||||
|
}
|
||||||
@ -2,10 +2,12 @@
|
|||||||
|
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use vapora_channels::ChannelRegistry;
|
||||||
use vapora_rlm::storage::SurrealDBStorage;
|
use vapora_rlm::storage::SurrealDBStorage;
|
||||||
use vapora_rlm::RLMEngine;
|
use vapora_rlm::RLMEngine;
|
||||||
use vapora_workflow_engine::WorkflowOrchestrator;
|
use vapora_workflow_engine::{ScheduleStore, WorkflowOrchestrator};
|
||||||
|
|
||||||
|
use crate::config::NotificationConfig;
|
||||||
use crate::services::{
|
use crate::services::{
|
||||||
AgentService, ProjectService, ProposalService, ProviderAnalyticsService, TaskService,
|
AgentService, ProjectService, ProposalService, ProviderAnalyticsService, TaskService,
|
||||||
};
|
};
|
||||||
@ -20,6 +22,12 @@ pub struct AppState {
|
|||||||
pub provider_analytics_service: Arc<ProviderAnalyticsService>,
|
pub provider_analytics_service: Arc<ProviderAnalyticsService>,
|
||||||
pub workflow_orchestrator: Option<Arc<WorkflowOrchestrator>>,
|
pub workflow_orchestrator: Option<Arc<WorkflowOrchestrator>>,
|
||||||
pub rlm_engine: Option<Arc<RLMEngine<SurrealDBStorage>>>,
|
pub rlm_engine: Option<Arc<RLMEngine<SurrealDBStorage>>>,
|
||||||
|
pub schedule_store: Option<Arc<ScheduleStore>>,
|
||||||
|
/// Outbound notification channels; `None` when `[channels]` is absent from
|
||||||
|
/// config.
|
||||||
|
pub channel_registry: Option<Arc<ChannelRegistry>>,
|
||||||
|
/// Backend-level event → channel-name mappings.
|
||||||
|
pub notification_config: Arc<NotificationConfig>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl AppState {
|
impl AppState {
|
||||||
@ -39,6 +47,9 @@ impl AppState {
|
|||||||
provider_analytics_service: Arc::new(provider_analytics_service),
|
provider_analytics_service: Arc::new(provider_analytics_service),
|
||||||
workflow_orchestrator: None,
|
workflow_orchestrator: None,
|
||||||
rlm_engine: None,
|
rlm_engine: None,
|
||||||
|
schedule_store: None,
|
||||||
|
channel_registry: None,
|
||||||
|
notification_config: Arc::new(NotificationConfig::default()),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -54,4 +65,198 @@ impl AppState {
|
|||||||
self.rlm_engine = Some(rlm_engine);
|
self.rlm_engine = Some(rlm_engine);
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Add schedule store to state
|
||||||
|
pub fn with_schedule_store(mut self, store: Arc<ScheduleStore>) -> Self {
|
||||||
|
self.schedule_store = Some(store);
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Attach the notification channel registry built from `[channels]` config.
|
||||||
|
pub fn with_channel_registry(mut self, registry: Arc<ChannelRegistry>) -> Self {
|
||||||
|
self.channel_registry = Some(registry);
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Attach the per-event notification targets.
|
||||||
|
pub fn with_notification_config(mut self, cfg: NotificationConfig) -> Self {
|
||||||
|
self.notification_config = Arc::new(cfg);
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Fire-and-forget: send `msg` to each channel in `targets`.
|
||||||
|
///
|
||||||
|
/// Spawns a background task; delivery failures are logged as `warn!` and
|
||||||
|
/// never surface to the caller.
|
||||||
|
pub fn notify(&self, targets: &[String], msg: vapora_channels::Message) {
|
||||||
|
if targets.is_empty() {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
let registry = self.channel_registry.clone();
|
||||||
|
let targets = targets.to_vec();
|
||||||
|
tokio::spawn(dispatch_notifications(registry, targets, msg));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Deliver `msg` to every channel name in `targets` using `registry`.
|
||||||
|
///
|
||||||
|
/// A `None` registry or an unknown channel name is silent (warn-logged for
|
||||||
|
/// unknown names). Failures in one channel do not abort delivery to others.
|
||||||
|
///
|
||||||
|
/// Extracted from [`AppState::notify`] to be directly callable in tests
|
||||||
|
/// without needing a fully-constructed [`AppState`].
|
||||||
|
pub(crate) async fn dispatch_notifications(
|
||||||
|
registry: Option<Arc<ChannelRegistry>>,
|
||||||
|
targets: Vec<String>,
|
||||||
|
msg: vapora_channels::Message,
|
||||||
|
) {
|
||||||
|
let Some(registry) = registry else {
|
||||||
|
return;
|
||||||
|
};
|
||||||
|
for name in &targets {
|
||||||
|
if let Err(e) = registry.send(name, msg.clone()).await {
|
||||||
|
tracing::warn!(channel = %name, error = %e, "Notification delivery failed");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use std::sync::{Arc, Mutex};
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use vapora_channels::Result as ChannelResult;
|
||||||
|
use vapora_channels::{ChannelError, ChannelRegistry, Message, NotificationChannel};
|
||||||
|
|
||||||
|
use super::dispatch_notifications;
|
||||||
|
|
||||||
|
struct RecordingChannel {
|
||||||
|
name: String,
|
||||||
|
captured: Arc<Mutex<Vec<Message>>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl RecordingChannel {
|
||||||
|
fn new(name: &str) -> (Self, Arc<Mutex<Vec<Message>>>) {
|
||||||
|
let captured = Arc::new(Mutex::new(vec![]));
|
||||||
|
(
|
||||||
|
Self {
|
||||||
|
name: name.to_string(),
|
||||||
|
captured: Arc::clone(&captured),
|
||||||
|
},
|
||||||
|
captured,
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl NotificationChannel for RecordingChannel {
|
||||||
|
fn name(&self) -> &str {
|
||||||
|
&self.name
|
||||||
|
}
|
||||||
|
async fn send(&self, msg: &Message) -> ChannelResult<()> {
|
||||||
|
self.captured.lock().unwrap().push(msg.clone());
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
struct FailingChannel {
|
||||||
|
name: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl NotificationChannel for FailingChannel {
|
||||||
|
fn name(&self) -> &str {
|
||||||
|
&self.name
|
||||||
|
}
|
||||||
|
async fn send(&self, _msg: &Message) -> ChannelResult<()> {
|
||||||
|
Err(ChannelError::ApiError {
|
||||||
|
channel: self.name.clone(),
|
||||||
|
status: 503,
|
||||||
|
body: "unavailable".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn dispatch_is_noop_when_registry_is_none() {
|
||||||
|
// Must not panic; targets are non-empty so the only short-circuit is None
|
||||||
|
// registry.
|
||||||
|
dispatch_notifications(None, vec!["ch".to_string()], Message::info("Test", "body")).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn dispatch_delivers_to_named_channel() {
|
||||||
|
let (recording, captured) = RecordingChannel::new("team-slack");
|
||||||
|
let mut registry = ChannelRegistry::new();
|
||||||
|
registry.register(Arc::new(recording));
|
||||||
|
|
||||||
|
dispatch_notifications(
|
||||||
|
Some(Arc::new(registry)),
|
||||||
|
vec!["team-slack".to_string()],
|
||||||
|
Message::success("Deploy done", "v1.0 → prod"),
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let msgs = captured.lock().unwrap();
|
||||||
|
assert_eq!(msgs.len(), 1);
|
||||||
|
assert_eq!(msgs[0].title, "Deploy done");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn dispatch_delivers_to_multiple_targets() {
|
||||||
|
let (ch_a, cap_a) = RecordingChannel::new("ch-a");
|
||||||
|
let (ch_b, cap_b) = RecordingChannel::new("ch-b");
|
||||||
|
let mut registry = ChannelRegistry::new();
|
||||||
|
registry.register(Arc::new(ch_a));
|
||||||
|
registry.register(Arc::new(ch_b));
|
||||||
|
|
||||||
|
dispatch_notifications(
|
||||||
|
Some(Arc::new(registry)),
|
||||||
|
vec!["ch-a".to_string(), "ch-b".to_string()],
|
||||||
|
Message::info("Test", "broadcast"),
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
assert_eq!(cap_a.lock().unwrap().len(), 1);
|
||||||
|
assert_eq!(cap_b.lock().unwrap().len(), 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn dispatch_continues_after_channel_failure() {
|
||||||
|
let bad = FailingChannel {
|
||||||
|
name: "bad".to_string(),
|
||||||
|
};
|
||||||
|
let (good, cap_good) = RecordingChannel::new("good");
|
||||||
|
let mut registry = ChannelRegistry::new();
|
||||||
|
registry.register(Arc::new(bad));
|
||||||
|
registry.register(Arc::new(good));
|
||||||
|
|
||||||
|
// Must not panic; "good" receives despite "bad" returning an error.
|
||||||
|
dispatch_notifications(
|
||||||
|
Some(Arc::new(registry)),
|
||||||
|
vec!["bad".to_string(), "good".to_string()],
|
||||||
|
Message::error("Alert", "system down"),
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
assert_eq!(cap_good.lock().unwrap().len(), 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn dispatch_logs_warn_on_unknown_channel_but_continues() {
|
||||||
|
let (present, cap) = RecordingChannel::new("present");
|
||||||
|
let mut registry = ChannelRegistry::new();
|
||||||
|
registry.register(Arc::new(present));
|
||||||
|
|
||||||
|
// "missing" → ChannelError::NotFound logged as warn; "present" still
|
||||||
|
// receives its message.
|
||||||
|
dispatch_notifications(
|
||||||
|
Some(Arc::new(registry)),
|
||||||
|
vec!["missing".to_string(), "present".to_string()],
|
||||||
|
Message::info("Test", "body"),
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
assert_eq!(cap.lock().unwrap().len(), 1);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -7,6 +7,7 @@ use axum::{
|
|||||||
Json,
|
Json,
|
||||||
};
|
};
|
||||||
use serde::Deserialize;
|
use serde::Deserialize;
|
||||||
|
use vapora_channels::Message;
|
||||||
use vapora_shared::models::{Task, TaskPriority, TaskStatus};
|
use vapora_shared::models::{Task, TaskPriority, TaskStatus};
|
||||||
|
|
||||||
use crate::api::state::AppState;
|
use crate::api::state::AppState;
|
||||||
@ -160,8 +161,17 @@ pub async fn update_task_status(
|
|||||||
|
|
||||||
let updated = state
|
let updated = state
|
||||||
.task_service
|
.task_service
|
||||||
.update_task_status(&id, tenant_id, status)
|
.update_task_status(&id, tenant_id, status.clone())
|
||||||
.await?;
|
.await?;
|
||||||
|
|
||||||
|
if status == TaskStatus::Done {
|
||||||
|
let msg = Message::success(
|
||||||
|
"Task completed",
|
||||||
|
format!("'{}' moved to Done", updated.title),
|
||||||
|
);
|
||||||
|
state.notify(&state.notification_config.clone().on_task_done, msg);
|
||||||
|
}
|
||||||
|
|
||||||
Ok(Json(updated))
|
Ok(Json(updated))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -290,6 +290,8 @@ mod tests {
|
|||||||
approval_required: false,
|
approval_required: false,
|
||||||
compensation_agents: None,
|
compensation_agents: None,
|
||||||
}],
|
}],
|
||||||
|
schedule: None,
|
||||||
|
notifications: Default::default(),
|
||||||
};
|
};
|
||||||
|
|
||||||
let instance = WorkflowInstance::new(&config, serde_json::json!({}));
|
let instance = WorkflowInstance::new(&config, serde_json::json!({}));
|
||||||
|
|||||||
@ -1,10 +1,12 @@
|
|||||||
// Configuration module for VAPORA Backend
|
// Configuration module for VAPORA Backend
|
||||||
// Loads config from vapora.toml with environment variable interpolation
|
// Loads config from vapora.toml with environment variable interpolation
|
||||||
|
|
||||||
|
use std::collections::HashMap;
|
||||||
use std::fs;
|
use std::fs;
|
||||||
use std::path::Path;
|
use std::path::Path;
|
||||||
|
|
||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
|
use vapora_channels::config::ChannelConfig;
|
||||||
use vapora_shared::{Result, VaporaError};
|
use vapora_shared::{Result, VaporaError};
|
||||||
|
|
||||||
/// Main configuration structure
|
/// Main configuration structure
|
||||||
@ -16,6 +18,32 @@ pub struct Config {
|
|||||||
pub auth: AuthConfig,
|
pub auth: AuthConfig,
|
||||||
pub logging: LoggingConfig,
|
pub logging: LoggingConfig,
|
||||||
pub metrics: MetricsConfig,
|
pub metrics: MetricsConfig,
|
||||||
|
/// Named outbound notification channels (`[channels.name]` blocks in TOML).
|
||||||
|
/// Credential fields support `${VAR}` / `${VAR:-default}` interpolation —
|
||||||
|
/// resolution happens automatically in [`ChannelRegistry::from_map`].
|
||||||
|
#[serde(default)]
|
||||||
|
pub channels: HashMap<String, ChannelConfig>,
|
||||||
|
/// Backend-level event → channel-name mappings.
|
||||||
|
#[serde(default)]
|
||||||
|
pub notifications: NotificationConfig,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Per-event lists of channel names to notify.
|
||||||
|
///
|
||||||
|
/// ```toml
|
||||||
|
/// [notifications]
|
||||||
|
/// on_task_done = ["team-slack"]
|
||||||
|
/// on_proposal_approved = ["team-slack"]
|
||||||
|
/// on_proposal_rejected = ["team-slack", "ops-telegram"]
|
||||||
|
/// ```
|
||||||
|
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
|
||||||
|
pub struct NotificationConfig {
|
||||||
|
#[serde(default)]
|
||||||
|
pub on_task_done: Vec<String>,
|
||||||
|
#[serde(default)]
|
||||||
|
pub on_proposal_approved: Vec<String>,
|
||||||
|
#[serde(default)]
|
||||||
|
pub on_proposal_rejected: Vec<String>,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Server configuration
|
/// Server configuration
|
||||||
@ -199,7 +227,8 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_env_var_interpolation() {
|
fn test_env_var_interpolation() {
|
||||||
std::env::set_var("TEST_VAR", "test_value");
|
// SAFETY: single-threaded test, no concurrent env access.
|
||||||
|
unsafe { std::env::set_var("TEST_VAR", "test_value") };
|
||||||
|
|
||||||
let input = "host = \"${TEST_VAR}\"";
|
let input = "host = \"${TEST_VAR}\"";
|
||||||
let result = Config::interpolate_env_vars(input).unwrap();
|
let result = Config::interpolate_env_vars(input).unwrap();
|
||||||
@ -245,6 +274,8 @@ mod tests {
|
|||||||
enabled: true,
|
enabled: true,
|
||||||
port: 9090,
|
port: 9090,
|
||||||
},
|
},
|
||||||
|
channels: HashMap::new(),
|
||||||
|
notifications: NotificationConfig::default(),
|
||||||
};
|
};
|
||||||
|
|
||||||
assert!(config.validate().is_err());
|
assert!(config.validate().is_err());
|
||||||
|
|||||||
@ -18,7 +18,9 @@ use axum::{
|
|||||||
use clap::Parser;
|
use clap::Parser;
|
||||||
use tower_http::cors::{Any, CorsLayer};
|
use tower_http::cors::{Any, CorsLayer};
|
||||||
use tracing::{info, Level};
|
use tracing::{info, Level};
|
||||||
|
use vapora_channels::ChannelRegistry;
|
||||||
use vapora_swarm::{SwarmCoordinator, SwarmMetrics};
|
use vapora_swarm::{SwarmCoordinator, SwarmMetrics};
|
||||||
|
use vapora_workflow_engine::ScheduleStore;
|
||||||
|
|
||||||
use crate::api::AppState;
|
use crate::api::AppState;
|
||||||
use crate::config::Config;
|
use crate::config::Config;
|
||||||
@ -104,15 +106,44 @@ async fn main() -> Result<()> {
|
|||||||
)?);
|
)?);
|
||||||
info!("RLM engine initialized for Phase 8");
|
info!("RLM engine initialized for Phase 8");
|
||||||
|
|
||||||
|
// Initialize schedule store (backed by the same SurrealDB connection)
|
||||||
|
let schedule_store = Arc::new(ScheduleStore::new(Arc::new(db.clone())));
|
||||||
|
info!("ScheduleStore initialized for autonomous scheduling");
|
||||||
|
|
||||||
|
// Build notification channel registry from [channels] config block.
|
||||||
|
// Absent block → no notifications sent; a build error is non-fatal (warns).
|
||||||
|
let channel_registry = if config.channels.is_empty() {
|
||||||
|
None
|
||||||
|
} else {
|
||||||
|
match ChannelRegistry::from_map(config.channels.clone()) {
|
||||||
|
Ok(r) => {
|
||||||
|
info!(
|
||||||
|
"Channel registry built ({} channels)",
|
||||||
|
r.channel_names().len()
|
||||||
|
);
|
||||||
|
Some(std::sync::Arc::new(r))
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!("Failed to build channel registry: {e}; notifications disabled");
|
||||||
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
// Create application state
|
// Create application state
|
||||||
let app_state = AppState::new(
|
let mut app_state = AppState::new(
|
||||||
project_service,
|
project_service,
|
||||||
task_service,
|
task_service,
|
||||||
agent_service,
|
agent_service,
|
||||||
proposal_service,
|
proposal_service,
|
||||||
provider_analytics_service,
|
provider_analytics_service,
|
||||||
)
|
)
|
||||||
.with_rlm_engine(rlm_engine);
|
.with_rlm_engine(rlm_engine)
|
||||||
|
.with_schedule_store(schedule_store)
|
||||||
|
.with_notification_config(config.notifications.clone());
|
||||||
|
if let Some(registry) = channel_registry {
|
||||||
|
app_state = app_state.with_channel_registry(registry);
|
||||||
|
}
|
||||||
|
|
||||||
// Create SwarmMetrics for Prometheus monitoring
|
// Create SwarmMetrics for Prometheus monitoring
|
||||||
let metrics = match SwarmMetrics::new() {
|
let metrics = match SwarmMetrics::new() {
|
||||||
@ -327,10 +358,33 @@ async fn main() -> Result<()> {
|
|||||||
"/api/v1/analytics/providers/:provider/tasks/:task_type",
|
"/api/v1/analytics/providers/:provider/tasks/:task_type",
|
||||||
get(api::provider_analytics::get_provider_task_type_metrics),
|
get(api::provider_analytics::get_provider_task_type_metrics),
|
||||||
)
|
)
|
||||||
|
// Channel endpoints
|
||||||
|
.route("/api/v1/channels", get(api::channels::list_channels))
|
||||||
|
.route(
|
||||||
|
"/api/v1/channels/:name/test",
|
||||||
|
post(api::channels::test_channel),
|
||||||
|
)
|
||||||
// RLM endpoints (Phase 8)
|
// RLM endpoints (Phase 8)
|
||||||
.route("/api/v1/rlm/documents", post(api::rlm::load_document))
|
.route("/api/v1/rlm/documents", post(api::rlm::load_document))
|
||||||
.route("/api/v1/rlm/query", post(api::rlm::query_document))
|
.route("/api/v1/rlm/query", post(api::rlm::query_document))
|
||||||
.route("/api/v1/rlm/analyze", post(api::rlm::analyze_document))
|
.route("/api/v1/rlm/analyze", post(api::rlm::analyze_document))
|
||||||
|
// Schedule endpoints
|
||||||
|
.route("/api/v1/schedules", get(api::schedules::list_schedules))
|
||||||
|
.route(
|
||||||
|
"/api/v1/schedules/:id",
|
||||||
|
get(api::schedules::get_schedule)
|
||||||
|
.put(api::schedules::put_schedule)
|
||||||
|
.patch(api::schedules::patch_schedule)
|
||||||
|
.delete(api::schedules::delete_schedule),
|
||||||
|
)
|
||||||
|
.route(
|
||||||
|
"/api/v1/schedules/:id/runs",
|
||||||
|
get(api::schedules::list_schedule_runs),
|
||||||
|
)
|
||||||
|
.route(
|
||||||
|
"/api/v1/schedules/:id/fire",
|
||||||
|
post(api::schedules::fire_schedule),
|
||||||
|
)
|
||||||
// Apply CORS, state, and extensions
|
// Apply CORS, state, and extensions
|
||||||
.layer(Extension(swarm_coordinator))
|
.layer(Extension(swarm_coordinator))
|
||||||
.layer(cors)
|
.layer(cors)
|
||||||
|
|||||||
23
crates/vapora-channels/Cargo.toml
Normal file
23
crates/vapora-channels/Cargo.toml
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
[package]
|
||||||
|
name = "vapora-channels"
|
||||||
|
version.workspace = true
|
||||||
|
edition.workspace = true
|
||||||
|
authors.workspace = true
|
||||||
|
license.workspace = true
|
||||||
|
repository.workspace = true
|
||||||
|
homepage = "https://vapora.dev"
|
||||||
|
rust-version.workspace = true
|
||||||
|
description = "Outbound notification channels: Slack, Discord, Telegram"
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
|
reqwest = { workspace = true }
|
||||||
|
async-trait = { workspace = true }
|
||||||
|
serde = { workspace = true }
|
||||||
|
serde_json = { workspace = true }
|
||||||
|
thiserror = { workspace = true }
|
||||||
|
tracing = { workspace = true }
|
||||||
|
regex = { workspace = true }
|
||||||
|
|
||||||
|
[dev-dependencies]
|
||||||
|
tokio = { workspace = true, features = ["full"] }
|
||||||
|
wiremock = { workspace = true }
|
||||||
9
crates/vapora-channels/src/channel.rs
Normal file
9
crates/vapora-channels/src/channel.rs
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
|
||||||
|
use crate::{error::Result, message::Message};
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
pub trait NotificationChannel: Send + Sync {
|
||||||
|
fn name(&self) -> &str;
|
||||||
|
async fn send(&self, msg: &Message) -> Result<()>;
|
||||||
|
}
|
||||||
246
crates/vapora-channels/src/config.rs
Normal file
246
crates/vapora-channels/src/config.rs
Normal file
@ -0,0 +1,246 @@
|
|||||||
|
use std::collections::HashMap;
|
||||||
|
use std::sync::OnceLock;
|
||||||
|
|
||||||
|
use regex::Regex;
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
|
use crate::error::{ChannelError, Result};
|
||||||
|
|
||||||
|
/// Top-level config section; embed under `[channels]` in your TOML.
|
||||||
|
///
|
||||||
|
/// Credential fields (`webhook_url`, `bot_token`, etc.) support `${VAR}` and
|
||||||
|
/// `${VAR:-default}` interpolation. Resolution is performed automatically by
|
||||||
|
/// [`ChannelRegistry::from_config`] / [`ChannelRegistry::from_map`] via
|
||||||
|
/// [`ChannelConfig::resolve_secrets`]. Plain literals pass through unchanged.
|
||||||
|
///
|
||||||
|
/// ```toml
|
||||||
|
/// [channels.team-slack]
|
||||||
|
/// type = "slack"
|
||||||
|
/// webhook_url = "${SLACK_WEBHOOK_URL}"
|
||||||
|
///
|
||||||
|
/// [channels.ops-discord]
|
||||||
|
/// type = "discord"
|
||||||
|
/// webhook_url = "${DISCORD_WEBHOOK_URL}"
|
||||||
|
///
|
||||||
|
/// [channels.alerts-telegram]
|
||||||
|
/// type = "telegram"
|
||||||
|
/// bot_token = "${TELEGRAM_BOT_TOKEN}"
|
||||||
|
/// chat_id = "${TELEGRAM_CHAT_ID:-100999}"
|
||||||
|
/// ```
|
||||||
|
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
|
||||||
|
pub struct ChannelsConfig {
|
||||||
|
#[serde(default)]
|
||||||
|
pub channels: HashMap<String, ChannelConfig>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ChannelsConfig {
|
||||||
|
/// Resolve all `${VAR}` references in every channel entry.
|
||||||
|
///
|
||||||
|
/// Consumes `self` and returns a new `ChannelsConfig` with literals. Fails
|
||||||
|
/// on the first channel whose secrets cannot be resolved.
|
||||||
|
pub fn resolve_secrets(self) -> Result<Self> {
|
||||||
|
let channels = self
|
||||||
|
.channels
|
||||||
|
.into_iter()
|
||||||
|
.map(|(name, cfg)| cfg.resolve_secrets().map(|c| (name, c)))
|
||||||
|
.collect::<Result<_>>()?;
|
||||||
|
Ok(Self { channels })
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
#[serde(tag = "type", rename_all = "lowercase")]
|
||||||
|
pub enum ChannelConfig {
|
||||||
|
Slack(SlackConfig),
|
||||||
|
Discord(DiscordConfig),
|
||||||
|
Telegram(TelegramConfig),
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ChannelConfig {
|
||||||
|
/// Resolve `${VAR}` / `${VAR:-default}` references in all credential
|
||||||
|
/// fields. Plain string literals are returned unchanged.
|
||||||
|
pub fn resolve_secrets(self) -> Result<Self> {
|
||||||
|
match self {
|
||||||
|
Self::Slack(c) => Ok(Self::Slack(c.resolve_secrets()?)),
|
||||||
|
Self::Discord(c) => Ok(Self::Discord(c.resolve_secrets()?)),
|
||||||
|
Self::Telegram(c) => Ok(Self::Telegram(c.resolve_secrets()?)),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct SlackConfig {
|
||||||
|
pub webhook_url: String,
|
||||||
|
/// Channel override (e.g. `#alerts`). The webhook already targets a
|
||||||
|
/// channel; this overrides it for workspaces that allow it.
|
||||||
|
pub channel: Option<String>,
|
||||||
|
pub username: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SlackConfig {
|
||||||
|
pub fn resolve_secrets(self) -> Result<Self> {
|
||||||
|
Ok(Self {
|
||||||
|
webhook_url: interpolate(&self.webhook_url)?,
|
||||||
|
channel: self.channel.map(|s| interpolate(&s)).transpose()?,
|
||||||
|
username: self.username.map(|s| interpolate(&s)).transpose()?,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct DiscordConfig {
|
||||||
|
pub webhook_url: String,
|
||||||
|
pub username: Option<String>,
|
||||||
|
pub avatar_url: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl DiscordConfig {
|
||||||
|
pub fn resolve_secrets(self) -> Result<Self> {
|
||||||
|
Ok(Self {
|
||||||
|
webhook_url: interpolate(&self.webhook_url)?,
|
||||||
|
username: self.username.map(|s| interpolate(&s)).transpose()?,
|
||||||
|
avatar_url: self.avatar_url.map(|s| interpolate(&s)).transpose()?,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct TelegramConfig {
|
||||||
|
pub bot_token: String,
|
||||||
|
/// Numeric chat ID (e.g. `-1001234567890` for a supergroup).
|
||||||
|
pub chat_id: String,
|
||||||
|
/// Override the Bot API base URL. Leave `None` for production.
|
||||||
|
/// Useful for pointing at a local mock server during tests.
|
||||||
|
pub api_base: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TelegramConfig {
|
||||||
|
pub fn api_url(&self) -> String {
|
||||||
|
let base = self
|
||||||
|
.api_base
|
||||||
|
.as_deref()
|
||||||
|
.unwrap_or("https://api.telegram.org");
|
||||||
|
format!("{}/bot{}/sendMessage", base, self.bot_token)
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn resolve_secrets(self) -> Result<Self> {
|
||||||
|
Ok(Self {
|
||||||
|
bot_token: interpolate(&self.bot_token)?,
|
||||||
|
chat_id: interpolate(&self.chat_id)?,
|
||||||
|
api_base: self.api_base.map(|s| interpolate(&s)).transpose()?,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Expand every `${VAR}` / `${VAR:-default}` reference found anywhere in `s`.
|
||||||
|
///
|
||||||
|
/// - `${FOO}` → value of `FOO`, error if unset
|
||||||
|
/// - `${FOO:-bar}` → value of `FOO` if set, `"bar"` otherwise
|
||||||
|
/// - Anything else → returned unchanged
|
||||||
|
fn interpolate(s: &str) -> Result<String> {
|
||||||
|
static RE: OnceLock<Regex> = OnceLock::new();
|
||||||
|
let re = RE.get_or_init(|| {
|
||||||
|
// Matches ${VAR} and ${VAR:-default} anywhere in the string.
|
||||||
|
Regex::new(r"\$\{([^}:]+)(?::-(.*?))?\}").expect("static regex is valid")
|
||||||
|
});
|
||||||
|
|
||||||
|
// Fast path: no placeholder in the string.
|
||||||
|
if !s.contains("${") {
|
||||||
|
return Ok(s.to_string());
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut result = s.to_string();
|
||||||
|
for cap in re.captures_iter(s) {
|
||||||
|
let full = cap.get(0).unwrap().as_str();
|
||||||
|
let var_name = cap.get(1).unwrap().as_str();
|
||||||
|
let default = cap.get(2).map(|m| m.as_str());
|
||||||
|
|
||||||
|
let value = match std::env::var(var_name) {
|
||||||
|
Ok(v) => v,
|
||||||
|
Err(_) => match default {
|
||||||
|
Some(d) => d.to_string(),
|
||||||
|
None => return Err(ChannelError::SecretNotFound(var_name.to_string())),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
result = result.replace(full, &value);
|
||||||
|
}
|
||||||
|
Ok(result)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn literal_passthrough() {
|
||||||
|
let v = interpolate("https://hooks.slack.com/services/T/B/token").unwrap();
|
||||||
|
assert_eq!(v, "https://hooks.slack.com/services/T/B/token");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn env_var_resolved() {
|
||||||
|
// SAFETY: single-threaded test process, no concurrent env access.
|
||||||
|
unsafe { std::env::set_var("TEST_CHANNELS_WEBHOOK", "https://resolved.example.com") };
|
||||||
|
let v = interpolate("${TEST_CHANNELS_WEBHOOK}").unwrap();
|
||||||
|
assert_eq!(v, "https://resolved.example.com");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn env_var_with_default_used_when_unset() {
|
||||||
|
// SAFETY: single-threaded test process, no concurrent env access.
|
||||||
|
unsafe { std::env::remove_var("TEST_CHANNELS_MISSING") };
|
||||||
|
let v = interpolate("${TEST_CHANNELS_MISSING:-fallback-token}").unwrap();
|
||||||
|
assert_eq!(v, "fallback-token");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn env_var_missing_no_default_errors() {
|
||||||
|
// SAFETY: single-threaded test process, no concurrent env access.
|
||||||
|
unsafe { std::env::remove_var("TEST_CHANNELS_REQUIRED") };
|
||||||
|
let err = interpolate("${TEST_CHANNELS_REQUIRED}").unwrap_err();
|
||||||
|
assert!(
|
||||||
|
matches!(err, ChannelError::SecretNotFound(ref v) if v == "TEST_CHANNELS_REQUIRED")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn partial_interpolation_in_url() {
|
||||||
|
// SAFETY: single-threaded test process, no concurrent env access.
|
||||||
|
unsafe { std::env::set_var("TEST_CHANNELS_PARTIAL_TOKEN", "abc123") };
|
||||||
|
let v =
|
||||||
|
interpolate("https://hooks.example.com/services/${TEST_CHANNELS_PARTIAL_TOKEN}/end")
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(v, "https://hooks.example.com/services/abc123/end");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn slack_config_resolves_secrets() {
|
||||||
|
// SAFETY: single-threaded test process, no concurrent env access.
|
||||||
|
unsafe { std::env::set_var("TEST_SLACK_WEBHOOK", "https://hooks.slack.com/s/t/b/x") };
|
||||||
|
let cfg = SlackConfig {
|
||||||
|
webhook_url: "${TEST_SLACK_WEBHOOK}".to_string(),
|
||||||
|
channel: Some("#alerts".to_string()),
|
||||||
|
username: None,
|
||||||
|
};
|
||||||
|
let resolved = cfg.resolve_secrets().unwrap();
|
||||||
|
assert_eq!(resolved.webhook_url, "https://hooks.slack.com/s/t/b/x");
|
||||||
|
assert_eq!(resolved.channel.as_deref(), Some("#alerts"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn telegram_config_resolves_secrets() {
|
||||||
|
// SAFETY: single-threaded test process, no concurrent env access.
|
||||||
|
unsafe {
|
||||||
|
std::env::set_var("TEST_TG_TOKEN", "999:TOKEN");
|
||||||
|
std::env::set_var("TEST_TG_CHAT", "-100999");
|
||||||
|
}
|
||||||
|
let cfg = TelegramConfig {
|
||||||
|
bot_token: "${TEST_TG_TOKEN}".to_string(),
|
||||||
|
chat_id: "${TEST_TG_CHAT}".to_string(),
|
||||||
|
api_base: None,
|
||||||
|
};
|
||||||
|
let resolved = cfg.resolve_secrets().unwrap();
|
||||||
|
assert_eq!(resolved.bot_token, "999:TOKEN");
|
||||||
|
assert_eq!(resolved.chat_id, "-100999");
|
||||||
|
}
|
||||||
|
}
|
||||||
154
crates/vapora-channels/src/discord.rs
Normal file
154
crates/vapora-channels/src/discord.rs
Normal file
@ -0,0 +1,154 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
use reqwest::Client;
|
||||||
|
use serde_json::{json, Value};
|
||||||
|
use tracing::instrument;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
channel::NotificationChannel,
|
||||||
|
config::DiscordConfig,
|
||||||
|
error::{ChannelError, Result},
|
||||||
|
message::Message,
|
||||||
|
};
|
||||||
|
|
||||||
|
pub struct DiscordChannel {
|
||||||
|
name: String,
|
||||||
|
config: DiscordConfig,
|
||||||
|
client: Client,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl DiscordChannel {
|
||||||
|
pub fn new(name: impl Into<String>, config: DiscordConfig, client: Client) -> Self {
|
||||||
|
Self {
|
||||||
|
name: name.into(),
|
||||||
|
config,
|
||||||
|
client,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Builds the Discord webhook JSON payload from a message.
|
||||||
|
pub(crate) fn build_payload(
|
||||||
|
msg: &Message,
|
||||||
|
username_override: Option<&str>,
|
||||||
|
avatar_url: Option<&str>,
|
||||||
|
) -> Value {
|
||||||
|
let fields: Vec<Value> = msg
|
||||||
|
.metadata
|
||||||
|
.iter()
|
||||||
|
.map(|(k, v)| {
|
||||||
|
json!({
|
||||||
|
"name": k,
|
||||||
|
"value": v,
|
||||||
|
"inline": true
|
||||||
|
})
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
let mut payload = json!({
|
||||||
|
"embeds": [{
|
||||||
|
"title": msg.title,
|
||||||
|
"description": msg.body,
|
||||||
|
"color": msg.level.discord_color(),
|
||||||
|
"fields": fields,
|
||||||
|
"footer": { "text": "vapora" }
|
||||||
|
}]
|
||||||
|
});
|
||||||
|
|
||||||
|
if let Some(u) = username_override {
|
||||||
|
payload["username"] = json!(u);
|
||||||
|
}
|
||||||
|
if let Some(av) = avatar_url {
|
||||||
|
payload["avatar_url"] = json!(av);
|
||||||
|
}
|
||||||
|
|
||||||
|
payload
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl NotificationChannel for DiscordChannel {
|
||||||
|
fn name(&self) -> &str {
|
||||||
|
&self.name
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip(self, msg), fields(channel = %self.name))]
|
||||||
|
async fn send(&self, msg: &Message) -> Result<()> {
|
||||||
|
let payload = build_payload(
|
||||||
|
msg,
|
||||||
|
self.config.username.as_deref(),
|
||||||
|
self.config.avatar_url.as_deref(),
|
||||||
|
);
|
||||||
|
|
||||||
|
let resp = self
|
||||||
|
.client
|
||||||
|
.post(&self.config.webhook_url)
|
||||||
|
.json(&payload)
|
||||||
|
.send()
|
||||||
|
.await
|
||||||
|
.map_err(|e| ChannelError::HttpError {
|
||||||
|
channel: self.name.clone(),
|
||||||
|
source: e,
|
||||||
|
})?;
|
||||||
|
|
||||||
|
// Discord returns 204 No Content on success.
|
||||||
|
if !resp.status().is_success() {
|
||||||
|
let status = resp.status().as_u16();
|
||||||
|
let body = resp.text().await.unwrap_or_default();
|
||||||
|
return Err(ChannelError::ApiError {
|
||||||
|
channel: self.name.clone(),
|
||||||
|
status,
|
||||||
|
body,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn error_message_uses_red_discord_color() {
|
||||||
|
let msg = Message::error("Service down", "Health check failed");
|
||||||
|
let payload = build_payload(&msg, None, None);
|
||||||
|
assert_eq!(
|
||||||
|
payload["embeds"][0]["color"].as_u64().unwrap(),
|
||||||
|
0xcc0000_u64
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn success_message_uses_green_discord_color() {
|
||||||
|
let msg = Message::success("Deploy complete", "v1.2.0");
|
||||||
|
let payload = build_payload(&msg, None, None);
|
||||||
|
assert_eq!(
|
||||||
|
payload["embeds"][0]["color"].as_u64().unwrap(),
|
||||||
|
0x36a64f_u64
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn metadata_maps_to_inline_fields() {
|
||||||
|
let msg = Message::info("Test", "Body").with_metadata("region", "eu-west-1");
|
||||||
|
let payload = build_payload(&msg, None, None);
|
||||||
|
let fields = payload["embeds"][0]["fields"].as_array().unwrap();
|
||||||
|
assert_eq!(fields.len(), 1);
|
||||||
|
assert_eq!(fields[0]["inline"], json!(true));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn username_and_avatar_appear_at_top_level() {
|
||||||
|
let msg = Message::info("Test", "Body");
|
||||||
|
let payload = build_payload(
|
||||||
|
&msg,
|
||||||
|
Some("vapora-bot"),
|
||||||
|
Some("https://example.com/avatar.png"),
|
||||||
|
);
|
||||||
|
assert_eq!(payload["username"], json!("vapora-bot"));
|
||||||
|
assert_eq!(
|
||||||
|
payload["avatar_url"],
|
||||||
|
json!("https://example.com/avatar.png")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
31
crates/vapora-channels/src/error.rs
Normal file
31
crates/vapora-channels/src/error.rs
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
use thiserror::Error;
|
||||||
|
|
||||||
|
#[derive(Error, Debug)]
|
||||||
|
pub enum ChannelError {
|
||||||
|
#[error("HTTP request failed for channel '{channel}': {source}")]
|
||||||
|
HttpError {
|
||||||
|
channel: String,
|
||||||
|
#[source]
|
||||||
|
source: reqwest::Error,
|
||||||
|
},
|
||||||
|
|
||||||
|
#[error("Channel '{0}' not found in registry")]
|
||||||
|
NotFound(String),
|
||||||
|
|
||||||
|
#[error("Channel '{channel}' returned non-success status {status}: {body}")]
|
||||||
|
ApiError {
|
||||||
|
channel: String,
|
||||||
|
status: u16,
|
||||||
|
body: String,
|
||||||
|
},
|
||||||
|
|
||||||
|
#[error("Failed to build HTTP client: {0}")]
|
||||||
|
HttpClientBuild(String),
|
||||||
|
|
||||||
|
/// Raised when a `${VAR}` reference is present in config but the env var
|
||||||
|
/// is not set and no `:-default` was provided.
|
||||||
|
#[error("Secret reference '${{{{0}}}}' not resolved: env var not set and no default provided")]
|
||||||
|
SecretNotFound(String),
|
||||||
|
}
|
||||||
|
|
||||||
|
pub type Result<T> = std::result::Result<T, ChannelError>;
|
||||||
54
crates/vapora-channels/src/lib.rs
Normal file
54
crates/vapora-channels/src/lib.rs
Normal file
@ -0,0 +1,54 @@
|
|||||||
|
//! Outbound notification channels for VAPORA.
|
||||||
|
//!
|
||||||
|
//! Delivers workflow events and agent completion signals to external team
|
||||||
|
//! communication platforms. All three providers use HTTP webhooks / Bot API —
|
||||||
|
//! no vendor SDKs are required.
|
||||||
|
//!
|
||||||
|
//! # Supported Channels
|
||||||
|
//!
|
||||||
|
//! - **Slack** — Incoming Webhooks (POST JSON with `attachments`)
|
||||||
|
//! - **Discord** — Incoming Webhooks (POST JSON with `embeds`, 204 response)
|
||||||
|
//! - **Telegram** — Bot API `sendMessage` with HTML `parse_mode`
|
||||||
|
//!
|
||||||
|
//! # Quick Start
|
||||||
|
//!
|
||||||
|
//! ```toml
|
||||||
|
//! # vapora.toml (under your [channels] section)
|
||||||
|
//! [channels.team-slack]
|
||||||
|
//! type = "slack"
|
||||||
|
//! webhook_url = "https://hooks.slack.com/services/…"
|
||||||
|
//!
|
||||||
|
//! [channels.ops-discord]
|
||||||
|
//! type = "discord"
|
||||||
|
//! webhook_url = "https://discord.com/api/webhooks/…"
|
||||||
|
//!
|
||||||
|
//! [channels.alerts]
|
||||||
|
//! type = "telegram"
|
||||||
|
//! bot_token = "123456:ABC-DEF…"
|
||||||
|
//! chat_id = "-1001234567890"
|
||||||
|
//! ```
|
||||||
|
//!
|
||||||
|
//! ```rust,ignore
|
||||||
|
//! let config: ChannelsConfig = toml::from_str(toml_str)?;
|
||||||
|
//! let registry = ChannelRegistry::from_config(config)?;
|
||||||
|
//!
|
||||||
|
//! registry.send("team-slack", Message::success(
|
||||||
|
//! "Deploy complete",
|
||||||
|
//! "v1.2.0 is live on production",
|
||||||
|
//! )).await?;
|
||||||
|
//! ```
|
||||||
|
|
||||||
|
pub mod channel;
|
||||||
|
pub mod config;
|
||||||
|
pub mod discord;
|
||||||
|
pub mod error;
|
||||||
|
pub mod message;
|
||||||
|
pub mod registry;
|
||||||
|
pub mod slack;
|
||||||
|
pub mod telegram;
|
||||||
|
|
||||||
|
pub use channel::NotificationChannel;
|
||||||
|
pub use config::{ChannelConfig, ChannelsConfig, DiscordConfig, SlackConfig, TelegramConfig};
|
||||||
|
pub use error::{ChannelError, Result};
|
||||||
|
pub use message::{Message, MessageLevel};
|
||||||
|
pub use registry::ChannelRegistry;
|
||||||
127
crates/vapora-channels/src/message.rs
Normal file
127
crates/vapora-channels/src/message.rs
Normal file
@ -0,0 +1,127 @@
|
|||||||
|
use std::collections::HashMap;
|
||||||
|
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||||
|
#[serde(rename_all = "lowercase")]
|
||||||
|
pub enum MessageLevel {
|
||||||
|
Info,
|
||||||
|
Success,
|
||||||
|
Warning,
|
||||||
|
Error,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MessageLevel {
|
||||||
|
/// Slack attachment hex color string.
|
||||||
|
pub fn slack_color(self) -> &'static str {
|
||||||
|
match self {
|
||||||
|
Self::Info => "#0099ff",
|
||||||
|
Self::Success => "#36a64f",
|
||||||
|
Self::Warning => "#ffcc00",
|
||||||
|
Self::Error => "#cc0000",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Discord embed color as 0xRRGGBB integer.
|
||||||
|
pub fn discord_color(self) -> u32 {
|
||||||
|
match self {
|
||||||
|
Self::Info => 0x0099ff,
|
||||||
|
Self::Success => 0x36a64f,
|
||||||
|
Self::Warning => 0xffcc00,
|
||||||
|
Self::Error => 0xcc0000,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Unicode emoji prefix for plain-text formats.
|
||||||
|
pub fn emoji(self) -> &'static str {
|
||||||
|
match self {
|
||||||
|
Self::Info => "ℹ️",
|
||||||
|
Self::Success => "✅",
|
||||||
|
Self::Warning => "⚠️",
|
||||||
|
Self::Error => "🔴",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct Message {
|
||||||
|
pub title: String,
|
||||||
|
pub body: String,
|
||||||
|
pub level: MessageLevel,
|
||||||
|
#[serde(default)]
|
||||||
|
pub metadata: HashMap<String, String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Message {
|
||||||
|
pub fn new(title: impl Into<String>, body: impl Into<String>, level: MessageLevel) -> Self {
|
||||||
|
Self {
|
||||||
|
title: title.into(),
|
||||||
|
body: body.into(),
|
||||||
|
level,
|
||||||
|
metadata: HashMap::new(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn info(title: impl Into<String>, body: impl Into<String>) -> Self {
|
||||||
|
Self::new(title, body, MessageLevel::Info)
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn success(title: impl Into<String>, body: impl Into<String>) -> Self {
|
||||||
|
Self::new(title, body, MessageLevel::Success)
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn warning(title: impl Into<String>, body: impl Into<String>) -> Self {
|
||||||
|
Self::new(title, body, MessageLevel::Warning)
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn error(title: impl Into<String>, body: impl Into<String>) -> Self {
|
||||||
|
Self::new(title, body, MessageLevel::Error)
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn with_metadata(mut self, key: impl Into<String>, value: impl Into<String>) -> Self {
|
||||||
|
self.metadata.insert(key.into(), value.into());
|
||||||
|
self
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn level_colors_are_distinct() {
|
||||||
|
let levels = [
|
||||||
|
MessageLevel::Info,
|
||||||
|
MessageLevel::Success,
|
||||||
|
MessageLevel::Warning,
|
||||||
|
MessageLevel::Error,
|
||||||
|
];
|
||||||
|
let slack_colors: Vec<_> = levels.iter().map(|l| l.slack_color()).collect();
|
||||||
|
let discord_colors: Vec<_> = levels.iter().map(|l| l.discord_color()).collect();
|
||||||
|
// All four Slack colors are unique.
|
||||||
|
let mut deduped = slack_colors.clone();
|
||||||
|
deduped.dedup();
|
||||||
|
assert_eq!(deduped.len(), 4);
|
||||||
|
// All four Discord colors are unique.
|
||||||
|
let mut deduped = discord_colors.clone();
|
||||||
|
deduped.dedup();
|
||||||
|
assert_eq!(deduped.len(), 4);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn constructors_set_correct_level() {
|
||||||
|
assert_eq!(Message::info("t", "b").level, MessageLevel::Info);
|
||||||
|
assert_eq!(Message::success("t", "b").level, MessageLevel::Success);
|
||||||
|
assert_eq!(Message::warning("t", "b").level, MessageLevel::Warning);
|
||||||
|
assert_eq!(Message::error("t", "b").level, MessageLevel::Error);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn with_metadata_is_additive() {
|
||||||
|
let msg = Message::info("t", "b")
|
||||||
|
.with_metadata("env", "prod")
|
||||||
|
.with_metadata("region", "eu-west-1");
|
||||||
|
assert_eq!(msg.metadata.len(), 2);
|
||||||
|
assert_eq!(msg.metadata["env"], "prod");
|
||||||
|
}
|
||||||
|
}
|
||||||
122
crates/vapora-channels/src/registry.rs
Normal file
122
crates/vapora-channels/src/registry.rs
Normal file
@ -0,0 +1,122 @@
|
|||||||
|
use std::{collections::HashMap, sync::Arc, time::Duration};
|
||||||
|
|
||||||
|
use tracing::{error, instrument};
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
channel::NotificationChannel,
|
||||||
|
config::{ChannelConfig, ChannelsConfig},
|
||||||
|
discord::DiscordChannel,
|
||||||
|
error::{ChannelError, Result},
|
||||||
|
message::Message,
|
||||||
|
slack::SlackChannel,
|
||||||
|
telegram::TelegramChannel,
|
||||||
|
};
|
||||||
|
|
||||||
|
/// Routes outbound notifications to named channels.
|
||||||
|
///
|
||||||
|
/// Each channel is addressed by the name given in the config (e.g.
|
||||||
|
/// `"team-slack"`, `"ops-discord"`). `send` delivers to one channel;
|
||||||
|
/// `broadcast` fans out to all registered channels in parallel.
|
||||||
|
pub struct ChannelRegistry {
|
||||||
|
channels: HashMap<String, Arc<dyn NotificationChannel>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ChannelRegistry {
|
||||||
|
/// Creates an empty registry. Use `register` to add channels individually
|
||||||
|
/// (e.g. in tests with pre-built clients).
|
||||||
|
pub fn new() -> Self {
|
||||||
|
Self {
|
||||||
|
channels: HashMap::new(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Builds all channels from `config` with a shared reqwest::Client.
|
||||||
|
///
|
||||||
|
/// The client is configured with a 10 s timeout and a `vapora-channels`
|
||||||
|
/// User-Agent. Fails if the TLS backend cannot be initialised.
|
||||||
|
pub fn from_config(config: ChannelsConfig) -> Result<Self> {
|
||||||
|
let client = reqwest::Client::builder()
|
||||||
|
.timeout(Duration::from_secs(10))
|
||||||
|
.user_agent(concat!("vapora-channels/", env!("CARGO_PKG_VERSION")))
|
||||||
|
.build()
|
||||||
|
.map_err(|e| ChannelError::HttpClientBuild(e.to_string()))?;
|
||||||
|
|
||||||
|
// Resolve ${VAR} references in every channel's credential fields before
|
||||||
|
// constructing any HTTP client — this is the single mandatory call site.
|
||||||
|
let config = config.resolve_secrets()?;
|
||||||
|
|
||||||
|
let mut registry = Self::new();
|
||||||
|
for (name, ch_config) in config.channels {
|
||||||
|
let channel: Arc<dyn NotificationChannel> = match ch_config {
|
||||||
|
ChannelConfig::Slack(cfg) => {
|
||||||
|
Arc::new(SlackChannel::new(name.clone(), cfg, client.clone()))
|
||||||
|
}
|
||||||
|
ChannelConfig::Discord(cfg) => {
|
||||||
|
Arc::new(DiscordChannel::new(name.clone(), cfg, client.clone()))
|
||||||
|
}
|
||||||
|
ChannelConfig::Telegram(cfg) => {
|
||||||
|
Arc::new(TelegramChannel::new(name.clone(), cfg, client.clone()))
|
||||||
|
}
|
||||||
|
};
|
||||||
|
registry.channels.insert(name, channel);
|
||||||
|
}
|
||||||
|
Ok(registry)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Builds all channels from a flat map, creating a shared
|
||||||
|
/// `reqwest::Client`.
|
||||||
|
///
|
||||||
|
/// Equivalent to wrapping the map in `ChannelsConfig` and calling
|
||||||
|
/// `from_config`. Use this when you hold the channel entries directly
|
||||||
|
/// (e.g. from `WorkflowsConfig.channels`).
|
||||||
|
pub fn from_map(channels: std::collections::HashMap<String, ChannelConfig>) -> Result<Self> {
|
||||||
|
Self::from_config(ChannelsConfig { channels })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Registers an already-constructed channel implementation.
|
||||||
|
pub fn register(&mut self, channel: Arc<dyn NotificationChannel>) -> &mut Self {
|
||||||
|
self.channels.insert(channel.name().to_string(), channel);
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Sends `msg` to a single channel identified by `name`.
|
||||||
|
#[instrument(skip(self, msg), fields(channel = %name))]
|
||||||
|
pub async fn send(&self, name: &str, msg: Message) -> Result<()> {
|
||||||
|
let channel = self
|
||||||
|
.channels
|
||||||
|
.get(name)
|
||||||
|
.ok_or_else(|| ChannelError::NotFound(name.to_string()))?;
|
||||||
|
channel.send(&msg).await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Sends `msg` to every registered channel sequentially.
|
||||||
|
///
|
||||||
|
/// Returns a `Vec` of `(channel_name, Result)` — failures do not abort
|
||||||
|
/// delivery to remaining channels.
|
||||||
|
pub async fn broadcast(&self, msg: Message) -> Vec<(String, Result<()>)> {
|
||||||
|
let mut results = Vec::with_capacity(self.channels.len());
|
||||||
|
for (name, channel) in &self.channels {
|
||||||
|
let result = channel.send(&msg).await;
|
||||||
|
if let Err(ref e) = result {
|
||||||
|
error!(channel = %name, error = %e, "Broadcast delivery failed");
|
||||||
|
}
|
||||||
|
results.push((name.clone(), result));
|
||||||
|
}
|
||||||
|
results
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Returns the names of all registered channels.
|
||||||
|
pub fn channel_names(&self) -> Vec<&str> {
|
||||||
|
self.channels.keys().map(String::as_str).collect()
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn is_empty(&self) -> bool {
|
||||||
|
self.channels.is_empty()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Default for ChannelRegistry {
|
||||||
|
fn default() -> Self {
|
||||||
|
Self::new()
|
||||||
|
}
|
||||||
|
}
|
||||||
148
crates/vapora-channels/src/slack.rs
Normal file
148
crates/vapora-channels/src/slack.rs
Normal file
@ -0,0 +1,148 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
use reqwest::Client;
|
||||||
|
use serde_json::{json, Value};
|
||||||
|
use tracing::instrument;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
channel::NotificationChannel,
|
||||||
|
config::SlackConfig,
|
||||||
|
error::{ChannelError, Result},
|
||||||
|
message::Message,
|
||||||
|
};
|
||||||
|
|
||||||
|
pub struct SlackChannel {
|
||||||
|
name: String,
|
||||||
|
config: SlackConfig,
|
||||||
|
client: Client,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SlackChannel {
|
||||||
|
pub fn new(name: impl Into<String>, config: SlackConfig, client: Client) -> Self {
|
||||||
|
Self {
|
||||||
|
name: name.into(),
|
||||||
|
config,
|
||||||
|
client,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Builds the Slack webhook JSON payload from a message.
|
||||||
|
///
|
||||||
|
/// Extracted as a free function so payload shape can be unit-tested without
|
||||||
|
/// mocking HTTP.
|
||||||
|
pub(crate) fn build_payload(
|
||||||
|
msg: &Message,
|
||||||
|
channel_override: Option<&str>,
|
||||||
|
username_override: Option<&str>,
|
||||||
|
) -> Value {
|
||||||
|
let fields: Vec<Value> = msg
|
||||||
|
.metadata
|
||||||
|
.iter()
|
||||||
|
.map(|(k, v)| {
|
||||||
|
json!({
|
||||||
|
"title": k,
|
||||||
|
"value": v,
|
||||||
|
"short": true
|
||||||
|
})
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
let mut payload = json!({
|
||||||
|
"attachments": [{
|
||||||
|
"color": msg.level.slack_color(),
|
||||||
|
"title": msg.title,
|
||||||
|
"text": msg.body,
|
||||||
|
"footer": "vapora",
|
||||||
|
"fields": fields
|
||||||
|
}]
|
||||||
|
});
|
||||||
|
|
||||||
|
if let Some(ch) = channel_override {
|
||||||
|
payload["channel"] = json!(ch);
|
||||||
|
}
|
||||||
|
if let Some(u) = username_override {
|
||||||
|
payload["username"] = json!(u);
|
||||||
|
}
|
||||||
|
|
||||||
|
payload
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl NotificationChannel for SlackChannel {
|
||||||
|
fn name(&self) -> &str {
|
||||||
|
&self.name
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip(self, msg), fields(channel = %self.name))]
|
||||||
|
async fn send(&self, msg: &Message) -> Result<()> {
|
||||||
|
let payload = build_payload(
|
||||||
|
msg,
|
||||||
|
self.config.channel.as_deref(),
|
||||||
|
self.config.username.as_deref(),
|
||||||
|
);
|
||||||
|
|
||||||
|
let resp = self
|
||||||
|
.client
|
||||||
|
.post(&self.config.webhook_url)
|
||||||
|
.json(&payload)
|
||||||
|
.send()
|
||||||
|
.await
|
||||||
|
.map_err(|e| ChannelError::HttpError {
|
||||||
|
channel: self.name.clone(),
|
||||||
|
source: e,
|
||||||
|
})?;
|
||||||
|
|
||||||
|
if !resp.status().is_success() {
|
||||||
|
let status = resp.status().as_u16();
|
||||||
|
let body = resp.text().await.unwrap_or_default();
|
||||||
|
return Err(ChannelError::ApiError {
|
||||||
|
channel: self.name.clone(),
|
||||||
|
status,
|
||||||
|
body,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn error_message_uses_red_color() {
|
||||||
|
let msg = Message::error("Deploy failed", "Rollback triggered");
|
||||||
|
let payload = build_payload(&msg, None, None);
|
||||||
|
assert_eq!(
|
||||||
|
payload["attachments"][0]["color"].as_str().unwrap(),
|
||||||
|
"#cc0000"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn metadata_maps_to_fields_array() {
|
||||||
|
let msg = Message::info("Test", "Body")
|
||||||
|
.with_metadata("env", "production")
|
||||||
|
.with_metadata("version", "1.2.0");
|
||||||
|
let payload = build_payload(&msg, None, None);
|
||||||
|
let fields = payload["attachments"][0]["fields"].as_array().unwrap();
|
||||||
|
assert_eq!(fields.len(), 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn channel_override_appears_at_top_level() {
|
||||||
|
let msg = Message::info("Test", "Body");
|
||||||
|
let payload = build_payload(&msg, Some("#alerts"), Some("vapora-bot"));
|
||||||
|
assert_eq!(payload["channel"], json!("#alerts"));
|
||||||
|
assert_eq!(payload["username"], json!("vapora-bot"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn no_overrides_leaves_keys_absent() {
|
||||||
|
let msg = Message::info("Test", "Body");
|
||||||
|
let payload = build_payload(&msg, None, None);
|
||||||
|
assert!(payload.get("channel").is_none());
|
||||||
|
assert!(payload.get("username").is_none());
|
||||||
|
}
|
||||||
|
}
|
||||||
178
crates/vapora-channels/src/telegram.rs
Normal file
178
crates/vapora-channels/src/telegram.rs
Normal file
@ -0,0 +1,178 @@
|
|||||||
|
use async_trait::async_trait;
|
||||||
|
use reqwest::Client;
|
||||||
|
use serde_json::{json, Value};
|
||||||
|
use tracing::instrument;
|
||||||
|
|
||||||
|
use crate::{
|
||||||
|
channel::NotificationChannel,
|
||||||
|
config::TelegramConfig,
|
||||||
|
error::{ChannelError, Result},
|
||||||
|
message::Message,
|
||||||
|
};
|
||||||
|
|
||||||
|
pub struct TelegramChannel {
|
||||||
|
name: String,
|
||||||
|
config: TelegramConfig,
|
||||||
|
client: Client,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TelegramChannel {
|
||||||
|
pub fn new(name: impl Into<String>, config: TelegramConfig, client: Client) -> Self {
|
||||||
|
Self {
|
||||||
|
name: name.into(),
|
||||||
|
config,
|
||||||
|
client,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Escapes the three characters that have meaning in Telegram HTML mode.
|
||||||
|
fn html_escape(s: &str) -> String {
|
||||||
|
// Telegram HTML supports <b>, <i>, <code>, <pre>, <a>.
|
||||||
|
// Only &, < and > need escaping.
|
||||||
|
let mut out = String::with_capacity(s.len());
|
||||||
|
for ch in s.chars() {
|
||||||
|
match ch {
|
||||||
|
'&' => out.push_str("&"),
|
||||||
|
'<' => out.push_str("<"),
|
||||||
|
'>' => out.push_str(">"),
|
||||||
|
other => out.push(other),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
out
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Builds the Telegram sendMessage JSON body.
|
||||||
|
pub(crate) fn build_payload(msg: &Message, chat_id: &str) -> Value {
|
||||||
|
let mut text = format!(
|
||||||
|
"<b>{} {}</b>\n\n{}",
|
||||||
|
msg.level.emoji(),
|
||||||
|
html_escape(&msg.title),
|
||||||
|
html_escape(&msg.body),
|
||||||
|
);
|
||||||
|
|
||||||
|
if !msg.metadata.is_empty() {
|
||||||
|
text.push('\n');
|
||||||
|
for (k, v) in &msg.metadata {
|
||||||
|
text.push_str(&format!("\n<b>{}</b>: {}", html_escape(k), html_escape(v)));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
json!({
|
||||||
|
"chat_id": chat_id,
|
||||||
|
"text": text,
|
||||||
|
"parse_mode": "HTML"
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl NotificationChannel for TelegramChannel {
|
||||||
|
fn name(&self) -> &str {
|
||||||
|
&self.name
|
||||||
|
}
|
||||||
|
|
||||||
|
#[instrument(skip(self, msg), fields(channel = %self.name))]
|
||||||
|
async fn send(&self, msg: &Message) -> Result<()> {
|
||||||
|
let payload = build_payload(msg, &self.config.chat_id);
|
||||||
|
|
||||||
|
let resp = self
|
||||||
|
.client
|
||||||
|
.post(self.config.api_url())
|
||||||
|
.json(&payload)
|
||||||
|
.send()
|
||||||
|
.await
|
||||||
|
.map_err(|e| ChannelError::HttpError {
|
||||||
|
channel: self.name.clone(),
|
||||||
|
source: e,
|
||||||
|
})?;
|
||||||
|
|
||||||
|
if !resp.status().is_success() {
|
||||||
|
let status = resp.status().as_u16();
|
||||||
|
let body = resp.text().await.unwrap_or_default();
|
||||||
|
return Err(ChannelError::ApiError {
|
||||||
|
channel: self.name.clone(),
|
||||||
|
status,
|
||||||
|
body,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn html_escape_handles_all_three_special_chars() {
|
||||||
|
assert_eq!(html_escape("a < b & c > d"), "a < b & c > d");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn html_escape_is_noop_for_plain_text() {
|
||||||
|
let plain = "Deploy complete: v1.2.0 to production";
|
||||||
|
assert_eq!(html_escape(plain), plain);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn payload_uses_html_parse_mode() {
|
||||||
|
let msg = Message::info("Test", "Body");
|
||||||
|
let payload = build_payload(&msg, "-100123456");
|
||||||
|
assert_eq!(payload["parse_mode"], json!("HTML"));
|
||||||
|
assert_eq!(payload["chat_id"], json!("-100123456"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn payload_contains_emoji_and_bold_title() {
|
||||||
|
let msg = Message::error("Service down", "Health check failed");
|
||||||
|
let payload = build_payload(&msg, "-100");
|
||||||
|
let text = payload["text"].as_str().unwrap();
|
||||||
|
assert!(text.contains("🔴"));
|
||||||
|
assert!(text.contains("<b>"));
|
||||||
|
assert!(text.contains("Service down"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn payload_escapes_html_in_title_and_body() {
|
||||||
|
let msg = Message::warning("a < b", "x & y > z");
|
||||||
|
let payload = build_payload(&msg, "-100");
|
||||||
|
let text = payload["text"].as_str().unwrap();
|
||||||
|
assert!(text.contains("a < b"));
|
||||||
|
assert!(text.contains("x & y > z"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn metadata_appended_as_bold_key_value_lines() {
|
||||||
|
let msg = Message::info("Test", "Body").with_metadata("env", "prod");
|
||||||
|
let payload = build_payload(&msg, "-100");
|
||||||
|
let text = payload["text"].as_str().unwrap();
|
||||||
|
assert!(text.contains("<b>env</b>: prod"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn api_url_default_base() {
|
||||||
|
let cfg = TelegramConfig {
|
||||||
|
bot_token: "123:ABC".to_string(),
|
||||||
|
chat_id: "-100".to_string(),
|
||||||
|
api_base: None,
|
||||||
|
};
|
||||||
|
assert_eq!(
|
||||||
|
cfg.api_url(),
|
||||||
|
"https://api.telegram.org/bot123:ABC/sendMessage"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn api_url_custom_base_for_testing() {
|
||||||
|
let cfg = TelegramConfig {
|
||||||
|
bot_token: "123:ABC".to_string(),
|
||||||
|
chat_id: "-100".to_string(),
|
||||||
|
api_base: Some("http://localhost:8080".to_string()),
|
||||||
|
};
|
||||||
|
assert_eq!(
|
||||||
|
cfg.api_url(),
|
||||||
|
"http://localhost:8080/bot123:ABC/sendMessage"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
285
crates/vapora-channels/tests/integration.rs
Normal file
285
crates/vapora-channels/tests/integration.rs
Normal file
@ -0,0 +1,285 @@
|
|||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use vapora_channels::{
|
||||||
|
config::{DiscordConfig, SlackConfig, TelegramConfig},
|
||||||
|
discord::DiscordChannel,
|
||||||
|
error::ChannelError,
|
||||||
|
message::Message,
|
||||||
|
registry::ChannelRegistry,
|
||||||
|
slack::SlackChannel,
|
||||||
|
telegram::TelegramChannel,
|
||||||
|
NotificationChannel, Result,
|
||||||
|
};
|
||||||
|
use wiremock::{
|
||||||
|
matchers::{method, path},
|
||||||
|
Mock, MockServer, ResponseTemplate,
|
||||||
|
};
|
||||||
|
|
||||||
|
// ── Slack ─────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn slack_send_returns_ok_on_200() {
|
||||||
|
let server = MockServer::start().await;
|
||||||
|
|
||||||
|
Mock::given(method("POST"))
|
||||||
|
.and(path("/hooks/slack"))
|
||||||
|
.respond_with(ResponseTemplate::new(200).set_body_string("ok"))
|
||||||
|
.mount(&server)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let cfg = SlackConfig {
|
||||||
|
webhook_url: format!("{}/hooks/slack", server.uri()),
|
||||||
|
channel: None,
|
||||||
|
username: None,
|
||||||
|
};
|
||||||
|
let channel = SlackChannel::new("slack", cfg, reqwest::Client::new());
|
||||||
|
let msg = Message::success("Deploy complete", "v1.2.0 → production");
|
||||||
|
|
||||||
|
channel.send(&msg).await.expect("should succeed on 200");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn slack_send_returns_api_error_on_500() {
|
||||||
|
let server = MockServer::start().await;
|
||||||
|
|
||||||
|
Mock::given(method("POST"))
|
||||||
|
.and(path("/hooks/slack"))
|
||||||
|
.respond_with(ResponseTemplate::new(500).set_body_string("internal_error"))
|
||||||
|
.mount(&server)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let cfg = SlackConfig {
|
||||||
|
webhook_url: format!("{}/hooks/slack", server.uri()),
|
||||||
|
channel: None,
|
||||||
|
username: None,
|
||||||
|
};
|
||||||
|
let channel = SlackChannel::new("slack", cfg, reqwest::Client::new());
|
||||||
|
let err = channel
|
||||||
|
.send(&Message::info("Test", "Body"))
|
||||||
|
.await
|
||||||
|
.unwrap_err();
|
||||||
|
|
||||||
|
assert!(
|
||||||
|
matches!(
|
||||||
|
err,
|
||||||
|
ChannelError::ApiError {
|
||||||
|
status: 500,
|
||||||
|
ref body,
|
||||||
|
..
|
||||||
|
} if body == "internal_error"
|
||||||
|
),
|
||||||
|
"unexpected error variant: {err}"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Discord
|
||||||
|
// ───────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn discord_send_returns_ok_on_204() {
|
||||||
|
let server = MockServer::start().await;
|
||||||
|
|
||||||
|
Mock::given(method("POST"))
|
||||||
|
.and(path("/webhooks/discord"))
|
||||||
|
.respond_with(ResponseTemplate::new(204))
|
||||||
|
.mount(&server)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let cfg = DiscordConfig {
|
||||||
|
webhook_url: format!("{}/webhooks/discord", server.uri()),
|
||||||
|
username: None,
|
||||||
|
avatar_url: None,
|
||||||
|
};
|
||||||
|
let channel = DiscordChannel::new("discord", cfg, reqwest::Client::new());
|
||||||
|
|
||||||
|
channel
|
||||||
|
.send(&Message::warning("High latency", "p99 > 500 ms"))
|
||||||
|
.await
|
||||||
|
.expect("should succeed on 204");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn discord_send_returns_api_error_on_400() {
|
||||||
|
let server = MockServer::start().await;
|
||||||
|
|
||||||
|
Mock::given(method("POST"))
|
||||||
|
.and(path("/webhooks/discord"))
|
||||||
|
.respond_with(ResponseTemplate::new(400).set_body_string("{\"code\":50006}"))
|
||||||
|
.mount(&server)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let cfg = DiscordConfig {
|
||||||
|
webhook_url: format!("{}/webhooks/discord", server.uri()),
|
||||||
|
username: None,
|
||||||
|
avatar_url: None,
|
||||||
|
};
|
||||||
|
let channel = DiscordChannel::new("discord", cfg, reqwest::Client::new());
|
||||||
|
let err = channel
|
||||||
|
.send(&Message::info("Test", "Body"))
|
||||||
|
.await
|
||||||
|
.unwrap_err();
|
||||||
|
|
||||||
|
assert!(
|
||||||
|
matches!(err, ChannelError::ApiError { status: 400, .. }),
|
||||||
|
"unexpected error variant: {err}"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Telegram
|
||||||
|
// ──────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn telegram_send_returns_ok_on_200() {
|
||||||
|
let server = MockServer::start().await;
|
||||||
|
|
||||||
|
// Telegram returns {"ok": true, "result": {...}} with HTTP 200.
|
||||||
|
Mock::given(method("POST"))
|
||||||
|
.and(path("/botTEST_TOKEN/sendMessage"))
|
||||||
|
.respond_with(
|
||||||
|
ResponseTemplate::new(200).set_body_string(r#"{"ok":true,"result":{"message_id":1}}"#),
|
||||||
|
)
|
||||||
|
.mount(&server)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let cfg = TelegramConfig {
|
||||||
|
bot_token: "TEST_TOKEN".to_string(),
|
||||||
|
chat_id: "-100999".to_string(),
|
||||||
|
api_base: Some(server.uri()),
|
||||||
|
};
|
||||||
|
let channel = TelegramChannel::new("telegram", cfg, reqwest::Client::new());
|
||||||
|
|
||||||
|
channel
|
||||||
|
.send(&Message::error("Service down", "Critical alert"))
|
||||||
|
.await
|
||||||
|
.expect("should succeed on 200");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn telegram_send_returns_api_error_on_400() {
|
||||||
|
let server = MockServer::start().await;
|
||||||
|
|
||||||
|
Mock::given(method("POST"))
|
||||||
|
.and(path("/botBAD_TOKEN/sendMessage"))
|
||||||
|
.respond_with(
|
||||||
|
ResponseTemplate::new(400)
|
||||||
|
.set_body_string(r#"{"ok":false,"description":"Unauthorized"}"#),
|
||||||
|
)
|
||||||
|
.mount(&server)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let cfg = TelegramConfig {
|
||||||
|
bot_token: "BAD_TOKEN".to_string(),
|
||||||
|
chat_id: "-100".to_string(),
|
||||||
|
api_base: Some(server.uri()),
|
||||||
|
};
|
||||||
|
let channel = TelegramChannel::new("telegram", cfg, reqwest::Client::new());
|
||||||
|
let err = channel
|
||||||
|
.send(&Message::info("Test", "Body"))
|
||||||
|
.await
|
||||||
|
.unwrap_err();
|
||||||
|
|
||||||
|
assert!(
|
||||||
|
matches!(err, ChannelError::ApiError { status: 400, .. }),
|
||||||
|
"unexpected error variant: {err}"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Registry
|
||||||
|
// ──────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
struct AlwaysOkChannel {
|
||||||
|
name: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl NotificationChannel for AlwaysOkChannel {
|
||||||
|
fn name(&self) -> &str {
|
||||||
|
&self.name
|
||||||
|
}
|
||||||
|
async fn send(&self, _msg: &Message) -> Result<()> {
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
struct AlwaysFailChannel {
|
||||||
|
name: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl NotificationChannel for AlwaysFailChannel {
|
||||||
|
fn name(&self) -> &str {
|
||||||
|
&self.name
|
||||||
|
}
|
||||||
|
async fn send(&self, _msg: &Message) -> Result<()> {
|
||||||
|
Err(ChannelError::ApiError {
|
||||||
|
channel: self.name.clone(),
|
||||||
|
status: 503,
|
||||||
|
body: "unavailable".to_string(),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn registry_send_routes_to_named_channel() {
|
||||||
|
let mut registry = ChannelRegistry::new();
|
||||||
|
registry.register(Arc::new(AlwaysOkChannel {
|
||||||
|
name: "ok-channel".to_string(),
|
||||||
|
}));
|
||||||
|
|
||||||
|
registry
|
||||||
|
.send("ok-channel", Message::info("Test", "Body"))
|
||||||
|
.await
|
||||||
|
.expect("should route to ok-channel and succeed");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn registry_send_returns_not_found_for_unknown_channel() {
|
||||||
|
let registry = ChannelRegistry::new();
|
||||||
|
let err = registry
|
||||||
|
.send("does-not-exist", Message::info("Test", "Body"))
|
||||||
|
.await
|
||||||
|
.unwrap_err();
|
||||||
|
|
||||||
|
assert!(
|
||||||
|
matches!(err, ChannelError::NotFound(ref n) if n == "does-not-exist"),
|
||||||
|
"unexpected error: {err}"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn registry_broadcast_delivers_to_all_channels() {
|
||||||
|
let mut registry = ChannelRegistry::new();
|
||||||
|
registry.register(Arc::new(AlwaysOkChannel {
|
||||||
|
name: "ch-a".to_string(),
|
||||||
|
}));
|
||||||
|
registry.register(Arc::new(AlwaysOkChannel {
|
||||||
|
name: "ch-b".to_string(),
|
||||||
|
}));
|
||||||
|
|
||||||
|
let results = registry
|
||||||
|
.broadcast(Message::success("All systems green", ""))
|
||||||
|
.await;
|
||||||
|
|
||||||
|
assert_eq!(results.len(), 2);
|
||||||
|
assert!(results.iter().all(|(_, r)| r.is_ok()));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn registry_broadcast_continues_after_partial_failure() {
|
||||||
|
let mut registry = ChannelRegistry::new();
|
||||||
|
registry.register(Arc::new(AlwaysOkChannel {
|
||||||
|
name: "good".to_string(),
|
||||||
|
}));
|
||||||
|
registry.register(Arc::new(AlwaysFailChannel {
|
||||||
|
name: "bad".to_string(),
|
||||||
|
}));
|
||||||
|
|
||||||
|
let results = registry.broadcast(Message::info("Test", "Body")).await;
|
||||||
|
|
||||||
|
assert_eq!(results.len(), 2);
|
||||||
|
let ok_count = results.iter().filter(|(_, r)| r.is_ok()).count();
|
||||||
|
let err_count = results.iter().filter(|(_, r)| r.is_err()).count();
|
||||||
|
assert_eq!(ok_count, 1);
|
||||||
|
assert_eq!(err_count, 1);
|
||||||
|
}
|
||||||
@ -12,6 +12,7 @@ categories.workspace = true
|
|||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
vapora-shared = { workspace = true }
|
vapora-shared = { workspace = true }
|
||||||
|
vapora-channels = { workspace = true }
|
||||||
vapora-swarm = { workspace = true }
|
vapora-swarm = { workspace = true }
|
||||||
vapora-agents = { workspace = true }
|
vapora-agents = { workspace = true }
|
||||||
vapora-knowledge-graph = { workspace = true }
|
vapora-knowledge-graph = { workspace = true }
|
||||||
@ -50,6 +51,10 @@ surrealdb = { workspace = true }
|
|||||||
# Authorization
|
# Authorization
|
||||||
cedar-policy = "4.9"
|
cedar-policy = "4.9"
|
||||||
|
|
||||||
|
# Scheduling
|
||||||
|
cron = "0.12"
|
||||||
|
chrono-tz = "0.10"
|
||||||
|
|
||||||
[dev-dependencies]
|
[dev-dependencies]
|
||||||
mockall = { workspace = true }
|
mockall = { workspace = true }
|
||||||
wiremock = { workspace = true }
|
wiremock = { workspace = true }
|
||||||
|
|||||||
@ -1,6 +1,8 @@
|
|||||||
use std::path::Path;
|
use std::path::Path;
|
||||||
|
use std::str::FromStr;
|
||||||
|
|
||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
|
use vapora_channels::config::ChannelConfig;
|
||||||
|
|
||||||
use crate::error::{ConfigError, Result};
|
use crate::error::{ConfigError, Result};
|
||||||
|
|
||||||
@ -8,6 +10,10 @@ use crate::error::{ConfigError, Result};
|
|||||||
pub struct WorkflowsConfig {
|
pub struct WorkflowsConfig {
|
||||||
pub engine: EngineConfig,
|
pub engine: EngineConfig,
|
||||||
pub workflows: Vec<WorkflowConfig>,
|
pub workflows: Vec<WorkflowConfig>,
|
||||||
|
/// Outbound notification channels keyed by name. Absent from TOML → no
|
||||||
|
/// notifications sent. Each entry becomes a channel in `ChannelRegistry`.
|
||||||
|
#[serde(default)]
|
||||||
|
pub channels: std::collections::HashMap<String, ChannelConfig>,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, Deserialize)]
|
#[derive(Debug, Clone, Deserialize)]
|
||||||
@ -19,11 +25,62 @@ pub struct EngineConfig {
|
|||||||
pub cedar_policy_dir: Option<String>,
|
pub cedar_policy_dir: Option<String>,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Per-workflow notification targets, keyed by event type.
|
||||||
|
///
|
||||||
|
/// Each field is a list of channel names registered in `[channels]`.
|
||||||
|
///
|
||||||
|
/// ```toml
|
||||||
|
/// [[workflows]]
|
||||||
|
/// name = "deploy-prod"
|
||||||
|
/// trigger = "schedule"
|
||||||
|
///
|
||||||
|
/// [workflows.notifications]
|
||||||
|
/// on_completed = ["team-slack"]
|
||||||
|
/// on_failed = ["team-slack", "ops-telegram"]
|
||||||
|
/// on_approval_required = ["team-slack"]
|
||||||
|
/// ```
|
||||||
|
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
|
||||||
|
pub struct WorkflowNotifications {
|
||||||
|
#[serde(default)]
|
||||||
|
pub on_completed: Vec<String>,
|
||||||
|
#[serde(default)]
|
||||||
|
pub on_failed: Vec<String>,
|
||||||
|
#[serde(default)]
|
||||||
|
pub on_approval_required: Vec<String>,
|
||||||
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, Deserialize)]
|
#[derive(Debug, Clone, Deserialize)]
|
||||||
pub struct WorkflowConfig {
|
pub struct WorkflowConfig {
|
||||||
pub name: String,
|
pub name: String,
|
||||||
pub trigger: String,
|
pub trigger: String,
|
||||||
pub stages: Vec<StageConfig>,
|
pub stages: Vec<StageConfig>,
|
||||||
|
#[serde(default)]
|
||||||
|
pub schedule: Option<ScheduleConfig>,
|
||||||
|
#[serde(default)]
|
||||||
|
pub notifications: WorkflowNotifications,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Cron-based scheduling configuration for `trigger = "schedule"` workflows.
|
||||||
|
#[derive(Debug, Clone, Deserialize)]
|
||||||
|
pub struct ScheduleConfig {
|
||||||
|
/// 5-field standard cron (`min hour dom month dow`) or 7-field cron crate
|
||||||
|
/// format (`sec min hour dom month dow year`).
|
||||||
|
pub cron: String,
|
||||||
|
/// IANA timezone identifier for cron evaluation (e.g.
|
||||||
|
/// `"America/New_York"`). Defaults to UTC when absent.
|
||||||
|
#[serde(default)]
|
||||||
|
pub timezone: Option<String>,
|
||||||
|
/// Allow a new instance to start while a previous one is still running.
|
||||||
|
#[serde(default)]
|
||||||
|
pub allow_concurrent: bool,
|
||||||
|
/// Fire all missed slots on restart (capped at 10). When false, only the
|
||||||
|
/// next slot is fired and missed slots are discarded.
|
||||||
|
#[serde(default)]
|
||||||
|
pub catch_up: bool,
|
||||||
|
/// Optional JSON object merged into each triggered workflow's initial
|
||||||
|
/// context.
|
||||||
|
#[serde(default)]
|
||||||
|
pub initial_context: Option<serde_json::Value>,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
@ -52,7 +109,7 @@ impl WorkflowsConfig {
|
|||||||
Ok(config)
|
Ok(config)
|
||||||
}
|
}
|
||||||
|
|
||||||
fn validate(&self) -> Result<()> {
|
pub fn validate(&self) -> Result<()> {
|
||||||
if self.workflows.is_empty() {
|
if self.workflows.is_empty() {
|
||||||
return Err(ConfigError::Invalid("No workflows defined".to_string()).into());
|
return Err(ConfigError::Invalid("No workflows defined".to_string()).into());
|
||||||
}
|
}
|
||||||
@ -75,6 +132,13 @@ impl WorkflowsConfig {
|
|||||||
.into());
|
.into());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Trigger/schedule alignment: "schedule" requires a [schedule] block
|
||||||
|
// and a parseable cron expression (fail-fast rather than silently
|
||||||
|
// skipping at runtime).
|
||||||
|
if workflow.trigger == "schedule" {
|
||||||
|
validate_schedule_config(&workflow.name, &workflow.schedule)?;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
@ -85,6 +149,44 @@ impl WorkflowsConfig {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn validate_schedule_config(
|
||||||
|
workflow_name: &str,
|
||||||
|
schedule: &Option<ScheduleConfig>,
|
||||||
|
) -> std::result::Result<(), ConfigError> {
|
||||||
|
let sc = schedule.as_ref().ok_or_else(|| {
|
||||||
|
ConfigError::Invalid(format!(
|
||||||
|
"Workflow '{}' has trigger = \"schedule\" but no [schedule] block",
|
||||||
|
workflow_name
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
// Validate cron expression (inline normalisation avoids circular dep with
|
||||||
|
// schedule.rs).
|
||||||
|
let normalized = match sc.cron.split_whitespace().count() {
|
||||||
|
5 => format!("0 {} *", sc.cron),
|
||||||
|
6 => format!("{} *", sc.cron),
|
||||||
|
_ => sc.cron.clone(),
|
||||||
|
};
|
||||||
|
cron::Schedule::from_str(&normalized).map_err(|e| {
|
||||||
|
ConfigError::Invalid(format!(
|
||||||
|
"Workflow '{}' has invalid cron '{}': {}",
|
||||||
|
workflow_name, sc.cron, e
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
// Validate timezone when provided.
|
||||||
|
if let Some(tz) = &sc.timezone {
|
||||||
|
tz.parse::<chrono_tz::Tz>().map_err(|_| {
|
||||||
|
ConfigError::Invalid(format!(
|
||||||
|
"Workflow '{}' has invalid timezone '{}': not a valid IANA identifier",
|
||||||
|
workflow_name, tz
|
||||||
|
))
|
||||||
|
})?;
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
@ -136,6 +238,7 @@ approval_required = false
|
|||||||
cedar_policy_dir: None,
|
cedar_policy_dir: None,
|
||||||
},
|
},
|
||||||
workflows: vec![],
|
workflows: vec![],
|
||||||
|
channels: std::collections::HashMap::new(),
|
||||||
};
|
};
|
||||||
|
|
||||||
assert!(config.validate().is_err());
|
assert!(config.validate().is_err());
|
||||||
@ -190,4 +293,89 @@ agents = ["agent2"]
|
|||||||
assert!(config.get_workflow("workflow_b").is_some());
|
assert!(config.get_workflow("workflow_b").is_some());
|
||||||
assert!(config.get_workflow("nonexistent").is_none());
|
assert!(config.get_workflow("nonexistent").is_none());
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_schedule_trigger_valid() {
|
||||||
|
let toml_str = r#"
|
||||||
|
[engine]
|
||||||
|
max_parallel_tasks = 4
|
||||||
|
workflow_timeout = 3600
|
||||||
|
approval_gates_enabled = false
|
||||||
|
|
||||||
|
[[workflows]]
|
||||||
|
name = "nightly_analysis"
|
||||||
|
trigger = "schedule"
|
||||||
|
|
||||||
|
[workflows.schedule]
|
||||||
|
cron = "0 2 * * *"
|
||||||
|
allow_concurrent = false
|
||||||
|
catch_up = false
|
||||||
|
|
||||||
|
[[workflows.stages]]
|
||||||
|
name = "analyze"
|
||||||
|
agents = ["analyst"]
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let config: WorkflowsConfig = toml::from_str(toml_str).unwrap();
|
||||||
|
assert!(
|
||||||
|
config.validate().is_ok(),
|
||||||
|
"Valid schedule config should pass"
|
||||||
|
);
|
||||||
|
let wf = config.get_workflow("nightly_analysis").unwrap();
|
||||||
|
let sc = wf.schedule.as_ref().unwrap();
|
||||||
|
assert_eq!(sc.cron, "0 2 * * *");
|
||||||
|
assert!(!sc.allow_concurrent);
|
||||||
|
assert!(!sc.catch_up);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_schedule_trigger_missing_block() {
|
||||||
|
let toml_str = r#"
|
||||||
|
[engine]
|
||||||
|
max_parallel_tasks = 4
|
||||||
|
workflow_timeout = 3600
|
||||||
|
approval_gates_enabled = false
|
||||||
|
|
||||||
|
[[workflows]]
|
||||||
|
name = "no_schedule_block"
|
||||||
|
trigger = "schedule"
|
||||||
|
|
||||||
|
[[workflows.stages]]
|
||||||
|
name = "work"
|
||||||
|
agents = ["worker"]
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let config: WorkflowsConfig = toml::from_str(toml_str).unwrap();
|
||||||
|
assert!(
|
||||||
|
config.validate().is_err(),
|
||||||
|
"schedule trigger without [schedule] block must fail validation"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_schedule_trigger_invalid_cron() {
|
||||||
|
let toml_str = r#"
|
||||||
|
[engine]
|
||||||
|
max_parallel_tasks = 4
|
||||||
|
workflow_timeout = 3600
|
||||||
|
approval_gates_enabled = false
|
||||||
|
|
||||||
|
[[workflows]]
|
||||||
|
name = "bad_cron"
|
||||||
|
trigger = "schedule"
|
||||||
|
|
||||||
|
[workflows.schedule]
|
||||||
|
cron = "not a valid cron expression"
|
||||||
|
|
||||||
|
[[workflows.stages]]
|
||||||
|
name = "work"
|
||||||
|
agents = ["worker"]
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let config: WorkflowsConfig = toml::from_str(toml_str).unwrap();
|
||||||
|
assert!(
|
||||||
|
config.validate().is_err(),
|
||||||
|
"Invalid cron expression must fail validation"
|
||||||
|
);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -49,6 +49,18 @@ pub enum WorkflowError {
|
|||||||
|
|
||||||
#[error("Internal error: {0}")]
|
#[error("Internal error: {0}")]
|
||||||
Internal(String),
|
Internal(String),
|
||||||
|
|
||||||
|
#[error("Schedule error: {0}")]
|
||||||
|
Schedule(#[from] ScheduleError),
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Error, Debug)]
|
||||||
|
pub enum ScheduleError {
|
||||||
|
#[error("Invalid cron expression '{expr}': {reason}")]
|
||||||
|
InvalidCron { expr: String, reason: String },
|
||||||
|
|
||||||
|
#[error("Schedule not found: {0}")]
|
||||||
|
NotFound(String),
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Error, Debug)]
|
#[derive(Error, Debug)]
|
||||||
|
|||||||
@ -226,6 +226,8 @@ mod tests {
|
|||||||
compensation_agents: None,
|
compensation_agents: None,
|
||||||
},
|
},
|
||||||
],
|
],
|
||||||
|
schedule: None,
|
||||||
|
notifications: Default::default(),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -31,15 +31,25 @@ pub mod metrics;
|
|||||||
pub mod orchestrator;
|
pub mod orchestrator;
|
||||||
pub mod persistence;
|
pub mod persistence;
|
||||||
pub mod saga;
|
pub mod saga;
|
||||||
|
pub mod schedule;
|
||||||
|
pub mod schedule_store;
|
||||||
|
pub mod scheduler;
|
||||||
pub mod stage;
|
pub mod stage;
|
||||||
|
|
||||||
pub use artifact::{Artifact, ArtifactType};
|
pub use artifact::{Artifact, ArtifactType};
|
||||||
pub use auth::CedarAuthorizer;
|
pub use auth::CedarAuthorizer;
|
||||||
pub use config::{EngineConfig, StageConfig, WorkflowConfig, WorkflowsConfig};
|
pub use config::{EngineConfig, ScheduleConfig, StageConfig, WorkflowConfig, WorkflowsConfig};
|
||||||
pub use error::{ConfigError, Result, WorkflowError};
|
pub use error::{ConfigError, Result, ScheduleError, WorkflowError};
|
||||||
pub use instance::{WorkflowInstance, WorkflowStatus};
|
pub use instance::{WorkflowInstance, WorkflowStatus};
|
||||||
pub use metrics::WorkflowMetrics;
|
pub use metrics::WorkflowMetrics;
|
||||||
pub use orchestrator::WorkflowOrchestrator;
|
pub use orchestrator::WorkflowOrchestrator;
|
||||||
pub use persistence::SurrealWorkflowStore;
|
pub use persistence::SurrealWorkflowStore;
|
||||||
pub use saga::SagaCompensator;
|
pub use saga::SagaCompensator;
|
||||||
|
pub use schedule::{
|
||||||
|
compute_next_fire_after, compute_next_fire_after_tz, compute_next_fire_at,
|
||||||
|
compute_next_fire_at_tz, validate_cron_expression, validate_timezone, RunStatus, ScheduleRun,
|
||||||
|
ScheduledWorkflow,
|
||||||
|
};
|
||||||
|
pub use schedule_store::ScheduleStore;
|
||||||
|
pub use scheduler::WorkflowScheduler;
|
||||||
pub use stage::{StageState, StageStatus, TaskState, TaskStatus};
|
pub use stage::{StageState, StageStatus, TaskState, TaskStatus};
|
||||||
|
|||||||
@ -10,6 +10,11 @@ pub struct WorkflowMetrics {
|
|||||||
pub active_workflows: IntGauge,
|
pub active_workflows: IntGauge,
|
||||||
pub stage_duration_seconds: Histogram,
|
pub stage_duration_seconds: Histogram,
|
||||||
pub workflow_duration_seconds: Histogram,
|
pub workflow_duration_seconds: Histogram,
|
||||||
|
// Scheduling subsystem
|
||||||
|
pub schedules_fired: Counter,
|
||||||
|
pub schedules_skipped: Counter,
|
||||||
|
pub schedules_failed: Counter,
|
||||||
|
pub active_schedules: IntGauge,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl WorkflowMetrics {
|
impl WorkflowMetrics {
|
||||||
@ -46,6 +51,22 @@ impl WorkflowMetrics {
|
|||||||
)
|
)
|
||||||
.buckets(vec![60.0, 300.0, 600.0, 1800.0, 3600.0]),
|
.buckets(vec![60.0, 300.0, 600.0, 1800.0, 3600.0]),
|
||||||
)?,
|
)?,
|
||||||
|
schedules_fired: register_counter!(
|
||||||
|
"vapora_schedules_fired_total",
|
||||||
|
"Total schedule fires that launched a workflow"
|
||||||
|
)?,
|
||||||
|
schedules_skipped: register_counter!(
|
||||||
|
"vapora_schedules_skipped_total",
|
||||||
|
"Total schedule fires skipped due to allow_concurrent=false"
|
||||||
|
)?,
|
||||||
|
schedules_failed: register_counter!(
|
||||||
|
"vapora_schedules_failed_total",
|
||||||
|
"Total schedule fires that failed to start a workflow"
|
||||||
|
)?,
|
||||||
|
active_schedules: register_int_gauge!(
|
||||||
|
"vapora_active_schedules",
|
||||||
|
"Number of enabled scheduled workflows"
|
||||||
|
)?,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -57,6 +78,10 @@ impl WorkflowMetrics {
|
|||||||
registry.register(Box::new(self.active_workflows.clone()))?;
|
registry.register(Box::new(self.active_workflows.clone()))?;
|
||||||
registry.register(Box::new(self.stage_duration_seconds.clone()))?;
|
registry.register(Box::new(self.stage_duration_seconds.clone()))?;
|
||||||
registry.register(Box::new(self.workflow_duration_seconds.clone()))?;
|
registry.register(Box::new(self.workflow_duration_seconds.clone()))?;
|
||||||
|
registry.register(Box::new(self.schedules_fired.clone()))?;
|
||||||
|
registry.register(Box::new(self.schedules_skipped.clone()))?;
|
||||||
|
registry.register(Box::new(self.schedules_failed.clone()))?;
|
||||||
|
registry.register(Box::new(self.active_schedules.clone()))?;
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -6,8 +6,10 @@ use futures::StreamExt;
|
|||||||
use serde_json::Value;
|
use serde_json::Value;
|
||||||
use surrealdb::engine::remote::ws::Client;
|
use surrealdb::engine::remote::ws::Client;
|
||||||
use surrealdb::Surreal;
|
use surrealdb::Surreal;
|
||||||
|
use tokio::sync::watch;
|
||||||
use tracing::{debug, error, info, warn};
|
use tracing::{debug, error, info, warn};
|
||||||
use vapora_agents::messages::{AgentMessage, TaskCompleted, TaskFailed};
|
use vapora_agents::messages::{AgentMessage, TaskCompleted, TaskFailed};
|
||||||
|
use vapora_channels::{ChannelRegistry, Message};
|
||||||
use vapora_knowledge_graph::persistence::KGPersistence;
|
use vapora_knowledge_graph::persistence::KGPersistence;
|
||||||
use vapora_swarm::coordinator::SwarmCoordinator;
|
use vapora_swarm::coordinator::SwarmCoordinator;
|
||||||
|
|
||||||
@ -19,6 +21,9 @@ use crate::instance::{WorkflowInstance, WorkflowStatus};
|
|||||||
use crate::metrics::WorkflowMetrics;
|
use crate::metrics::WorkflowMetrics;
|
||||||
use crate::persistence::SurrealWorkflowStore;
|
use crate::persistence::SurrealWorkflowStore;
|
||||||
use crate::saga::SagaCompensator;
|
use crate::saga::SagaCompensator;
|
||||||
|
use crate::schedule::ScheduledWorkflow;
|
||||||
|
use crate::schedule_store::ScheduleStore;
|
||||||
|
use crate::scheduler::WorkflowScheduler;
|
||||||
use crate::stage::{StageState, StageStatus, TaskState};
|
use crate::stage::{StageState, StageStatus, TaskState};
|
||||||
|
|
||||||
pub struct WorkflowOrchestrator {
|
pub struct WorkflowOrchestrator {
|
||||||
@ -32,6 +37,8 @@ pub struct WorkflowOrchestrator {
|
|||||||
store: Arc<SurrealWorkflowStore>,
|
store: Arc<SurrealWorkflowStore>,
|
||||||
saga: SagaCompensator,
|
saga: SagaCompensator,
|
||||||
cedar: Option<Arc<CedarAuthorizer>>,
|
cedar: Option<Arc<CedarAuthorizer>>,
|
||||||
|
/// Outbound notification registry. `None` when no channels are configured.
|
||||||
|
channels: Option<Arc<ChannelRegistry>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl WorkflowOrchestrator {
|
impl WorkflowOrchestrator {
|
||||||
@ -64,6 +71,16 @@ impl WorkflowOrchestrator {
|
|||||||
info!("Cedar authorization enabled for workflow stages");
|
info!("Cedar authorization enabled for workflow stages");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
let channels = if config.channels.is_empty() {
|
||||||
|
None
|
||||||
|
} else {
|
||||||
|
let count = config.channels.len();
|
||||||
|
let registry = ChannelRegistry::from_map(config.channels.clone())
|
||||||
|
.map_err(|e| WorkflowError::Internal(format!("Channel registry init: {e}")))?;
|
||||||
|
info!(count, "Notification channels registered");
|
||||||
|
Some(Arc::new(registry))
|
||||||
|
};
|
||||||
|
|
||||||
// Crash recovery: restore active workflows from DB
|
// Crash recovery: restore active workflows from DB
|
||||||
let active_workflows = DashMap::new();
|
let active_workflows = DashMap::new();
|
||||||
match store.load_active().await {
|
match store.load_active().await {
|
||||||
@ -92,6 +109,7 @@ impl WorkflowOrchestrator {
|
|||||||
store,
|
store,
|
||||||
saga,
|
saga,
|
||||||
cedar,
|
cedar,
|
||||||
|
channels,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -384,13 +402,16 @@ impl WorkflowOrchestrator {
|
|||||||
if should_continue {
|
if should_continue {
|
||||||
self.execute_current_stage(workflow_id).await?;
|
self.execute_current_stage(workflow_id).await?;
|
||||||
} else {
|
} else {
|
||||||
let duration = {
|
let (duration, template_name) = {
|
||||||
let instance = self
|
let instance = self
|
||||||
.active_workflows
|
.active_workflows
|
||||||
.get(workflow_id)
|
.get(workflow_id)
|
||||||
.ok_or_else(|| WorkflowError::WorkflowNotFound(workflow_id.to_string()))?;
|
.ok_or_else(|| WorkflowError::WorkflowNotFound(workflow_id.to_string()))?;
|
||||||
|
|
||||||
(Utc::now() - instance.created_at).num_seconds() as f64
|
(
|
||||||
|
(Utc::now() - instance.created_at).num_seconds() as f64,
|
||||||
|
instance.template_name.clone(),
|
||||||
|
)
|
||||||
};
|
};
|
||||||
|
|
||||||
self.metrics.workflow_duration_seconds.observe(duration);
|
self.metrics.workflow_duration_seconds.observe(duration);
|
||||||
@ -404,6 +425,8 @@ impl WorkflowOrchestrator {
|
|||||||
);
|
);
|
||||||
|
|
||||||
self.publish_workflow_completed(workflow_id).await?;
|
self.publish_workflow_completed(workflow_id).await?;
|
||||||
|
self.notify_workflow_completed(workflow_id, &template_name, duration)
|
||||||
|
.await;
|
||||||
|
|
||||||
// Remove from DB — terminal state is cleaned up
|
// Remove from DB — terminal state is cleaned up
|
||||||
if let Err(e) = self.store.delete(workflow_id).await {
|
if let Err(e) = self.store.delete(workflow_id).await {
|
||||||
@ -546,7 +569,7 @@ impl WorkflowOrchestrator {
|
|||||||
// `mark_current_task_failed` encapsulates the mutable stage borrow so
|
// `mark_current_task_failed` encapsulates the mutable stage borrow so
|
||||||
// the DashMap entry can be re-accessed without nesting or borrow
|
// the DashMap entry can be re-accessed without nesting or borrow
|
||||||
// conflicts.
|
// conflicts.
|
||||||
let compensation_data: Option<(Vec<StageState>, Value, String)> = {
|
let compensation_data: Option<(Vec<StageState>, Value, String, String)> = {
|
||||||
let mut instance = self
|
let mut instance = self
|
||||||
.active_workflows
|
.active_workflows
|
||||||
.get_mut(&workflow_id)
|
.get_mut(&workflow_id)
|
||||||
@ -572,6 +595,7 @@ impl WorkflowOrchestrator {
|
|||||||
let current_idx = instance.current_stage_idx;
|
let current_idx = instance.current_stage_idx;
|
||||||
let executed_stages = instance.stages[..current_idx].to_vec();
|
let executed_stages = instance.stages[..current_idx].to_vec();
|
||||||
let context = instance.initial_context.clone();
|
let context = instance.initial_context.clone();
|
||||||
|
let template_name = instance.template_name.clone();
|
||||||
|
|
||||||
instance.fail(format!("Stage {} failed: {}", stage_name, msg.error));
|
instance.fail(format!("Stage {} failed: {}", stage_name, msg.error));
|
||||||
|
|
||||||
@ -585,12 +609,12 @@ impl WorkflowOrchestrator {
|
|||||||
"Workflow failed"
|
"Workflow failed"
|
||||||
);
|
);
|
||||||
|
|
||||||
Some((executed_stages, context, stage_name))
|
Some((executed_stages, context, stage_name, template_name))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}; // DashMap lock released here
|
}; // DashMap lock released here
|
||||||
|
|
||||||
if let Some((executed_stages, context, _stage_name)) = compensation_data {
|
if let Some((executed_stages, context, stage_name, template_name)) = compensation_data {
|
||||||
// Saga compensation: dispatch rollback tasks in reverse order (best-effort)
|
// Saga compensation: dispatch rollback tasks in reverse order (best-effort)
|
||||||
self.saga
|
self.saga
|
||||||
.compensate(&workflow_id, &executed_stages, &context)
|
.compensate(&workflow_id, &executed_stages, &context)
|
||||||
@ -600,6 +624,9 @@ impl WorkflowOrchestrator {
|
|||||||
if let Some(instance) = self.active_workflows.get(&workflow_id) {
|
if let Some(instance) = self.active_workflows.get(&workflow_id) {
|
||||||
self.store.save(instance.value()).await?;
|
self.store.save(instance.value()).await?;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
self.notify_workflow_failed(&workflow_id, &template_name, &stage_name, &msg.error)
|
||||||
|
.await;
|
||||||
}
|
}
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
@ -627,6 +654,15 @@ impl WorkflowOrchestrator {
|
|||||||
"Approval request published"
|
"Approval request published"
|
||||||
);
|
);
|
||||||
|
|
||||||
|
let template_name = self
|
||||||
|
.active_workflows
|
||||||
|
.get(workflow_id)
|
||||||
|
.map(|e| e.template_name.clone())
|
||||||
|
.unwrap_or_default();
|
||||||
|
|
||||||
|
self.notify_approval_required(workflow_id, &template_name, stage_name)
|
||||||
|
.await;
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -653,6 +689,156 @@ impl WorkflowOrchestrator {
|
|||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Returns true if any non-terminal workflow instance is running for the
|
||||||
|
/// given template name. Used by `WorkflowScheduler` to enforce
|
||||||
|
/// `allow_concurrent = false`.
|
||||||
|
pub fn has_active_workflow_for_template(&self, template_name: &str) -> bool {
|
||||||
|
self.active_workflows
|
||||||
|
.iter()
|
||||||
|
.any(|e| e.value().template_name == template_name)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Build a `WorkflowScheduler` seeded from the TOML-configured scheduled
|
||||||
|
/// workflows.
|
||||||
|
///
|
||||||
|
/// TOML is the source of truth for static config; the DB owns runtime state
|
||||||
|
/// (`last_fired_at`, `runs_count`, `next_fire_at`). The returned
|
||||||
|
/// `watch::Sender<bool>` drives graceful shutdown: `sender.send(true)`
|
||||||
|
/// terminates the scheduler loop.
|
||||||
|
pub async fn build_scheduler(
|
||||||
|
self: Arc<Self>,
|
||||||
|
db: Arc<Surreal<Client>>,
|
||||||
|
) -> Result<(WorkflowScheduler, watch::Sender<bool>)> {
|
||||||
|
let store = Arc::new(ScheduleStore::new(Arc::clone(&db)));
|
||||||
|
|
||||||
|
for wf in self
|
||||||
|
.config
|
||||||
|
.workflows
|
||||||
|
.iter()
|
||||||
|
.filter(|w| w.trigger == "schedule")
|
||||||
|
{
|
||||||
|
if let Some(sc) = &wf.schedule {
|
||||||
|
let entry = ScheduledWorkflow::from_config(&wf.name, sc);
|
||||||
|
store.upsert(&entry).await?;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let (shutdown_tx, shutdown_rx) = watch::channel(false);
|
||||||
|
let nats = Some(Arc::clone(&self.nats));
|
||||||
|
let metrics = Arc::clone(&self.metrics);
|
||||||
|
let orchestrator = Arc::clone(&self);
|
||||||
|
let scheduler = WorkflowScheduler::new(store, orchestrator, nats, metrics, shutdown_rx);
|
||||||
|
|
||||||
|
Ok((scheduler, shutdown_tx))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Sends a completion notification to every channel listed in
|
||||||
|
/// `workflow.notifications.on_completed`. Never propagates errors —
|
||||||
|
/// a channel timeout must not abort the workflow record.
|
||||||
|
async fn notify_workflow_completed(
|
||||||
|
&self,
|
||||||
|
workflow_id: &str,
|
||||||
|
template: &str,
|
||||||
|
duration_secs: f64,
|
||||||
|
) {
|
||||||
|
let Some(registry) = &self.channels else {
|
||||||
|
return;
|
||||||
|
};
|
||||||
|
|
||||||
|
let targets = self
|
||||||
|
.config
|
||||||
|
.get_workflow(template)
|
||||||
|
.map(|w| w.notifications.on_completed.clone())
|
||||||
|
.unwrap_or_default();
|
||||||
|
|
||||||
|
if targets.is_empty() {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
let msg = Message::success(
|
||||||
|
format!("Workflow completed: {}", template),
|
||||||
|
format!("All stages finished in {:.0}s", duration_secs),
|
||||||
|
)
|
||||||
|
.with_metadata("workflow_id", workflow_id)
|
||||||
|
.with_metadata("template", template)
|
||||||
|
.with_metadata("duration", format!("{:.0}s", duration_secs));
|
||||||
|
|
||||||
|
for target in &targets {
|
||||||
|
if let Err(e) = registry.send(target, msg.clone()).await {
|
||||||
|
warn!(channel = %target, error = %e, "Completion notification failed");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn notify_workflow_failed(
|
||||||
|
&self,
|
||||||
|
workflow_id: &str,
|
||||||
|
template: &str,
|
||||||
|
stage: &str,
|
||||||
|
error: &str,
|
||||||
|
) {
|
||||||
|
let Some(registry) = &self.channels else {
|
||||||
|
return;
|
||||||
|
};
|
||||||
|
|
||||||
|
let targets = self
|
||||||
|
.config
|
||||||
|
.get_workflow(template)
|
||||||
|
.map(|w| w.notifications.on_failed.clone())
|
||||||
|
.unwrap_or_default();
|
||||||
|
|
||||||
|
if targets.is_empty() {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
let msg = Message::error(
|
||||||
|
format!("Workflow failed: {}", template),
|
||||||
|
format!("Stage `{}` failed: {}", stage, error),
|
||||||
|
)
|
||||||
|
.with_metadata("workflow_id", workflow_id)
|
||||||
|
.with_metadata("template", template)
|
||||||
|
.with_metadata("failed_stage", stage);
|
||||||
|
|
||||||
|
for target in &targets {
|
||||||
|
if let Err(e) = registry.send(target, msg.clone()).await {
|
||||||
|
warn!(channel = %target, error = %e, "Failure notification failed");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn notify_approval_required(&self, workflow_id: &str, template: &str, stage_name: &str) {
|
||||||
|
let Some(registry) = &self.channels else {
|
||||||
|
return;
|
||||||
|
};
|
||||||
|
|
||||||
|
let targets = self
|
||||||
|
.config
|
||||||
|
.get_workflow(template)
|
||||||
|
.map(|w| w.notifications.on_approval_required.clone())
|
||||||
|
.unwrap_or_default();
|
||||||
|
|
||||||
|
if targets.is_empty() {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
let msg = Message::warning(
|
||||||
|
format!("Approval required: {}", stage_name),
|
||||||
|
format!(
|
||||||
|
"Workflow `{}` is waiting for human approval to proceed with stage `{}`.",
|
||||||
|
template, stage_name
|
||||||
|
),
|
||||||
|
)
|
||||||
|
.with_metadata("workflow_id", workflow_id)
|
||||||
|
.with_metadata("template", template)
|
||||||
|
.with_metadata("stage", stage_name);
|
||||||
|
|
||||||
|
for target in &targets {
|
||||||
|
if let Err(e) = registry.send(target, msg.clone()).await {
|
||||||
|
warn!(channel = %target, error = %e, "Approval notification failed");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
pub fn list_templates(&self) -> Vec<String> {
|
pub fn list_templates(&self) -> Vec<String> {
|
||||||
self.config
|
self.config
|
||||||
.workflows
|
.workflows
|
||||||
|
|||||||
248
crates/vapora-workflow-engine/src/schedule.rs
Normal file
248
crates/vapora-workflow-engine/src/schedule.rs
Normal file
@ -0,0 +1,248 @@
|
|||||||
|
use std::str::FromStr;
|
||||||
|
|
||||||
|
use chrono::{DateTime, Utc};
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
|
||||||
|
use crate::config::ScheduleConfig;
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||||
|
pub enum RunStatus {
|
||||||
|
Fired,
|
||||||
|
Skipped,
|
||||||
|
Failed,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct ScheduleRun {
|
||||||
|
pub id: String,
|
||||||
|
pub schedule_id: String,
|
||||||
|
pub workflow_instance_id: Option<String>,
|
||||||
|
pub fired_at: DateTime<Utc>,
|
||||||
|
pub status: RunStatus,
|
||||||
|
pub notes: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct ScheduledWorkflow {
|
||||||
|
pub id: String,
|
||||||
|
pub template_name: String,
|
||||||
|
pub cron_expression: String,
|
||||||
|
pub initial_context: serde_json::Value,
|
||||||
|
pub enabled: bool,
|
||||||
|
pub allow_concurrent: bool,
|
||||||
|
pub catch_up: bool,
|
||||||
|
/// IANA timezone identifier for cron evaluation (e.g. "America/New_York").
|
||||||
|
/// When `None`, UTC is used.
|
||||||
|
#[serde(default)]
|
||||||
|
pub timezone: Option<String>,
|
||||||
|
pub last_fired_at: Option<DateTime<Utc>>,
|
||||||
|
pub next_fire_at: Option<DateTime<Utc>>,
|
||||||
|
pub runs_count: u64,
|
||||||
|
pub created_at: DateTime<Utc>,
|
||||||
|
pub updated_at: DateTime<Utc>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ScheduledWorkflow {
|
||||||
|
/// Build from TOML config, deriving a stable ID from the template name
|
||||||
|
/// so repeated calls to `store.upsert` are idempotent.
|
||||||
|
pub fn from_config(template_name: &str, sc: &ScheduleConfig) -> Self {
|
||||||
|
let next_fire_at = compute_next_fire_at_tz(&sc.cron, sc.timezone.as_deref());
|
||||||
|
|
||||||
|
Self {
|
||||||
|
id: template_name.to_string(),
|
||||||
|
template_name: template_name.to_string(),
|
||||||
|
cron_expression: sc.cron.clone(),
|
||||||
|
initial_context: sc
|
||||||
|
.initial_context
|
||||||
|
.clone()
|
||||||
|
.unwrap_or(serde_json::Value::Object(Default::default())),
|
||||||
|
enabled: true,
|
||||||
|
allow_concurrent: sc.allow_concurrent,
|
||||||
|
catch_up: sc.catch_up,
|
||||||
|
timezone: sc.timezone.clone(),
|
||||||
|
last_fired_at: None,
|
||||||
|
next_fire_at,
|
||||||
|
runs_count: 0,
|
||||||
|
created_at: Utc::now(),
|
||||||
|
updated_at: Utc::now(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// True when the scheduled fire time has arrived or passed.
|
||||||
|
pub fn is_due(&self, now: DateTime<Utc>) -> bool {
|
||||||
|
self.next_fire_at.is_some_and(|t| t <= now)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Normalise a cron expression to the 7-field format required by the `cron`
|
||||||
|
/// crate (`sec min hour dom month dow year`).
|
||||||
|
///
|
||||||
|
/// - 5-field standard shell cron (`min hour dom month dow`) → prepend `0`
|
||||||
|
/// (seconds) and append `*` (any year).
|
||||||
|
/// - 6-field (`sec min hour dom month dow`) → append `*` (any year).
|
||||||
|
/// - 7-field → unchanged.
|
||||||
|
pub(crate) fn normalize_cron(expr: &str) -> String {
|
||||||
|
match expr.split_whitespace().count() {
|
||||||
|
5 => format!("0 {} *", expr),
|
||||||
|
6 => format!("{} *", expr),
|
||||||
|
_ => expr.to_string(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Validate a cron expression (5-field, 6-field, or 7-field).
|
||||||
|
///
|
||||||
|
/// Returns `Ok(())` if parseable, `Err(description)` otherwise.
|
||||||
|
pub fn validate_cron_expression(expr: &str) -> Result<(), String> {
|
||||||
|
let normalized = normalize_cron(expr);
|
||||||
|
cron::Schedule::from_str(&normalized)
|
||||||
|
.map(|_| ())
|
||||||
|
.map_err(|e| e.to_string())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Validate an IANA timezone string (e.g. `"America/New_York"`).
|
||||||
|
///
|
||||||
|
/// Returns `Ok(())` when recognised, `Err(description)` otherwise.
|
||||||
|
pub fn validate_timezone(tz: &str) -> Result<(), String> {
|
||||||
|
tz.parse::<chrono_tz::Tz>()
|
||||||
|
.map(|_| ())
|
||||||
|
.map_err(|_| format!("'{}' is not a valid IANA timezone identifier", tz))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Compute the next scheduled fire time after UTC now, evaluated in `tz`.
|
||||||
|
///
|
||||||
|
/// Returns `None` if the expression is invalid or has no future occurrences.
|
||||||
|
/// When `tz` is `None`, UTC is used.
|
||||||
|
pub fn compute_next_fire_at_tz(expr: &str, tz: Option<&str>) -> Option<DateTime<Utc>> {
|
||||||
|
let normalized = normalize_cron(expr);
|
||||||
|
let schedule = cron::Schedule::from_str(&normalized).ok()?;
|
||||||
|
match tz.and_then(|s| s.parse::<chrono_tz::Tz>().ok()) {
|
||||||
|
Some(tz) => schedule.upcoming(tz).next().map(|t| t.with_timezone(&Utc)),
|
||||||
|
None => schedule.upcoming(Utc).next(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Compute the next scheduled fire time after `after`, evaluated in `tz`.
|
||||||
|
///
|
||||||
|
/// When `tz` is `None`, UTC is used.
|
||||||
|
pub fn compute_next_fire_after_tz(
|
||||||
|
expr: &str,
|
||||||
|
after: &DateTime<Utc>,
|
||||||
|
tz: Option<&str>,
|
||||||
|
) -> Option<DateTime<Utc>> {
|
||||||
|
let normalized = normalize_cron(expr);
|
||||||
|
let schedule = cron::Schedule::from_str(&normalized).ok()?;
|
||||||
|
match tz.and_then(|s| s.parse::<chrono_tz::Tz>().ok()) {
|
||||||
|
Some(tz) => schedule
|
||||||
|
.after(&after.with_timezone(&tz))
|
||||||
|
.next()
|
||||||
|
.map(|t| t.with_timezone(&Utc)),
|
||||||
|
None => schedule.after(after).next(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// UTC convenience wrapper — delegates to `compute_next_fire_at_tz` with no
|
||||||
|
/// timezone.
|
||||||
|
pub fn compute_next_fire_at(expr: &str) -> Option<DateTime<Utc>> {
|
||||||
|
compute_next_fire_at_tz(expr, None)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// UTC convenience wrapper — delegates to `compute_next_fire_after_tz` with no
|
||||||
|
/// timezone.
|
||||||
|
pub fn compute_next_fire_after(expr: &str, after: &DateTime<Utc>) -> Option<DateTime<Utc>> {
|
||||||
|
compute_next_fire_after_tz(expr, after, None)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_run_status_serde() {
|
||||||
|
for status in [RunStatus::Fired, RunStatus::Skipped, RunStatus::Failed] {
|
||||||
|
let json = serde_json::to_string(&status).unwrap();
|
||||||
|
let round_tripped: RunStatus = serde_json::from_str(&json).unwrap();
|
||||||
|
assert_eq!(status, round_tripped);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_schedule_is_due() {
|
||||||
|
let now = Utc::now();
|
||||||
|
let past = now - chrono::Duration::seconds(60);
|
||||||
|
let future = now + chrono::Duration::seconds(60);
|
||||||
|
|
||||||
|
let mut s = ScheduledWorkflow {
|
||||||
|
id: "test".to_string(),
|
||||||
|
template_name: "test".to_string(),
|
||||||
|
cron_expression: "0 * * * *".to_string(),
|
||||||
|
initial_context: serde_json::json!({}),
|
||||||
|
enabled: true,
|
||||||
|
allow_concurrent: false,
|
||||||
|
catch_up: false,
|
||||||
|
timezone: None,
|
||||||
|
last_fired_at: None,
|
||||||
|
next_fire_at: None,
|
||||||
|
runs_count: 0,
|
||||||
|
created_at: now,
|
||||||
|
updated_at: now,
|
||||||
|
};
|
||||||
|
|
||||||
|
assert!(!s.is_due(now), "None next_fire_at should not be due");
|
||||||
|
|
||||||
|
s.next_fire_at = Some(past);
|
||||||
|
assert!(s.is_due(now), "Past next_fire_at should be due");
|
||||||
|
|
||||||
|
s.next_fire_at = Some(now);
|
||||||
|
assert!(s.is_due(now), "Exact now should be due");
|
||||||
|
|
||||||
|
s.next_fire_at = Some(future);
|
||||||
|
assert!(!s.is_due(now), "Future next_fire_at should not be due");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_validate_timezone_valid() {
|
||||||
|
assert!(validate_timezone("America/New_York").is_ok());
|
||||||
|
assert!(validate_timezone("Europe/London").is_ok());
|
||||||
|
assert!(validate_timezone("Asia/Tokyo").is_ok());
|
||||||
|
assert!(validate_timezone("UTC").is_ok());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_validate_timezone_invalid() {
|
||||||
|
assert!(validate_timezone("Not/ATimezone").is_err());
|
||||||
|
assert!(validate_timezone("New York").is_err());
|
||||||
|
assert!(validate_timezone("").is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_next_fire_at_tz_utc() {
|
||||||
|
// UTC fallback: any future result is valid
|
||||||
|
let result = compute_next_fire_at_tz("0 9 * * *", None);
|
||||||
|
assert!(result.is_some(), "should produce a future time");
|
||||||
|
assert!(result.unwrap() > Utc::now());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_next_fire_at_tz_named() {
|
||||||
|
let result_tz = compute_next_fire_at_tz("0 9 * * *", Some("America/New_York"));
|
||||||
|
let result_utc = compute_next_fire_at_tz("0 9 * * *", None);
|
||||||
|
// Both must be Some and in the future.
|
||||||
|
assert!(result_tz.is_some());
|
||||||
|
assert!(result_utc.is_some());
|
||||||
|
// They represent 09:00 in different timezones, so they must differ
|
||||||
|
// (EST = UTC-5, EDT = UTC-4; New_York 09:00 != UTC 09:00).
|
||||||
|
assert_ne!(
|
||||||
|
result_tz.unwrap(),
|
||||||
|
result_utc.unwrap(),
|
||||||
|
"timezone-aware result must differ from UTC result"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_next_fire_at_tz_invalid_tz_fallback() {
|
||||||
|
// Unknown timezone is silently treated as UTC (parse fails, match hits None
|
||||||
|
// arm).
|
||||||
|
let result = compute_next_fire_at_tz("0 9 * * *", Some("Invalid/Zone"));
|
||||||
|
assert!(result.is_some(), "falls back to UTC on unknown timezone");
|
||||||
|
}
|
||||||
|
}
|
||||||
384
crates/vapora-workflow-engine/src/schedule_store.rs
Normal file
384
crates/vapora-workflow-engine/src/schedule_store.rs
Normal file
@ -0,0 +1,384 @@
|
|||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use chrono::{DateTime, Utc};
|
||||||
|
use surrealdb::engine::remote::ws::Client;
|
||||||
|
use surrealdb::Surreal;
|
||||||
|
use tracing::debug;
|
||||||
|
|
||||||
|
use crate::error::{Result, WorkflowError};
|
||||||
|
use crate::schedule::{ScheduleRun, ScheduledWorkflow};
|
||||||
|
|
||||||
|
/// Persists `ScheduledWorkflow` and `ScheduleRun` records to SurrealDB.
|
||||||
|
///
|
||||||
|
/// Follows the same injection pattern as `SurrealWorkflowStore` — receives the
|
||||||
|
/// shared DB connection; does not create its own. Tables are defined by
|
||||||
|
/// migration 010.
|
||||||
|
pub struct ScheduleStore {
|
||||||
|
db: Arc<Surreal<Client>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ScheduleStore {
|
||||||
|
pub fn new(db: Arc<Surreal<Client>>) -> Self {
|
||||||
|
Self { db }
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Idempotent upsert for a TOML-seeded schedule.
|
||||||
|
///
|
||||||
|
/// - First call: creates the record with full initial state (including
|
||||||
|
/// computed `next_fire_at`).
|
||||||
|
/// - Subsequent calls: merges only static config fields (`cron_expression`,
|
||||||
|
/// `allow_concurrent`, `catch_up`, `initial_context`), leaving runtime
|
||||||
|
/// state (`last_fired_at`, `next_fire_at`, `runs_count`) intact.
|
||||||
|
pub async fn upsert(&self, s: &ScheduledWorkflow) -> Result<()> {
|
||||||
|
let existing: Option<serde_json::Value> = self
|
||||||
|
.db
|
||||||
|
.select(("scheduled_workflows", &*s.id))
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("check schedule {}: {e}", s.id)))?;
|
||||||
|
|
||||||
|
if existing.is_none() {
|
||||||
|
let json = serde_json::to_value(s).map_err(|e| {
|
||||||
|
WorkflowError::DatabaseError(format!("serialize schedule {}: {e}", s.id))
|
||||||
|
})?;
|
||||||
|
let _: Option<serde_json::Value> = self
|
||||||
|
.db
|
||||||
|
.create(("scheduled_workflows", &*s.id))
|
||||||
|
.content(json)
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
WorkflowError::DatabaseError(format!("create schedule {}: {e}", s.id))
|
||||||
|
})?;
|
||||||
|
debug!(schedule_id = %s.id, template = %s.template_name, "Schedule created");
|
||||||
|
} else {
|
||||||
|
let _: Option<serde_json::Value> = self
|
||||||
|
.db
|
||||||
|
.update(("scheduled_workflows", &*s.id))
|
||||||
|
.merge(serde_json::json!({
|
||||||
|
"cron_expression": s.cron_expression,
|
||||||
|
"allow_concurrent": s.allow_concurrent,
|
||||||
|
"catch_up": s.catch_up,
|
||||||
|
"initial_context": s.initial_context,
|
||||||
|
"timezone": s.timezone,
|
||||||
|
"updated_at": Utc::now(),
|
||||||
|
}))
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
WorkflowError::DatabaseError(format!("merge schedule {}: {e}", s.id))
|
||||||
|
})?;
|
||||||
|
debug!(schedule_id = %s.id, template = %s.template_name, "Schedule config merged");
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Load all enabled schedules. Filters in Rust to avoid complex SurrealQL
|
||||||
|
/// against serde-tagged enums (same rationale as `SurrealWorkflowStore`).
|
||||||
|
pub async fn load_enabled(&self) -> Result<Vec<ScheduledWorkflow>> {
|
||||||
|
let mut response = self
|
||||||
|
.db
|
||||||
|
.query("SELECT * FROM scheduled_workflows")
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("load_enabled query: {e}")))?;
|
||||||
|
|
||||||
|
let raw: Vec<serde_json::Value> = response
|
||||||
|
.take(0)
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("load_enabled take: {e}")))?;
|
||||||
|
|
||||||
|
let schedules: Vec<ScheduledWorkflow> = raw
|
||||||
|
.into_iter()
|
||||||
|
.filter_map(|v| serde_json::from_value(v).ok())
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
Ok(schedules.into_iter().filter(|s| s.enabled).collect())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Atomically advance runtime state after one or more fires.
|
||||||
|
///
|
||||||
|
/// Uses `runs_count = runs_count + 1` in SurrealQL to avoid a read-modify-
|
||||||
|
/// write race when multiple scheduler instances run (future distributed
|
||||||
|
/// deployments).
|
||||||
|
pub async fn update_after_fire(
|
||||||
|
&self,
|
||||||
|
id: &str,
|
||||||
|
fired_at: DateTime<Utc>,
|
||||||
|
next_fire_at: Option<DateTime<Utc>>,
|
||||||
|
) -> Result<()> {
|
||||||
|
self.db
|
||||||
|
.query(
|
||||||
|
"UPDATE type::thing('scheduled_workflows', $id) SET last_fired_at = $fired_at, \
|
||||||
|
next_fire_at = $next_fire_at, runs_count = runs_count + 1, updated_at = \
|
||||||
|
time::now()",
|
||||||
|
)
|
||||||
|
.bind(("id", id.to_string()))
|
||||||
|
.bind(("fired_at", fired_at))
|
||||||
|
.bind(("next_fire_at", next_fire_at))
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("update_after_fire {id}: {e}")))?;
|
||||||
|
|
||||||
|
debug!(schedule_id = %id, "Schedule runtime state updated");
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Load a single schedule by its ID.
|
||||||
|
pub async fn load_one(&self, id: &str) -> Result<Option<ScheduledWorkflow>> {
|
||||||
|
let raw: Option<serde_json::Value> = self
|
||||||
|
.db
|
||||||
|
.select(("scheduled_workflows", id))
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("load_one schedule {id}: {e}")))?;
|
||||||
|
|
||||||
|
raw.map(|v| {
|
||||||
|
serde_json::from_value(v).map_err(|e| {
|
||||||
|
WorkflowError::DatabaseError(format!("deserialize schedule {id}: {e}"))
|
||||||
|
})
|
||||||
|
})
|
||||||
|
.transpose()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Load all schedules (enabled and disabled).
|
||||||
|
pub async fn load_all(&self) -> Result<Vec<ScheduledWorkflow>> {
|
||||||
|
let mut response = self
|
||||||
|
.db
|
||||||
|
.query("SELECT * FROM scheduled_workflows")
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("load_all query: {e}")))?;
|
||||||
|
|
||||||
|
let raw: Vec<serde_json::Value> = response
|
||||||
|
.take(0)
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("load_all take: {e}")))?;
|
||||||
|
|
||||||
|
Ok(raw
|
||||||
|
.into_iter()
|
||||||
|
.filter_map(|v| serde_json::from_value(v).ok())
|
||||||
|
.collect())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Full-replace upsert for `PUT` semantics.
|
||||||
|
///
|
||||||
|
/// Replaces all config fields and recomputes `next_fire_at`. Preserves
|
||||||
|
/// `last_fired_at` and `runs_count` from the existing record so PUT
|
||||||
|
/// doesn't erase operational history.
|
||||||
|
pub async fn full_upsert(&self, s: &ScheduledWorkflow) -> Result<()> {
|
||||||
|
let existing: Option<serde_json::Value> = self
|
||||||
|
.db
|
||||||
|
.select(("scheduled_workflows", &*s.id))
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("check schedule {}: {e}", s.id)))?;
|
||||||
|
|
||||||
|
if existing.is_none() {
|
||||||
|
let json = serde_json::to_value(s).map_err(|e| {
|
||||||
|
WorkflowError::DatabaseError(format!("serialize schedule {}: {e}", s.id))
|
||||||
|
})?;
|
||||||
|
let _: Option<serde_json::Value> = self
|
||||||
|
.db
|
||||||
|
.create(("scheduled_workflows", &*s.id))
|
||||||
|
.content(json)
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
WorkflowError::DatabaseError(format!("create schedule {}: {e}", s.id))
|
||||||
|
})?;
|
||||||
|
} else {
|
||||||
|
// Replace all config fields but preserve operational counters.
|
||||||
|
let _: Option<serde_json::Value> = self
|
||||||
|
.db
|
||||||
|
.update(("scheduled_workflows", &*s.id))
|
||||||
|
.merge(serde_json::json!({
|
||||||
|
"template_name": s.template_name,
|
||||||
|
"cron_expression": s.cron_expression,
|
||||||
|
"initial_context": s.initial_context,
|
||||||
|
"enabled": s.enabled,
|
||||||
|
"allow_concurrent": s.allow_concurrent,
|
||||||
|
"catch_up": s.catch_up,
|
||||||
|
"timezone": s.timezone,
|
||||||
|
"next_fire_at": s.next_fire_at,
|
||||||
|
"updated_at": Utc::now(),
|
||||||
|
}))
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
WorkflowError::DatabaseError(format!("full_upsert schedule {}: {e}", s.id))
|
||||||
|
})?;
|
||||||
|
}
|
||||||
|
debug!(schedule_id = %s.id, "Schedule full-upserted");
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Partial update: only touches the fields provided (PATCH semantics).
|
||||||
|
///
|
||||||
|
/// If `cron_expression` changes, the caller must compute and pass the new
|
||||||
|
/// `next_fire_at`.
|
||||||
|
pub async fn patch(
|
||||||
|
&self,
|
||||||
|
id: &str,
|
||||||
|
patch: serde_json::Value,
|
||||||
|
) -> Result<Option<ScheduledWorkflow>> {
|
||||||
|
let _: Option<serde_json::Value> = self
|
||||||
|
.db
|
||||||
|
.update(("scheduled_workflows", id))
|
||||||
|
.merge(patch)
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("patch schedule {id}: {e}")))?;
|
||||||
|
|
||||||
|
self.load_one(id).await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Delete a schedule definition permanently.
|
||||||
|
pub async fn delete(&self, id: &str) -> Result<()> {
|
||||||
|
let _: Option<serde_json::Value> = self
|
||||||
|
.db
|
||||||
|
.delete(("scheduled_workflows", id))
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("delete schedule {id}: {e}")))?;
|
||||||
|
|
||||||
|
debug!(schedule_id = %id, "Schedule deleted");
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Load the run history for a specific schedule, ordered by fired_at desc.
|
||||||
|
pub async fn load_runs(&self, schedule_id: &str) -> Result<Vec<ScheduleRun>> {
|
||||||
|
let mut response = self
|
||||||
|
.db
|
||||||
|
.query(
|
||||||
|
"SELECT * FROM schedule_runs WHERE schedule_id = $schedule_id ORDER BY fired_at \
|
||||||
|
DESC LIMIT 100",
|
||||||
|
)
|
||||||
|
.bind(("schedule_id", schedule_id.to_string()))
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("load_runs query: {e}")))?;
|
||||||
|
|
||||||
|
let raw: Vec<serde_json::Value> = response
|
||||||
|
.take(0)
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("load_runs take: {e}")))?;
|
||||||
|
|
||||||
|
Ok(raw
|
||||||
|
.into_iter()
|
||||||
|
.filter_map(|v| serde_json::from_value(v).ok())
|
||||||
|
.collect())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Attempt to acquire an exclusive fire-lock for distributed deployments.
|
||||||
|
///
|
||||||
|
/// Uses a conditional SurrealDB UPDATE: succeeds only when the record has
|
||||||
|
/// no current lock holder, or the existing lock has expired (> 120 s old).
|
||||||
|
/// Returns `true` if the lock was acquired by this call, `false` if another
|
||||||
|
/// instance already holds it.
|
||||||
|
///
|
||||||
|
/// The 120-second TTL means a crashed instance releases its lock
|
||||||
|
/// automatically on the next scheduler tick after that window.
|
||||||
|
pub async fn try_acquire_fire_lock(
|
||||||
|
&self,
|
||||||
|
id: &str,
|
||||||
|
instance_id: &str,
|
||||||
|
now: &chrono::DateTime<Utc>,
|
||||||
|
) -> Result<bool> {
|
||||||
|
let expiry = *now - chrono::Duration::seconds(120);
|
||||||
|
let mut resp = self
|
||||||
|
.db
|
||||||
|
.query(
|
||||||
|
"UPDATE type::thing('scheduled_workflows', $id) SET locked_by = $instance_id, \
|
||||||
|
locked_at = $now WHERE locked_by IS NONE OR locked_at < $expiry",
|
||||||
|
)
|
||||||
|
.bind(("id", id.to_string()))
|
||||||
|
.bind(("instance_id", instance_id.to_string()))
|
||||||
|
.bind(("now", *now))
|
||||||
|
.bind(("expiry", expiry))
|
||||||
|
.await
|
||||||
|
.map_err(|e| {
|
||||||
|
WorkflowError::DatabaseError(format!("try_acquire_fire_lock {id}: {e}"))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
let records: Vec<serde_json::Value> = resp.take(0).map_err(|e| {
|
||||||
|
WorkflowError::DatabaseError(format!("try_acquire_fire_lock take {id}: {e}"))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
Ok(!records.is_empty())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Release the fire-lock held by `instance_id`.
|
||||||
|
///
|
||||||
|
/// The WHERE guard ensures an instance can only release its own lock,
|
||||||
|
/// preventing a slow instance from accidentally clearing another's lock
|
||||||
|
/// after its TTL expired and a second instance re-acquired it.
|
||||||
|
pub async fn release_fire_lock(&self, id: &str, instance_id: &str) -> Result<()> {
|
||||||
|
self.db
|
||||||
|
.query(
|
||||||
|
"UPDATE type::thing('scheduled_workflows', $id) SET locked_by = NONE, locked_at = \
|
||||||
|
NONE WHERE locked_by = $instance_id",
|
||||||
|
)
|
||||||
|
.bind(("id", id.to_string()))
|
||||||
|
.bind(("instance_id", instance_id.to_string()))
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("release_fire_lock {id}: {e}")))?;
|
||||||
|
|
||||||
|
debug!(schedule_id = %id, instance_id = %instance_id, "Fire lock released");
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Append a run record to the immutable audit log.
|
||||||
|
pub async fn record_run(&self, run: &ScheduleRun) -> Result<()> {
|
||||||
|
let json = serde_json::to_value(run)
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("serialize run {}: {e}", run.id)))?;
|
||||||
|
|
||||||
|
let _: Option<serde_json::Value> = self
|
||||||
|
.db
|
||||||
|
.create(("schedule_runs", &*run.id))
|
||||||
|
.content(json)
|
||||||
|
.await
|
||||||
|
.map_err(|e| WorkflowError::DatabaseError(format!("record_run {}: {e}", run.id)))?;
|
||||||
|
|
||||||
|
debug!(
|
||||||
|
run_id = %run.id,
|
||||||
|
schedule = %run.schedule_id,
|
||||||
|
status = ?run.status,
|
||||||
|
"Schedule run recorded"
|
||||||
|
);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use crate::schedule::RunStatus;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_schedule_serialization() {
|
||||||
|
let now = Utc::now();
|
||||||
|
let s = ScheduledWorkflow {
|
||||||
|
id: "nightly_analysis".to_string(),
|
||||||
|
template_name: "nightly_analysis".to_string(),
|
||||||
|
cron_expression: "0 2 * * *".to_string(),
|
||||||
|
initial_context: serde_json::json!({"env": "prod"}),
|
||||||
|
enabled: true,
|
||||||
|
allow_concurrent: false,
|
||||||
|
catch_up: false,
|
||||||
|
timezone: None,
|
||||||
|
last_fired_at: Some(now),
|
||||||
|
next_fire_at: Some(now + chrono::Duration::hours(24)),
|
||||||
|
runs_count: 42,
|
||||||
|
created_at: now,
|
||||||
|
updated_at: now,
|
||||||
|
};
|
||||||
|
|
||||||
|
let json = serde_json::to_string(&s).expect("serialize");
|
||||||
|
let decoded: ScheduledWorkflow = serde_json::from_str(&json).expect("deserialize");
|
||||||
|
|
||||||
|
assert_eq!(decoded.id, s.id);
|
||||||
|
assert_eq!(decoded.template_name, s.template_name);
|
||||||
|
assert_eq!(decoded.cron_expression, s.cron_expression);
|
||||||
|
assert_eq!(decoded.runs_count, 42);
|
||||||
|
assert_eq!(decoded.allow_concurrent, false);
|
||||||
|
assert_eq!(decoded.catch_up, false);
|
||||||
|
|
||||||
|
let run = ScheduleRun {
|
||||||
|
id: uuid::Uuid::new_v4().to_string(),
|
||||||
|
schedule_id: s.id.clone(),
|
||||||
|
workflow_instance_id: Some("wf-abc".to_string()),
|
||||||
|
fired_at: now,
|
||||||
|
status: RunStatus::Fired,
|
||||||
|
notes: None,
|
||||||
|
};
|
||||||
|
let run_json = serde_json::to_string(&run).expect("serialize run");
|
||||||
|
let decoded_run: ScheduleRun = serde_json::from_str(&run_json).expect("deserialize run");
|
||||||
|
assert_eq!(decoded_run.status, RunStatus::Fired);
|
||||||
|
}
|
||||||
|
}
|
||||||
388
crates/vapora-workflow-engine/src/scheduler.rs
Normal file
388
crates/vapora-workflow-engine/src/scheduler.rs
Normal file
@ -0,0 +1,388 @@
|
|||||||
|
use std::str::FromStr;
|
||||||
|
use std::sync::Arc;
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
use chrono::{DateTime, Utc};
|
||||||
|
use tokio::sync::watch;
|
||||||
|
use tokio::time::MissedTickBehavior;
|
||||||
|
use tracing::{error, info, warn};
|
||||||
|
|
||||||
|
use crate::error::{Result, ScheduleError, WorkflowError};
|
||||||
|
use crate::metrics::WorkflowMetrics;
|
||||||
|
use crate::orchestrator::WorkflowOrchestrator;
|
||||||
|
use crate::schedule::{
|
||||||
|
compute_next_fire_after_tz, compute_next_fire_at_tz, normalize_cron, RunStatus, ScheduleRun,
|
||||||
|
ScheduledWorkflow,
|
||||||
|
};
|
||||||
|
use crate::schedule_store::ScheduleStore;
|
||||||
|
|
||||||
|
pub struct WorkflowScheduler {
|
||||||
|
store: Arc<ScheduleStore>,
|
||||||
|
orchestrator: Arc<WorkflowOrchestrator>,
|
||||||
|
nats: Option<Arc<async_nats::Client>>,
|
||||||
|
metrics: Arc<WorkflowMetrics>,
|
||||||
|
tick_interval: Duration,
|
||||||
|
shutdown: watch::Receiver<bool>,
|
||||||
|
/// UUID identifying this process instance. Used as the lock owner in
|
||||||
|
/// distributed deployments to prevent double-fires across instances.
|
||||||
|
instance_id: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl WorkflowScheduler {
|
||||||
|
pub fn new(
|
||||||
|
store: Arc<ScheduleStore>,
|
||||||
|
orchestrator: Arc<WorkflowOrchestrator>,
|
||||||
|
nats: Option<Arc<async_nats::Client>>,
|
||||||
|
metrics: Arc<WorkflowMetrics>,
|
||||||
|
shutdown: watch::Receiver<bool>,
|
||||||
|
) -> Self {
|
||||||
|
Self {
|
||||||
|
store,
|
||||||
|
orchestrator,
|
||||||
|
nats,
|
||||||
|
metrics,
|
||||||
|
tick_interval: Duration::from_secs(30),
|
||||||
|
shutdown,
|
||||||
|
instance_id: uuid::Uuid::new_v4().to_string(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Override the default 30-second tick interval (useful in tests).
|
||||||
|
pub fn with_tick_interval(mut self, interval: Duration) -> Self {
|
||||||
|
self.tick_interval = interval;
|
||||||
|
self
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Drive the scheduling loop until the shutdown signal fires.
|
||||||
|
///
|
||||||
|
/// Call from a `tokio::spawn` — the returned future completes when
|
||||||
|
/// `watch::Sender::send(true)` is called on the paired sender.
|
||||||
|
pub async fn run(self: Arc<Self>) {
|
||||||
|
let mut shutdown = self.shutdown.clone();
|
||||||
|
let mut interval = tokio::time::interval(self.tick_interval);
|
||||||
|
interval.set_missed_tick_behavior(MissedTickBehavior::Skip);
|
||||||
|
|
||||||
|
info!(
|
||||||
|
instance_id = %self.instance_id,
|
||||||
|
tick = ?self.tick_interval,
|
||||||
|
"WorkflowScheduler started"
|
||||||
|
);
|
||||||
|
|
||||||
|
loop {
|
||||||
|
tokio::select! {
|
||||||
|
changed = shutdown.changed() => {
|
||||||
|
if changed.is_ok() && *shutdown.borrow() {
|
||||||
|
info!(instance_id = %self.instance_id, "WorkflowScheduler shutting down");
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
_ = interval.tick() => {
|
||||||
|
let now = Utc::now();
|
||||||
|
match self.store.load_enabled().await {
|
||||||
|
Ok(schedules) => {
|
||||||
|
for s in schedules.into_iter().filter(|s| s.is_due(now)) {
|
||||||
|
let scheduler = Arc::clone(&self);
|
||||||
|
let s = s.clone();
|
||||||
|
tokio::spawn(async move {
|
||||||
|
if let Err(e) = scheduler.fire_schedule(&s, now).await {
|
||||||
|
error!(
|
||||||
|
schedule_id = %s.id,
|
||||||
|
template = %s.template_name,
|
||||||
|
error = %e,
|
||||||
|
"Schedule fire error"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => error!(error = %e, "Failed to load enabled schedules"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn fire_schedule(&self, s: &ScheduledWorkflow, now: DateTime<Utc>) -> Result<()> {
|
||||||
|
// Acquire distributed lock — prevents double-fires in multi-instance
|
||||||
|
// deployments. Lock has a 120-second TTL enforced by the store.
|
||||||
|
let acquired = self
|
||||||
|
.store
|
||||||
|
.try_acquire_fire_lock(&s.id, &self.instance_id, &now)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
if !acquired {
|
||||||
|
info!(
|
||||||
|
schedule_id = %s.id,
|
||||||
|
template = %s.template_name,
|
||||||
|
instance_id = %self.instance_id,
|
||||||
|
"Fire lock held by another instance — skipping this tick"
|
||||||
|
);
|
||||||
|
self.metrics.schedules_skipped.inc();
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
|
||||||
|
let result = self.fire_with_lock(s, now).await;
|
||||||
|
|
||||||
|
// Release the lock unconditionally. A failure here is non-fatal since
|
||||||
|
// the TTL will expire automatically after 120 s.
|
||||||
|
if let Err(e) = self.store.release_fire_lock(&s.id, &self.instance_id).await {
|
||||||
|
warn!(
|
||||||
|
schedule_id = %s.id,
|
||||||
|
instance_id = %self.instance_id,
|
||||||
|
error = %e,
|
||||||
|
"Failed to release schedule fire lock (TTL will expire automatically)"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
result
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Execute a single schedule fire — called after the distributed lock is
|
||||||
|
/// held.
|
||||||
|
async fn fire_with_lock(&self, s: &ScheduledWorkflow, now: DateTime<Utc>) -> Result<()> {
|
||||||
|
let tz = s.timezone.as_deref();
|
||||||
|
|
||||||
|
let normalized = normalize_cron(&s.cron_expression);
|
||||||
|
let cron_schedule = cron::Schedule::from_str(&normalized).map_err(|e| {
|
||||||
|
WorkflowError::Schedule(ScheduleError::InvalidCron {
|
||||||
|
expr: s.cron_expression.clone(),
|
||||||
|
reason: e.to_string(),
|
||||||
|
})
|
||||||
|
})?;
|
||||||
|
|
||||||
|
// Concurrency guard: skip if an active workflow for this template exists.
|
||||||
|
// Now safe to check without races because we hold the distributed lock.
|
||||||
|
if !s.allow_concurrent
|
||||||
|
&& self
|
||||||
|
.orchestrator
|
||||||
|
.has_active_workflow_for_template(&s.template_name)
|
||||||
|
{
|
||||||
|
let next_fire_at = compute_next_fire_at_tz(&s.cron_expression, tz);
|
||||||
|
self.store
|
||||||
|
.update_after_fire(&s.id, now, next_fire_at)
|
||||||
|
.await?;
|
||||||
|
self.store
|
||||||
|
.record_run(&ScheduleRun {
|
||||||
|
id: uuid::Uuid::new_v4().to_string(),
|
||||||
|
schedule_id: s.id.clone(),
|
||||||
|
workflow_instance_id: None,
|
||||||
|
fired_at: now,
|
||||||
|
status: RunStatus::Skipped,
|
||||||
|
notes: Some("Active workflow for template already exists".to_string()),
|
||||||
|
})
|
||||||
|
.await?;
|
||||||
|
self.metrics.schedules_skipped.inc();
|
||||||
|
info!(
|
||||||
|
schedule_id = %s.id,
|
||||||
|
template = %s.template_name,
|
||||||
|
"Schedule skipped: concurrent run in progress"
|
||||||
|
);
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
|
||||||
|
// Determine which time slots to fire — timezone-aware.
|
||||||
|
let last = s.last_fired_at.unwrap_or(s.created_at);
|
||||||
|
let fire_times = compute_fire_times_tz(&cron_schedule, last, now, s.catch_up, tz);
|
||||||
|
|
||||||
|
let mut last_fire_time = now;
|
||||||
|
for &fire_time in &fire_times {
|
||||||
|
last_fire_time = fire_time;
|
||||||
|
match self
|
||||||
|
.orchestrator
|
||||||
|
.start_workflow(&s.template_name, s.initial_context.clone())
|
||||||
|
.await
|
||||||
|
{
|
||||||
|
Ok(workflow_id) => {
|
||||||
|
self.store
|
||||||
|
.record_run(&ScheduleRun {
|
||||||
|
id: uuid::Uuid::new_v4().to_string(),
|
||||||
|
schedule_id: s.id.clone(),
|
||||||
|
workflow_instance_id: Some(workflow_id.clone()),
|
||||||
|
fired_at: fire_time,
|
||||||
|
status: RunStatus::Fired,
|
||||||
|
notes: None,
|
||||||
|
})
|
||||||
|
.await?;
|
||||||
|
self.metrics.schedules_fired.inc();
|
||||||
|
info!(
|
||||||
|
schedule_id = %s.id,
|
||||||
|
template = %s.template_name,
|
||||||
|
workflow_id = %workflow_id,
|
||||||
|
fired_at = %fire_time,
|
||||||
|
timezone = ?tz,
|
||||||
|
"Schedule fired"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
error!(
|
||||||
|
schedule_id = %s.id,
|
||||||
|
template = %s.template_name,
|
||||||
|
error = %e,
|
||||||
|
"Workflow start failed for scheduled fire"
|
||||||
|
);
|
||||||
|
self.store
|
||||||
|
.record_run(&ScheduleRun {
|
||||||
|
id: uuid::Uuid::new_v4().to_string(),
|
||||||
|
schedule_id: s.id.clone(),
|
||||||
|
workflow_instance_id: None,
|
||||||
|
fired_at: fire_time,
|
||||||
|
status: RunStatus::Failed,
|
||||||
|
notes: Some(e.to_string()),
|
||||||
|
})
|
||||||
|
.await?;
|
||||||
|
self.metrics.schedules_failed.inc();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Advance the pointer to the next scheduled time (timezone-aware).
|
||||||
|
let next_fire_at = compute_next_fire_after_tz(&s.cron_expression, &last_fire_time, tz);
|
||||||
|
self.store
|
||||||
|
.update_after_fire(&s.id, last_fire_time, next_fire_at)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
// Publish NATS event — graceful skip when NATS is unavailable.
|
||||||
|
if let Some(nats) = &self.nats {
|
||||||
|
let event = serde_json::json!({
|
||||||
|
"type": "schedule_fired",
|
||||||
|
"schedule_id": s.id,
|
||||||
|
"template": s.template_name,
|
||||||
|
"timezone": s.timezone,
|
||||||
|
"fires": fire_times.len(),
|
||||||
|
"timestamp": Utc::now().to_rfc3339(),
|
||||||
|
});
|
||||||
|
if let Err(e) = nats
|
||||||
|
.publish("vapora.schedule.fired", event.to_string().into())
|
||||||
|
.await
|
||||||
|
{
|
||||||
|
warn!(error = %e, "Failed to publish vapora.schedule.fired");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Compute which UTC time slots to fire, optionally in a named timezone.
|
||||||
|
///
|
||||||
|
/// - `catch_up = true`: all missed slots since `last`, capped at 10, converted
|
||||||
|
/// to UTC.
|
||||||
|
/// - `catch_up = false`: always exactly one slot (`now`), timezone ignored.
|
||||||
|
///
|
||||||
|
/// When `tz` is `Some` the cron iterator runs in that timezone so that
|
||||||
|
/// e.g. "every day at 09:00 America/New_York" fires at 14:00 UTC (or 13:00
|
||||||
|
/// during EDT).
|
||||||
|
#[cfg_attr(not(test), allow(dead_code))]
|
||||||
|
fn compute_fire_times_tz(
|
||||||
|
schedule: &cron::Schedule,
|
||||||
|
last: DateTime<Utc>,
|
||||||
|
now: DateTime<Utc>,
|
||||||
|
catch_up: bool,
|
||||||
|
tz: Option<&str>,
|
||||||
|
) -> Vec<DateTime<Utc>> {
|
||||||
|
if !catch_up {
|
||||||
|
return vec![now];
|
||||||
|
}
|
||||||
|
|
||||||
|
let collected: Vec<DateTime<Utc>> = match tz.and_then(|s| s.parse::<chrono_tz::Tz>().ok()) {
|
||||||
|
Some(tz) => schedule
|
||||||
|
.after(&last.with_timezone(&tz))
|
||||||
|
.take_while(|t| t.with_timezone(&Utc) <= now)
|
||||||
|
.take(10)
|
||||||
|
.map(|t| t.with_timezone(&Utc))
|
||||||
|
.collect(),
|
||||||
|
None => schedule
|
||||||
|
.after(&last)
|
||||||
|
.take_while(|t| t <= &now)
|
||||||
|
.take(10)
|
||||||
|
.collect(),
|
||||||
|
};
|
||||||
|
|
||||||
|
if collected.is_empty() {
|
||||||
|
vec![now]
|
||||||
|
} else {
|
||||||
|
collected
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use crate::schedule::normalize_cron;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_normalize_cron_5field() {
|
||||||
|
let result = normalize_cron("0 9 * * 1-5");
|
||||||
|
assert_eq!(result, "0 0 9 * * 1-5 *");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_normalize_cron_7field() {
|
||||||
|
let expr = "0 0 9 * * 1-5 *";
|
||||||
|
let result = normalize_cron(expr);
|
||||||
|
assert_eq!(result, expr);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_fires_no_catchup() {
|
||||||
|
let normalized = normalize_cron("0 0 * * *"); // Every day at midnight
|
||||||
|
let schedule = cron::Schedule::from_str(&normalized).expect("valid cron");
|
||||||
|
|
||||||
|
let now = Utc::now();
|
||||||
|
let last = now - chrono::Duration::days(3);
|
||||||
|
|
||||||
|
let fires = compute_fire_times_tz(&schedule, last, now, false, None);
|
||||||
|
// Without catch_up: always exactly one fire at `now`.
|
||||||
|
assert_eq!(fires.len(), 1);
|
||||||
|
assert_eq!(fires[0], now);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_fires_with_catchup() {
|
||||||
|
let normalized = normalize_cron("0 0 * * *"); // Every day at midnight
|
||||||
|
let schedule = cron::Schedule::from_str(&normalized).expect("valid cron");
|
||||||
|
|
||||||
|
// last fired 5 days ago → should produce up to 5 missed midnight slots.
|
||||||
|
let now = Utc::now();
|
||||||
|
let last = now - chrono::Duration::days(5);
|
||||||
|
|
||||||
|
let fires = compute_fire_times_tz(&schedule, last, now, true, None);
|
||||||
|
assert!(fires.len() > 1, "catch_up should produce multiple fires");
|
||||||
|
assert!(fires.len() <= 10, "catch_up is capped at 10");
|
||||||
|
for &t in &fires {
|
||||||
|
assert!(t <= now, "catch_up fire time must not be in the future");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_fires_with_catchup_named_tz() {
|
||||||
|
let normalized = normalize_cron("0 0 * * *"); // Every day at midnight (in tz)
|
||||||
|
let schedule = cron::Schedule::from_str(&normalized).expect("valid cron");
|
||||||
|
|
||||||
|
let now = Utc::now();
|
||||||
|
let last = now - chrono::Duration::days(3);
|
||||||
|
|
||||||
|
let fires_utc = compute_fire_times_tz(&schedule, last, now, true, None);
|
||||||
|
let fires_tz = compute_fire_times_tz(&schedule, last, now, true, Some("America/New_York"));
|
||||||
|
|
||||||
|
// Both produce multiple slots. The UTC times differ because midnight NY
|
||||||
|
// is 05:00 UTC (or 04:00 EDT) — so the same count but different instants.
|
||||||
|
assert!(!fires_utc.is_empty());
|
||||||
|
assert!(!fires_tz.is_empty());
|
||||||
|
if fires_utc.len() == fires_tz.len() && fires_utc.len() > 0 {
|
||||||
|
// Timestamps must differ (different timezone means different UTC offset).
|
||||||
|
assert_ne!(
|
||||||
|
fires_utc[0], fires_tz[0],
|
||||||
|
"UTC and NY midnight must be different UTC instants"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_instance_id_is_unique() {
|
||||||
|
// Each WorkflowScheduler gets a distinct UUID to prevent lock collisions.
|
||||||
|
let id1 = uuid::Uuid::new_v4().to_string();
|
||||||
|
let id2 = uuid::Uuid::new_v4().to_string();
|
||||||
|
assert_ne!(id1, id2);
|
||||||
|
}
|
||||||
|
}
|
||||||
106
crates/vapora-workflow-engine/tests/notification_config.rs
Normal file
106
crates/vapora-workflow-engine/tests/notification_config.rs
Normal file
@ -0,0 +1,106 @@
|
|||||||
|
use vapora_channels::ChannelRegistry;
|
||||||
|
use vapora_workflow_engine::config::WorkflowsConfig;
|
||||||
|
|
||||||
|
/// Full TOML round-trip: channels section + per-workflow notification targets.
|
||||||
|
const TOML_WITH_CHANNELS: &str = r#"
|
||||||
|
[engine]
|
||||||
|
max_parallel_tasks = 4
|
||||||
|
workflow_timeout = 3600
|
||||||
|
approval_gates_enabled = false
|
||||||
|
|
||||||
|
[channels.team-slack]
|
||||||
|
type = "slack"
|
||||||
|
webhook_url = "https://hooks.slack.com/services/TEST/TEST/TEST"
|
||||||
|
|
||||||
|
[channels.ops-telegram]
|
||||||
|
type = "telegram"
|
||||||
|
bot_token = "123:TEST"
|
||||||
|
chat_id = "-100999"
|
||||||
|
|
||||||
|
[[workflows]]
|
||||||
|
name = "deploy-prod"
|
||||||
|
trigger = "manual"
|
||||||
|
|
||||||
|
[workflows.notifications]
|
||||||
|
on_completed = ["team-slack"]
|
||||||
|
on_failed = ["team-slack", "ops-telegram"]
|
||||||
|
on_approval_required = ["team-slack"]
|
||||||
|
|
||||||
|
[[workflows.stages]]
|
||||||
|
name = "build"
|
||||||
|
agents = ["developer"]
|
||||||
|
|
||||||
|
[[workflows.stages]]
|
||||||
|
name = "deploy"
|
||||||
|
agents = ["deployer"]
|
||||||
|
approval_required = true
|
||||||
|
"#;
|
||||||
|
|
||||||
|
/// Workflow with no [channels] section — should parse without error and leave
|
||||||
|
/// the channel map empty (registry skipped by orchestrator).
|
||||||
|
const TOML_WITHOUT_CHANNELS: &str = r#"
|
||||||
|
[engine]
|
||||||
|
max_parallel_tasks = 4
|
||||||
|
workflow_timeout = 3600
|
||||||
|
approval_gates_enabled = false
|
||||||
|
|
||||||
|
[[workflows]]
|
||||||
|
name = "ci-pipeline"
|
||||||
|
trigger = "manual"
|
||||||
|
|
||||||
|
[[workflows.stages]]
|
||||||
|
name = "test"
|
||||||
|
agents = ["developer"]
|
||||||
|
"#;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn channels_section_parses_into_config() {
|
||||||
|
let config: WorkflowsConfig = toml::from_str(TOML_WITH_CHANNELS).expect("must parse");
|
||||||
|
|
||||||
|
assert_eq!(config.channels.len(), 2);
|
||||||
|
assert!(config.channels.contains_key("team-slack"));
|
||||||
|
assert!(config.channels.contains_key("ops-telegram"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn notification_targets_parse_per_workflow() {
|
||||||
|
let config: WorkflowsConfig = toml::from_str(TOML_WITH_CHANNELS).expect("must parse");
|
||||||
|
|
||||||
|
let wf = config.get_workflow("deploy-prod").expect("workflow exists");
|
||||||
|
assert_eq!(wf.notifications.on_completed, ["team-slack"]);
|
||||||
|
assert_eq!(wf.notifications.on_failed, ["team-slack", "ops-telegram"]);
|
||||||
|
assert_eq!(wf.notifications.on_approval_required, ["team-slack"]);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn missing_channels_section_defaults_to_empty() {
|
||||||
|
let config: WorkflowsConfig = toml::from_str(TOML_WITHOUT_CHANNELS).expect("must parse");
|
||||||
|
|
||||||
|
assert!(config.channels.is_empty());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn missing_notifications_block_defaults_to_empty_vecs() {
|
||||||
|
let config: WorkflowsConfig = toml::from_str(TOML_WITHOUT_CHANNELS).expect("must parse");
|
||||||
|
|
||||||
|
let wf = config.get_workflow("ci-pipeline").expect("workflow exists");
|
||||||
|
assert!(wf.notifications.on_completed.is_empty());
|
||||||
|
assert!(wf.notifications.on_failed.is_empty());
|
||||||
|
assert!(wf.notifications.on_approval_required.is_empty());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn channel_registry_builds_from_config() {
|
||||||
|
let config: WorkflowsConfig = toml::from_str(TOML_WITH_CHANNELS).expect("must parse");
|
||||||
|
let registry = ChannelRegistry::from_map(config.channels).expect("registry must build");
|
||||||
|
|
||||||
|
let mut names = registry.channel_names();
|
||||||
|
names.sort_unstable();
|
||||||
|
assert_eq!(names, ["ops-telegram", "team-slack"]);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn validation_passes_with_channels_and_notifications() {
|
||||||
|
let config: WorkflowsConfig = toml::from_str(TOML_WITH_CHANNELS).expect("must parse");
|
||||||
|
config.validate().expect("validation must pass");
|
||||||
|
}
|
||||||
101
docs/adrs/0034-autonomous-scheduling.md
Normal file
101
docs/adrs/0034-autonomous-scheduling.md
Normal file
@ -0,0 +1,101 @@
|
|||||||
|
# ADR-0034: Autonomous Cron Scheduling — Timezone Support and Distributed Fire-Lock
|
||||||
|
|
||||||
|
**Status**: Implemented
|
||||||
|
**Date**: 2026-02-26
|
||||||
|
**Deciders**: VAPORA Team
|
||||||
|
**Technical Story**: `vapora-workflow-engine` scheduler fired cron jobs only in UTC and had no protection against double-fires in multi-instance deployments.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Extend the autonomous scheduling subsystem with two independent hardening layers:
|
||||||
|
|
||||||
|
1. **Timezone-aware scheduling** (`chrono-tz`) — cron expressions evaluated in any IANA timezone, stored per-schedule, validated at API and config-load boundaries.
|
||||||
|
2. **Distributed fire-lock** — SurrealDB conditional `UPDATE ... WHERE locked_by IS NONE OR locked_at < $expiry` provides atomic, TTL-backed mutual exclusion across instances without additional infrastructure.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
### Gaps Addressed
|
||||||
|
|
||||||
|
| Gap | Consequence |
|
||||||
|
|-----|-------------|
|
||||||
|
| UTC-only cron evaluation | `"0 9 * * *"` fires at 09:00 UTC regardless of business timezone; scheduled reports or maintenance windows drift by the UTC offset |
|
||||||
|
| No distributed coordination | Two `vapora-workflow-engine` instances reading the same `scheduled_workflows` table both fire the same schedule at the same tick |
|
||||||
|
|
||||||
|
### Why These Approaches
|
||||||
|
|
||||||
|
**`chrono-tz`** over manual UTC-offset arithmetic:
|
||||||
|
- Compile-time exhaustive enum of all IANA timezone names — invalid names are rejected at parse time.
|
||||||
|
- The `cron` crate's `Schedule::upcoming(tz)` / `Schedule::after(&dt_in_tz)` are generic over any `TimeZone`, so timezone-awareness requires no special-casing in iteration logic: pass `DateTime<chrono_tz::Tz>` instead of `DateTime<Utc>`, convert output with `.with_timezone(&Utc)`.
|
||||||
|
- DST transitions handled automatically by `chrono-tz` — no application code needed.
|
||||||
|
|
||||||
|
**SurrealDB conditional UPDATE** over external distributed lock (Redis, etcd):
|
||||||
|
- No additional infrastructure dependency.
|
||||||
|
- SurrealDB applies document-level write locking; `UPDATE record WHERE condition` is atomic — two concurrent instances race on the same document and only one succeeds (non-empty return array = lock acquired).
|
||||||
|
- 120-second TTL enforced in application code: `locked_at < $expiry` in the WHERE clause auto-expires a lock from a crashed instance within two scheduler ticks.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### New Fields
|
||||||
|
|
||||||
|
`scheduled_workflows` table gains three columns (migration 011):
|
||||||
|
|
||||||
|
| Field | Type | Purpose |
|
||||||
|
|-------|------|---------|
|
||||||
|
| `timezone` | `option<string>` | IANA identifier (`"America/New_York"`) or `NONE` for UTC |
|
||||||
|
| `locked_by` | `option<string>` | UUID of the instance holding the current fire-lock |
|
||||||
|
| `locked_at` | `option<datetime>` | When the lock was acquired; used for TTL expiry |
|
||||||
|
|
||||||
|
### Lock Protocol
|
||||||
|
|
||||||
|
```
|
||||||
|
Tick N fires schedule S:
|
||||||
|
try_acquire_fire_lock(id, instance_id, now)
|
||||||
|
→ UPDATE ... WHERE locked_by IS NONE OR locked_at < (now - 120s)
|
||||||
|
→ returns true (non-empty) or false (empty)
|
||||||
|
if false: log + inc schedules_skipped, return
|
||||||
|
fire_with_lock(S, now) ← actual workflow start
|
||||||
|
release_fire_lock(id, instance_id)
|
||||||
|
→ UPDATE ... WHERE locked_by = instance_id
|
||||||
|
→ own-instance guard prevents stale release
|
||||||
|
```
|
||||||
|
|
||||||
|
Lock release is always attempted even on `fire_with_lock` error; a `warn!` is emitted if release fails (TTL provides fallback).
|
||||||
|
|
||||||
|
### Timezone-Aware Cron Evaluation
|
||||||
|
|
||||||
|
```
|
||||||
|
compute_fire_times_tz(schedule, last, now, catch_up, tz):
|
||||||
|
match tz.parse::<chrono_tz::Tz>():
|
||||||
|
Some(tz) → schedule.after(&last.with_timezone(&tz))
|
||||||
|
.take_while(|t| t.with_timezone(&Utc) <= now)
|
||||||
|
.map(|t| t.with_timezone(&Utc))
|
||||||
|
None → schedule.after(&last) ← UTC
|
||||||
|
```
|
||||||
|
|
||||||
|
Parsing an unknown/invalid timezone string silently falls back to UTC — avoids a hard error at runtime if a previously valid TZ identifier is removed from the `chrono-tz` database in a future upgrade.
|
||||||
|
|
||||||
|
### API Surface Changes
|
||||||
|
|
||||||
|
`PUT /api/v1/schedules/:id` and `PATCH /api/v1/schedules/:id` accept and return `timezone: Option<String>`. Timezone is validated at the API boundary using `validate_timezone()` (returns `400 InvalidInput` for unknown identifiers). Config-file `[schedule]` blocks also accept `timezone` and are validated at startup (fail-fast, same as `cron`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
|
||||||
|
- Schedules expressed in business-local time — no mental UTC arithmetic for operators.
|
||||||
|
- Multi-instance deployments safe by default; no external lock service required.
|
||||||
|
- `ScheduledWorkflow.timezone` is nullable/optional — all existing schedules without the field default to UTC with no migration required.
|
||||||
|
|
||||||
|
### Negative / Trade-offs
|
||||||
|
|
||||||
|
- `chrono-tz` adds ~2 MB of IANA timezone data to the binary (compile-time embedded).
|
||||||
|
- Distributed lock TTL of 120 s means a worst-case window of one double-fire per 120 s if the winning instance crashes between acquiring the lock and calling `update_after_fire`. Acceptable given the `schedule_runs` audit log makes duplicates visible.
|
||||||
|
- No multi-PATCH for timezone clearance: passing `timezone: null` in JSON is treated as absent (`#[serde(default)]`). Clearing timezone (revert to UTC) requires a full PUT.
|
||||||
159
docs/adrs/0035-notification-channels.md
Normal file
159
docs/adrs/0035-notification-channels.md
Normal file
@ -0,0 +1,159 @@
|
|||||||
|
# ADR-0035: Webhook-Based Notification Channels — `vapora-channels` Crate
|
||||||
|
|
||||||
|
**Status**: Implemented
|
||||||
|
**Date**: 2026-02-26
|
||||||
|
**Deciders**: VAPORA Team
|
||||||
|
**Technical Story**: Workflow events (task completion, proposal approve/reject, schedule fires) had no outbound delivery path; operators had to poll the API to learn about state changes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Introduce a dedicated `vapora-channels` crate implementing a **trait-based webhook delivery layer** with:
|
||||||
|
|
||||||
|
1. `NotificationChannel` trait — single `send(&Message) -> Result<()>` method; consumers implement HTTP webhooks (Slack, Discord, Telegram) without vendor SDK dependencies.
|
||||||
|
2. `ChannelRegistry` — name-keyed routing hub; `from_config(HashMap<String, ChannelConfig>)` builds the registry from TOML config, resolving secrets at construction time.
|
||||||
|
3. `${VAR}` / `${VAR:-default}` interpolation **inside the library** — secret resolution is mandatory and cannot be bypassed by callers.
|
||||||
|
4. Fire-and-forget delivery at both layers: `AppState::notify` (backend) and `WorkflowOrchestrator::notify_*` (workflow engine) spawn background tasks; delivery failures are `warn!`-logged and never surface to API callers.
|
||||||
|
5. Per-event routing config (`NotificationConfig`) maps event names to channel-name lists, not hardcoded channel identifiers.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
### Gaps Addressed
|
||||||
|
|
||||||
|
| Gap | Consequence |
|
||||||
|
|-----|-------------|
|
||||||
|
| No outbound event delivery | Operators must poll 40+ API endpoints to detect state changes |
|
||||||
|
| Secrets in TOML as plain strings | If resolution is left to callers, a `${SLACK_WEBHOOK_URL}` placeholder reaches the HTTP layer verbatim when the caller forgets to interpolate |
|
||||||
|
| Tight vendor coupling | Using `slack-rs` / `serenity` locks the feature to specific Slack/Discord API versions and transitive dependency trees |
|
||||||
|
|
||||||
|
### Why `NotificationChannel` Trait Over Vendor SDKs
|
||||||
|
|
||||||
|
Slack, Discord, and Telegram all accept a simple `POST` with a JSON body to a webhook URL — no OAuth, no persistent connection, no stateful session. A trait with one async method covers all three with less than 50 lines per implementation. Vendor SDKs add 200–500 kB of transitive dependencies and introduce breaking changes on provider API updates.
|
||||||
|
|
||||||
|
### Why Secret Resolution in the Library
|
||||||
|
|
||||||
|
Placing the responsibility on the caller creates a **TOFU gap**: the first time any caller forgets to call `resolve_secrets()` before constructing `ChannelRegistry`, a raw `${SLACK_WEBHOOK_URL}` string is sent to Slack's API as the URL. The request fails silently (Slack returns 404 or 400), the placeholder leaks in logs, and no compile-time or runtime warning is raised until a log is inspected.
|
||||||
|
|
||||||
|
Moving interpolation into `ChannelRegistry::from_config` makes it **structurally impossible to construct a registry with unresolved secrets**: `ChannelError::SecretNotFound(var_name)` is returned immediately if an env var is absent and no default is provided. There is no non-error path that bypasses resolution.
|
||||||
|
|
||||||
|
### Why Fire-and-Forget With `tokio::spawn`
|
||||||
|
|
||||||
|
Notification delivery is a best-effort side-effect, not part of the request/response contract. A Slack outage should not cause a `POST /api/v1/proposals/:id/approve` to return 500. Spawning an independent task decouples delivery latency from API latency; `warn!` logging provides observability without blocking the caller.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Crate Structure (`vapora-channels`)
|
||||||
|
|
||||||
|
```
|
||||||
|
vapora-channels/
|
||||||
|
├── src/
|
||||||
|
│ ├── lib.rs — pub re-exports (ChannelRegistry, Message, NotificationChannel)
|
||||||
|
│ ├── channel.rs — NotificationChannel trait
|
||||||
|
│ ├── config.rs — ChannelsConfig, ChannelConfig, SlackConfig/DiscordConfig/TelegramConfig
|
||||||
|
│ │ resolve_secrets() chain + interpolate() with OnceLock<Regex>
|
||||||
|
│ ├── error.rs — ChannelError: NotFound, ApiError, SecretNotFound, SerializationError
|
||||||
|
│ ├── message.rs — Message { title, body, level: Info|Success|Warning|Error }
|
||||||
|
│ ├── registry.rs — ChannelRegistry: name → Arc<dyn NotificationChannel>
|
||||||
|
│ └── webhooks/
|
||||||
|
│ ├── slack.rs — SlackChannel: POST IncomingWebhook JSON
|
||||||
|
│ ├── discord.rs — DiscordChannel: POST Webhook embed JSON
|
||||||
|
│ └── telegram.rs— TelegramChannel: POST bot sendMessage JSON
|
||||||
|
```
|
||||||
|
|
||||||
|
### Secret Resolution
|
||||||
|
|
||||||
|
```
|
||||||
|
interpolate(s: &str) -> Result<String>:
|
||||||
|
regex: \$\{([^}:]+)(?::-(.*?))?\} (compiled once via OnceLock)
|
||||||
|
fast-path: if !s.contains("${") { return Ok(s) }
|
||||||
|
for each capture:
|
||||||
|
var_name = capture[1]
|
||||||
|
default = capture[2] (optional)
|
||||||
|
match env::var(var_name):
|
||||||
|
Ok(v) → replace placeholder with v
|
||||||
|
Err(_) → if default.is_some(): replace with default
|
||||||
|
else: return Err(SecretNotFound(var_name))
|
||||||
|
```
|
||||||
|
|
||||||
|
`resolve_secrets()` is called unconditionally in `ChannelRegistry::from_config` — single mandatory call site, no consumer bypass.
|
||||||
|
|
||||||
|
### Integration Points
|
||||||
|
|
||||||
|
#### `vapora-workflow-engine`
|
||||||
|
|
||||||
|
`WorkflowConfig.notifications: WorkflowNotifications` maps four events to channel-name lists:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[workflows.myflow.notifications]
|
||||||
|
on_stage_complete = ["team-slack"]
|
||||||
|
on_stage_failed = ["team-slack", "ops-discord"]
|
||||||
|
on_completed = ["team-slack"]
|
||||||
|
on_cancelled = ["ops-discord"]
|
||||||
|
```
|
||||||
|
|
||||||
|
`WorkflowOrchestrator` holds `Option<Arc<ChannelRegistry>>` and calls `notify_stage_complete`, `notify_stage_failed`, `notify_completed`, `notify_cancelled` — each spawns `dispatch_notifications`.
|
||||||
|
|
||||||
|
#### `vapora-backend`
|
||||||
|
|
||||||
|
`Config.channels: HashMap<String, ChannelConfig>` and `Config.notifications: NotificationConfig`:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[channels.team-slack]
|
||||||
|
type = "slack"
|
||||||
|
webhook_url = "${SLACK_WEBHOOK_URL}"
|
||||||
|
|
||||||
|
[notifications]
|
||||||
|
on_task_done = ["team-slack"]
|
||||||
|
on_proposal_approved = ["team-slack", "ops-discord"]
|
||||||
|
on_proposal_rejected = ["ops-discord"]
|
||||||
|
```
|
||||||
|
|
||||||
|
`AppState` gains `channel_registry: Option<Arc<ChannelRegistry>>` and `notification_config: Arc<NotificationConfig>`. Hooks in three existing handlers:
|
||||||
|
|
||||||
|
- `update_task_status` — fires `Message::success` on `TaskStatus::Done`
|
||||||
|
- `approve_proposal` — fires `Message::success`
|
||||||
|
- `reject_proposal` — fires `Message::warning`
|
||||||
|
|
||||||
|
#### New REST Endpoints
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `GET` | `/api/v1/channels` | List registered channel names |
|
||||||
|
| `POST` | `/api/v1/channels/:name/test` | Send connectivity test; 200 OK / 404 / 502 |
|
||||||
|
|
||||||
|
### Testability
|
||||||
|
|
||||||
|
`dispatch_notifications` is extracted as `pub(crate) async fn` taking `Option<Arc<ChannelRegistry>>` directly, making it testable without a DB or a fully-constructed `AppState`. Five inline tests in `state.rs` use `RecordingChannel` (captures messages) and `FailingChannel` (returns 503 error) to verify:
|
||||||
|
|
||||||
|
1. No-op when registry is `None`
|
||||||
|
2. Single-channel delivery
|
||||||
|
3. Multi-channel broadcast
|
||||||
|
4. Resilience: delivery continues after one channel fails
|
||||||
|
5. Warn-log on unknown channel name, other channels still receive
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
|
||||||
|
- Operators get real-time Slack/Discord/Telegram alerts on task completion, proposal decisions, and workflow lifecycle events.
|
||||||
|
- Adding a new channel type requires implementing one trait method and one TOML variant — no changes to routing or dispatch code.
|
||||||
|
- Secret resolution failures surface immediately at startup (if `ChannelRegistry::from_config` is called at boot), not silently at first delivery.
|
||||||
|
- Zero additional infrastructure: webhooks are outbound-only HTTP POSTs.
|
||||||
|
|
||||||
|
### Negative / Trade-offs
|
||||||
|
|
||||||
|
- Delivery is best-effort (fire-and-forget). A channel that is consistently down produces `warn!` logs but no alert escalation; consumers needing guaranteed delivery must implement their own retry loop or use a message queue.
|
||||||
|
- `${VAR}` interpolation uses `unsafe { std::env::set_var }` in tests (required by Rust 1.80 stabilization of the unsafety annotation). Tests set/unset env vars; multi-threaded test parallelism can cause flaky results if not isolated with `#[serial_test::serial]`.
|
||||||
|
- No per-channel rate limiting: a workflow that fires 1,000 stage-complete events will produce 1,000 Slack messages. Operators must configure `notifications` lists deliberately.
|
||||||
|
|
||||||
|
### Supersedes / Specializes
|
||||||
|
|
||||||
|
- Builds on `SecretumVault` pattern (ADR-0011) philosophy of never storing secrets as plain strings; specializes it to config-file webhook tokens.
|
||||||
|
- Parallel to `vapora-a2a-client`'s retry pattern (ADR-0030) — both handle external HTTP delivery, but channels are fire-and-forget while A2A requires confirmed response.
|
||||||
@ -2,8 +2,8 @@
|
|||||||
|
|
||||||
Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.
|
Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.
|
||||||
|
|
||||||
**Status**: Complete (33 ADRs documented)
|
**Status**: Complete (35 ADRs documented)
|
||||||
**Last Updated**: 2026-02-21
|
**Last Updated**: 2026-02-26
|
||||||
**Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)
|
**Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)
|
||||||
|
|
||||||
---
|
---
|
||||||
@ -81,6 +81,8 @@ Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestació
|
|||||||
| [028](./0028-workflow-orchestrator.md) | Workflow Orchestrator para Multi-Agent Pipelines | Short-lived agent contexts + artifact passing para reducir cache tokens 95% | ✅ Accepted |
|
| [028](./0028-workflow-orchestrator.md) | Workflow Orchestrator para Multi-Agent Pipelines | Short-lived agent contexts + artifact passing para reducir cache tokens 95% | ✅ Accepted |
|
||||||
| [029](./0029-rlm-recursive-language-models.md) | Recursive Language Models (RLM) | Custom Rust engine: BM25 + semantic hybrid search + distributed LLM dispatch + WASM/Docker sandbox | ✅ Accepted |
|
| [029](./0029-rlm-recursive-language-models.md) | Recursive Language Models (RLM) | Custom Rust engine: BM25 + semantic hybrid search + distributed LLM dispatch + WASM/Docker sandbox | ✅ Accepted |
|
||||||
| [033](./0033-stratum-orchestrator-workflow-hardening.md) | Workflow Engine Hardening — Persistence · Saga · Cedar | SurrealDB persistence + Saga best-effort rollback + Cedar per-stage auth; stratum patterns implemented natively (no path dep) | ✅ Implemented |
|
| [033](./0033-stratum-orchestrator-workflow-hardening.md) | Workflow Engine Hardening — Persistence · Saga · Cedar | SurrealDB persistence + Saga best-effort rollback + Cedar per-stage auth; stratum patterns implemented natively (no path dep) | ✅ Implemented |
|
||||||
|
| [034](./0034-autonomous-scheduling.md) | Autonomous Scheduling — Timezone Support and Distributed Fire-Lock | `chrono-tz` IANA-aware cron evaluation + SurrealDB conditional UPDATE fire-lock; no external lock service required | ✅ Implemented |
|
||||||
|
| [035](./0035-notification-channels.md) | Webhook-Based Notification Channels — `vapora-channels` Crate | Trait-based webhook delivery (Slack/Discord/Telegram) + `${VAR}` secret resolution built into `ChannelRegistry::from_config`; fire-and-forget via `tokio::spawn` | ✅ Implemented |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -141,6 +143,8 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.
|
|||||||
- **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
|
- **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
|
||||||
- **Workflow Orchestrator**: Short-lived agent contexts + artifact passing reduce cache token costs ~95% vs monolithic sessions
|
- **Workflow Orchestrator**: Short-lived agent contexts + artifact passing reduce cache token costs ~95% vs monolithic sessions
|
||||||
- **Recursive Language Models (RLM)**: Hybrid BM25+semantic search + distributed LLM dispatch + WASM/Docker sandbox enables reasoning over 100k+ token documents
|
- **Recursive Language Models (RLM)**: Hybrid BM25+semantic search + distributed LLM dispatch + WASM/Docker sandbox enables reasoning over 100k+ token documents
|
||||||
|
- **Autonomous Scheduling**: `chrono-tz` IANA-aware cron evaluation + SurrealDB CAS fire-lock eliminates double-fires in multi-instance deployments without external lock infrastructure
|
||||||
|
- **Notification Channels**: Trait-based webhook delivery with `${VAR}` secret resolution built into `ChannelRegistry` construction — operators get real-time Slack/Discord/Telegram alerts with zero new infrastructure
|
||||||
|
|
||||||
### 🔧 Development Patterns
|
### 🔧 Development Patterns
|
||||||
|
|
||||||
|
|||||||
@ -5,3 +5,5 @@ VAPORA capabilities and overview documentation.
|
|||||||
## Contents
|
## Contents
|
||||||
|
|
||||||
- **[Features Overview](overview.md)** — Complete feature list and descriptions including learning-based agent selection, cost optimization, and swarm coordination
|
- **[Features Overview](overview.md)** — Complete feature list and descriptions including learning-based agent selection, cost optimization, and swarm coordination
|
||||||
|
- **[Workflow Orchestrator](workflow-orchestrator.md)** — Multi-stage pipelines, approval gates, artifacts, autonomous scheduling, and distributed fire-lock
|
||||||
|
- **[Notification Channels](notification-channels.md)** — Webhook delivery to Slack, Discord, and Telegram with built-in secret resolution
|
||||||
|
|||||||
236
docs/features/notification-channels.md
Normal file
236
docs/features/notification-channels.md
Normal file
@ -0,0 +1,236 @@
|
|||||||
|
# Notification Channels
|
||||||
|
|
||||||
|
Real-time outbound alerts to Slack, Discord, and Telegram via webhook delivery.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
`vapora-channels` provides a trait-based webhook notification layer. When VAPORA events occur (task completion, proposal decisions, workflow lifecycle), configured channels receive a message immediately — no polling required.
|
||||||
|
|
||||||
|
**Key properties**:
|
||||||
|
|
||||||
|
- No vendor SDKs — plain HTTP POST to webhook URLs
|
||||||
|
- Secret tokens resolved from environment variables at startup; a raw `${VAR}` placeholder never reaches the HTTP layer
|
||||||
|
- Fire-and-forget delivery: channel failures never surface as API errors
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
All channel configuration lives in `vapora.toml`.
|
||||||
|
|
||||||
|
### Declaring channels
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[channels.team-slack]
|
||||||
|
type = "slack"
|
||||||
|
webhook_url = "${SLACK_WEBHOOK_URL}"
|
||||||
|
|
||||||
|
[channels.ops-discord]
|
||||||
|
type = "discord"
|
||||||
|
webhook_url = "${DISCORD_WEBHOOK_URL}"
|
||||||
|
|
||||||
|
[channels.alerts-telegram]
|
||||||
|
type = "telegram"
|
||||||
|
bot_token = "${TELEGRAM_BOT_TOKEN}"
|
||||||
|
chat_id = "${TELEGRAM_CHAT_ID}"
|
||||||
|
```
|
||||||
|
|
||||||
|
Channel names (`team-slack`, `ops-discord`, `alerts-telegram`) are arbitrary identifiers used in event routing below.
|
||||||
|
|
||||||
|
### Routing events to channels
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[notifications]
|
||||||
|
on_task_done = ["team-slack"]
|
||||||
|
on_proposal_approved = ["team-slack", "ops-discord"]
|
||||||
|
on_proposal_rejected = ["ops-discord"]
|
||||||
|
```
|
||||||
|
|
||||||
|
Each key is an event name; the value is a list of channel names declared in `[channels.*]`. An empty list or absent key means no notification for that event.
|
||||||
|
|
||||||
|
### Workflow lifecycle notifications
|
||||||
|
|
||||||
|
Per-workflow notification targets are set in the workflow template:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[[workflows]]
|
||||||
|
name = "nightly_analysis"
|
||||||
|
trigger = "schedule"
|
||||||
|
|
||||||
|
[workflows.nightly_analysis.notifications]
|
||||||
|
on_stage_complete = ["team-slack"]
|
||||||
|
on_stage_failed = ["team-slack", "ops-discord"]
|
||||||
|
on_completed = ["team-slack"]
|
||||||
|
on_cancelled = ["ops-discord"]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Secret Resolution
|
||||||
|
|
||||||
|
Token values in `[channels.*]` blocks are interpolated from the environment before any network call is made. Two syntaxes are supported:
|
||||||
|
|
||||||
|
| Syntax | Behaviour |
|
||||||
|
|--------|-----------|
|
||||||
|
| `"${VAR}"` | Replaced with `$VAR`; startup fails if the variable is unset |
|
||||||
|
| `"${VAR:-default}"` | Replaced with `$VAR` if set, otherwise `default` |
|
||||||
|
|
||||||
|
Resolution happens inside `ChannelRegistry::from_config` — the single mandatory call site. There is no way to construct a registry with an unresolved placeholder.
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/T.../..."
|
||||||
|
export DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/..."
|
||||||
|
export TELEGRAM_BOT_TOKEN="123456:ABC..."
|
||||||
|
export TELEGRAM_CHAT_ID="-1001234567890"
|
||||||
|
```
|
||||||
|
|
||||||
|
If a required variable is absent and no default is provided, VAPORA exits at startup with:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Error: Secret reference '${SLACK_WEBHOOK_URL}' not resolved: env var not set and no default provided
|
||||||
|
```
|
||||||
|
|
||||||
|
## Supported Channel Types
|
||||||
|
|
||||||
|
### Slack
|
||||||
|
|
||||||
|
Uses the [Incoming Webhooks](https://api.slack.com/messaging/webhooks) API. The webhook URL is obtained from Slack's app configuration.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[channels.my-slack]
|
||||||
|
type = "slack"
|
||||||
|
webhook_url = "${SLACK_WEBHOOK_URL}"
|
||||||
|
```
|
||||||
|
|
||||||
|
Payload format: `{ "text": "**Title**\nBody" }`. No SDK dependency.
|
||||||
|
|
||||||
|
### Discord
|
||||||
|
|
||||||
|
Uses the [Discord Webhook](https://discord.com/developers/docs/resources/webhook) endpoint. The webhook URL includes the token — obtain it from the channel's Integrations settings.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[channels.my-discord]
|
||||||
|
type = "discord"
|
||||||
|
webhook_url = "${DISCORD_WEBHOOK_URL}"
|
||||||
|
```
|
||||||
|
|
||||||
|
Payload format: `{ "embeds": [{ "title": "...", "description": "...", "color": <level-color> }] }`.
|
||||||
|
|
||||||
|
### Telegram
|
||||||
|
|
||||||
|
Uses the [Bot API](https://core.telegram.org/bots/api#sendmessage) `sendMessage` endpoint. Requires a bot token from `@BotFather` and the numeric chat ID of the target group or channel.
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[channels.my-telegram]
|
||||||
|
type = "telegram"
|
||||||
|
bot_token = "${TELEGRAM_BOT_TOKEN}"
|
||||||
|
chat_id = "${TELEGRAM_CHAT_ID}"
|
||||||
|
```
|
||||||
|
|
||||||
|
Payload format: `{ "chat_id": "...", "text": "**Title**\nBody", "parse_mode": "Markdown" }`.
|
||||||
|
|
||||||
|
## Message Levels
|
||||||
|
|
||||||
|
Every notification carries a level that controls colour and emoji in the rendered message:
|
||||||
|
|
||||||
|
| Level | Constructor | Use case |
|
||||||
|
|-------|-------------|----------|
|
||||||
|
| `Info` | `Message::info(title, body)` | General status updates |
|
||||||
|
| `Success` | `Message::success(title, body)` | Task done, workflow completed |
|
||||||
|
| `Warning` | `Message::warning(title, body)` | Proposal rejected, stage failed |
|
||||||
|
| `Error` | `Message::error(title, body)` | Unrecoverable failure |
|
||||||
|
|
||||||
|
## REST API
|
||||||
|
|
||||||
|
Two endpoints are available under `/api/v1/channels`:
|
||||||
|
|
||||||
|
### List channels
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/v1/channels
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns the names of all registered channels (sorted alphabetically). Returns an empty list when no channels are configured.
|
||||||
|
|
||||||
|
**Response**:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"channels": ["ops-discord", "team-slack"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test a channel
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/v1/channels/:name/test
|
||||||
|
```
|
||||||
|
|
||||||
|
Sends a connectivity test message to the named channel and returns synchronously.
|
||||||
|
|
||||||
|
| Status | Meaning |
|
||||||
|
|--------|---------|
|
||||||
|
| `200 OK` | Message delivered successfully |
|
||||||
|
| `404 Not Found` | Channel name unknown or no channels configured |
|
||||||
|
| `502 Bad Gateway` | Delivery attempt failed at the remote platform |
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:8001/api/v1/channels/team-slack/test
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected Slack message: `Test notification — Connectivity test from VAPORA backend for channel 'team-slack'`
|
||||||
|
|
||||||
|
## Delivery Semantics
|
||||||
|
|
||||||
|
Delivery is **fire-and-forget**: `AppState::notify` spawns a background Tokio task and returns immediately. The API response does not wait for webhook delivery to complete.
|
||||||
|
|
||||||
|
Behaviour on failure:
|
||||||
|
|
||||||
|
- Unknown channel name: `warn!` log, delivery to other targets continues
|
||||||
|
- HTTP error from the remote platform: `warn!` log, delivery to other targets continues
|
||||||
|
- No channels configured (`channel_registry = None`): silent no-op
|
||||||
|
|
||||||
|
There is no built-in retry. A channel that is consistently unreachable produces `warn!` log lines but no escalation. Use the `/test` endpoint to confirm connectivity after configuration changes.
|
||||||
|
|
||||||
|
## Events Reference
|
||||||
|
|
||||||
|
| Event key | Trigger | Default level |
|
||||||
|
|-----------|---------|---------------|
|
||||||
|
| `on_task_done` | Task moved to `Done` status | `Success` |
|
||||||
|
| `on_proposal_approved` | Proposal approved via API | `Success` |
|
||||||
|
| `on_proposal_rejected` | Proposal rejected via API | `Warning` |
|
||||||
|
| `on_stage_complete` | Workflow stage finished | `Info` |
|
||||||
|
| `on_stage_failed` | Workflow stage failed | `Warning` |
|
||||||
|
| `on_completed` | Workflow reached terminal `Completed` state | `Success` |
|
||||||
|
| `on_cancelled` | Workflow cancelled | `Warning` |
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Channel not receiving messages
|
||||||
|
|
||||||
|
1. Verify the channel name in `[notifications]` matches the name in `[channels.*]` exactly (case-sensitive).
|
||||||
|
2. Confirm the env variable is set: `echo $SLACK_WEBHOOK_URL`.
|
||||||
|
3. Send a test message: `POST /api/v1/channels/<name>/test`.
|
||||||
|
4. Check backend logs for `warn` entries with `channel = "<name>"`.
|
||||||
|
|
||||||
|
### Startup fails with `SecretNotFound`
|
||||||
|
|
||||||
|
The env variable referenced in `webhook_url` or `bot_token`/`chat_id` is not set. Either export the variable or add a default value:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
webhook_url = "${SLACK_WEBHOOK_URL:-https://hooks.slack.com/...}"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Discord returns 400
|
||||||
|
|
||||||
|
The webhook URL must end with `/slack` for Slack-compatible mode, or be the raw Discord webhook URL. Ensure the URL copied from Discord's channel settings is used without modification.
|
||||||
|
|
||||||
|
### Telegram chat_id not found
|
||||||
|
|
||||||
|
The bot must be a member of the target group or channel. For groups, prefix the numeric ID with `-` (e.g. `-1001234567890`). Use `@userinfobot` in Telegram to retrieve your chat ID.
|
||||||
|
|
||||||
|
## Related Documentation
|
||||||
|
|
||||||
|
- [Workflow Orchestrator](./workflow-orchestrator.md) — workflow lifecycle events and notification config
|
||||||
|
- [ADR-0035: Notification Channels](../adrs/0035-notification-channels.md) — design rationale
|
||||||
|
- [ADR-0011: SecretumVault](../adrs/0011-secretumvault.md) — secret management philosophy
|
||||||
@ -528,11 +528,85 @@ docker logs vapora-backend
|
|||||||
nats sub "vapora.tasks.>"
|
nats sub "vapora.tasks.>"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Autonomous Scheduling
|
||||||
|
|
||||||
|
Workflows with `trigger = "schedule"` fire automatically on a cron expression without any REST trigger.
|
||||||
|
|
||||||
|
### TOML Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[[workflows]]
|
||||||
|
name = "nightly_analysis"
|
||||||
|
trigger = "schedule"
|
||||||
|
|
||||||
|
[workflows.schedule]
|
||||||
|
cron = "0 2 * * *" # 5-field: min hour dom month dow
|
||||||
|
timezone = "America/New_York" # IANA identifier; omit for UTC
|
||||||
|
allow_concurrent = false # skip if previous run is still active
|
||||||
|
catch_up = false # fire missed slots on restart (capped 10)
|
||||||
|
|
||||||
|
[[workflows.stages]]
|
||||||
|
name = "analyze"
|
||||||
|
agents = ["analyst"]
|
||||||
|
```
|
||||||
|
|
||||||
|
Cron accepts 5-field (standard shell), 6-field (with seconds), or 7-field (with seconds + year). The expression is validated at config-load time — startup fails on invalid cron or unknown timezone.
|
||||||
|
|
||||||
|
### Schedule REST API
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `GET` | `/api/v1/schedules` | List all schedules |
|
||||||
|
| `GET` | `/api/v1/schedules/:id` | Get one schedule |
|
||||||
|
| `PUT` | `/api/v1/schedules/:id` | Create or fully replace |
|
||||||
|
| `PATCH` | `/api/v1/schedules/:id` | Partial update |
|
||||||
|
| `DELETE` | `/api/v1/schedules/:id` | Remove |
|
||||||
|
| `GET` | `/api/v1/schedules/:id/runs` | Execution history (last 100) |
|
||||||
|
| `POST` | `/api/v1/schedules/:id/fire` | Manual trigger bypassing cron |
|
||||||
|
|
||||||
|
**PUT body** (all fields):
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"template_name": "nightly_analysis",
|
||||||
|
"cron_expression": "0 2 * * *",
|
||||||
|
"timezone": "America/New_York",
|
||||||
|
"enabled": true,
|
||||||
|
"allow_concurrent": false,
|
||||||
|
"catch_up": false,
|
||||||
|
"initial_context": {}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**PATCH body** (only changed fields):
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "enabled": false }
|
||||||
|
```
|
||||||
|
|
||||||
|
### Timezone Support
|
||||||
|
|
||||||
|
`timezone` is an IANA timezone identifier (e.g. `"America/New_York"`, `"Europe/Berlin"`, `"Asia/Tokyo"`). When absent, UTC is used. DST transitions are handled automatically.
|
||||||
|
|
||||||
|
The REST API validates the timezone at the boundary — an unknown identifier returns `400 InvalidInput`.
|
||||||
|
|
||||||
|
### Distributed Fire-Lock
|
||||||
|
|
||||||
|
When multiple VAPORA backend instances run against the same SurrealDB, the scheduler uses a conditional `UPDATE ... WHERE locked_by IS NONE OR locked_at < (now - 120s)` to ensure only one instance fires each schedule per tick. The lock holder is identified by a per-process UUID stored in `locked_by`; it expires automatically after 120 seconds, handling crashed instances.
|
||||||
|
|
||||||
|
### Schedule Metrics (Prometheus)
|
||||||
|
|
||||||
|
- `vapora_schedules_fired_total` — successful fires
|
||||||
|
- `vapora_schedules_skipped_total` — skipped (concurrent guard or distributed lock contention)
|
||||||
|
- `vapora_schedules_failed_total` — workflow start failures
|
||||||
|
- `vapora_active_schedules` — current count (gauge)
|
||||||
|
|
||||||
## Related Documentation
|
## Related Documentation
|
||||||
|
|
||||||
- [CLI Commands Guide](../setup/cli-commands.md) - Command-line usage
|
- [CLI Commands Guide](../setup/cli-commands.md) - Command-line usage
|
||||||
- [Multi-Agent Workflows](../architecture/multi-agent-workflows.md) - Architecture overview
|
- [Multi-Agent Workflows](../architecture/multi-agent-workflows.md) - Architecture overview
|
||||||
- [Agent Registry & Coordination](../architecture/agent-registry-coordination.md) - Agent management
|
- [Agent Registry & Coordination](../architecture/agent-registry-coordination.md) - Agent management
|
||||||
- [ADR-0028: Workflow Orchestrator](../adrs/0028-workflow-orchestrator.md) - Decision rationale
|
- [ADR-0028: Workflow Orchestrator](../adrs/0028-workflow-orchestrator.md) - Decision rationale
|
||||||
|
- [ADR-0034: Autonomous Scheduling](../adrs/0034-autonomous-scheduling.md) - Scheduling design decisions
|
||||||
- [ADR-0014: Learning-Based Agent Selection](../adrs/0014-learning-profiles.md) - Agent selection
|
- [ADR-0014: Learning-Based Agent Selection](../adrs/0014-learning-profiles.md) - Agent selection
|
||||||
- [ADR-0015: Budget Enforcement](../adrs/0015-budget-enforcement.md) - Cost control
|
- [ADR-0015: Budget Enforcement](../adrs/0015-budget-enforcement.md) - Cost control
|
||||||
|
|||||||
1
justfiles/rust-axum
Symbolic link
1
justfiles/rust-axum
Symbolic link
@ -0,0 +1 @@
|
|||||||
|
/Users/Akasha/Tools/dev-system/languages/rust/just-modules/axum
|
||||||
1
justfiles/rust-cargo
Symbolic link
1
justfiles/rust-cargo
Symbolic link
@ -0,0 +1 @@
|
|||||||
|
/Users/Akasha/Tools/dev-system/languages/rust/just-modules/cargo
|
||||||
1
justfiles/rust-leptos
Symbolic link
1
justfiles/rust-leptos
Symbolic link
@ -0,0 +1 @@
|
|||||||
|
/Users/Akasha/Tools/dev-system/languages/rust/just-modules/leptos
|
||||||
44
migrations/010_scheduled_workflows.surql
Normal file
44
migrations/010_scheduled_workflows.surql
Normal file
@ -0,0 +1,44 @@
|
|||||||
|
-- Migration 010: Scheduled Workflow Definitions and Run History
|
||||||
|
-- Enables autonomous cron-based workflow firing without REST triggers.
|
||||||
|
-- Two tables: schedule definitions (managed by TOML + DB) and an append-only run log.
|
||||||
|
|
||||||
|
DEFINE TABLE scheduled_workflows SCHEMAFULL;
|
||||||
|
|
||||||
|
DEFINE FIELD id ON TABLE scheduled_workflows TYPE record<scheduled_workflows>;
|
||||||
|
DEFINE FIELD template_name ON TABLE scheduled_workflows TYPE string ASSERT $value != NONE;
|
||||||
|
DEFINE FIELD cron_expression ON TABLE scheduled_workflows TYPE string ASSERT $value != NONE;
|
||||||
|
DEFINE FIELD initial_context ON TABLE scheduled_workflows FLEXIBLE TYPE object DEFAULT {};
|
||||||
|
DEFINE FIELD enabled ON TABLE scheduled_workflows TYPE bool DEFAULT true;
|
||||||
|
DEFINE FIELD allow_concurrent ON TABLE scheduled_workflows TYPE bool DEFAULT false;
|
||||||
|
DEFINE FIELD catch_up ON TABLE scheduled_workflows TYPE bool DEFAULT false;
|
||||||
|
DEFINE FIELD last_fired_at ON TABLE scheduled_workflows TYPE option<datetime> DEFAULT NONE;
|
||||||
|
DEFINE FIELD next_fire_at ON TABLE scheduled_workflows TYPE option<datetime> DEFAULT NONE;
|
||||||
|
DEFINE FIELD runs_count ON TABLE scheduled_workflows TYPE int DEFAULT 0;
|
||||||
|
DEFINE FIELD created_at ON TABLE scheduled_workflows TYPE datetime DEFAULT time::now();
|
||||||
|
DEFINE FIELD updated_at ON TABLE scheduled_workflows TYPE datetime DEFAULT time::now() VALUE time::now();
|
||||||
|
|
||||||
|
DEFINE INDEX idx_scheduled_workflows_template
|
||||||
|
ON TABLE scheduled_workflows COLUMNS template_name;
|
||||||
|
|
||||||
|
DEFINE INDEX idx_scheduled_workflows_enabled
|
||||||
|
ON TABLE scheduled_workflows COLUMNS enabled;
|
||||||
|
|
||||||
|
-- Append-only execution history for audit and debugging.
|
||||||
|
DEFINE TABLE schedule_runs SCHEMAFULL;
|
||||||
|
|
||||||
|
DEFINE FIELD id ON TABLE schedule_runs TYPE record<schedule_runs>;
|
||||||
|
DEFINE FIELD schedule_id ON TABLE schedule_runs TYPE string ASSERT $value != NONE;
|
||||||
|
DEFINE FIELD workflow_instance_id ON TABLE schedule_runs TYPE option<string> DEFAULT NONE;
|
||||||
|
DEFINE FIELD fired_at ON TABLE schedule_runs TYPE datetime ASSERT $value != NONE;
|
||||||
|
DEFINE FIELD status ON TABLE schedule_runs TYPE string
|
||||||
|
ASSERT $value INSIDE ['Fired', 'Skipped', 'Failed'];
|
||||||
|
DEFINE FIELD notes ON TABLE schedule_runs TYPE option<string> DEFAULT NONE;
|
||||||
|
|
||||||
|
DEFINE INDEX idx_schedule_runs_schedule_id
|
||||||
|
ON TABLE schedule_runs COLUMNS schedule_id;
|
||||||
|
|
||||||
|
DEFINE INDEX idx_schedule_runs_fired_at
|
||||||
|
ON TABLE schedule_runs COLUMNS fired_at;
|
||||||
|
|
||||||
|
DEFINE INDEX idx_schedule_runs_schedule_fired
|
||||||
|
ON TABLE schedule_runs COLUMNS schedule_id, fired_at;
|
||||||
9
migrations/011_schedule_tz_lock.surql
Normal file
9
migrations/011_schedule_tz_lock.surql
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
-- Migration 011: Timezone and Distributed Fire-Lock for Scheduled Workflows
|
||||||
|
-- Extends the scheduled_workflows table with:
|
||||||
|
-- timezone — IANA timezone identifier for cron evaluation (e.g. "America/New_York")
|
||||||
|
-- locked_by — instance UUID holding the current fire lock (prevents double-fires)
|
||||||
|
-- locked_at — when the lock was acquired; TTL = 120 s is enforced in application code
|
||||||
|
|
||||||
|
DEFINE FIELD timezone ON TABLE scheduled_workflows TYPE option<string> DEFAULT NONE;
|
||||||
|
DEFINE FIELD locked_by ON TABLE scheduled_workflows TYPE option<string> DEFAULT NONE;
|
||||||
|
DEFINE FIELD locked_at ON TABLE scheduled_workflows TYPE option<datetime> DEFAULT NONE;
|
||||||
Loading…
x
Reference in New Issue
Block a user