# ADR-0034: Autonomous Cron Scheduling — Timezone Support and Distributed Fire-Lock **Status**: Implemented **Date**: 2026-02-26 **Deciders**: VAPORA Team **Technical Story**: `vapora-workflow-engine` scheduler fired cron jobs only in UTC and had no protection against double-fires in multi-instance deployments. --- ## Decision Extend the autonomous scheduling subsystem with two independent hardening layers: 1. **Timezone-aware scheduling** (`chrono-tz`) — cron expressions evaluated in any IANA timezone, stored per-schedule, validated at API and config-load boundaries. 2. **Distributed fire-lock** — SurrealDB conditional `UPDATE ... WHERE locked_by IS NONE OR locked_at < $expiry` provides atomic, TTL-backed mutual exclusion across instances without additional infrastructure. --- ## Context ### Gaps Addressed | Gap | Consequence | |-----|-------------| | UTC-only cron evaluation | `"0 9 * * *"` fires at 09:00 UTC regardless of business timezone; scheduled reports or maintenance windows drift by the UTC offset | | No distributed coordination | Two `vapora-workflow-engine` instances reading the same `scheduled_workflows` table both fire the same schedule at the same tick | ### Why These Approaches **`chrono-tz`** over manual UTC-offset arithmetic: - Compile-time exhaustive enum of all IANA timezone names — invalid names are rejected at parse time. - The `cron` crate's `Schedule::upcoming(tz)` / `Schedule::after(&dt_in_tz)` are generic over any `TimeZone`, so timezone-awareness requires no special-casing in iteration logic: pass `DateTime` instead of `DateTime`, convert output with `.with_timezone(&Utc)`. - DST transitions handled automatically by `chrono-tz` — no application code needed. **SurrealDB conditional UPDATE** over external distributed lock (Redis, etcd): - No additional infrastructure dependency. - SurrealDB applies document-level write locking; `UPDATE record WHERE condition` is atomic — two concurrent instances race on the same document and only one succeeds (non-empty return array = lock acquired). - 120-second TTL enforced in application code: `locked_at < $expiry` in the WHERE clause auto-expires a lock from a crashed instance within two scheduler ticks. --- ## Implementation ### New Fields `scheduled_workflows` table gains three columns (migration 011): | Field | Type | Purpose | |-------|------|---------| | `timezone` | `option` | IANA identifier (`"America/New_York"`) or `NONE` for UTC | | `locked_by` | `option` | UUID of the instance holding the current fire-lock | | `locked_at` | `option` | When the lock was acquired; used for TTL expiry | ### Lock Protocol ``` Tick N fires schedule S: try_acquire_fire_lock(id, instance_id, now) → UPDATE ... WHERE locked_by IS NONE OR locked_at < (now - 120s) → returns true (non-empty) or false (empty) if false: log + inc schedules_skipped, return fire_with_lock(S, now) ← actual workflow start release_fire_lock(id, instance_id) → UPDATE ... WHERE locked_by = instance_id → own-instance guard prevents stale release ``` Lock release is always attempted even on `fire_with_lock` error; a `warn!` is emitted if release fails (TTL provides fallback). ### Timezone-Aware Cron Evaluation ``` compute_fire_times_tz(schedule, last, now, catch_up, tz): match tz.parse::(): Some(tz) → schedule.after(&last.with_timezone(&tz)) .take_while(|t| t.with_timezone(&Utc) <= now) .map(|t| t.with_timezone(&Utc)) None → schedule.after(&last) ← UTC ``` Parsing an unknown/invalid timezone string silently falls back to UTC — avoids a hard error at runtime if a previously valid TZ identifier is removed from the `chrono-tz` database in a future upgrade. ### API Surface Changes `PUT /api/v1/schedules/:id` and `PATCH /api/v1/schedules/:id` accept and return `timezone: Option`. Timezone is validated at the API boundary using `validate_timezone()` (returns `400 InvalidInput` for unknown identifiers). Config-file `[schedule]` blocks also accept `timezone` and are validated at startup (fail-fast, same as `cron`). --- ## Consequences ### Positive - Schedules expressed in business-local time — no mental UTC arithmetic for operators. - Multi-instance deployments safe by default; no external lock service required. - `ScheduledWorkflow.timezone` is nullable/optional — all existing schedules without the field default to UTC with no migration required. ### Negative / Trade-offs - `chrono-tz` adds ~2 MB of IANA timezone data to the binary (compile-time embedded). - Distributed lock TTL of 120 s means a worst-case window of one double-fire per 120 s if the winning instance crashes between acquiring the lock and calling `update_after_fire`. Acceptable given the `schedule_runs` audit log makes duplicates visible. - No multi-PATCH for timezone clearance: passing `timezone: null` in JSON is treated as absent (`#[serde(default)]`). Clearing timezone (revert to UTC) requires a full PUT.