Add cron-based autonomous workflow firing with two hardening layers:
- Timezone-aware scheduling via chrono-tz: ScheduledWorkflow.timezone
(IANA identifier), compute_next_fire_at/after_tz, validate_timezone;
DST-safe, UTC fallback when absent; validated at config load and REST API
- Distributed fire-lock via SurrealDB conditional UPDATE (locked_by/locked_at
fields, 120 s TTL); WorkflowScheduler gains instance_id (UUID) as lock owner;
prevents double-fires across multi-instance deployments without extra infra
- ScheduleStore: try_acquire_fire_lock, release_fire_lock (own-instance guard),
full CRUD (load_one/all, full_upsert, patch, delete, load_runs)
- REST: 7 endpoints (GET/PUT/PATCH/DELETE schedules, runs history, manual fire)
with timezone field in all request/response types
- Migrations 010 (schedule tables) + 011 (timezone + lock columns)
- Tests: 48 passing (was 26); ADR-0034; changelog; feature docs updated
5.0 KiB
ADR-0034: Autonomous Cron Scheduling — Timezone Support and Distributed Fire-Lock
Status: Implemented
Date: 2026-02-26
Deciders: VAPORA Team
Technical Story: vapora-workflow-engine scheduler fired cron jobs only in UTC and had no protection against double-fires in multi-instance deployments.
Decision
Extend the autonomous scheduling subsystem with two independent hardening layers:
- Timezone-aware scheduling (
chrono-tz) — cron expressions evaluated in any IANA timezone, stored per-schedule, validated at API and config-load boundaries. - Distributed fire-lock — SurrealDB conditional
UPDATE ... WHERE locked_by IS NONE OR locked_at < $expiryprovides atomic, TTL-backed mutual exclusion across instances without additional infrastructure.
Context
Gaps Addressed
| Gap | Consequence |
|---|---|
| UTC-only cron evaluation | "0 9 * * *" fires at 09:00 UTC regardless of business timezone; scheduled reports or maintenance windows drift by the UTC offset |
| No distributed coordination | Two vapora-workflow-engine instances reading the same scheduled_workflows table both fire the same schedule at the same tick |
Why These Approaches
chrono-tz over manual UTC-offset arithmetic:
- Compile-time exhaustive enum of all IANA timezone names — invalid names are rejected at parse time.
- The
croncrate'sSchedule::upcoming(tz)/Schedule::after(&dt_in_tz)are generic over anyTimeZone, so timezone-awareness requires no special-casing in iteration logic: passDateTime<chrono_tz::Tz>instead ofDateTime<Utc>, convert output with.with_timezone(&Utc). - DST transitions handled automatically by
chrono-tz— no application code needed.
SurrealDB conditional UPDATE over external distributed lock (Redis, etcd):
- No additional infrastructure dependency.
- SurrealDB applies document-level write locking;
UPDATE record WHERE conditionis atomic — two concurrent instances race on the same document and only one succeeds (non-empty return array = lock acquired). - 120-second TTL enforced in application code:
locked_at < $expiryin the WHERE clause auto-expires a lock from a crashed instance within two scheduler ticks.
Implementation
New Fields
scheduled_workflows table gains three columns (migration 011):
| Field | Type | Purpose |
|---|---|---|
timezone |
option<string> |
IANA identifier ("America/New_York") or NONE for UTC |
locked_by |
option<string> |
UUID of the instance holding the current fire-lock |
locked_at |
option<datetime> |
When the lock was acquired; used for TTL expiry |
Lock Protocol
Tick N fires schedule S:
try_acquire_fire_lock(id, instance_id, now)
→ UPDATE ... WHERE locked_by IS NONE OR locked_at < (now - 120s)
→ returns true (non-empty) or false (empty)
if false: log + inc schedules_skipped, return
fire_with_lock(S, now) ← actual workflow start
release_fire_lock(id, instance_id)
→ UPDATE ... WHERE locked_by = instance_id
→ own-instance guard prevents stale release
Lock release is always attempted even on fire_with_lock error; a warn! is emitted if release fails (TTL provides fallback).
Timezone-Aware Cron Evaluation
compute_fire_times_tz(schedule, last, now, catch_up, tz):
match tz.parse::<chrono_tz::Tz>():
Some(tz) → schedule.after(&last.with_timezone(&tz))
.take_while(|t| t.with_timezone(&Utc) <= now)
.map(|t| t.with_timezone(&Utc))
None → schedule.after(&last) ← UTC
Parsing an unknown/invalid timezone string silently falls back to UTC — avoids a hard error at runtime if a previously valid TZ identifier is removed from the chrono-tz database in a future upgrade.
API Surface Changes
PUT /api/v1/schedules/:id and PATCH /api/v1/schedules/:id accept and return timezone: Option<String>. Timezone is validated at the API boundary using validate_timezone() (returns 400 InvalidInput for unknown identifiers). Config-file [schedule] blocks also accept timezone and are validated at startup (fail-fast, same as cron).
Consequences
Positive
- Schedules expressed in business-local time — no mental UTC arithmetic for operators.
- Multi-instance deployments safe by default; no external lock service required.
ScheduledWorkflow.timezoneis nullable/optional — all existing schedules without the field default to UTC with no migration required.
Negative / Trade-offs
chrono-tzadds ~2 MB of IANA timezone data to the binary (compile-time embedded).- Distributed lock TTL of 120 s means a worst-case window of one double-fire per 120 s if the winning instance crashes between acquiring the lock and calling
update_after_fire. Acceptable given theschedule_runsaudit log makes duplicates visible. - No multi-PATCH for timezone clearance: passing
timezone: nullin JSON is treated as absent (#[serde(default)]). Clearing timezone (revert to UTC) requires a full PUT.