SEV-2 · started 14:08 today · updated 14:24
Runway Gen-4 elevated error rate DEGRADED
We're seeing a spike in 429 responses from Runway's API (~12% of requests). Renders are slower than usual and may auto-retry. We're rate-limiting our own dispatcher to back off; expect a return to normal within the hour.
SEV-2 · started 13:59 today · updated 14:18
Hetzner fsn1-dc14-07 storage node offline PARTIAL OUTAGE
One of 8 GPU/storage nodes is unreachable. Jobs are being routed to the remaining 7 nodes. No customer renders have failed because of this — you may see slightly longer queue times for ComfyUI workflows.
MAY 18 · 14:08 → 16:42 · 2h 34m
Stripe webhook backlog cleared RESOLVED
A misconfigured retry policy caused some invoice.paid events to be processed late. No charges were lost, but 14 customers saw their plans switch with a delay. We've added a runbook + alert for queue depth.
MAY 04 · 09:18 → 09:54 · 36m
Suno import returned 502s RESOLVED
Suno had an upstream outage. Imports were retried automatically and all succeeded after the incident.
APR 21 · 22:42 → 23:14 · 32m
Render queue paused due to GPU driver issue RESOLVED
An NVIDIA driver update on fsn1-dc14 nodes 1–4 broke ComfyUI workflows. We rolled the driver back and paused new renders for 32 minutes. Jobs queued during this time were processed automatically.
APR 02 · 11:14 → 11:48 · 34m
Publishing to TikTok degraded RESOLVED
TikTok's Content Posting API returned 5xx errors. We retried with exponential backoff and all clips were eventually published.
MAR 18 · 03:18 → 04:08 · 50m
Authentication outage RESOLVED
Magic-link sign-in failed for 50 minutes due to an expired SSL cert on our mailer. We replaced the cert and added 30-day expiry alerts.