Architecture
System design for safe production failure replay
FluxRun splits the system across SDK capture, customer-controlled replay agents, low-latency ingest, durable payload storage, query APIs, and a dashboard built for incident triage.
End-to-end path
SDK to replay, with clear trust boundaries
The core system path is intentionally explicit: capture in the app, store only what the platform needs, and send sensitive replay actions back through the customer's agent.
01
SDK
Wraps routes and records supported execution boundaries.
02
Agent Route
Customer runtime endpoint for decrypt, replay, and protected unlock.
03
Ingest Worker
Validates project auth and receives execution batches.
04
Storage
Persists indexed summaries and encrypted execution payloads.
05
Query API
Serves run lists, execution detail, events, network, and issue views.
06
Dashboard
Svelte UI for triage, replay readiness, and regression workflow.
07
Replay
Agent runs captured path with recorded IO and no live side effects.
Platform map
What runs where
FluxRun uses Cloudflare for the internet-facing control plane and Golang services for the data-plane path where ingest durability, object storage, and query backends matter.
SDK execution batch
-> HTTPS POST /v1/executions
-> project token check
-> edge WAL append
-> data-node aggregation
-> object storage payload
-> query API visibilityCloudflare Workers
Auth, management GraphQL, query, alert, and usage services sit close to the browser and SDK control paths.
Golang services
Edge and data-node services handle ingest durability, batching, object storage writes, and query backends.
Svelte dashboard
The app renders failures, execution detail, network evidence, replay readiness, and regression actions.
PostgreSQL
Control-plane state, workspace data, apps, tokens, billing state, and configuration live in relational storage.
OCI Object Storage
Durable payload storage keeps larger execution artifacts out of the relational hot path.
Execution payload storage
Summaries stay queryable while sensitive event bodies and protected payloads remain encrypted or masked.
Recorded IO model
Request state
Captured request method, route, headers, body, request id, and trace metadata.
Recorded IO
Fetches, host RPC calls, env reads, time, timers, randomness, logs, result, and error.
Agent unlock
Dashboard asks the customer agent for short-lived decrypt or replay actions.
Safe replay
Recorded responses are served back to the execution instead of touching live systems.
Protected payload unlock
The dashboard can inspect summaries, but sensitive payload access is routed through the customer agent. Private replay keys stay in the customer runtime with the traced route.
Replay safety boundary
During replay, fetches and host calls are satisfied from captured responses. DB, payment, queue, email, and external API writes do not run again.
Interview talking points
Why this is platform work
The architecture is valuable because product experience, replay semantics, cryptographic ownership, and operational ingest all depend on one another.
Trust boundary
Keep sensitive replay actions customer-controlled.
Latency and durability
Acknowledge ingest only after the write path is durable enough for telemetry.
Product loop
Convert one production failure into a fix and regression guard.