Architecture

System design for safe production failure replay

FluxRun splits the system across SDK capture, customer-controlled replay agents, low-latency ingest, durable payload storage, query APIs, and a dashboard built for incident triage.

End-to-end path

SDK to replay, with clear trust boundaries

The core system path is intentionally explicit: capture in the app, store only what the platform needs, and send sensitive replay actions back through the customer's agent.

SDK

Wraps routes and records supported execution boundaries.

Agent Route

Customer runtime endpoint for decrypt, replay, and protected unlock.

Ingest Worker

Validates project auth and receives execution batches.

Storage

Persists indexed summaries and encrypted execution payloads.

Query API

Serves run lists, execution detail, events, network, and issue views.

Dashboard

Svelte UI for triage, replay readiness, and regression workflow.

Replay

Agent runs captured path with recorded IO and no live side effects.

Platform map

What runs where

FluxRun uses Cloudflare for the internet-facing control plane and Golang services for the data-plane path where ingest durability, object storage, and query backends matter.

SDK execution batch
  -> HTTPS POST /v1/executions
  -> project token check
  -> edge WAL append
  -> data-node aggregation
  -> object storage payload
  -> query API visibility

Cloudflare Workers

Auth, management GraphQL, query, alert, and usage services sit close to the browser and SDK control paths.

Golang services

Edge and data-node services handle ingest durability, batching, object storage writes, and query backends.

Svelte dashboard

The app renders failures, execution detail, network evidence, replay readiness, and regression actions.

PostgreSQL

Control-plane state, workspace data, apps, tokens, billing state, and configuration live in relational storage.

OCI Object Storage

Durable payload storage keeps larger execution artifacts out of the relational hot path.

Execution payload storage

Summaries stay queryable while sensitive event bodies and protected payloads remain encrypted or masked.

Recorded IO model

Request state

Captured request method, route, headers, body, request id, and trace metadata.

Recorded IO

Fetches, host RPC calls, env reads, time, timers, randomness, logs, result, and error.

Agent unlock

Dashboard asks the customer agent for short-lived decrypt or replay actions.

Safe replay

Recorded responses are served back to the execution instead of touching live systems.

Protected payload unlock

The dashboard can inspect summaries, but sensitive payload access is routed through the customer agent. Private replay keys stay in the customer runtime with the traced route.

Replay safety boundary

During replay, fetches and host calls are satisfied from captured responses. DB, payment, queue, email, and external API writes do not run again.

Interview talking points

Why this is platform work

The architecture is valuable because product experience, replay semantics, cryptographic ownership, and operational ingest all depend on one another.

Trust boundary

Keep sensitive replay actions customer-controlled.

Latency and durability

Acknowledge ingest only after the write path is durable enough for telemetry.

Product loop

Convert one production failure into a fix and regression guard.