A blocking payment system loses you money every time a provider slows down. Not because it crashes. Because threads freeze on I/O, the queue backs up, and transactions that could have been approved on a different provider never get the chance. The customer sees a timeout. You see a decline. The revenue is gone.
We built Exirom on a fully reactive, non-blocking stack because payment infrastructure is an I/O coordination problem. Every meaningful operation - PSP authorization, database write, cache lookup, message publish - is a network call. If your threads block on those calls, your throughput is capped by your slowest dependency. In payments, that dependency changes every hour.
What reactive means in practice
No thread ever waits for I/O. When a request goes to a PSP, the thread is released immediately and picks up the next transaction. When the response arrives, a different thread continues the flow. This applies to every outbound I/O boundary: PSP authorization requests, PostgreSQL via R2DBC, Redis, Kafka publishing, and outbound webhook delivery. Internal service-to-service calls use gRPC with Kotlin coroutine stubs - suspend functions, not blocking threads.
A traditional thread-per-request server with 200 threads handles 200 concurrent I/O calls. At 500ms average PSP latency, that caps at 400 transactions per second. A reactive server with the same hardware never blocks - the same box pushes 2,000-5,000 concurrent operations. Observed under our production load: 5-10x throughput on identical infrastructure.
Thread Model Comparison
How threads are utilized in blocking vs reactive systems.
Blocking (Traditional)
Every I/O call blocks a thread. DB, cache, PSP, message bus - all waiting.
Reactive (Exirom)
R2DBC, Redis, PSP calls, Kafka - all non-blocking. Threads never idle.
Cascading depends on speed you cannot fake
Smart routing with cascading is the core value of payment orchestration. Provider A declines, Provider B fires instantly. The customer never sees a failure.
But cascading only works if the retry is fast enough. The customer is watching a loading screen. If Provider A takes 800ms to decline and your system needs 100ms of thread scheduling before it can fire the retry, Provider B now has less than a second to authorize before the checkout times out.
In a reactive system, the decline callback triggers the routing engine immediately. No thread scheduling. No queue wait. We measured the gap in production: 12ms between first decline and second authorization request. Under the same load on a blocking architecture, internal benchmarks showed 80-150ms - enough that a meaningful percentage of retries would have timed out at the checkout.
During a production incident where a major provider soft-declined 40% of traffic, our cascading recovered 87% of those declines on secondary providers. The 12ms retry gap made that recovery rate possible. At 80-150ms, modeling shows recovery drops to 60-70%.
“The milliseconds between a decline and a retry are not a technical detail. They are the difference between recovered revenue and a lost customer.
Cascading Timeline
Provider A soft-declines. Retry fires to Provider B in 12ms.
Predictable behavior under stress
Speed matters. But what senior engineers actually care about is predictability.
A blocking system performs fine at normal load. At 60% capacity, it performs fine. At 85%, latency starts climbing. At 95%, it falls off a cliff. Tail latency explodes. Timeouts cascade. The system goes from healthy to degraded to failing in minutes.
Stress Behavior
How latency scales with load in blocking vs reactive systems.
Blocking (Thread-per-Request)
Reactive (Non-blocking)
A reactive system degrades linearly. Throughput scales with load until resource limits are hit, then it applies backpressure - slowing intake rather than collapsing. No cliff. No cascade. No global degradation from one slow dependency.
Provider Isolation
What happens when Provider Alpha degrades to 1800ms latency.
▶ All providers healthy...
Blocking (Thread-per-Request)
Affected: 0%Alpha
280ms
Beta
280ms
Gamma
280ms
Delta
280ms
Reactive (Non-blocking)
Affected: 0%Alpha
280ms
Beta
280ms
Gamma
280ms
Delta
280ms
In payment terms: when Provider Alpha starts responding in 2 seconds instead of 300ms, a blocking system runs out of threads and everything fails - including transactions to Provider Beta, which is perfectly healthy. In our reactive system, Provider Alpha gets backpressure (reduced request rate), the routing engine shifts traffic to healthier providers, and Provider Beta continues at full speed. Stable tail latency across the board.
This is not a nice-to-have. Every PSP has bad days. Network issues, maintenance windows, capacity limits during peak. The question is whether one slow provider takes down your entire payment stack or just reduces its own throughput.
Event-driven, not request-response
Reactive I/O is half the architecture. The other half is how services communicate.
We use Kafka as the backbone. Every transaction state change is an event. The routing engine publishes "routed to Provider Alpha." The PSP adapter publishes "authorization declined." The callback handler publishes status updates hours later. Services consume events independently - no direct calls, no synchronous chains.
Why this matters operationally: when a PSP sends a callback 3 hours after the original transaction, the callback handler - a separate service receiving inbound requests - processes it and publishes the state change to Kafka. The routing engine does not need to be involved. When a settlement file arrives at 2am, the reconciler processes 50,000 line items against the event log. No batch jobs. No cron scripts. Every transaction already has a complete, ordered history.
Event-Driven Architecture
Every service publishes and consumes events through Kafka.
Decoupled
Services operate independently. One failure does not cascade.
Replayable
Every event stored. Full transaction history on demand.
Real-time
Monitoring, alerts, and rerouting consume the same stream.
Monitoring built into the architecture
Because every event flows through Kafka, we get real-time monitoring as a side effect. Not as a bolt-on polling system.
A stream processor computes rolling approval rates, latency percentiles, and error rates per provider, per BIN range, per geography - updated every second. When a provider degrades, the routing engine knows through the same stream. Traffic shifts before the next transaction arrives.
Detection-to-reroute in production: under 3 seconds. Polling-based monitoring with 30-60 second intervals means 3,000-6,000 transactions routed to a failing provider before anyone notices.
“We do not poll for provider health. Every transaction is a health signal. The architecture that processes payments is the same architecture that monitors them.
Idempotency by architecture
PSP callbacks are inbound requests - each provider hits your callback endpoints with status updates. This is a separate problem from outbound I/O. The challenge is not speed. It is correctness.
Every PSP behaves differently. One sends a single callback. Another sends three - pending, approved, settled. A third retries the same callback if it does not get a 200 within 5 seconds. A fourth changes status retroactively 6 hours later. Multiply by 20 providers and thousands of transactions per hour.
Every transaction in our system has a state machine - a directed graph of valid transitions, not a status string. "Settled" cannot move to "pending." Duplicate callbacks are acknowledged but processed once. Out-of-order events are detected by sequence numbers in the event log. The callback handler is a separate service that publishes state transitions to Kafka - decoupled from the authorization flow entirely.
Result: zero duplicate charges. Zero phantom credits. Clean reconciliation data every day. Your finance team works with facts, not artifacts.
Why most platforms are not built this way
Most payment platforms started as blocking monoliths. They work. They process transactions. They generate reports. But they are fundamentally limited by an architecture that was not designed for I/O-heavy, multi-provider, failure-prone workloads.
Rewriting a production payment system from blocking to reactive is a multi-year project that most companies cannot justify. So they optimize around the edges: more servers, bigger thread pools, tuned timeouts. These help - up to a point. But they increase cost without increasing resilience. Scaling a blocking system means paying more to handle the same failure modes.
The alternative - patching a blocking core with async wrappers - creates the worst of both worlds: reactive complexity with blocking failure modes. The system looks modern but collapses the same way under provider degradation.
We started from zero in 2024. No legacy. No migration path. We chose reactive because the problem demanded it. Payment orchestration is I/O coordination under adversarial conditions - slow providers, flaky callbacks, unpredictable latency, concurrent failures. The infrastructure is invisible to operators. But the architecture shows up in every metric that matters: approval rates, latency, uptime, recovery speed.
Architecture is the product.