Active Incidents — Tell Me Which Cluster Is Spiking Right Now

The Other Half of Cluster View

Last week we shipped failure clusters — distinct failure patterns aggregated by fingerprint. Useful for understanding what's happening, less useful for understanding what's urgent. A cluster with 5,000 deliveries from last week is bigger than one with 30 from the last 5 minutes, but the small fresh one is the active fire.

Today the cluster page learns to tell the difference.

Two Rate Windows

Each cluster now computes two rates inside the existing fn_delivery_clusters SQL function:

Recent rate — failed deliveries in the last 5 minutes ÷ 5
Baseline rate — failed deliveries in the 5–65 minute window ÷ 60

A cluster is escalating when its recent rate is ≥ 0.5/min (at least one failure in the last 5 min) AND > 5× its baseline rate. The threshold is intentionally simple — a real rolling-baseline z-score is overkill for this signal, and we already have a separate anomaly_volume alert type for source-side traffic anomalies that does the statistical heavy lifting.

The Active Incidents Section

The clusters page now splits results into two sections:

Active incidents (pulsing destructive-themed banner, pinned at top) shows clusters whose recent rate exceeds the threshold. Each card shows the recent rate, the baseline rate, and a prominent Edit & Replay all button — because if it's escalating, the answer is usually "fix it now."

Other clusters (below, normal styling) shows everything else — high-count historical patterns that aren't currently active, slow-burn issues, transient blips that already resolved.

When you arrive at the clusters page during an incident, the active-incidents section tells you exactly where to look — no scrolling, no scanning, no doing the math in your head.

On the MCP Side

The same signal flows out through MCP for AI-driven triage. hookbase_list_delivery_clusters now returns recentRatePerMin, baselineRatePerMin, and a boolean escalating flag per cluster. An agent investigating an incident can ask for clusters, filter to escalating === true, and prioritize those.

The Tradeoff

A 5-minute / 60-minute comparison is a deliberate choice. Shorter windows are noisier (any sub-5min blip looks like an active incident); longer windows lag (an incident has to last several minutes to show up). 5/60 with a 5× threshold catches anything that doubles or more on the order of seconds-to-minutes without firing on isolated retries.

We'll tune these if customer signal points elsewhere. For now, the cluster page is one place that answers two related but different questions: what failure patterns exist? and which one is happening right now?

The Other Half of Cluster View

Today the cluster page learns to tell the difference.

Two Rate Windows

Each cluster now computes two rates inside the existing fn_delivery_clusters SQL function:

Recent rate — failed deliveries in the last 5 minutes ÷ 5
Baseline rate — failed deliveries in the 5–65 minute window ÷ 60

The Active Incidents Section

The clusters page now splits results into two sections:

Other clusters (below, normal styling) shows everything else — high-count historical patterns that aren't currently active, slow-burn issues, transient blips that already resolved.

When you arrive at the clusters page during an incident, the active-incidents section tells you exactly where to look — no scrolling, no scanning, no doing the math in your head.

Active Incidents — Tell Me Which Cluster Is Spiking Right Now

The Other Half of Cluster View

Two Rate Windows

The Active Incidents Section

On the MCP Side

The Tradeoff

Related Articles

Install Hookbase as an App

Why Webhooks Arrive Out of Order (and How to Handle It)

Fan-Out: Deliver One Webhook to Many Destinations

Ready to Try Hookbase?

Active Incidents — Tell Me Which Cluster Is Spiking Right Now

The Other Half of Cluster View

Two Rate Windows

The Active Incidents Section

On the MCP Side

The Tradeoff

Related Articles

Install Hookbase as an App

Why Webhooks Arrive Out of Order (and How to Handle It)

Fan-Out: Deliver One Webhook to Many Destinations

Ready to Try Hookbase?