Hookbase
LoginGet Started Free
Back to Blog
Product Update

Active Incidents — Tell Me Which Cluster Is Spiking Right Now

Failure clusters last week told you what failure patterns exist. They didn't tell you which one is on fire right now. Two new rate windows split clusters into "active incidents" (escalating) and everything else — so when you arrive during an incident, the page tells you where to look.

Hookbase Team
May 28, 2026
4 min read

The Other Half of Cluster View

Last week we shipped failure clusters — distinct failure patterns aggregated by fingerprint. Useful for understanding what's happening, less useful for understanding what's urgent. A cluster with 5,000 deliveries from last week is bigger than one with 30 from the last 5 minutes, but the small fresh one is the active fire.

Today the cluster page learns to tell the difference.

Two Rate Windows

Each cluster now computes two rates inside the existing fn_delivery_clusters SQL function:

  • Recent rate — failed deliveries in the last 5 minutes ÷ 5
  • Baseline rate — failed deliveries in the 5–65 minute window ÷ 60

A cluster is escalating when its recent rate is ≥ 0.5/min (at least one failure in the last 5 min) AND > 5× its baseline rate. The threshold is intentionally simple — a real rolling-baseline z-score is overkill for this signal, and we already have a separate anomaly_volume alert type for source-side traffic anomalies that does the statistical heavy lifting.

The Active Incidents Section

The clusters page now splits results into two sections:

Active incidents (pulsing destructive-themed banner, pinned at top) shows clusters whose recent rate exceeds the threshold. Each card shows the recent rate, the baseline rate, and a prominent Edit & Replay all button — because if it's escalating, the answer is usually "fix it now."

Other clusters (below, normal styling) shows everything else — high-count historical patterns that aren't currently active, slow-burn issues, transient blips that already resolved.

When you arrive at the clusters page during an incident, the active-incidents section tells you exactly where to look — no scrolling, no scanning, no doing the math in your head.

On the MCP Side

The same signal flows out through MCP for AI-driven triage. hookbase_list_delivery_clusters now returns recentRatePerMin, baselineRatePerMin, and a boolean escalating flag per cluster. An agent investigating an incident can ask for clusters, filter to escalating === true, and prioritize those.

The Tradeoff

A 5-minute / 60-minute comparison is a deliberate choice. Shorter windows are noisier (any sub-5min blip looks like an active incident); longer windows lag (an incident has to last several minutes to show up). 5/60 with a 5× threshold catches anything that doubles or more on the order of seconds-to-minutes without firing on isolated retries.

We'll tune these if customer signal points elsewhere. For now, the cluster page is one place that answers two related but different questions: what failure patterns exist? and which one is happening right now?

product-updateobservabilityclustersincidentsmonitoring

Related Articles

Product Update

MCP Tools for Webhook Recovery — Let Claude or Cursor Drive the Fix

The clusters page, replay-with-edit modal, and pattern hints we shipped over the last three weeks are all the same loop: triage → probe → fix → confirm → fan out. Today that loop is callable from MCP, so any AI assistant can drive recovery end to end.

Product Update

Two New Tabs That Tell You What Likely Broke, Before RCA Even Runs

A hand-curated library of 12 common webhook failure patterns matches every failed delivery in microseconds — likely cause and suggested fix appear before any AI call. Alongside it, a new Recent Changes tab pulls every audit log entry for the route/destination/transform involved in the failure over the last 14 days.

Product Update

Bulk Edit & Replay — Fix a Whole Time Window With One Click

You shipped a bad transform. You don't notice for 17 minutes. By the time you revert, hundreds of events have failed against the broken code. The events page bulk action now accepts the same override matrix as per-delivery replay — fix the transform once, apply the fix to every event in the window, optionally save the new code to the route.

Ready to Try Hookbase?

Start receiving, transforming, and routing webhooks in minutes.

Get Started Free
Hookbase

Reliable webhook infrastructure for modern teams. Built on Cloudflare's global edge network.

Product

  • Features
  • Pricing
  • Use Cases
  • Integrations
  • ngrok Alternative

Resources

  • Documentation
  • API Reference
  • CLI Guide
  • Blog
  • FAQ

Free Tools

  • All Tools
  • Webhook Bin
  • HMAC Calculator
  • JSONata Playground
  • Cron Builder
  • Payload Formatter
  • Local Testing

Legal

  • Privacy Policy
  • Terms of Service
  • Contact
  • Status

© 2026 Hookbase. All rights reserved.