Webhook Retries: What Every Provider Does Differently
Stripe retries for 3 days. GitHub gives up after one failure. Shopify retries 19 times. Knowing the rules for each provider is the difference between losing events and not. A reference table plus what it means for your handler.
Why This Matters
When your webhook handler returns a non-2xx (or doesn't respond in time), the provider decides what happens next. Some providers retry aggressively for days. Others give up after one attempt. A few disable your endpoint entirely after enough failures.
If you don't know the rules, you can't reason about which events you've actually received. A flaky handler under Stripe's policy will still process every event eventually. The same handler under GitHub's policy will quietly drop them.
The Reference Table
| Provider | Initial Retry | Max Attempts | Total Window | Backoff | Disables Endpoint? | |----------|---------------|--------------|--------------|---------|-------------------| | Stripe | ~10 min | ~10 | 3 days | Exponential | After ~3 days of failure | | GitHub | None by default | 1 | Single attempt | N/A | After persistent failures | | Shopify | ~1 min | 19 | 48 hours | Exponential | Yes, after 48h failure | | Slack | Immediate | 3 | Minutes | Linear | No, but throttles | | Twilio | 5 min | 11 | ~24 hours | Exponential | No | | Square | ~1 min | 19 | 72 hours | Exponential | After 72h | | Plaid | ~1 min | Up to 24h worth | 24 hours | Exponential | No | | Clerk | Immediate | 5 | ~24 hours | Exponential | No | | PayPal | ~1 min | 25 | 3 days | Exponential | No | | HubSpot | ~1 min | 10 | ~10 hours | Exponential | After persistent failure | | Linear | ~1 min | ~5 | Hours | Exponential | No |
These policies change. Always check the provider's docs for the current behavior before relying on this.
What "Retry" Actually Means
A retry is not a guarantee. Three things can interrupt the retry chain:
- Your endpoint returns 2xx late. The provider considers the event delivered, even though your handler may have failed internally after responding.
- Your endpoint returns 4xx. Most providers treat 4xx as "this will never succeed" and stop retrying immediately. A bug that returns 400 is worse than one that returns 500.
- The provider disables your webhook. After enough consecutive failures, several providers will pause delivery entirely until you manually re-enable.
That last one is the silent killer. Your monitoring shows zero webhook errors — because no webhooks are being sent.
Provider-Specific Notes
Stripe
The most forgiving. 3-day window, exponential backoff, and the events stay in the Stripe dashboard for replay even after they give up. If you're integrating any provider for the first time, Stripe is the easiest to recover from mistakes.
After three days of failures, Stripe disables the endpoint and emails the account owner. The events aren't lost — they remain in the dashboard for manual replay or programmatic re-fetch via the Events API.
GitHub
The least forgiving. By default, a single failed delivery is the only delivery. GitHub does not retry on its own.
The "Recent Deliveries" tab in your repo settings is the only fallback — you can manually redeliver from the UI. There's no automatic retry. Code that handles GitHub webhooks needs to be especially careful about timeouts and 5xx responses, because there's no second chance.
Shopify
19 retries over 48 hours, exponential. After 48 hours of total failure, Shopify removes the webhook subscription entirely — not just disabled, removed. You'll need to re-create the subscription via the API. This is the most dangerous policy of any major provider for slow recovery scenarios.
Slack
Slack's policy depends on what kind of webhook. Event API webhooks retry up to three times within minutes. Slash command responses have a strict 3-second timeout and don't retry at all. Interactive components retry, but the user might have already given up.
Twilio
Twilio retries on 5xx and connection errors, but not on 4xx. Importantly, Twilio considers a 200 response with no body as success — which differs from some providers that expect a specific body.
How to Make Retry Behavior Stop Mattering
The pattern that works regardless of provider:
- Acknowledge in milliseconds. Verify the signature, persist the raw payload, return 200. Don't do real work in the request handler.
- Process from your own queue. Once the event is in your queue, you control the retry policy. The provider's job is done.
- Monitor delivery rate against expected volume. If your provider sends ~1000 events/day and you suddenly receive 200, something is wrong — even if you have zero error logs.
- Test the disable threshold. Deliberately fail your endpoint in staging until the provider disables it. Know what that looks like in your dashboard so you recognize it in production.
Where Hookbase Fits
Hookbase becomes the endpoint the provider sees. We acknowledge in milliseconds, store every event durably, and retry to your handler on a policy you control — not the provider's. If your handler is down for an hour, Hookbase keeps the events. If you fix the bug and want to replay last Tuesday's failed events, they're a click away.
The provider never disables your endpoint, because from their perspective, your endpoint never fails.