Loading tutorials…
Loading tutorials…
Your workflow ran fine for weeks. Now it fails — or worse, it succeeds but produces garbage. This is the diagnostic sequence specialists run to isolate the root cause in 15-30 minutes instead of an afternoon.
Who this is forAnyone with an n8n workflow that started failing or producing wrong data. The diagnostic tree below covers ~90% of real-world failure modes. If three modes match, you probably have compounding issues — fix one at a time.
What you'll need
Step 1
Open Executions → click the failed execution → click the red node. The error panel shows the exact failure. 60% of debugging is reading it carefully.
Executions tab → click any red row → the editor opens with the failing node highlighted.
Click the red node. The output panel shows: error type, error message, stack trace (if applicable), and HTTP status (for API nodes).
Common error patterns:
- '401 Unauthorized' / 'Invalid credentials' → credential expired or rotated. Re-auth.
- '429 Too Many Requests' → rate limited. Add Wait node or split into batches.
- 'Cannot read property X of undefined' → expression references a field that does not exist on this record.
- 'No data returned' → trigger output empty; check the source app.
Resist the urge to immediately fix without reading. The error message tells you exactly what is wrong 80% of the time.
Step 2
Executions list, filter by Error status, sort by date. One-off failure vs. systematic — the fix is different.
Executions tab → filter by Status = Error. Sort by date descending.
If you see 1 error in the last 100 runs: probably a transient issue (API blip, timed-out request). Add retry-on-fail to the node and move on.
If you see errors clustered in the last 24 hours: something changed recently. Check: did you edit the workflow? Did the source app push a schema change? Did a credential expire?
If errors are constant since day 1: the workflow was never working right. Walk through the build step by step.
If errors are intermittent across weeks: rate limits, conditional data shapes, or upstream flakiness. Add retries + better field-existence checks.
Step 3
Settings → Credentials → find the credential the failing node uses → check the "Used by" tab and the credential type for OAuth expiry hints.
In the failing execution, note which credentials the failing node uses (visible in the node config).
Go to Settings → Credentials. Click the relevant credential.
For OAuth credentials: look for an "Expired" or "Token needs refresh" indicator. Re-authorize from the credential settings.
For API key credentials: check the vendor portal — was the key rotated, deleted, or scoped down? Update the credential value if so.
Test the credential with the credential's built-in test button. Most credentials have a "Test connection" feature; use it.
Step 4
The source app changed its API response shape. Your expressions reference fields that no longer exist. Re-pull sample data and re-map.
Open the workflow. Click the trigger node. Click "Execute Step" or "Listen for Test Event" to pull fresh sample data.
Compare the new sample to what your downstream nodes expect. Are field names different? Is a previously-required field now optional (or vice versa)?
Common cause: vendor API updates (Shopify, HubSpot, Stripe). New fields added, deprecated fields removed.
Fix: re-map field references in downstream nodes. Use `??` fallbacks for fields that may not exist: `{{ $json.customField ?? "default" }}`.
Set a calendar reminder to re-pull sample data quarterly for production workflows. Schema drift is the #2 cause of slow-burn breakage after credential expiry.
Step 5
Right-click any node → Pin Data. n8n will use the pinned data instead of pulling fresh from the source. This isolates downstream issues from trigger issues.
Right-click the trigger node → "Pin Data." n8n caches the current sample.
Now you can re-execute the workflow against the pinned data repeatedly without re-pulling from the source.
Useful when the source is slow (rate-limited) or non-deterministic (varying data).
Also useful for the inverse: pin known-good data, then deliberately break it (delete a field) to verify your fallback logic works.
Remember to unpin before activating. Otherwise the workflow runs against stale sample data forever.
Step 6
If the workflow handles batches, the source or destination may be rate-limiting. Add Wait nodes or split into smaller batches.
Look at the failing execution. Was it processing many records?
If yes and the error is "429 Too Many Requests" or similar: the destination app is rate-limiting.
Fix 1: add a "Wait" node between the loop and the action, set to 1-5 seconds.
Fix 2: use "Split In Batches" earlier — process 50 at a time instead of 500.
Fix 3: check the destination app for batch endpoints. HubSpot, Mailchimp, and others support batch operations that count as one API call.
For pagination: if the trigger node has a "Limit" field, it caps how many records it pulls per execution. Increase it OR set up a scheduled trigger that handles the next batch.
Step 7
If a node is set to 'Continue on Fail,' downstream nodes may run with empty input and produce garbage. Audit per-node failure settings.
Sometimes the workflow does not fail — it succeeds while producing wrong data. The culprit is often "Continue on Fail" on a critical node.
Click each node in the failing workflow → Settings → look for "Continue on Fail" toggle.
For nodes where partial failure is acceptable (e.g., enrichment lookups that may or may not find a match), Continue on Fail is right.
For nodes where failure should halt the workflow (e.g., the primary write to your CRM), Continue on Fail should be OFF. Otherwise downstream nodes run with null inputs.
Common mistakes
Fixing the symptom, not the cause
What goes wrong: Workflow fails. You see '401 Unauthorized.' You re-auth the credential. It works. Two weeks later, same error. You re-auth again. You never investigate why the OAuth token keeps expiring early (often: a vendor account permission change or a service-account suspension).
How to avoid: Always ask: why is this happening? If a credential expires twice in a month, that is not normal — find the upstream cause.
Editing the live workflow during debug
What goes wrong: You make changes to a live failing workflow while debugging. Now you do not know whether the original issue is fixed or whether your debugging changes broke something else. Two days of compound mess.
How to avoid: Always duplicate the failing workflow before debugging. Disable the original. Debug on the copy. Promote the working copy once fixed.
Ignoring intermittent failures
What goes wrong: Workflow fails 2x/week. You assume it is 'just flaky.' You skip investigation. Three months later, you realize 50 records were missed and the gap is unrecoverable.
How to avoid: Every intermittent failure has a cause. Common causes: rate limits, conditional data shapes, race conditions. Investigate at the 5-failure mark, not the 50-failure mark.
No diagnostic baseline
What goes wrong: Workflow starts failing. You compare to 'how it used to be' from memory. Memory is wrong. You change the wrong thing.
How to avoid: For every active workflow, screenshot the working state (sample data, node settings) once it is stable. Store in a docs folder. Compare against this when debugging.
Not using Pin Data
What goes wrong: You debug a flaky downstream node by repeatedly re-executing the trigger. The trigger pulls slightly different data each time. You cannot isolate the bug.
How to avoid: Pin Data on the trigger to lock the input while debugging downstream. Unpin before activating.
Recap
Done — what's next
How to set up error workflows in n8n
Read the next tutorial
Hand it off
Troubleshooting once is a project. Maintaining 20 workflows without slow-burn breakage is a job. EverestX automation specialists run a weekly Executions audit on managed accounts and catch failures in the first 1-3 errors instead of the 50th. Typically $400-800/mo at $14-16/hr.
See ongoing rates
Investigate after the 3rd-5th failure, not the 50th. Patterns become clear after a handful of data points. The compounding cost of a workflow that fails 2x/week for 3 months is far more than the cost of a 1-hour investigation today.
Three places: (1) Continue on Fail toggles upstream — did a node fail and pass null downstream? (2) Field mappings — did an expression reference the wrong field? (3) Source data shape — did the schema change?
Yes. Executions → click failed → 'Retry' button. n8n re-runs from the first failed node using the original input data. Useful for backfilling missed events after a credential rotation or vendor outage.
Subscribe to the vendor's API changelog or developer newsletter. For high-criticality integrations, set up a weekly heartbeat workflow that pulls sample data and compares against a known-good schema — alerts you to drift in real time.
Cloud has workflow versioning (Pro+). Self-hosted with Postgres can rely on database backups. Otherwise: export workflows to Git on every major edit. Version control is essential for production stacks of 10+ workflows.
n8n
Workflows fail silently by default. By the time someone notices the missing data, the gap is unrecoverable. This walks the proper error-handling pattern — error workflow, alerts, retries, and the monitoring that catches the rest.
n8n
Credentials are the hidden failure point of every n8n stack. They expire, get rotated, get shared too broadly, or get lost when a teammate leaves. This walks the disciplined pattern for managing them.
n8n
n8n webhooks are how your tools tell n8n something happened — Stripe payments, Typeform submissions, custom app events. They are also where most production setups quietly leak data. This walks the right pattern.
n8n
DIY n8n is great until you have 15 workflows and a credentials audit you keep deferring. This is the honest framework: when the cost of self-managing exceeds the cost of a specialist, and how to tell which side you are on.
Zapier
Your Zap was working last week. Today, Zap History shows red. This walks through the diagnostic flow specialists run — OAuth, payload shape, rate limits, schema drift — in the order that surfaces the issue fastest.