How to troubleshoot n8n workflow failures

Your workflow ran fine for weeks. Now it fails — or worse, it succeeds but produces garbage. This is the diagnostic sequence specialists run to isolate the root cause in 15-30 minutes instead of an afternoon.

~1 hrIntermediateUpdated May 26, 2026

Who this is forAnyone with an n8n workflow that started failing or producing wrong data. The diagnostic tree below covers ~90% of real-world failure modes. If three modes match, you probably have compounding issues — fix one at a time.

What you'll need

Access to the failing workflow and its Executions history
Knowledge of which apps/credentials the workflow uses
About 60-90 minutes for thorough diagnosis

Step 1

Read the error message carefully

Open Executions → click the failed execution → click the red node. The error panel shows the exact failure. 60% of debugging is reading it carefully.

Executions tab → click any red row → the editor opens with the failing node highlighted.

Click the red node. The output panel shows: error type, error message, stack trace (if applicable), and HTTP status (for API nodes).

Common error patterns:

- '401 Unauthorized' / 'Invalid credentials' → credential expired or rotated. Re-auth.

- '429 Too Many Requests' → rate limited. Add Wait node or split into batches.

- 'Cannot read property X of undefined' → expression references a field that does not exist on this record.

- 'No data returned' → trigger output empty; check the source app.

Resist the urge to immediately fix without reading. The error message tells you exactly what is wrong 80% of the time.

Step 2

Check whether it is a single-execution issue or a pattern

Executions list, filter by Error status, sort by date. One-off failure vs. systematic — the fix is different.

Executions tab → filter by Status = Error. Sort by date descending.

If you see 1 error in the last 100 runs: probably a transient issue (API blip, timed-out request). Add retry-on-fail to the node and move on.

If you see errors clustered in the last 24 hours: something changed recently. Check: did you edit the workflow? Did the source app push a schema change? Did a credential expire?

If errors are constant since day 1: the workflow was never working right. Walk through the build step by step.

If errors are intermittent across weeks: rate limits, conditional data shapes, or upstream flakiness. Add retries + better field-existence checks.

Step 3

Audit credentials for the failing nodes

Settings → Credentials → find the credential the failing node uses → check the "Used by" tab and the credential type for OAuth expiry hints.

In the failing execution, note which credentials the failing node uses (visible in the node config).

Go to Settings → Credentials. Click the relevant credential.

For OAuth credentials: look for an "Expired" or "Token needs refresh" indicator. Re-authorize from the credential settings.

For API key credentials: check the vendor portal — was the key rotated, deleted, or scoped down? Update the credential value if so.

Test the credential with the credential's built-in test button. Most credentials have a "Test connection" feature; use it.

Step 4

Check for schema drift in source data

The source app changed its API response shape. Your expressions reference fields that no longer exist. Re-pull sample data and re-map.

Open the workflow. Click the trigger node. Click "Execute Step" or "Listen for Test Event" to pull fresh sample data.

Compare the new sample to what your downstream nodes expect. Are field names different? Is a previously-required field now optional (or vice versa)?

Common cause: vendor API updates (Shopify, HubSpot, Stripe). New fields added, deprecated fields removed.

Fix: re-map field references in downstream nodes. Use `??` fallbacks for fields that may not exist: `{{ $json.customField ?? "default" }}`.

Set a calendar reminder to re-pull sample data quarterly for production workflows. Schema drift is the #2 cause of slow-burn breakage after credential expiry.

Step 5

Test in isolation with Pin Data

Right-click any node → Pin Data. n8n will use the pinned data instead of pulling fresh from the source. This isolates downstream issues from trigger issues.

Right-click the trigger node → "Pin Data." n8n caches the current sample.

Now you can re-execute the workflow against the pinned data repeatedly without re-pulling from the source.

Useful when the source is slow (rate-limited) or non-deterministic (varying data).

Also useful for the inverse: pin known-good data, then deliberately break it (delete a field) to verify your fallback logic works.

Remember to unpin before activating. Otherwise the workflow runs against stale sample data forever.

Step 6

Check for rate limiting and pagination

If the workflow handles batches, the source or destination may be rate-limiting. Add Wait nodes or split into smaller batches.

Look at the failing execution. Was it processing many records?

If yes and the error is "429 Too Many Requests" or similar: the destination app is rate-limiting.

Fix 1: add a "Wait" node between the loop and the action, set to 1-5 seconds.

Fix 2: use "Split In Batches" earlier — process 50 at a time instead of 500.

Fix 3: check the destination app for batch endpoints. HubSpot, Mailchimp, and others support batch operations that count as one API call.

For pagination: if the trigger node has a "Limit" field, it caps how many records it pulls per execution. Increase it OR set up a scheduled trigger that handles the next batch.

Step 7

Check Continue on Fail settings

If a node is set to 'Continue on Fail,' downstream nodes may run with empty input and produce garbage. Audit per-node failure settings.

Sometimes the workflow does not fail — it succeeds while producing wrong data. The culprit is often "Continue on Fail" on a critical node.

Click each node in the failing workflow → Settings → look for "Continue on Fail" toggle.

For nodes where partial failure is acceptable (e.g., enrichment lookups that may or may not find a match), Continue on Fail is right.

For nodes where failure should halt the workflow (e.g., the primary write to your CRM), Continue on Fail should be OFF. Otherwise downstream nodes run with null inputs.

Common mistakes

What goes wrong (and how to avoid it)

Fixing the symptom, not the cause
What goes wrong: Workflow fails. You see '401 Unauthorized.' You re-auth the credential. It works. Two weeks later, same error. You re-auth again. You never investigate why the OAuth token keeps expiring early (often: a vendor account permission change or a service-account suspension).
How to avoid: Always ask: why is this happening? If a credential expires twice in a month, that is not normal — find the upstream cause.
Editing the live workflow during debug
What goes wrong: You make changes to a live failing workflow while debugging. Now you do not know whether the original issue is fixed or whether your debugging changes broke something else. Two days of compound mess.
How to avoid: Always duplicate the failing workflow before debugging. Disable the original. Debug on the copy. Promote the working copy once fixed.
Ignoring intermittent failures
What goes wrong: Workflow fails 2x/week. You assume it is 'just flaky.' You skip investigation. Three months later, you realize 50 records were missed and the gap is unrecoverable.
How to avoid: Every intermittent failure has a cause. Common causes: rate limits, conditional data shapes, race conditions. Investigate at the 5-failure mark, not the 50-failure mark.
No diagnostic baseline
What goes wrong: Workflow starts failing. You compare to 'how it used to be' from memory. Memory is wrong. You change the wrong thing.
How to avoid: For every active workflow, screenshot the working state (sample data, node settings) once it is stable. Store in a docs folder. Compare against this when debugging.
Not using Pin Data
What goes wrong: You debug a flaky downstream node by repeatedly re-executing the trigger. The trigger pulls slightly different data each time. You cannot isolate the bug.
How to avoid: Pin Data on the trigger to lock the input while debugging downstream. Unpin before activating.

Recap

What to take away

Read the error message carefully — it tells you 80% of the answer.
Distinguish one-off failures (transient) from patterns (systemic).
OAuth and API-key issues are #1 cause of weekly-running failures.
Schema drift is the #2 cause of slow-burn breakage.
Debug on duplicates, never live workflows. Pin Data to isolate.

Done — what's next

How to set up error workflows in n8n

Read the next tutorial

Hand it off

Troubleshooting once is a project. Maintaining 20 workflows without slow-burn breakage is a job. EverestX automation specialists run a weekly Executions audit on managed accounts and catch failures in the first 1-3 errors instead of the 50th. Typically $400-800/mo at $14-16/hr.

See ongoing rates

Frequently Asked Questions

How long should I wait before debugging an intermittent failure?

Investigate after the 3rd-5th failure, not the 50th. Patterns become clear after a handful of data points. The compounding cost of a workflow that fails 2x/week for 3 months is far more than the cost of a 1-hour investigation today.

My workflow succeeded but wrote wrong data. Where do I start?

Three places: (1) Continue on Fail toggles upstream — did a node fail and pass null downstream? (2) Field mappings — did an expression reference the wrong field? (3) Source data shape — did the schema change?

Can I re-run a failed execution after fixing the issue?

Yes. Executions → click failed → 'Retry' button. n8n re-runs from the first failed node using the original input data. Useful for backfilling missed events after a credential rotation or vendor outage.

How do I find out if a vendor changed their API?

Subscribe to the vendor's API changelog or developer newsletter. For high-criticality integrations, set up a weekly heartbeat workflow that pulls sample data and compares against a known-good schema — alerts you to drift in real time.

Is there an n8n equivalent of "version history"?

Cloud has workflow versioning (Pro+). Self-hosted with Postgres can rely on database backups. Otherwise: export workflows to Git on every major edit. Version control is essential for production stacks of 10+ workflows.

How to troubleshoot n8n workflow failures

Read the error message carefully

Check whether it is a single-execution issue or a pattern

Audit credentials for the failing nodes

Check for schema drift in source data

Test in isolation with Pin Data

Check for rate limiting and pagination

Check Continue on Fail settings

What goes wrong (and how to avoid it)

What to take away

Frequently Asked Questions

Related tutorials

How to set up error workflows in n8n

How to manage credentials in n8n

How to set up webhooks in n8n

When to hire an n8n specialist — an honest checklist

How to troubleshoot a failing Zap (step-by-step debug)

How to troubleshoot n8n workflow failures

Read the error message carefully

Check whether it is a single-execution issue or a pattern

Audit credentials for the failing nodes

Check for schema drift in source data

Test in isolation with Pin Data

Check for rate limiting and pagination

Check Continue on Fail settings

What goes wrong (and how to avoid it)

What to take away

Frequently Asked Questions

Related tutorials

How to set up error workflows in n8n

How to manage credentials in n8n

How to set up webhooks in n8n

When to hire an n8n specialist — an honest checklist

How to troubleshoot a failing Zap (step-by-step debug)