How to set up AI Agent nodes in n8n

Most 'AI Agent' tutorials build a demo that breaks the second real users touch it. This walks the n8n Agent node end-to-end — model selection, memory, tool wiring, and the guardrails that make it production-ready.

~2-4 hrAdvancedUpdated May 26, 2026

Who this is forOperators or technical founders who want an AI agent that actually does work — not a chat demo. If you are integrating AI into lead routing, content drafting, or customer support, this gets you the n8n-native patterns.

What you'll need

n8n on Pro plan or higher (Cloud) or self-hosted with LangChain nodes enabled
An API key for at least one LLM provider (OpenAI, Anthropic, Mistral, or local Ollama)
A clear use case scoped to one job (not "do everything")
Familiarity with prompts and at least one previous n8n workflow built
About 2-4 hours for the first production-grade agent

Step 1

Scope the agent to one job

Write a one-sentence job description for the agent. "Classify support tickets by urgency and route to the right Slack channel." NOT "be a support assistant."

AI agents fail when scope is broad. "Customer support assistant" is a product, not a workflow.

Scope down to one job: "When a Zendesk ticket arrives, classify urgency (P1/P2/P3) and route to #support-p1, #support-p2, or #support-p3."

Or: "When a new lead arrives, generate a 3-sentence summary of their company website and append to the HubSpot contact record."

Or: "When a Slack DM arrives in #ai-helpdesk, look up the user in HubSpot, classify intent, and reply with the right doc link."

If your sentence contains "and also," split into two agents. One job per agent.

Step 2

Pick the model and provider

GPT-4o or Claude Sonnet for reasoning-heavy work. GPT-4o-mini or Haiku for high-volume classification. Local Ollama for sensitive data.

For reasoning, planning, or multi-step tool use: GPT-4o or Claude 3.5 Sonnet. Cost: ~$3-5 per 1M input tokens, $10-15 per 1M output tokens.

For high-volume classification, routing, or short summaries: GPT-4o-mini or Claude Haiku. Cost: ~$0.15-0.50 per 1M input tokens. 10-20x cheaper than the big models.

For sensitive data that cannot leave your infrastructure: local Ollama with Llama 3.1, Mistral, or Qwen. No per-token cost; infrastructure cost only.

Set up the model credential in Settings → Credentials → "OpenAI" / "Anthropic" / "Ollama." Test with a one-prompt completion before wiring into an agent.

Step 3

Add the AI Agent node and configure the system prompt

Add an "AI Agent" node (under LangChain in self-hosted). Set the system prompt to the agent's job description plus output format.

In the workflow editor, add the "AI Agent" node downstream of your trigger.

Set "Agent Type" to "Tools Agent" (most common) or "ReAct Agent" (more explicit reasoning, more tokens).

Set the Chat Model to the credential you just created.

Write the system prompt. Be specific. Bad: "You are a helpful assistant." Good: "You are a support ticket classifier. Given a ticket title and body, output a JSON object: {urgency: \"P1\"|\"P2\"|\"P3\", reason: \"...\"}. Use P1 only for outages or security incidents. Use P2 for blocked workflows. Use P3 for everything else."

Set "Output Parser" to "Structured Output Parser" if you need reliable JSON. Define the schema explicitly.

Step 4

Wire up tools

Tools are sub-workflows or HTTP requests the agent can call. Each tool needs a name, description, and parameters.

Below the AI Agent node, attach Tool nodes (HTTP Request Tool, Workflow Tool, Code Tool, or any integration with Tool support).

Each tool needs: name (snake_case), description (the agent reads this to decide when to use it), and parameter schema.

Example: `lookup_contact_in_hubspot` with description "Use this when you need to find an existing contact by email. Takes one parameter: email (string)."

The agent reads the descriptions and picks tools dynamically. Better descriptions = better tool selection. Bad descriptions = the agent guesses and burns tokens.

Test each tool independently FIRST. The agent will not save you from a broken HTTP Request node.

Step 5

Add memory (if needed) and conversation context

For multi-turn agents, attach a Memory node (Redis, Postgres, Window Buffer). For one-shot agents, skip memory.

Memory is for agents that need to remember earlier turns of a conversation. Examples: chat support bots, multi-step interview agents.

If your agent is one-shot (input → tool calls → output, no follow-up), skip memory entirely. Memory adds cost and complexity.

For multi-turn agents on Cloud or non-DB self-hosted: use "Window Buffer Memory" with a max of 10 messages.

For multi-turn agents on self-hosted with Postgres or Redis: use "Postgres Chat Memory" or "Redis Chat Memory." Pass a sessionId (usually user email or conversation ID) so different users get different memory.

Memory tokens cost real money. Cap the window. Do not let memory grow unbounded.

Step 6

Test, then add guardrails

Test with 10 real inputs (including weird ones). Add token budget caps, retry limits, and fallback paths for when the agent fails.

Collect 10 real example inputs — including 2-3 weird ones (unicode, very long, intentionally adversarial).

Run each through the agent. Inspect the tool calls, intermediate steps, and final output. The intermediate steps are where bugs hide.

Set a max-iterations limit on the AI Agent node (default is 15; consider 5-8 for production). This prevents runaway loops.

Add a downstream IF node: if the agent output is unexpected or fails parsing, route to a fallback path (manual review queue, default response). Never let an agent failure halt a critical workflow.

Monitor token usage. Add a separate workflow that pulls executions weekly and surfaces top-spend workflows so you can catch runaway costs.

Step 7

Activate with monitoring + cost alerts

Activate the workflow. Set an OpenAI/Anthropic budget alert at the provider level. Pipe agent execution metrics to a dashboard.

Activate the workflow.

Set a budget alert at the LLM provider (OpenAI: Settings → Limits → soft + hard caps. Anthropic: Settings → Usage limits).

Build a "AI Agent Monitor" workflow that runs daily, pulls execution data, and posts a Slack message: "Yesterday: 142 runs, $4.20 spent, 2 failures." This is your early-warning system for cost spikes.

For the first 14 days, manually inspect 10% of agent outputs daily. Look for hallucinations, wrong tool calls, or weird edge cases.

After 14 days of stable behavior, drop daily inspection to weekly. Continue cost monitoring forever.

Common mistakes

What goes wrong (and how to avoid it)

Defaulting to GPT-4o for high-volume tasks
What goes wrong: You wire up classification of 1,000 support tickets/day against GPT-4o. Cost: ~$50/day = $1,500/mo. The same job on GPT-4o-mini would cost $50/mo. You burn $1,450/mo of avoidable cost.
How to avoid: Always start with the smallest viable model (mini/Haiku). Only upgrade to the big model on tasks where small-model quality demonstrably fails.
No max-iterations limit
What goes wrong: Agent gets stuck in a tool-call loop, makes 50 LLM calls trying to satisfy an impossible request. One execution burns $5 of tokens. If 20 users hit this in a day, that is $100/day of agent waste.
How to avoid: Set max iterations to 5-8 on the AI Agent node. Add a downstream IF check: if iterations were exhausted, route to a manual fallback.
Vague tool descriptions
What goes wrong: Tool description says 'Use this to look up customer data.' Agent calls it for every request including ones where it should not. 80% of tool calls are unnecessary, doubling token spend and making the agent slower.
How to avoid: Tool descriptions must include WHEN to use the tool, not just WHAT it does. Bad: "Looks up customers." Good: "Use this only when the user asks about an account or order they already have. Do not use for general questions."
Unbounded memory window
What goes wrong: Window memory grows to 200 messages over a long conversation. Every new turn sends 200 messages to the LLM. Cost per turn balloons from $0.05 to $5. User does not notice; you do at month-end.
How to avoid: Cap memory at 10-20 messages. For longer conversations, summarize older turns periodically (a separate workflow does this).
No fallback path for agent failures
What goes wrong: Agent returns malformed JSON on 5% of inputs. Downstream node fails to parse. Workflow halts. Those 5% of users get no response. Customer-facing breakage.
How to avoid: Always add an IF node downstream of the agent: if output looks wrong, route to a fallback (manual review, default message, escalation channel). Never let agent failure halt the workflow silently.
No cost monitoring
What goes wrong: Agent works fine for a month. Costs creep from $50 → $300 → $2,000 as usage grows. You discover at month-end when the credit card hits.
How to avoid: Set provider-level hard budget caps. Build a daily Slack digest of token spend per workflow. Review weekly. A 5-minute habit prevents 5-figure surprises.

Recap

What to take away

Scope each agent to ONE job. Multiple jobs = multiple agents.
Default to mini/Haiku models — only upgrade where small models demonstrably fail.
Tool descriptions are critical: tell the agent WHEN to use each tool.
Cap memory windows. Cap max iterations. Both control cost.
Always have a fallback path for agent failures. Always monitor token spend.

Done — what's next

How to build your first workflow in n8n

Read the next tutorial

Hand it off

AI agents are a different skill from traditional automation. Most teams that try to DIY AI agents burn 60-120 hours and $2,000-5,000 in tokens before getting one production-ready. An EverestX automation specialist with AI experience will scope, build, and monitor your first agent — typically $400-800 for the build + $200-500/mo for ongoing tuning at $14-16/hr.

See specialist rates

Frequently Asked Questions

What is the difference between AI Agent and a plain LLM node?

A plain LLM node sends one prompt, gets one response. An AI Agent can iterate: call tools, evaluate responses, call more tools, and synthesize a final output. Agents are right when the job needs reasoning across multiple steps. LLM nodes are right for single-shot completions.

How much does a production n8n AI agent cost to run?

Wildly variable. A 1,000-runs/day classification agent on GPT-4o-mini might cost $30-80/mo. The same volume on GPT-4o might cost $500-1,500/mo. Multi-tool agents with memory cost 5-10x more than simple classifiers. Always model token cost before activating.

Can I use local LLMs (Ollama) with the AI Agent node?

Yes, on self-hosted n8n. Set up the Ollama credential pointing to your Ollama instance. Llama 3.1 8B works for simple classification; Llama 3.1 70B works for reasoning at the cost of much heavier hardware (16GB+ VRAM).

How do I prevent prompt injection in user inputs?

Three layers: (1) Wrap user input in clear delimiters in your system prompt (`User input is between <<<>>> below`). (2) Use a structured output parser so even if injection happens, the output schema constrains the damage. (3) For sensitive actions (writing to DB, sending email), require explicit confirmation tool calls the user input cannot trigger.

Should I use AI Agent for everything or just hard tasks?

Just hard tasks. For deterministic work (if/else, field mapping, simple transforms), regular n8n nodes are faster, cheaper, and more reliable. Use AI Agent when the job genuinely needs natural language understanding or open-ended reasoning.

How to set up AI Agent nodes in n8n

Scope the agent to one job

Pick the model and provider

Add the AI Agent node and configure the system prompt

Wire up tools

Add memory (if needed) and conversation context

Test, then add guardrails

Activate with monitoring + cost alerts

What goes wrong (and how to avoid it)

What to take away

Frequently Asked Questions

Related tutorials

How to build your first workflow in n8n

How to set up error workflows in n8n

How to troubleshoot n8n workflow failures

When to hire an n8n specialist — an honest checklist

How to set up multi-step Zaps in Zapier

How to set up AI Agent nodes in n8n

Scope the agent to one job

Pick the model and provider

Add the AI Agent node and configure the system prompt

Wire up tools

Add memory (if needed) and conversation context

Test, then add guardrails

Activate with monitoring + cost alerts

What goes wrong (and how to avoid it)

What to take away

Frequently Asked Questions

Related tutorials

How to build your first workflow in n8n

How to set up error workflows in n8n

How to troubleshoot n8n workflow failures

When to hire an n8n specialist — an honest checklist

How to set up multi-step Zaps in Zapier