Loading tutorials…
Loading tutorials…
Most 'AI Agent' tutorials build a demo that breaks the second real users touch it. This walks the n8n Agent node end-to-end — model selection, memory, tool wiring, and the guardrails that make it production-ready.
Who this is forOperators or technical founders who want an AI agent that actually does work — not a chat demo. If you are integrating AI into lead routing, content drafting, or customer support, this gets you the n8n-native patterns.
What you'll need
Step 1
Write a one-sentence job description for the agent. "Classify support tickets by urgency and route to the right Slack channel." NOT "be a support assistant."
AI agents fail when scope is broad. "Customer support assistant" is a product, not a workflow.
Scope down to one job: "When a Zendesk ticket arrives, classify urgency (P1/P2/P3) and route to #support-p1, #support-p2, or #support-p3."
Or: "When a new lead arrives, generate a 3-sentence summary of their company website and append to the HubSpot contact record."
Or: "When a Slack DM arrives in #ai-helpdesk, look up the user in HubSpot, classify intent, and reply with the right doc link."
If your sentence contains "and also," split into two agents. One job per agent.
Step 2
GPT-4o or Claude Sonnet for reasoning-heavy work. GPT-4o-mini or Haiku for high-volume classification. Local Ollama for sensitive data.
For reasoning, planning, or multi-step tool use: GPT-4o or Claude 3.5 Sonnet. Cost: ~$3-5 per 1M input tokens, $10-15 per 1M output tokens.
For high-volume classification, routing, or short summaries: GPT-4o-mini or Claude Haiku. Cost: ~$0.15-0.50 per 1M input tokens. 10-20x cheaper than the big models.
For sensitive data that cannot leave your infrastructure: local Ollama with Llama 3.1, Mistral, or Qwen. No per-token cost; infrastructure cost only.
Set up the model credential in Settings → Credentials → "OpenAI" / "Anthropic" / "Ollama." Test with a one-prompt completion before wiring into an agent.
Step 3
Add an "AI Agent" node (under LangChain in self-hosted). Set the system prompt to the agent's job description plus output format.
In the workflow editor, add the "AI Agent" node downstream of your trigger.
Set "Agent Type" to "Tools Agent" (most common) or "ReAct Agent" (more explicit reasoning, more tokens).
Set the Chat Model to the credential you just created.
Write the system prompt. Be specific. Bad: "You are a helpful assistant." Good: "You are a support ticket classifier. Given a ticket title and body, output a JSON object: {urgency: \"P1\"|\"P2\"|\"P3\", reason: \"...\"}. Use P1 only for outages or security incidents. Use P2 for blocked workflows. Use P3 for everything else."
Set "Output Parser" to "Structured Output Parser" if you need reliable JSON. Define the schema explicitly.
Step 4
Tools are sub-workflows or HTTP requests the agent can call. Each tool needs a name, description, and parameters.
Below the AI Agent node, attach Tool nodes (HTTP Request Tool, Workflow Tool, Code Tool, or any integration with Tool support).
Each tool needs: name (snake_case), description (the agent reads this to decide when to use it), and parameter schema.
Example: `lookup_contact_in_hubspot` with description "Use this when you need to find an existing contact by email. Takes one parameter: email (string)."
The agent reads the descriptions and picks tools dynamically. Better descriptions = better tool selection. Bad descriptions = the agent guesses and burns tokens.
Test each tool independently FIRST. The agent will not save you from a broken HTTP Request node.
Step 5
For multi-turn agents, attach a Memory node (Redis, Postgres, Window Buffer). For one-shot agents, skip memory.
Memory is for agents that need to remember earlier turns of a conversation. Examples: chat support bots, multi-step interview agents.
If your agent is one-shot (input → tool calls → output, no follow-up), skip memory entirely. Memory adds cost and complexity.
For multi-turn agents on Cloud or non-DB self-hosted: use "Window Buffer Memory" with a max of 10 messages.
For multi-turn agents on self-hosted with Postgres or Redis: use "Postgres Chat Memory" or "Redis Chat Memory." Pass a sessionId (usually user email or conversation ID) so different users get different memory.
Memory tokens cost real money. Cap the window. Do not let memory grow unbounded.
Step 6
Test with 10 real inputs (including weird ones). Add token budget caps, retry limits, and fallback paths for when the agent fails.
Collect 10 real example inputs — including 2-3 weird ones (unicode, very long, intentionally adversarial).
Run each through the agent. Inspect the tool calls, intermediate steps, and final output. The intermediate steps are where bugs hide.
Set a max-iterations limit on the AI Agent node (default is 15; consider 5-8 for production). This prevents runaway loops.
Add a downstream IF node: if the agent output is unexpected or fails parsing, route to a fallback path (manual review queue, default response). Never let an agent failure halt a critical workflow.
Monitor token usage. Add a separate workflow that pulls executions weekly and surfaces top-spend workflows so you can catch runaway costs.
Step 7
Activate the workflow. Set an OpenAI/Anthropic budget alert at the provider level. Pipe agent execution metrics to a dashboard.
Activate the workflow.
Set a budget alert at the LLM provider (OpenAI: Settings → Limits → soft + hard caps. Anthropic: Settings → Usage limits).
Build a "AI Agent Monitor" workflow that runs daily, pulls execution data, and posts a Slack message: "Yesterday: 142 runs, $4.20 spent, 2 failures." This is your early-warning system for cost spikes.
For the first 14 days, manually inspect 10% of agent outputs daily. Look for hallucinations, wrong tool calls, or weird edge cases.
After 14 days of stable behavior, drop daily inspection to weekly. Continue cost monitoring forever.
Common mistakes
Defaulting to GPT-4o for high-volume tasks
What goes wrong: You wire up classification of 1,000 support tickets/day against GPT-4o. Cost: ~$50/day = $1,500/mo. The same job on GPT-4o-mini would cost $50/mo. You burn $1,450/mo of avoidable cost.
How to avoid: Always start with the smallest viable model (mini/Haiku). Only upgrade to the big model on tasks where small-model quality demonstrably fails.
No max-iterations limit
What goes wrong: Agent gets stuck in a tool-call loop, makes 50 LLM calls trying to satisfy an impossible request. One execution burns $5 of tokens. If 20 users hit this in a day, that is $100/day of agent waste.
How to avoid: Set max iterations to 5-8 on the AI Agent node. Add a downstream IF check: if iterations were exhausted, route to a manual fallback.
Vague tool descriptions
What goes wrong: Tool description says 'Use this to look up customer data.' Agent calls it for every request including ones where it should not. 80% of tool calls are unnecessary, doubling token spend and making the agent slower.
How to avoid: Tool descriptions must include WHEN to use the tool, not just WHAT it does. Bad: "Looks up customers." Good: "Use this only when the user asks about an account or order they already have. Do not use for general questions."
Unbounded memory window
What goes wrong: Window memory grows to 200 messages over a long conversation. Every new turn sends 200 messages to the LLM. Cost per turn balloons from $0.05 to $5. User does not notice; you do at month-end.
How to avoid: Cap memory at 10-20 messages. For longer conversations, summarize older turns periodically (a separate workflow does this).
No fallback path for agent failures
What goes wrong: Agent returns malformed JSON on 5% of inputs. Downstream node fails to parse. Workflow halts. Those 5% of users get no response. Customer-facing breakage.
How to avoid: Always add an IF node downstream of the agent: if output looks wrong, route to a fallback (manual review, default message, escalation channel). Never let agent failure halt the workflow silently.
No cost monitoring
What goes wrong: Agent works fine for a month. Costs creep from $50 → $300 → $2,000 as usage grows. You discover at month-end when the credit card hits.
How to avoid: Set provider-level hard budget caps. Build a daily Slack digest of token spend per workflow. Review weekly. A 5-minute habit prevents 5-figure surprises.
Recap
Done — what's next
How to build your first workflow in n8n
Read the next tutorial
Hand it off
AI agents are a different skill from traditional automation. Most teams that try to DIY AI agents burn 60-120 hours and $2,000-5,000 in tokens before getting one production-ready. An EverestX automation specialist with AI experience will scope, build, and monitor your first agent — typically $400-800 for the build + $200-500/mo for ongoing tuning at $14-16/hr.
See specialist rates
A plain LLM node sends one prompt, gets one response. An AI Agent can iterate: call tools, evaluate responses, call more tools, and synthesize a final output. Agents are right when the job needs reasoning across multiple steps. LLM nodes are right for single-shot completions.
Wildly variable. A 1,000-runs/day classification agent on GPT-4o-mini might cost $30-80/mo. The same volume on GPT-4o might cost $500-1,500/mo. Multi-tool agents with memory cost 5-10x more than simple classifiers. Always model token cost before activating.
Yes, on self-hosted n8n. Set up the Ollama credential pointing to your Ollama instance. Llama 3.1 8B works for simple classification; Llama 3.1 70B works for reasoning at the cost of much heavier hardware (16GB+ VRAM).
Three layers: (1) Wrap user input in clear delimiters in your system prompt (`User input is between <<<>>> below`). (2) Use a structured output parser so even if injection happens, the output schema constrains the damage. (3) For sensitive actions (writing to DB, sending email), require explicit confirmation tool calls the user input cannot trigger.
Just hard tasks. For deterministic work (if/else, field mapping, simple transforms), regular n8n nodes are faster, cheaper, and more reliable. Use AI Agent when the job genuinely needs natural language understanding or open-ended reasoning.
n8n
You have n8n running. Now build one real workflow that does something useful. This walks a real trigger → action chain end-to-end with the expression-mapping detail most tutorials skip.
n8n
Workflows fail silently by default. By the time someone notices the missing data, the gap is unrecoverable. This walks the proper error-handling pattern — error workflow, alerts, retries, and the monitoring that catches the rest.
n8n
Your workflow ran fine for weeks. Now it fails — or worse, it succeeds but produces garbage. This is the diagnostic sequence specialists run to isolate the root cause in 15-30 minutes instead of an afternoon.
n8n
DIY n8n is great until you have 15 workflows and a credentials audit you keep deferring. This is the honest framework: when the cost of self-managing exceeds the cost of a specialist, and how to tell which side you are on.
Zapier
One trigger, three or four actions. Easy to draw on a whiteboard, easy to break in production. This walks through chaining, naming, and the error scenarios that hit you on day 30, not day 1.