‹ All Screaming Frog SEO Spider tutorials

How to set up Screaming Frog and run your first crawl

Screaming Frog only earns its keep when the crawl matches how Googlebot actually sees your site. This walks through the install, license activation, memory tuning, and configuration choices that 90% of first-time users get wrong.

~2-4 hrIntermediateUpdated May 26, 2026

Who this is forOwners or in-house marketers who just bought the £259/yr license (or hit the 500-URL free-tier ceiling) and need to produce a defensible technical audit. If your last crawl returned 8,000 issues you couldn't prioritize, this is the reset.

What you'll need

Screaming Frog SEO Spider 21.x installed (free up to 500 URLs, £259/yr for unlimited)
A machine with at least 8 GB RAM (16 GB recommended for sites over 50K URLs)
Admin/edit access to the site (or at least to robots.txt and DNS) in case crawling is blocked
Google Search Console verified for the target domain (used later for cross-validation)
About 3 hours for first-pass setup plus 1-6 hours for the initial crawl

Step 1

Install Screaming Frog and activate the license

Download from screamingfrog.co.uk → install → Licence → Enter Licence. Confirm you see "Licenced" in the top-right and the 500-URL cap is gone.

Download Screaming Frog SEO Spider from screamingfrog.co.uk. Use the latest 21.x build — earlier versions ship with an older Chromium for JS rendering and miss roughly 8-12% of modern JS frameworks.

Install. On macOS, drag the app to Applications; on Windows, run the .exe and accept the install path. Linux users: use the .deb or .rpm package shipped on the same download page.

Open the app. Without a licence you're capped at 500 URLs per crawl — useful for testing, not for any real audit. Go to Licence → Enter Licence and paste the username + key from your purchase email.

Restart the app. Confirm the title bar shows 'Licenced' and the URL counter no longer flashes red at 500. If it still caps, the licence didn't apply — re-paste and watch for trailing whitespace in the key.

Step 2

Tune memory allocation before the first crawl

Configuration → System → Memory Allocation. Raise the heap to 4 GB minimum, 8-12 GB for sites over 50K URLs. Restart after changing.

Screaming Frog is RAM-bound. The default 2 GB heap will choke on any site over 20K URLs and corrupt the crawl partway through — usually with a silent slowdown rather than a hard error.

Open Configuration → System → Memory Allocation. Set the RAM ceiling to roughly 50% of your machine's physical RAM. On a 16 GB MacBook, allocate 8 GB. On a 32 GB workstation, allocate 12-16 GB.

Switch storage mode from 'Memory Storage' to 'Database Storage' (Configuration → System → Storage Mode → Database Storage) for any site over 100K URLs. Database mode trades a small speed penalty for the ability to crawl multi-million-URL sites without hitting RAM limits.

Restart the app. Memory and storage changes only take effect on restart — a frequent reason users think the change 'didn't work.'

Step 3

Configure the Spider tab

Configuration → Spider → Crawl. Decide what to crawl (internal HTML, images, CSS, JS, PDFs) and what to extract (titles, meta, headings, canonicals, hreflang).

Configuration → Spider → Crawl tab controls what URL types the crawler follows. For a standard SEO audit, leave the defaults: internal HTML pages + images + CSS + JS. Disable PDFs and SWF unless you specifically need them — they triple crawl time on document-heavy sites.

Check 'Crawl All Subdomains' if your canonical setup uses subdomains (shop.example.com, blog.example.com). Leave it OFF if everything lives on one host — otherwise you'll waste credits on staging.example.com and similar non-canonical hosts.

Check 'Crawl Outside of Start Folder' only if you want to follow internal links that escape your starting path. For a /blog audit, leave this OFF — it confines the crawl to /blog and produces a cleaner report.

Move to the Extraction tab. Check: Page Titles, Meta Description, Meta Keywords, H1, H2, Indexability, Canonical Link Element, Pagination, Hreflang, Word Count, Structured Data. Uncheck anything you don't use — extraction time scales linearly with checked items.

Step 4

Set robots.txt handling and user-agent

Configuration → robots.txt → Settings. Choose "Respect robots.txt" for a Googlebot-like crawl, or "Ignore" only if you own the site and need to crawl blocked paths.

Configuration → robots.txt → Settings. The default is 'Respect robots.txt,' which is correct for most audits — you want to mirror what Googlebot can actually crawl.

If you need to crawl Disallow'd paths (e.g., to audit staging or a hidden admin section you own), switch to 'Ignore robots.txt.' Never do this on a site you don't control — it's a fast path to getting your IP banned.

Configuration → User-Agent. The default is 'Screaming Frog SEO Spider.' Switch to 'Googlebot (Smartphone)' for the most realistic SEO audit — many sites serve different HTML based on user-agent, and the SF default can show you something Googlebot would never see.

If your firewall blocks Googlebot UA from non-Google IPs, stay on the SF default and allowlist 'Screaming Frog SEO Spider' in your WAF instead.

Step 5

Run the first crawl and watch the queue

Paste your apex domain (https://example.com) into the URL bar → Start. Watch the URLs Discovered counter and the Status tab for early errors.

Paste the full URL of your site — including https:// — into the input at the top. Use the apex you actually canonicalize (https://www.example.com or https://example.com, not both).

Click Start. The URLs Discovered counter should climb in the first 10 seconds. If it stays at 1, your seed URL returned a non-200 status or got blocked at the firewall.

Watch the Status tab on the right rail. Status code 200 is good; 3xx redirects are normal up to 5-10% of URLs; 4xx and 5xx above 2% means something is misconfigured (robots, WAF, or genuinely broken site).

For a 5K-URL site, expect 15-45 minutes. For 50K URLs, plan on 2-6 hours. For 500K+, run overnight in Database Storage mode and check progress in the morning.

Step 6

Validate the crawl against GSC before acting on it

Open Google Search Console → Pages → Indexed count. Compare against Screaming Frog's "Internal HTML → 200 OK" count. Should be within 25%.

Once the crawl is complete, open the Internal tab → filter 'HTML' → look at the row count. This is your crawled-and-indexable URL universe.

Open Google Search Console → Pages report → note the 'Indexed' count.

Compare. If GSC says 4,800 indexed and Screaming Frog says 4,950 crawled, you're aligned. If GSC says 4,800 indexed and SF says 18,000 crawled, your crawl scope is too broad — most likely you're following parameter URLs or staging.

If SF crawled fewer URLs than GSC has indexed, you're missing orphan pages. Use the Sitemap source (Configuration → Spider → Crawl → 'Crawl Linked XML Sitemaps') to pick up URLs your internal linking doesn't reach.

Don't act on the audit until the two systems reconcile. Acting on a misaligned crawl is the most expensive mistake in technical SEO — you'll fix issues that don't matter and miss the ones that do.

Common mistakes

What goes wrong (and how to avoid it)

Running with the default 2 GB heap on a 50K-URL site
What goes wrong: The crawl silently stalls at around 35K URLs. The progress counter still climbs slowly, but new URLs stop processing correctly. Output is incomplete and you don't realize until you compare against GSC. Estimated cost: 8-15 hours of redo work plus 2 days of slipped audit timeline.
How to avoid: Before any crawl over 20K URLs, raise the heap in Configuration → System → Memory Allocation to at least 50% of physical RAM. Restart the app. Re-run.
Not switching to Database Storage on large sites
What goes wrong: Memory mode crashes around 250K URLs even with 16 GB allocated. You lose the entire crawl mid-way and have to start over. On a site you bill against, this can mean a missed deadline costing $1,500-3,000 in agency reputation damage.
How to avoid: Configuration → System → Storage Mode → Database Storage. Pick a dedicated SSD path with at least 50 GB free. Restart. Database mode handles multi-million-URL crawls reliably.
Leaving 'Respect robots.txt' off and getting blocked at the WAF
What goes wrong: You configure SF to ignore robots.txt 'just to be thorough.' Cloudflare flags the crawl as malicious bot traffic and IP-bans your office. The entire team loses access to the site for 24-48 hours.
How to avoid: Respect robots.txt by default. If you need to crawl Disallow'd paths, allowlist your SF source IP in Cloudflare/WAF first AND coordinate with the dev team so they know the crawl is authorized.
Crawling with the default Screaming Frog user-agent on a Googlebot-cloaked site
What goes wrong: Some sites (especially older WordPress installs with caching plugins) serve different HTML to bots vs. browsers. Crawling as 'Screaming Frog SEO Spider' shows you the user version; Googlebot sees a different page. You audit content that Googlebot never sees and ship fixes that don't move rankings. Cost: 1-3 months of wasted SEO effort plus the opportunity cost on whatever real issue went unfixed.
How to avoid: Switch User-Agent to 'Googlebot (Smartphone)' in Configuration → User-Agent. Re-crawl. Compare the two reports — if they differ meaningfully, you have a cloaking or caching issue worth investigating.
Crawling parameter-heavy URLs without exclusion rules
What goes wrong: On an ecommerce site, faceted nav creates millions of /?color=red&size=md&sort=price-asc URLs. SF crawls all of them. Your 'audit' returns 800K rows and 90% are duplicate variants. Hours wasted, real issues drowned in noise. Lost revenue from delayed fix: $5,000-15,000 in unfixed canonical issues.
How to avoid: Configuration → URL Rewriting → Remove Parameters (regex). Strip common parameters like utm_*, sort, filter, color, size before they hit the crawl. Re-crawl and confirm URL count drops to a sane number.
Treating the issue count as a goal
What goes wrong: The Overview tab shows '8,247 issues.' You assign someone to 'fix all of them.' They work for three weeks on missing alt text and short meta descriptions while broken canonicals on 200 product pages cost you $40K/quarter in lost revenue.
How to avoid: Sort by URLs affected, not by issue type. The top 10 issues by URL count almost always represent 80% of the SEO impact. Ignore the headline number.

Recap

What to take away

Activate the licence before any real crawl — the 500-URL free tier is testing-only.
Memory allocation matters: at least 4 GB heap for medium sites, Database Storage for 100K+ URLs.
Respect robots.txt unless you own the site AND have a reason to ignore it.
Switch User-Agent to Googlebot (Smartphone) for the most realistic audit.
Validate against GSC before acting — if URL counts diverge by more than 25%, fix scope first.

Done — what's next

How to find broken links with Screaming Frog

Read the next tutorial

Hand it off

Configuring Screaming Frog once is a project. Running it monthly, triaging the issues, and shipping the fixes is a job. A vetted technical SEO specialist on EverestX will own the crawl schedule, prioritize fixes by revenue impact, and hand you a weekly report — typically $400-900/mo at $14-16/hr depending on site size.

See specialist rates

Frequently Asked Questions

Do I need the paid licence, or is the free tier enough?

Free tier caps at 500 URLs per crawl, no JS rendering, no scheduled crawls, no API access. Fine for a single landing page audit. For any site over 500 URLs or any real audit work, the £259/yr licence pays for itself the first time you run a JS-rendered crawl or hit the URL cap mid-audit.

How much RAM do I actually need?

8 GB minimum machine RAM (4 GB allocated heap) for sites up to 50K URLs. 16 GB machine (8-12 GB heap) for 50K-200K URLs. Above 200K URLs, switch to Database Storage mode and 16 GB+ machine RAM. Above 1M URLs, plan on a dedicated 32 GB workstation.

Should I use Memory Storage or Database Storage?

Memory Storage is faster but capped by RAM. Use it for sites under 100K URLs. Database Storage is slightly slower but scales to multi-million URL sites without crashing. Use it for ecommerce sites, news sites, or anything over 100K URLs.

Why does Screaming Frog show fewer URLs than Google Search Console?

Three usual reasons: (1) Your internal linking doesn't reach orphan pages — enable 'Crawl Linked XML Sitemaps' under Configuration → Spider → Crawl. (2) JavaScript-rendered links aren't being followed — enable JS rendering in Configuration → Spider → Rendering. (3) Your robots.txt blocks paths that are still indexed (orphan-indexed) — diagnose via GSC's Page Indexing report.

How often should I re-crawl?

Active sites with weekly deploys: monthly full crawl + weekly partial crawls of changed sections. Static brochure sites: quarterly. After any major release: full crawl within 48 hours and compare against the previous baseline.

Can I crawl a competitor's site?

Technically yes — Screaming Frog doesn't require ownership. But respect their robots.txt, crawl politely (Configuration → Speed → 2 threads, 5 URLs/sec), and use a clearly-identifiable user-agent so you're not mistaken for malicious traffic. Some sites will rate-limit or block you regardless.

How to set up Screaming Frog and run your first crawl

Install Screaming Frog and activate the license

Tune memory allocation before the first crawl

Configure the Spider tab

Set robots.txt handling and user-agent

Run the first crawl and watch the queue

Validate the crawl against GSC before acting on it

What goes wrong (and how to avoid it)

What to take away

Frequently Asked Questions

Related tutorials

How to find broken links with Screaming Frog

How to configure JavaScript rendering in Screaming Frog

How to set up Ahrefs Site Audit the right way

When to hire a technical SEO specialist — the honest checklist

How to set up Screaming Frog and run your first crawl

Install Screaming Frog and activate the license

Tune memory allocation before the first crawl

Configure the Spider tab

Set robots.txt handling and user-agent

Run the first crawl and watch the queue

Validate the crawl against GSC before acting on it

What goes wrong (and how to avoid it)

What to take away

Frequently Asked Questions

Related tutorials

How to find broken links with Screaming Frog

How to configure JavaScript rendering in Screaming Frog

How to set up Ahrefs Site Audit the right way

When to hire a technical SEO specialist — the honest checklist