Loading tutorials…
Loading tutorials…
Screaming Frog only earns its keep when the crawl matches how Googlebot actually sees your site. This walks through the install, license activation, memory tuning, and configuration choices that 90% of first-time users get wrong.
Who this is forOwners or in-house marketers who just bought the £259/yr license (or hit the 500-URL free-tier ceiling) and need to produce a defensible technical audit. If your last crawl returned 8,000 issues you couldn't prioritize, this is the reset.
What you'll need
Step 1
Download from screamingfrog.co.uk → install → Licence → Enter Licence. Confirm you see "Licenced" in the top-right and the 500-URL cap is gone.
Download Screaming Frog SEO Spider from screamingfrog.co.uk. Use the latest 21.x build — earlier versions ship with an older Chromium for JS rendering and miss roughly 8-12% of modern JS frameworks.
Install. On macOS, drag the app to Applications; on Windows, run the .exe and accept the install path. Linux users: use the .deb or .rpm package shipped on the same download page.
Open the app. Without a licence you're capped at 500 URLs per crawl — useful for testing, not for any real audit. Go to Licence → Enter Licence and paste the username + key from your purchase email.
Restart the app. Confirm the title bar shows 'Licenced' and the URL counter no longer flashes red at 500. If it still caps, the licence didn't apply — re-paste and watch for trailing whitespace in the key.
Step 2
Configuration → System → Memory Allocation. Raise the heap to 4 GB minimum, 8-12 GB for sites over 50K URLs. Restart after changing.
Screaming Frog is RAM-bound. The default 2 GB heap will choke on any site over 20K URLs and corrupt the crawl partway through — usually with a silent slowdown rather than a hard error.
Open Configuration → System → Memory Allocation. Set the RAM ceiling to roughly 50% of your machine's physical RAM. On a 16 GB MacBook, allocate 8 GB. On a 32 GB workstation, allocate 12-16 GB.
Switch storage mode from 'Memory Storage' to 'Database Storage' (Configuration → System → Storage Mode → Database Storage) for any site over 100K URLs. Database mode trades a small speed penalty for the ability to crawl multi-million-URL sites without hitting RAM limits.
Restart the app. Memory and storage changes only take effect on restart — a frequent reason users think the change 'didn't work.'
Step 3
Configuration → Spider → Crawl. Decide what to crawl (internal HTML, images, CSS, JS, PDFs) and what to extract (titles, meta, headings, canonicals, hreflang).
Configuration → Spider → Crawl tab controls what URL types the crawler follows. For a standard SEO audit, leave the defaults: internal HTML pages + images + CSS + JS. Disable PDFs and SWF unless you specifically need them — they triple crawl time on document-heavy sites.
Check 'Crawl All Subdomains' if your canonical setup uses subdomains (shop.example.com, blog.example.com). Leave it OFF if everything lives on one host — otherwise you'll waste credits on staging.example.com and similar non-canonical hosts.
Check 'Crawl Outside of Start Folder' only if you want to follow internal links that escape your starting path. For a /blog audit, leave this OFF — it confines the crawl to /blog and produces a cleaner report.
Move to the Extraction tab. Check: Page Titles, Meta Description, Meta Keywords, H1, H2, Indexability, Canonical Link Element, Pagination, Hreflang, Word Count, Structured Data. Uncheck anything you don't use — extraction time scales linearly with checked items.
Step 4
Configuration → robots.txt → Settings. Choose "Respect robots.txt" for a Googlebot-like crawl, or "Ignore" only if you own the site and need to crawl blocked paths.
Configuration → robots.txt → Settings. The default is 'Respect robots.txt,' which is correct for most audits — you want to mirror what Googlebot can actually crawl.
If you need to crawl Disallow'd paths (e.g., to audit staging or a hidden admin section you own), switch to 'Ignore robots.txt.' Never do this on a site you don't control — it's a fast path to getting your IP banned.
Configuration → User-Agent. The default is 'Screaming Frog SEO Spider.' Switch to 'Googlebot (Smartphone)' for the most realistic SEO audit — many sites serve different HTML based on user-agent, and the SF default can show you something Googlebot would never see.
If your firewall blocks Googlebot UA from non-Google IPs, stay on the SF default and allowlist 'Screaming Frog SEO Spider' in your WAF instead.
Step 5
Paste your apex domain (https://example.com) into the URL bar → Start. Watch the URLs Discovered counter and the Status tab for early errors.
Paste the full URL of your site — including https:// — into the input at the top. Use the apex you actually canonicalize (https://www.example.com or https://example.com, not both).
Click Start. The URLs Discovered counter should climb in the first 10 seconds. If it stays at 1, your seed URL returned a non-200 status or got blocked at the firewall.
Watch the Status tab on the right rail. Status code 200 is good; 3xx redirects are normal up to 5-10% of URLs; 4xx and 5xx above 2% means something is misconfigured (robots, WAF, or genuinely broken site).
For a 5K-URL site, expect 15-45 minutes. For 50K URLs, plan on 2-6 hours. For 500K+, run overnight in Database Storage mode and check progress in the morning.
Step 6
Open Google Search Console → Pages → Indexed count. Compare against Screaming Frog's "Internal HTML → 200 OK" count. Should be within 25%.
Once the crawl is complete, open the Internal tab → filter 'HTML' → look at the row count. This is your crawled-and-indexable URL universe.
Open Google Search Console → Pages report → note the 'Indexed' count.
Compare. If GSC says 4,800 indexed and Screaming Frog says 4,950 crawled, you're aligned. If GSC says 4,800 indexed and SF says 18,000 crawled, your crawl scope is too broad — most likely you're following parameter URLs or staging.
If SF crawled fewer URLs than GSC has indexed, you're missing orphan pages. Use the Sitemap source (Configuration → Spider → Crawl → 'Crawl Linked XML Sitemaps') to pick up URLs your internal linking doesn't reach.
Don't act on the audit until the two systems reconcile. Acting on a misaligned crawl is the most expensive mistake in technical SEO — you'll fix issues that don't matter and miss the ones that do.
Common mistakes
Running with the default 2 GB heap on a 50K-URL site
What goes wrong: The crawl silently stalls at around 35K URLs. The progress counter still climbs slowly, but new URLs stop processing correctly. Output is incomplete and you don't realize until you compare against GSC. Estimated cost: 8-15 hours of redo work plus 2 days of slipped audit timeline.
How to avoid: Before any crawl over 20K URLs, raise the heap in Configuration → System → Memory Allocation to at least 50% of physical RAM. Restart the app. Re-run.
Not switching to Database Storage on large sites
What goes wrong: Memory mode crashes around 250K URLs even with 16 GB allocated. You lose the entire crawl mid-way and have to start over. On a site you bill against, this can mean a missed deadline costing $1,500-3,000 in agency reputation damage.
How to avoid: Configuration → System → Storage Mode → Database Storage. Pick a dedicated SSD path with at least 50 GB free. Restart. Database mode handles multi-million-URL crawls reliably.
Leaving 'Respect robots.txt' off and getting blocked at the WAF
What goes wrong: You configure SF to ignore robots.txt 'just to be thorough.' Cloudflare flags the crawl as malicious bot traffic and IP-bans your office. The entire team loses access to the site for 24-48 hours.
How to avoid: Respect robots.txt by default. If you need to crawl Disallow'd paths, allowlist your SF source IP in Cloudflare/WAF first AND coordinate with the dev team so they know the crawl is authorized.
Crawling with the default Screaming Frog user-agent on a Googlebot-cloaked site
What goes wrong: Some sites (especially older WordPress installs with caching plugins) serve different HTML to bots vs. browsers. Crawling as 'Screaming Frog SEO Spider' shows you the user version; Googlebot sees a different page. You audit content that Googlebot never sees and ship fixes that don't move rankings. Cost: 1-3 months of wasted SEO effort plus the opportunity cost on whatever real issue went unfixed.
How to avoid: Switch User-Agent to 'Googlebot (Smartphone)' in Configuration → User-Agent. Re-crawl. Compare the two reports — if they differ meaningfully, you have a cloaking or caching issue worth investigating.
Crawling parameter-heavy URLs without exclusion rules
What goes wrong: On an ecommerce site, faceted nav creates millions of /?color=red&size=md&sort=price-asc URLs. SF crawls all of them. Your 'audit' returns 800K rows and 90% are duplicate variants. Hours wasted, real issues drowned in noise. Lost revenue from delayed fix: $5,000-15,000 in unfixed canonical issues.
How to avoid: Configuration → URL Rewriting → Remove Parameters (regex). Strip common parameters like utm_*, sort, filter, color, size before they hit the crawl. Re-crawl and confirm URL count drops to a sane number.
Treating the issue count as a goal
What goes wrong: The Overview tab shows '8,247 issues.' You assign someone to 'fix all of them.' They work for three weeks on missing alt text and short meta descriptions while broken canonicals on 200 product pages cost you $40K/quarter in lost revenue.
How to avoid: Sort by URLs affected, not by issue type. The top 10 issues by URL count almost always represent 80% of the SEO impact. Ignore the headline number.
Recap
Done — what's next
How to find broken links with Screaming Frog
Read the next tutorial
Hand it off
Configuring Screaming Frog once is a project. Running it monthly, triaging the issues, and shipping the fixes is a job. A vetted technical SEO specialist on EverestX will own the crawl schedule, prioritize fixes by revenue impact, and hand you a weekly report — typically $400-900/mo at $14-16/hr depending on site size.
See specialist rates
Free tier caps at 500 URLs per crawl, no JS rendering, no scheduled crawls, no API access. Fine for a single landing page audit. For any site over 500 URLs or any real audit work, the £259/yr licence pays for itself the first time you run a JS-rendered crawl or hit the URL cap mid-audit.
8 GB minimum machine RAM (4 GB allocated heap) for sites up to 50K URLs. 16 GB machine (8-12 GB heap) for 50K-200K URLs. Above 200K URLs, switch to Database Storage mode and 16 GB+ machine RAM. Above 1M URLs, plan on a dedicated 32 GB workstation.
Memory Storage is faster but capped by RAM. Use it for sites under 100K URLs. Database Storage is slightly slower but scales to multi-million URL sites without crashing. Use it for ecommerce sites, news sites, or anything over 100K URLs.
Three usual reasons: (1) Your internal linking doesn't reach orphan pages — enable 'Crawl Linked XML Sitemaps' under Configuration → Spider → Crawl. (2) JavaScript-rendered links aren't being followed — enable JS rendering in Configuration → Spider → Rendering. (3) Your robots.txt blocks paths that are still indexed (orphan-indexed) — diagnose via GSC's Page Indexing report.
Active sites with weekly deploys: monthly full crawl + weekly partial crawls of changed sections. Static brochure sites: quarterly. After any major release: full crawl within 48 hours and compare against the previous baseline.
Technically yes — Screaming Frog doesn't require ownership. But respect their robots.txt, crawl politely (Configuration → Speed → 2 threads, 5 URLs/sec), and use a clearly-identifiable user-agent so you're not mistaken for malicious traffic. Some sites will rate-limit or block you regardless.
Screaming Frog SEO Spider
Broken internal links bleed crawl budget and tank user experience. Broken external links signal stale content. This walks through the exact filter + export workflow that turns a 4xx report into a prioritized fix list in 90 minutes.
Screaming Frog SEO Spider
If your site uses React, Vue, Next.js without SSR, or any client-side framework, the default Screaming Frog crawl shows you a skeleton — not what Googlebot sees. JS rendering fixes that, but only if configured correctly.
Ahrefs
Site Audit only earns its keep when the crawl actually mirrors how Googlebot sees you. This walks through the project + crawl settings that 80% of DIY setups misconfigure on the first pass.
Screaming Frog SEO Spider
You've crawled the site. You have 6,000 issues. You're not sure which 30 actually matter. This is the honest decision framework for when self-managed technical SEO becomes false economy.