Loading tutorials…
Loading tutorials…
Custom Extraction is the most under-used Screaming Frog feature and the closest thing to a superpower it offers. Pull prices, schema fields, review counts, hreflang variants — anything on the page, at crawl scale.
Who this is forTechnical SEOs and ecom marketers who need page-level data Screaming Frog doesn't extract by default. If you've ever opened 200 product pages by hand to copy prices into a spreadsheet, this is the answer.
What you'll need
Step 1
Configuration → Custom → Custom Extraction. Add up to 100 extractors (Spider edition) or unlimited (Cluster).
Configuration → Custom → Custom Extraction. This is where you define every selector you want SF to extract on every crawled URL.
The Spider licence supports up to 100 custom extractors per crawl. Cluster/Enterprise removes the cap.
Each extractor has: a Name (column header in your CSV), an Extractor type (CSSPath, XPath, or Regex), the Selector itself, and an Extract dropdown (Extract HTML, Extract Inner HTML, Extract Text, Function Value, Attribute Value).
Add extractors one at a time. Test each in the sample URL test pane before running the full crawl.
Step 2
CSSPath = simplest for visible elements. XPath = most powerful for nested or attribute-based selection. Regex = pattern matching in raw HTML.
CSSPath: use for straightforward element selection. Example: `.product-price` extracts every element with class product-price. Easy to read, fragile to class renames.
XPath: use for complex queries — selecting by attribute, by position, by parent/child relationships. Example: `//meta[@property='og:image']/@content` extracts the OG image URL from every page.
Regex: use for patterns that don't map to DOM structure — e.g., extracting a SKU pattern that appears in body text. Example: `SKU-\d{6}` matches SKU-123456.
When in doubt, start with CSSPath. Switch to XPath when you need attribute selection or parent traversal. Use regex only when DOM selection won't work.
Step 3
Right-click element → Inspect → right-click in Elements panel → Copy → Copy selector / Copy XPath. Test the selector in DevTools before pasting into SF.
Open a representative page in Chrome. Right-click the element you want to extract → Inspect.
In the Elements panel, right-click the highlighted HTML → Copy → Copy selector (for CSS) or Copy XPath (for XPath).
Test the selector in DevTools' Console: `document.querySelectorAll('.product-price')` for CSS, or `$x("//meta[@property='og:image']/@content")` for XPath. The result should be exactly the elements you want.
If DevTools returns 0 elements, your selector is wrong. Refine. If it returns 50 elements when you wanted 1, narrow with `:first-child`, `[data-attr='unique']`, or more specific path.
Once the selector works in DevTools, paste it into Screaming Frog's extractor. Use the 'Test' button in SF to validate against a sample URL.
Step 4
Extract Text for visible content. Extract HTML for the full element. Function Value for count(). Attribute Value for href, src, content.
Extract Text: returns the visible text content of the matched element. Use for product prices, headings, button labels.
Extract HTML: returns the inner HTML including child tags. Use when you need formatted content (e.g., schema JSON-LD).
Function Value: lets you use XPath functions like `count(//a)` to count elements, or `concat()` to combine values. Powerful but advanced.
Attribute Value: pulls a specific attribute. Use with selectors like `//meta[@property='og:image']` and extract type 'Attribute Value' with 'content' as the attribute. Most common use: meta tag content extraction.
Wrong Extract type is the #1 reason extractors return blank columns. If your selector finds elements in DevTools but the SF column is empty, change the Extract type.
Step 5
In the Custom Extraction config, paste a sample URL into the test field. Hit Test. Each extractor should return real data.
Configuration → Custom → Custom Extraction. At the bottom is a URL input and a Test button.
Paste a representative URL. Click Test. SF fetches the URL and runs every configured extractor against it.
Each extractor should return data in the right-hand pane. Blank = selector wrong OR Extract type wrong OR element doesn't exist on this URL.
Test 3-5 representative URLs (e.g., a product page, a category page, an article) to catch extractors that work on some templates but not others.
Don't launch a 50K-URL crawl with untested extractors. Blank columns across thousands of URLs are a recoverable problem only if you catch it before acting on the data.
Step 6
After crawl, Internal tab → scroll right to find your custom extractor columns. Export → CSV.
Run a full crawl with the configured extractors. They execute alongside the standard extraction at no extra cost per URL.
After completion, open the Internal tab. Scroll right past the standard columns (Title 1, Meta Description, H1, etc.) — your custom extractors appear as separate columns named after their config.
Filter the URLs to the template you care about (e.g., URL contains '/product/'). The extractor columns should show data on those URLs.
Export → CSV. Open in Sheets/Excel. The custom columns are now joined to URL, status, and other crawl data in one analyzable file.
Common use cases unlocked: price tracking across product catalog, schema JSON-LD field validation, hreflang verification at scale, review-count extraction for snippet eligibility, OG image verification, internal link audit by anchor.
Step 7
Schema JSON-LD: `//script[@type="application/ld+json"]`. OG image: `//meta[@property="og:image"]/@content`. Price: site-specific CSS.
JSON-LD schema: XPath `//script[@type="application/ld+json"]` with Extract Inner HTML. Returns the full schema block per page — paste into Google's Rich Results Test for validation.
OG image URL: XPath `//meta[@property="og:image"]` with Extract Attribute Value, attribute name `content`.
Hreflang variants: XPath `//link[@rel="alternate"][@hreflang]` with Extract HTML. Returns all hreflang links per page.
Canonical URL: XPath `//link[@rel="canonical"]` with Extract Attribute Value, attribute `href`. (SF extracts canonical by default, but this is useful for custom analysis.)
Product price: site-specific. Inspect, find the price class/element. Example for many ecom sites: CSSPath `.product-price` with Extract Text.
Review count: site-specific. Example: CSSPath `.review-count` with Extract Text, then post-process the CSV to strip 'reviews' suffix.
Author byline: CSSPath `.byline-author` with Extract Text. Useful for content audits.
Common mistakes
Using DevTools' Copy Selector verbatim
What goes wrong: The auto-generated selector includes nth-child paths like `body > div:nth-child(2) > section > article`. Any layout change breaks it. Your extractor returns blanks for a week before you notice. Cost: re-running the crawl and re-extracting can cost 4-8 hours of cycle time per audit.
How to avoid: Use the auto-generated selector as a starting point, then simplify to class- or data-attribute-based selection. Talk to the dev team about which classes are stable.
Not validating selectors against multiple templates
What goes wrong: Your product-price selector works on /product/* URLs but returns empty on /category/*. The crawl 'completes' but 60% of your data is missing. You make decisions on partial data. Pricing analysis off by 40-60%.
How to avoid: Test extractors against 3-5 URLs spanning different templates (product, category, blog, home). If the selector should only fire on one template, that's fine — verify it does.
Wrong Extract type returning blanks
What goes wrong: Selector is correct (finds elements in DevTools) but Extract type is wrong. SF returns empty columns. You assume the selector is broken and spend hours re-writing it.
How to avoid: When SF columns are empty but DevTools shows the elements, change Extract type. Common mistake: using 'Extract Text' on a `<meta>` tag (which has no text content) instead of 'Extract Attribute Value' with attribute 'content.'
Extracting too much per page
What goes wrong: 100 extractors each pulling 50 elements per page. The crawl slows by 3x. Memory ballooning. CSV exports become 200-MB files that crash Excel. Audit cycles slow from days to weeks.
How to avoid: Limit to extractors you'll actually analyze. 5-15 well-chosen extractors > 50 'nice to have' ones. Add more in subsequent crawls as needs emerge.
Regex extractors with greedy patterns
What goes wrong: A regex like `<div.*</div>` is greedy and matches from the first `<div>` to the LAST `</div>` on the page. Returns 100KB of HTML per URL. CSV file balloons; analysis becomes impossible.
How to avoid: Use non-greedy regex: `<div.*?</div>` matches the smallest match. Better: don't use regex for HTML parsing — use XPath, which understands DOM structure.
Not exporting and analyzing within 48 hours of the crawl
What goes wrong: The crawl data is fresh but no one analyzes it. By the time someone opens the CSV, the site has shipped 5 changes and the data is stale. Custom extraction loses its value if it isn't acted on quickly.
How to avoid: Schedule analysis sessions for the day after each crawl completes. Custom Extraction is leverage; it only pays off when the data drives decisions within the data's lifespan.
Recap
Done — what's next
How to set up Screaming Frog and run your first crawl
Read the next tutorial
Hand it off
Custom Extraction is where Screaming Frog stops being a generic auditor and becomes a programmable data layer for your site. A vetted technical SEO specialist on EverestX will build the extractor library for your stack, document them, and own the recurring extraction → analysis cycle — typically $500-900/mo at $14-16/hr.
See specialist rates
CSS gets you 80% of common use cases. XPath gets you the remaining 20% — attribute selection, parent/child traversal, position-based selection. Most extractor libraries are a mix. Start with CSS; learn XPath as you hit its limits.
Yes. XPath `//script[@type="application/ld+json"]` with Extract Inner HTML returns the full schema block. Post-process in Sheets or paste into Google's Rich Results Test to validate. This is the fastest way to audit schema coverage at scale.
Spider tier: 100 per crawl. Cluster/Enterprise: unlimited. In practice, 10-20 is the sweet spot — more than that and the crawl slows noticeably + the CSV becomes unwieldy.
Three usual reasons: (1) Extract type is wrong — try Attribute Value for `<meta>` and `<link>` tags; (2) The element is JS-rendered and you don't have JS rendering enabled in SF; (3) The element exists only on some templates and you're looking at the wrong template's row.
Yes — Configuration → Authentication → Forms-Based Authentication. Log SF in to your site, then run the crawl with extractors against logged-in URLs. Useful for auditing member-only sections.
Screaming Frog SEO Spider
Screaming Frog only earns its keep when the crawl matches how Googlebot actually sees your site. This walks through the install, license activation, memory tuning, and configuration choices that 90% of first-time users get wrong.
Screaming Frog SEO Spider
If your site uses React, Vue, Next.js without SSR, or any client-side framework, the default Screaming Frog crawl shows you a skeleton — not what Googlebot sees. JS rendering fixes that, but only if configured correctly.
Screaming Frog SEO Spider
If your CMS-generated sitemap is bloated, missing pages, or includes noindex URLs, Screaming Frog can produce a clean replacement in 20 minutes. Better than any sitemap plugin for sites that need precision.
Ahrefs
Batch Analysis lets you pull DR, UR, traffic, and backlink counts for up to 200 URLs at once. The skill is what you do with the export — most teams stop at 'I have a CSV.'
Screaming Frog SEO Spider
You've crawled the site. You have 6,000 issues. You're not sure which 30 actually matter. This is the honest decision framework for when self-managed technical SEO becomes false economy.