‹ All Screaming Frog SEO Spider tutorials

How to use Custom Extraction in Screaming Frog

Custom Extraction is the most under-used Screaming Frog feature and the closest thing to a superpower it offers. Pull prices, schema fields, review counts, hreflang variants — anything on the page, at crawl scale.

~2 hrAdvancedUpdated May 26, 2026

Who this is forTechnical SEOs and ecom marketers who need page-level data Screaming Frog doesn't extract by default. If you've ever opened 200 product pages by hand to copy prices into a spreadsheet, this is the answer.

What you'll need

Screaming Frog SEO Spider 21+ with a valid licence (Custom Extraction is gated above the free tier)
Familiarity with at least one of: XPath, CSS selectors, or regex
Browser DevTools to inspect element structure on the pages you want to extract from
About 1-2 hours for first-time setup of 3-5 extractors

Step 1

Open the Custom Extraction configuration

Configuration → Custom → Custom Extraction. Add up to 100 extractors (Spider edition) or unlimited (Cluster).

Configuration → Custom → Custom Extraction. This is where you define every selector you want SF to extract on every crawled URL.

The Spider licence supports up to 100 custom extractors per crawl. Cluster/Enterprise removes the cap.

Each extractor has: a Name (column header in your CSV), an Extractor type (CSSPath, XPath, or Regex), the Selector itself, and an Extract dropdown (Extract HTML, Extract Inner HTML, Extract Text, Function Value, Attribute Value).

Add extractors one at a time. Test each in the sample URL test pane before running the full crawl.

Step 2

Pick the right extractor type for your use case

CSSPath = simplest for visible elements. XPath = most powerful for nested or attribute-based selection. Regex = pattern matching in raw HTML.

CSSPath: use for straightforward element selection. Example: `.product-price` extracts every element with class product-price. Easy to read, fragile to class renames.

XPath: use for complex queries — selecting by attribute, by position, by parent/child relationships. Example: `//meta[@property='og:image']/@content` extracts the OG image URL from every page.

Regex: use for patterns that don't map to DOM structure — e.g., extracting a SKU pattern that appears in body text. Example: `SKU-\d{6}` matches SKU-123456.

When in doubt, start with CSSPath. Switch to XPath when you need attribute selection or parent traversal. Use regex only when DOM selection won't work.

Step 3

Build your selector with browser DevTools first

Right-click element → Inspect → right-click in Elements panel → Copy → Copy selector / Copy XPath. Test the selector in DevTools before pasting into SF.

Open a representative page in Chrome. Right-click the element you want to extract → Inspect.

In the Elements panel, right-click the highlighted HTML → Copy → Copy selector (for CSS) or Copy XPath (for XPath).

Test the selector in DevTools' Console: `document.querySelectorAll('.product-price')` for CSS, or `$x("//meta[@property='og:image']/@content")` for XPath. The result should be exactly the elements you want.

If DevTools returns 0 elements, your selector is wrong. Refine. If it returns 50 elements when you wanted 1, narrow with `:first-child`, `[data-attr='unique']`, or more specific path.

Once the selector works in DevTools, paste it into Screaming Frog's extractor. Use the 'Test' button in SF to validate against a sample URL.

Step 4

Choose the right Extract type

Extract Text for visible content. Extract HTML for the full element. Function Value for count(). Attribute Value for href, src, content.

Extract Text: returns the visible text content of the matched element. Use for product prices, headings, button labels.

Extract HTML: returns the inner HTML including child tags. Use when you need formatted content (e.g., schema JSON-LD).

Function Value: lets you use XPath functions like `count(//a)` to count elements, or `concat()` to combine values. Powerful but advanced.

Attribute Value: pulls a specific attribute. Use with selectors like `//meta[@property='og:image']` and extract type 'Attribute Value' with 'content' as the attribute. Most common use: meta tag content extraction.

Wrong Extract type is the #1 reason extractors return blank columns. If your selector finds elements in DevTools but the SF column is empty, change the Extract type.

Step 5

Validate extractors with the test pane before crawling

In the Custom Extraction config, paste a sample URL into the test field. Hit Test. Each extractor should return real data.

Configuration → Custom → Custom Extraction. At the bottom is a URL input and a Test button.

Paste a representative URL. Click Test. SF fetches the URL and runs every configured extractor against it.

Each extractor should return data in the right-hand pane. Blank = selector wrong OR Extract type wrong OR element doesn't exist on this URL.

Test 3-5 representative URLs (e.g., a product page, a category page, an article) to catch extractors that work on some templates but not others.

Don't launch a 50K-URL crawl with untested extractors. Blank columns across thousands of URLs are a recoverable problem only if you catch it before acting on the data.

Step 6

Run the crawl and export the extracted columns

After crawl, Internal tab → scroll right to find your custom extractor columns. Export → CSV.

Run a full crawl with the configured extractors. They execute alongside the standard extraction at no extra cost per URL.

After completion, open the Internal tab. Scroll right past the standard columns (Title 1, Meta Description, H1, etc.) — your custom extractors appear as separate columns named after their config.

Filter the URLs to the template you care about (e.g., URL contains '/product/'). The extractor columns should show data on those URLs.

Export → CSV. Open in Sheets/Excel. The custom columns are now joined to URL, status, and other crawl data in one analyzable file.

Common use cases unlocked: price tracking across product catalog, schema JSON-LD field validation, hreflang verification at scale, review-count extraction for snippet eligibility, OG image verification, internal link audit by anchor.

Step 7

Common extractor recipes you should keep on hand

Schema JSON-LD: `//script[@type="application/ld+json"]`. OG image: `//meta[@property="og:image"]/@content`. Price: site-specific CSS.

JSON-LD schema: XPath `//script[@type="application/ld+json"]` with Extract Inner HTML. Returns the full schema block per page — paste into Google's Rich Results Test for validation.

OG image URL: XPath `//meta[@property="og:image"]` with Extract Attribute Value, attribute name `content`.

Hreflang variants: XPath `//link[@rel="alternate"][@hreflang]` with Extract HTML. Returns all hreflang links per page.

Canonical URL: XPath `//link[@rel="canonical"]` with Extract Attribute Value, attribute `href`. (SF extracts canonical by default, but this is useful for custom analysis.)

Product price: site-specific. Inspect, find the price class/element. Example for many ecom sites: CSSPath `.product-price` with Extract Text.

Review count: site-specific. Example: CSSPath `.review-count` with Extract Text, then post-process the CSV to strip 'reviews' suffix.

Author byline: CSSPath `.byline-author` with Extract Text. Useful for content audits.

Common mistakes

What goes wrong (and how to avoid it)

Using DevTools' Copy Selector verbatim
What goes wrong: The auto-generated selector includes nth-child paths like `body > div:nth-child(2) > section > article`. Any layout change breaks it. Your extractor returns blanks for a week before you notice. Cost: re-running the crawl and re-extracting can cost 4-8 hours of cycle time per audit.
How to avoid: Use the auto-generated selector as a starting point, then simplify to class- or data-attribute-based selection. Talk to the dev team about which classes are stable.
Not validating selectors against multiple templates
What goes wrong: Your product-price selector works on /product/* URLs but returns empty on /category/*. The crawl 'completes' but 60% of your data is missing. You make decisions on partial data. Pricing analysis off by 40-60%.
How to avoid: Test extractors against 3-5 URLs spanning different templates (product, category, blog, home). If the selector should only fire on one template, that's fine — verify it does.
Wrong Extract type returning blanks
What goes wrong: Selector is correct (finds elements in DevTools) but Extract type is wrong. SF returns empty columns. You assume the selector is broken and spend hours re-writing it.
How to avoid: When SF columns are empty but DevTools shows the elements, change Extract type. Common mistake: using 'Extract Text' on a `<meta>` tag (which has no text content) instead of 'Extract Attribute Value' with attribute 'content.'
Extracting too much per page
What goes wrong: 100 extractors each pulling 50 elements per page. The crawl slows by 3x. Memory ballooning. CSV exports become 200-MB files that crash Excel. Audit cycles slow from days to weeks.
How to avoid: Limit to extractors you'll actually analyze. 5-15 well-chosen extractors > 50 'nice to have' ones. Add more in subsequent crawls as needs emerge.
Regex extractors with greedy patterns
What goes wrong: A regex like `<div.*</div>` is greedy and matches from the first `<div>` to the LAST `</div>` on the page. Returns 100KB of HTML per URL. CSV file balloons; analysis becomes impossible.
How to avoid: Use non-greedy regex: `<div.*?</div>` matches the smallest match. Better: don't use regex for HTML parsing — use XPath, which understands DOM structure.
Not exporting and analyzing within 48 hours of the crawl
What goes wrong: The crawl data is fresh but no one analyzes it. By the time someone opens the CSV, the site has shipped 5 changes and the data is stale. Custom extraction loses its value if it isn't acted on quickly.
How to avoid: Schedule analysis sessions for the day after each crawl completes. Custom Extraction is leverage; it only pays off when the data drives decisions within the data's lifespan.

Recap

What to take away

Configuration → Custom → Custom Extraction. Up to 100 extractors per crawl on Spider tier.
CSSPath for simple cases, XPath for attribute/parent selection, regex only when DOM selection fails.
Build selectors in DevTools first. Verify they return the right element count before pasting into SF.
Wrong Extract type = blank columns. Use Attribute Value for `<meta>` and `<link>` tags.
Always validate against 3-5 sample URLs before launching the full crawl.

Done — what's next

How to set up Screaming Frog and run your first crawl

Read the next tutorial

Hand it off

Custom Extraction is where Screaming Frog stops being a generic auditor and becomes a programmable data layer for your site. A vetted technical SEO specialist on EverestX will build the extractor library for your stack, document them, and own the recurring extraction → analysis cycle — typically $500-900/mo at $14-16/hr.

See specialist rates

Frequently Asked Questions

Do I need to know XPath, or is CSS Selector enough?

CSS gets you 80% of common use cases. XPath gets you the remaining 20% — attribute selection, parent/child traversal, position-based selection. Most extractor libraries are a mix. Start with CSS; learn XPath as you hit its limits.

Can I extract structured data (JSON-LD) from every page?

Yes. XPath `//script[@type="application/ld+json"]` with Extract Inner HTML returns the full schema block. Post-process in Sheets or paste into Google's Rich Results Test to validate. This is the fastest way to audit schema coverage at scale.

How many extractors can I run at once?

Spider tier: 100 per crawl. Cluster/Enterprise: unlimited. In practice, 10-20 is the sweet spot — more than that and the crawl slows noticeably + the CSV becomes unwieldy.

Why are my extractor columns empty when DevTools shows the elements?

Three usual reasons: (1) Extract type is wrong — try Attribute Value for `<meta>` and `<link>` tags; (2) The element is JS-rendered and you don't have JS rendering enabled in SF; (3) The element exists only on some templates and you're looking at the wrong template's row.

Can Custom Extraction handle authenticated pages?

Yes — Configuration → Authentication → Forms-Based Authentication. Log SF in to your site, then run the crawl with extractors against logged-in URLs. Useful for auditing member-only sections.

How to use Custom Extraction in Screaming Frog

Open the Custom Extraction configuration

Pick the right extractor type for your use case

Build your selector with browser DevTools first

Choose the right Extract type

Validate extractors with the test pane before crawling

Run the crawl and export the extracted columns

Common extractor recipes you should keep on hand

What goes wrong (and how to avoid it)

What to take away

Frequently Asked Questions

Related tutorials

How to set up Screaming Frog and run your first crawl

How to configure JavaScript rendering in Screaming Frog

How to generate an XML sitemap with Screaming Frog

How to run Ahrefs Batch Analysis on 200 URLs (and prioritize them)

When to hire a technical SEO specialist — the honest checklist

How to use Custom Extraction in Screaming Frog

Open the Custom Extraction configuration

Pick the right extractor type for your use case

Build your selector with browser DevTools first

Choose the right Extract type

Validate extractors with the test pane before crawling

Run the crawl and export the extracted columns

Common extractor recipes you should keep on hand

What goes wrong (and how to avoid it)

What to take away

Frequently Asked Questions

Related tutorials

How to set up Screaming Frog and run your first crawl

How to configure JavaScript rendering in Screaming Frog

How to generate an XML sitemap with Screaming Frog

How to run Ahrefs Batch Analysis on 200 URLs (and prioritize them)

When to hire a technical SEO specialist — the honest checklist