Loading tutorials…
Loading tutorials…
Crazy Egg's A/B Test editor is the easiest in the market — and that's the problem. Easy makes operators ship tests on 100-visitor samples and call a 12% lift 'a win.' This is the setup that prevents that and ships only tests that actually moved the needle.
Who this is forOperators ready to graduate from qualitative CRO (Snapshots, Recordings) to A/B testing. You need at least 5,000 visits/month per page being tested and a clear hypothesis. Below that, qualitative CRO is more economical.
What you'll need
Step 1
A hypothesis is: 'We believe [change] will cause [outcome] because [reason]. We'll know we're right if [metric] moves by [amount] within [time].' Skip this and you'll ship tests for which there's no clear win condition.
Bad hypothesis: 'Test a new headline.' This produces a test with no clear winner — even if the new headline lifts CTR 3%, you can't tell if that's noise.
Good hypothesis: 'We believe changing the hero headline from <current> to <new> will lift hero CTA click-through rate by 15%+ because the new copy speaks to the top job-to-be-done from our user interviews. We'll declare a winner if click-through rate moves 15% with 95% confidence within 4 weeks.'
The hypothesis sets: (1) the variant, (2) the primary metric, (3) the win threshold (effect size you'd ship for), (4) the time bound, (5) the confidence threshold.
Without these, you'll either ship false positives (small lifts that disappear post-test) or kill tests too early because you can't tell what 'winning' looks like.
Pre-register the hypothesis in a doc. Date it. Don't change the win conditions after you've seen results — that's p-hacking and it's how 80% of self-run A/B tests produce phantom wins.
Step 2
Use an A/B test sample-size calculator (or Crazy Egg's built-in one). Plug in baseline conversion rate, minimum detectable effect, and confidence level. Most tests need 5,000-50,000 visits per variant.
Go to a calculator (Optimizely's, AB Testguide's, or whatever Crazy Egg's editor surfaces). Inputs: baseline conversion rate (e.g., 2%), minimum detectable effect (MDE — the smallest lift you'd ship for, e.g., 10%), statistical power (default 80%), confidence level (default 95%).
Output: required sample size per variant. Example: 2% baseline + 10% MDE + 95% confidence + 80% power = ~15,500 visits per variant = 31,000 total to call a winner.
Compare against your traffic: if the page gets 10K visits/month, this test needs ~3 months at 50/50 split. Decide if that's acceptable.
If the math says the test will take 4+ months: either (a) target a larger MDE (only ship if it's a HUGE win), (b) test on higher-traffic pages, or (c) skip the test and ship based on Snapshot + Recording evidence instead.
Sample-size math is the gate that separates real A/B testing from theater. Skip it and 80% of your tests will be inconclusive after running for a month.
Step 3
A/B Tests → New Test. Crazy Egg has a Visual Editor (drag-and-drop) and a Code Editor (paste custom HTML/CSS/JS). Pick based on the change complexity.
In the left sidebar, click A/B Tests → + New Test.
Name the test clearly: 'Pricing headline V2 — Jan 2026 — primary metric: signup CTR'. Future-you and your team need the metric and date in the name.
Enter the target URL. Crazy Egg loads the live page in the editor.
Visual Editor: hover any element → click to edit text, change colors, hide elements, or swap images. Best for headline/copy/color tests where no new layout is required.
Code Editor: paste custom HTML, CSS, or JS that runs only for the variant. Best for: new sections, structural layout changes, or anything visual editor can't reach.
Add 1-3 variants. KEEP IT TO 2 VARIANTS (A vs B) for your first 10 tests. Multivariate (A/B/C/D) splits your traffic 4 ways — each variant gets 25% — which doubles or triples the time to significance. Stick to A/B until you have enough traffic for multivariate to matter.
Preview each variant in the editor. Test the variant on mobile and desktop separately — visual edits commonly break responsively.
Step 4
Traffic split = 50/50 between original and variant. Audience = the segment you'll test on. Default 'all visitors' is usually wrong — at least exclude returning visitors and internal traffic.
Set traffic allocation: 50% to control, 50% to variant. Even splits reach significance fastest. Skewed splits (90/10) only make sense for risk-mitigation tests on high-stakes pages — and they take ~5x longer to detect a winner.
Set audience: by default, all visitors who land on the URL. Refine: exclude returning visitors (their behavior is anchored on the original; muddies the read), exclude internal team IPs (already blocked at account level but double-check here), exclude bot user agents.
For paid-traffic-only tests: filter audience to UTM source = google/cpc + facebook/ads. Tests run only on the audience that drove the question.
For device-segmented tests: run separate tests for desktop and mobile. Behavior differs so much between them that merged tests usually show "winner on desktop, loser on mobile" results that net to zero.
Set test duration: minimum 1 full week (captures the day-of-week effect). Most tests should run 2-4 weeks to reach sample size and account for weekly traffic cycles.
Step 5
Crazy Egg supports URL goals (visit /thank-you), click goals (clicked .signup-btn), and custom event goals. Pick ONE primary goal. Secondary metrics are observational, not decision-driving.
Open the test → Goals tab.
Primary goal: the single metric that determines winner. URL goal: 'visited /signup/success' or '/checkout/complete'. Click goal: 'clicked .signup-btn'. Custom event: requires firing CE2.tracker('event', 'goalName') from your app.
Pick the conversion goal closest to revenue, not the easiest one to move. A 'clicked the hero CTA' goal is easy to move but tells you nothing about purchase rate. A 'completed checkout' goal is harder to move but tells you what matters.
For ecommerce: primary goal should be transaction completion (visit /thank-you with order ID), not add-to-cart. Tests that win on ATC often lose on purchases — the variant attracted more clicks but lower-quality clicks.
Add 1-2 secondary goals for observation (e.g., add-to-cart rate as a secondary alongside purchase as primary). Secondary goals show you the funnel impact but don't determine winner.
NEVER add new goals after the test starts. Mid-test goal additions invalidate the statistical model. If you missed a goal, run a new test.
Step 6
Test the variant in 5 browsers and 3 devices before launch. A broken variant in Safari that you didn't catch costs the test and the traffic.
In the Crazy Egg editor, click Preview. Open the preview URL in: Chrome desktop, Safari desktop, Firefox desktop, Chrome Android, Safari iOS.
Verify on each: (1) variant renders correctly, (2) no broken layout/overflow, (3) all clickable elements still clickable, (4) page still loads in <3 seconds with the variant code injected.
Check the Network tab during preview: variant code shouldn't add more than 50-100ms to LCP. A slow variant will appear to underperform purely because of load speed.
Test the goal-firing path: if your goal is 'visit /thank-you', complete the flow in preview mode → verify Crazy Egg records the conversion. If goal-firing is broken, the test will show 0 conversions for the variant and look like a catastrophic loss.
Pre-launch QA takes 30 minutes and saves the 2-4 weeks of test run time. Skip it and you'll usually catch a broken variant 5 days in — then have to restart the test and lose all the data.
Step 7
Launch. Don't peek. Wait until sample size is hit AND test has run at least 7 days. Read results only after both conditions are true.
Launch the test. From this point, you wait.
DO NOT peek at results daily. Early-stage results swing wildly — at 200 visits per variant, you might see a 30% lift that has 0 statistical significance. Looking at it tempts you to ship the variant, ship the loser, or kill the test based on noise.
Set a calendar reminder for the projected end date (based on your sample-size math).
When the test reaches required sample size AND has run at least 1 full week: open the test → Results tab.
Check: (a) confidence level reached 95% or higher, (b) primary goal moved in the predicted direction, (c) effect size meets your minimum detectable effect threshold (the 10%+ from your hypothesis). All three = ship the winner.
If confidence is 85% and you stopped at sample size: the test is inconclusive. Either run another 2 weeks or accept that this change didn't produce a meaningful effect. Don't ship based on 85% confidence — it's a coin flip.
After shipping: monitor the metric for 2-4 weeks post-launch. Real wins should hold. If the lift evaporates within a month, the test was a false positive (often a novelty effect — users behaved differently in the test window because the change was new).
Common mistakes
Skipping sample-size math and stopping at the first promising result
What goes wrong: Test runs 5 days, variant shows 12% lift at 70% confidence. Operator declares 'winner,' ships the variant, and the lift evaporates within 2 weeks. Worse, the operator now has a documented 'win' in their playbook that informs future decisions but isn't real. Repeat this 3-4 times per quarter and you've shipped 4 false wins, none of which actually moved conversion. ~$5,000-15,000 in design+dev cost on changes that did nothing.
How to avoid: Pre-compute sample size before launching. Run until BOTH sample size is reached AND test has run at least 7 days. Don't stop early no matter how exciting the early result looks.
Running A/B/C/D multivariate before you have the traffic
What goes wrong: Test splits traffic 4 ways. Each variant gets 25% of your already-small sample. Time-to-significance triples. After 8 weeks, all variants are statistically tied. The test produced no decision. 8 weeks of CRO momentum gone, plus design time on 3 variants ($3K-6K) you can't use.
How to avoid: Stick to A/B (2 variants) until you have at least 30K monthly visits per page. Above that, A/B/C can work. Multivariate (full factorial) needs 100K+/mo per page.
Choosing easy-to-move goals instead of revenue-aligned ones
What goes wrong: Test optimizes for 'hero CTA click rate.' Variant wins by 30%. Conversion to purchase is unchanged or worse — variant attracted more low-intent clicks. Operator ships the variant based on the CTR win. Revenue stays flat or declines. On a $30K/mo revenue site, a test that 'wins' CTR but breaks purchase rate can cost $2K-8K/mo until someone notices and reverts.
How to avoid: Primary goal must be the most revenue-aligned metric reachable from the test page. Click-rate is observational; purchase-rate is decision-driving.
Adding goals or changing audience mid-test
What goes wrong: Halfway through the test, operator notices a secondary metric and adds it as a new goal. Or filters audience to a different segment. Both invalidate the statistical model — the test is now measuring a different population than the one the sample-size math assumed. Result is no longer trustworthy. Wasted 2-4 weeks of traffic + design+dev cost.
How to avoid: Lock test configuration at launch. Pre-register goals and audience in writing. If you missed something, run a new test — don't modify the running one.
Testing two unrelated changes in one variant
What goes wrong: Variant changes hero headline AND CTA color AND adds testimonials. Variant wins by 15%. You can't tell which change caused the lift. Next test, you copy the headline but skip testimonials. No lift. You wasted 1-2 future tests trying to attribute the win.
How to avoid: One change per variant. If you have 3 hypotheses, run 3 sequential A/B tests. Each clean test produces an attributable result.
Not validating wins for novelty effect
What goes wrong: Variant wins by 18%. Ship it. After 3 weeks of being live, conversion rate returns to baseline. The 'win' was a novelty effect — users responded to the new thing because it was new, not because it was better. You shipped, declared victory, and recorded a phantom $5K-20K/year of 'CRO impact' that isn't real.
How to avoid: After shipping a winner, monitor the primary metric for 4 weeks post-launch. Real wins hold. Novelty wins decay within 2-3 weeks. If the lift evaporates, revert and rethink.
Recap
Done — what's next
How to set up a Crazy Egg Snapshot the right way
Read the next tutorial
Hand it off
A/B testing done well typically lifts site conversion 10-30% over 12 months. A/B testing done poorly burns design+dev cycles on false-positive 'wins.' A specialist runs the sample-size math, locks the design, and ships only tests with defensible results. Most engagements run $400-1,200/mo at $14-16/hr — net positive on the first real win, which usually arrives within 60-90 days.
See specialist rates
Minimum 1 full week (to capture day-of-week effects). Typically 2-4 weeks to reach required sample size. The exact answer comes from the sample-size calculator: baseline conversion rate + minimum detectable effect + 95% confidence + 80% power = required visits per variant ÷ daily traffic = test duration.
Three common causes: (1) Stopped the test too early on a false positive — at 70% confidence, ~1 in 4 'wins' are noise. (2) Novelty effect — users responded to the change because it was new, not because it was better. (3) Audience drift — the variant won on the test audience but underperforms on the broader live audience.
Technically yes, but don't. Running two A/B test tools on the same page can cause variant interference (one tool's variant interacts with the other's) and confuses the analytics. Pick one A/B testing tool per site. Google Optimize sunset in 2023, so most operators are choosing between Crazy Egg, VWO, Convert, and Optimizely.
Yes, by 50-200ms typically. Crazy Egg's variant code runs after the page starts loading, so the variant 'pops' in. For minor text/color changes, this is invisible. For structural changes, you may see a flash of unstyled content (FOUC). Use the 'flicker control' setting in the test config to mitigate.
Yes, and you usually should. Behavior differs so much between them that merged tests often show winners on one device and losers on the other. Create separate tests with device-segmented audiences. Doubles your test count but produces actionable per-device results.
Ending the test in Crazy Egg removes the variant code injection. The site reverts to the original. If you want to keep the winning variant, you need to implement the change in your actual site code — Crazy Egg's variant editor is for testing, not permanent deployment. Forgetting this and ending a 'winning' test reverts the lift.
Crazy Egg
Crazy Egg's Snapshot is a five-report bundle (Heatmap, Scrollmap, Confetti, Overlay, List). Most operators read one and skip the other four — and the four they skipped were the ones that answered the question. This is the setup that reads the whole bundle.
Crazy Egg
Crazy Egg Recordings are powerful — and most-wasted. The difference is filter discipline. This is the setup that turns 1,000 recordings/month into 5 useful insights, not 1,000 hours of 'someday I'll watch these.'
Crazy Egg
CTA Cards are Crazy Egg's pop-up + banner builder. The good news: no developer required. The bad news: the default settings show every CTA to every user on every page — and 90% of pop-ups that drive revenue look like they should be illegal. This is the setup that converts without burning trust.