ICE Scoring: The Complete Guide to Experiment Prioritization
The complete guide to ICE scoring: score Impact, Confidence, and Ease with anchors, fix the Confidence mistake almost everyone makes, learn why ICE needs ROTI, and grab a free template.
Every growth team has the same problem: too many experiment ideas, not enough time to run them all.
You have a backlog of 30 hypotheses. Your sprint is two weeks. You can realistically run three. Which three do you pick?
If your answer is "the ones the CEO likes most" or "whatever feels most urgent this week," you are not alone. But you are leaving wins on the table. ICE scoring fixes that. It takes the politics, the gut feelings, and the HiPPO-driven decisions out of experiment prioritization and replaces them with a simple, repeatable system.
This guide covers what ICE scoring is, how to score each dimension with anchors that stop the inflation, the one thing almost everyone gets wrong about Confidence, why ICE alone is not enough, and a free template you can use today.
What is ICE scoring?
ICE scoring is a prioritization framework that ranks experiments on three dimensions, Impact, Confidence, and Ease, each scored 1 to 10, to decide what to run first. It was popularized by Sean Ellis, who coined the term "growth hacking."
- Impact: how much will this move the metric if it works?
- Confidence: how sure are you about your impact estimate?
- Ease: how easy is this to build and measure?
ICE Score = (Impact + Confidence + Ease) / 3
Highest score runs first. Simple, consistent, defensible.
Why ICE scoring works
Most teams prioritize by whoever argues loudest. The result: expensive experiments that confirm what everyone already knew, while high-leverage ideas rot in the backlog. ICE forces three useful behaviors:
- It makes assumptions explicit. Scoring Confidence a 4 forces the question: why only 4? What evidence is missing? That surfaces gaps before you spend sprint capacity.
- It levels the playing field. A junior analyst with data and a sharp hypothesis can outscore a VP with a hunch. The framework removes hierarchy from the math.
- It creates a shared language. Everyone scoring the same way stops the "which idea is better" debate and starts the useful one: how do we raise our confidence in this hypothesis?
How to score each dimension (with anchors)
The fastest way to ruin ICE is fuzzy scoring, where one person's 7 is another's 4. Anchor each dimension so a score means the same thing across the team.
Impact (1 to 10)
If the hypothesis is right, how much does it move your north star metric?
| Score | Meaning |
|---|---|
| 9-10 | Step-change (2x or more) |
| 7-8 | Significant (30-50% uplift likely) |
| 5-6 | Moderate (10-30% uplift) |
| 3-4 | Small (5-10% uplift) |
| 1-2 | Marginal or hard to measure |
Always tie Impact to a specific metric. "Improve the funnel" is not a metric. "Increase free-to-paid from 8% to 10%" is.
Confidence (1 to 10)
| Score | Meaning |
|---|---|
| 9-10 | Strong data or multiple validated precedents |
| 7-8 | Some data or one clear validated example |
| 5-6 | Logical reasoning, limited hard data |
| 3-4 | Mostly intuition or anecdote |
| 1-2 | Pure guess |
Ease (1 to 10)
| Score | Meaning |
|---|---|
| 9-10 | Hours: copy change, toggle, config |
| 7-8 | A day or two of work |
| 5-6 | One sprint |
| 3-4 | Multiple sprints, cross-team dependencies |
| 1-2 | Major engineering project |
Ease should account for the full cost: design, engineering, QA, legal review, and measurement setup. Underestimating Ease is the most common ICE mistake.
The Confidence trap (the part almost everyone gets wrong)
Here is the correction that separates teams who get value from ICE from teams who just fill a spreadsheet. Most people score Confidence as "how sure am I this will succeed." That is wrong, and it quietly double-counts Impact.
Confidence is certainty about your Impact estimate, not optimism about the outcome. "I really believe this will work" is not Confidence; that belief is already baked into the Impact number. A high Confidence score means your impact estimate rests on real evidence: past experiment results, user research, analytics, a validated competitor precedent. A low Confidence score means the estimate is a guess, however exciting.
Score it the way you would defend it to a skeptic: not "how much do I want this to win" but "how sure am I that my predicted effect size is the right one." Teams that make this one correction stop over-ranking moonshots with weak foundations and stop burying well-researched, moderate-impact tests. A low Confidence score is not a reason to kill an experiment; it is a signal to do discovery first.
What ICE misses: time to learn
ICE has a blind spot. It says nothing about how long an experiment takes to produce a learning. A high-ICE build that takes six weeks to read out scores the same on Ease nuance as a test that answers the question in three days, yet they are not equal. The faster one lets you run the next experiment sooner, and that compounding is the entire point of an experimentation program.
This is where ROTI (Return On Time Invested) completes the picture. ROTI scores how much you learn per unit of time. Run ICE and ROTI together and the rank order changes: a moderate-ICE painted-door test that resolves in days often beats a higher-ICE build that takes a quarter. ICE tells you what is worth doing; ROTI tells you what is worth doing now. For more on building the ranked queue, see the growth experiment template.
ICE scoring in practice: three worked examples
These are illustrative, not real company data.
Experiment A: countdown timer on the pricing page. Impact 7 (urgency lifts conversion in many tests), Confidence 8 (multiple validated precedents), Ease 9 (a few hours). ICE = 8.0.
Experiment B: in-app onboarding checklist. Impact 9 (activation is the biggest drop-off), Confidence 6 (interviews suggest confusion, no A/B data yet), Ease 4 (two sprints, cross-team). ICE = 6.3.
Experiment C: rewrite the homepage hero headline. Impact 6, Confidence 5 (audit flagged weak clarity, no testing), Ease 9 (copy change). ICE = 6.7.
Prioritization: run A (8.0), then C (6.7). B goes to the backlog until discovery raises its Confidence. Notice the highest-Impact idea (B) is not first. That is the power of ICE: it balances ambition with pragmatism.
Free ICE scoring template
| ID | Hypothesis | I | C | E | ICE | Status |
|---|---|---|---|---|---|---|
| EXP-001 | Add urgency to pricing, conversion rises >20% | 7 | 8 | 9 | 8.0 | Active |
| EXP-002 | Onboarding checklist, D7 retention up >15% | 9 | 6 | 4 | 6.3 | Backlog |
| EXP-003 | Rewrite hero headline, signups up >10% | 6 | 5 | 9 | 6.7 | Backlog |
The growth experiment template includes ICE plus ROTI scoring already filled in across 24 pre-built templates, and the experiment database keeps every experiment ranked as you run it, so you skip the spreadsheet maintenance.
ICE vs PIE vs RICE
- PIE (Potential, Importance, Ease): better for CRO teams focused on high-traffic pages.
- RICE (Reach, Impact, Confidence, Effort): better for product teams comparing features of very different scope.
- ICE wins for most growth teams: faster to score (three dimensions), works across any channel, easy to calibrate, low overhead.
Five common ICE mistakes
- Inflating Confidence without data. Confidence reflects evidence, not optimism (see the Confidence trap above).
- Ignoring dependencies. A three-team coordination is not an 8 on Ease.
- Scoring in isolation. Score independently, then compare and discuss divergences.
- Using ICE to kill ideas. A low score is a prioritization signal, not a death sentence.
- Never revisiting scores. Update them as you learn.
How to run an ICE scoring session (30 minutes)
- Prep (5 min): everyone reviews the backlog beforehand.
- Score independently (10 min): no discussion yet.
- Compare (10 min): flag experiments where scores diverge by more than 3 points.
- Discuss outliers (5 min): resolve the gap in thinking.
- Commit (2 min): lock the top three to five for the sprint.
Frequently asked questions
How is the ICE score calculated?
Score Impact, Confidence, and Ease each from 1 to 10, then take the average: (Impact + Confidence + Ease) / 3. Some teams multiply instead of average to penalize ideas that are weak on any single dimension. Rank experiments by the result, highest first.
What is a good ICE score?
On the averaged 1-to-10 scale, anything at or above 8 is a strong candidate to run now, 6 to 8 is a solid backlog, and below 5 usually needs more discovery (often to raise Confidence) before it earns a sprint slot. Treat the number as a ranking signal within your own backlog, not an absolute benchmark.
What is the difference between ICE and RICE?
ICE scores Impact, Confidence, and Ease. RICE adds Reach (how many users are affected) and replaces Ease with Effort. RICE suits product teams comparing features with very different reach; ICE is faster and works across any growth channel.
Does a low ICE score mean I should drop the idea?
No. A low score, especially one driven by low Confidence, is a signal to do discovery first (user research, analytics, a cheap validation test) and re-score, not to kill the idea.
What does ICE leave out?
Time to learn. ICE does not capture how quickly an experiment produces a result. Pair it with ROTI (Return On Time Invested) so fast, cheap tests that let you run the next experiment sooner are ranked appropriately.
Free tool: ICE Score Calculator for Growth Experiments. Score any experiment by Impact, Confidence, and Ease in seconds.