The Experiment
Performing power analysis and sample size estimation is an important aspect of experimental design. Without these calculations, the sample size may be too high or too low. If the sample size is too low, the experiment will lack the precision to provide reliable answers to the questions it is investigating. In this case, it would be wise to alter or abandon the experiment. If the sample size is too large, time and resources will be wasted, often for minimal gain.
HOW
GATHER
For each test, we will gather the following four data points before any code or resources are used.
- What is the primary performance indicator (KPI)?
- By how much do we believe our hypothesis will effect that KPI (effect size)?
- Our standard can be 1%, 2% ,3%
- By how much do we believe our hypothesis will effect that KPI (effect size)?
- What is the acceptable minimum confidence level – 90%, 95%, 99% (significance level)?
- What is the acceptable minimum power level- 70%, 80%, 90%?
- Often considered to be between .80 and. 90.
- Think of “Power” as the strength of the experiment. Statistical power is the probability that the test will detect an effect that actually exists.
- What is the current traffic size on the page being tested?
WHY
CALCULATE
With these data points, (effect size, sample size, significance level, power) we can enter three of the four quantities and the fourth is calculated. The basic idea of calculating power or sample size is to leave out the argument that you want to calculate. If you want to calculate power, then leave the power argument out of the equation. If you want to calculate sample size, leave ‘n’ out of the equation. Whatever parameter you want to calculate is determined from the others.
Power Analysis for - Checkout - Guest Checkout - Current Results
What is the Power of our current test results?
- Sample Size (n) = 30,946
- Effect Size (d) = 1.8%
- Power = Unknown?
- Sig Level (alpha/confidence level) = 0.12 or 88%
Power = ~90%
This tells us that there is a 90% probability our test we will be able to detect a change.
However, there is only an 88% confidence level in that change.
What do we do?
- We could accept 88% as “good enough”.
- We could re-run our power analysis with a smaller effect size. This will increase the sample size needed. Continue running the test.
ปั้มไลค์
Like!! Great article post.Really thank you! Really Cool.