Hypothesis Testing Explained Simply

Is Drug X Better Than Drug BIn statistics, a hypothesis is a claim about something. There are always two hypotheses: the Null Hypothesis and the Alternative Hypothesis. Hypothesis testing is a core statistical method that is used in every branch of science and data analysis.

Effective Difference and Statistical Difference

There needs to be a definition of the "effective difference", or a difference that actually matters. For example, if dating agency "A" has a success rate of 25.00%, and dating agency "B" has a success rate of 25.01%, then the difference is too small to consider, and there is no need to do a hypothesis test. However, if airline "A" has a plane crash rate of 0.00001%, and airline B has a plane crash rate of 0.00002%, then the difference is very significant indeed, and hypothesis testing is appropriate.

Step 1: State the Null Hypothesis and Alternative Hypothesis

The Null Hypothesis is stated first, and it assumes that there is no change. e.g. "Dating agency A has the same rate of success as dating agency B", or "the new drug is no more effective than the old drug". The Alternative Hypothesis is that "the new drug is effectively better than the old drug".

Step 2: Define Confidence Required

In statistics, there is no such thing as 100% sure, so a level of uncertainty must be allowed. The level of certainty depends upon the application. The level of 5% uncertainty is fairly typical, although the medical industry often uses 1% uncertainty. These numbers (5% and 1%) are called the "alpha risk" - the probability of concluding that there is a difference when there actually is none. There is also a "beta risk" - the probability of concluding that there is no difference, when a real difference exists.

Step 3: Calculate Sample Sizes Required

The sample size calculation is performed for a planned trial. It is often not necessary when looking at historical data. It depends on five other factors:

The data distribution type (Gaussian, Binomial, Poisson, Ratio etc)The level of uncertainty required (alpha risk)The level of uncertainty required (beta risk)The standard deviation in the samplesThe effective difference stated.Step 4: Select Appropriate Statistical test

The type of statistical test to be performed depends on what the measurable parameter is and how it is distributed. To compare dating agency A against dating agency B, the parameter could be "ratio of customers that got dates", or "ratio of members that got married". To compare drug A to drug B, then median survival time (or median recovery time) is commonly used

Step 5: Perform Statistical Analysis

Data from the trial (or historical data) are then subjected to an appropriate test. In the examples in "Step 4", the test would be "Test of two proportions" for the dating agencies. Selecting the correct statistical test is critical - using the wrong test will often result in incorrect conclusions. The analysis will produce a number, often called the "p-value", which is the probability that the results found happened by chance, and were not the result of a difference between the two populations.

Step 6: Make Decision About Validity of Null Hypothesis

The p-value produced in step 5 is considered. If the p-value is low (less than desired uncertainty), then the Null Hypothesis is rejected. A p-value of 0.04 is 4%, which means that there is 4% chance that the result happened due to randomness, and that there is likely to be an underlying difference between the two populations. If the p-value is 0.15 then there is a 15% chance that the results happened due to randomness, and so the analyst will "fail to reject the Null Hypothesis", as this number is too high.

Summary of Hypothesis Testing

A hypothesis is a claim made about a population. It is used in all branches of science, and most academic disciplines. There are six main steps to performing a hypothesis test, starting with a Null Hypothesis, stating that "nothing is different" or "nothing has changed". A probability is calculated, from which the Null Hypothesis is either rejected or not rejected. There are confidence levels for the result.