Alpha threshold

The scientific method tests hypotheses. We don’t actually prove our hypothesis, we set out to disprove the opposite (called the null hypothesis). And we don’t even get to actually disprove the the null hypothesis either; the best we can manage is to say that the probability of the null hypothesis being right is very low. How low is very low? That is in the eye of the beholder, and the beholder sets this level by defining the alpha threshold. In medicine, for historical and arbitrary reasons, we set alpha to be 0.05. That is, we are willing to accept the a 5% chance of a false positive study result, aka Type I error. If the p-value of a study is below alpha (p < 0.05, typically), the study results are said to be statistically significant. In some cases, it may make sense to set alpha even lower than 0.05; but that of course increases the risk of failing to reject a false null hypothesis, Type II error. As in many situations in clinical medicine, setting alpha is a compromise. One cannot "err on the side of caution" with a uniformly high or low threshold. The conservative approach depends on the clinical details.


Incidence is a measure of the number of new cases of a disease that crop up in a population over a specified time interval. It is often confused with prevalence.


Prevalence is the fraction of the population with the disease, or the ratio of the number of people with the disease to the number of individuals in the population. Prevalence should not to be confused with incidence – the latter is a measure of new occurrences of a disease within a given time interval, whereas prevalence counts total cases, regardless of when they were contracted. Prevalence is typically greater than incidence, unless the disease is highly fatal.


The mean is the average value for all of the data points. Note that only continuous, not categorical, data can have a mean. The "mean result" of a series of 6 patients, 2 of whom were rated "bad", 2 "ok", and 2 "good" is not "ok"; rather, in that case, the term "mean" is meaningless (no pun intended).


The median is the value of the "middle number" of the sorted data set; by definition, half of the values are above the median, and half are below. If there are 101 data points, the median is the value of sorted item number 51 (50 are below it and 50 are above). The median is also the 50th percentile. The median of the set (1,2,3,4,5,6,7,8,9) is obviously 5. The key to finding the median is sorting the set; the median of the set is not so obvious when not sorted: (8,5,6,1,2,7,4,9,3).


The mode is defined as the most frequently occurring value or item within a distribution.


This is a test statistic representing the probability that the null hypothesis (typically, the assumption the two groups are not different) is true. Hence, the lower the value, the more likely that the two groups are indeed different.

Statistical significance

When we say two sample means are different and that the difference is statistically significant, we are saying that the probability that the two means come from the same group is so low that we can ignore that possibility and consider that the samples represent groups that are indeed distinct. Ordinarily, one sets a standard for significance, called the alpha threshold, and then calculates a p- value of the results. The p-value is compared with the alpha threshold, and if the p-value is below alpha, the results are said to be statistically significant.

Type I error

A false positive. Example: A small (n=2) study of ankle fractures showed that ORIF leads to faster union than casting. In fact, if a larger study were undertaken, time to union would be the same. This is a Type I error.

Type II error

A false negative. Example: A small (n=2) study of ankle fractures showed that ORIF and casting have the same wound infection complication rate; zero in both groups. In fact, if a larger study were undertaken, wound infections would be found only in the surgical group. This is a Type II error. The Power of the Study is the ability to detect differences if they truly are present. Power increases with sample size. The small study size caused the error here.