There are several definitions that need to be mastered in order to use, understand and apply the methods of biostatistics. These three pairs of terms are commonly confused with each other.

Prevalence vs incidence

Prevalence and incidence are both concerned with frequency of disease (though frequency is not a word invested with special meaning, and is used in biostatistics only in its English sense).

Prevalence is defined as the number of total cases of a disease that are present in a population at a given time. Incidence is the number of new cases of a disease that crop up in a population over a specified time interval. You can easily imagine that prevalence exceeds incidence when the disease is neither curable nor fatal—cases just accumulate. Incidence exceeds prevalence in the converse setting: the total number of active cases is lower than the number of new cases because either the disease or the patient with it disappears from the group.

Mean, Median and Mode

When analyzing a specific set of data, certain terms are useful in describing the data set. The mean is the average value for all of the data points. The median is the value of the “middle number” of the sorted data set; if there are 101 data points, the median is the value of sorted item number 51 (50 are below it and 50 are above). If the distribution – the arrangement of the data points– is symmetric about the mean, then the median value equal the mean value. On the other hand if the distribution is skewed, the two will be different.

Consider the weights of all American men. Because the upper limit is essentially boundless but the lower limit is fixed (or, put another way, you can be 300 pounds heavier than the mean but you can’t be 300 lighter), it should be clear that the distribution is not symmetric: there are more data points (light men) below the mean than points (heavy men) above it—but it is a weighted average, pun intended, such that one heavy man can drive up the mean more than a light man can reduce it. Hence the median value is lower than the mean. The mode is defined as the most frequently occurring value or item within the distribution. (As a mental exercise, list a set of 11 numbers whose mean, median and mode are the same. Click here for one possible example)

Statistical significance vs Clinical significance

Statistical significance is a bit of a misnomer. Bernstein (J Am Acad Orthop Surg. 2004 Mar-Apr;12(2):80-8) has argued for the more descriptively accurate “statistically distinct”.

When we say two sample means are different and that the difference is statistically significant, we are saying that the probability the two means come from the same group, ie they do not really represent valid distinctions because the difference we found was due to chance, is so low that we can ignore that possibility and thus conclude that the samples represent groups that are indeed distinct. (Note Statistical significance is always associated with a p-value, representing the probability measurement. Note further that means which are different with a p value of 0.09 may not be “statistically significant” for most medical journals (which seems to worship at the alter of p < 0.05) but it is still 10-to-1 odds that they truly are distinct!)

For example, we may measure the mean time to union for ankle fractures treated with surgery and with casting. Obviously, we don’t assess every broken ankle in the world but only a sample. If we find that the mean time to union in the casted group is 49 days, and the mean in the operated group is 42 days and that the difference is significant to 0.05, we are saying the probability is below 5% that a difference of this size, 7 days, would have been found if indeed casting and operating lead to union at the same time. Now, that says nothing about clinical significance. Is 7 days clinically significant? That is in the eye of the beholder…there is no statistical test for that.