Lets say that we have two boards, Board A and Board B. The scores in the two boards are as follows
Board A | Board B |
100 | 98 |
100 | 83 |
98 | 82 |
95 | 81 |
93 | 80 |
Looking at the above scores, it seems that the test paper for Board A was a lot easier. If we compare the scores directly, Board A students have an unfair advantage. That's why we need to normalise the scores in order to properly judge the relative merits of the students from different boards.
The proposed normalisation scheme (according to the newspaper) is as follows:
- The ratio between the highest marks constitutes a multiplication factor
- This multiplication factor is applied to all the scores
Board A | Board B | Normalised Board B |
100 | 98 | 98x(100/98) = 100 |
100 | 83 | 83x(100/98) = 84.7 |
98 | 82 | 83.7 |
95 | 81 | 82.7 |
93 | 80 | 81.6 |
Now we can see why this method is so broken. Although a casual glance tells us that Board B's test paper was a lot harder, the scores after normalisation have hardly changed! This is because one person got a good score of 98. This single data point is an exception to the rule, yet it has influenced the process so much as to render the normalisation completely meaningless.
This is an example of broken statistics. The top mark is usually an outlier and its a bad idea to calculate statistics of some data based on the outlier values.
I'm pretty amazed that they adopted this method of normalisation. Surely some statistician must have brought up this issue??
So what can be done?
I'm not a statistician, but here are some ideas that come to mind.
Fitting to a normal curve
How this works is to take the top mark and map it to 100, take the bottom score and map it to 0, and then map the intermediate scores based on a normal distribution with mean 50 and some experimentally obtained standard deviation. The two distributions can then be compared.
Drawbacks: This only works if the score distribution is normal! Usually it is not. The graph is generally skewed towards higher marks, as there are a lot more people passing the test than failing it. A common mistake is taking a non-normal distribution and fitting it to a normal curve.
Percentiles
Another scheme that is used is percentiles. The percentile is the percent of people who scored below you. So a 95 percentile means that 95% of the population who took the test are below you. Or in other words, you are in the top 5% of the population. Then, instead of comparing the absolute marks, you compare the percentiles.
This is like comparing rank, except that it normalises the fact that different number of students might have taken the two tests.
Drawbacks: A big drawback with percentile is that it can break near areas of high density. Take the above example again
Score | Percentile |
98 | 80 |
83 | 60 |
82 | 40 |
81 | 20 |
80 | 0 |
As you can see, only 4 points separates the 0 percentile with 60 percentile. Of course, the effect is pronounced in this example because the sample size is so small. The same thing happens to a lesser degree in larger samples if the data is very dense in certain parts of the distribution.
Conclusion
Neither of the above solutions are particularly satisfying. Both introduce distortions of mapping one distribution onto another. In one case we are mapping a non-normal distribution to a normal one, in the other we are mapping it to a linear distribution.
The ideal solution would be to find out the actual distribution for test scores. Once that is done, both sets of scores can be equalised using the parameters of that distribution and compared. Since the distribution will be the same for both sets of test scores, the mapping will not introduce any distortion and the comparision will be fair.
No comments:
Post a Comment