Wednesday, April 6, 2011

Random Thought: Statistics in science

I was conversing with a friend of mine about the research project that she was doing and we both came to an agreement. Statistics should not be done by the researcher. There are two main reasons why I think that statistics on a set of data should not be done by the researcher who gathered the data.



The first reason is that, unless you are in the field of statistics or mathematics, it is unlikely that you understand the reasoning behind the tests being run. How many scientists know the difference between the post-hoc Fisher's LSD and Tukey's HSD tests? How many will actually test their data for normality before running statistical tests that assume parametric data? As a rule most researchers I know simply take the data and run ANOVAs and then take the output. Many don't have any statistical education beyond a few undergraduate level courses. A statistician will be able to run the appropriate tests for the appropriate reasons. They are, by far, better equipped to handle the data.

The second reason is that the statistician has no bias. They will analyze the data without care what the results say. No matter what the ideal is, researchers do want their research to lean one way more than another. Most of the time, they wish for the research to show that the research target has an effect. Publishing papers saying that 'x' channel is related to 'y' illness is easier than publishing one that says there is no relation between the two. This bias may not even be conscious. A statistician has no stake in the matter. When a researcher removes an 'outlier', are they doing so to for a better fit, or so that the statistics will tell the story that they wish to tell? When they choose the tests that need to be run (or if they run multiple tests) which results do they use?

I'm not suggesting that researchers should dump oodles of raw data upon unsuspecting statisticians and merely wait for the result. There does have to be communication between the two. The statisticians may have to know how the data is collected, what the data means, and what type of data (ie. nominal, ordinal, or ratio). Sometimes the researcher needs the output in a certain unit, which require additional computations, this needs to be made known to the statistician.

My suggestion for this is that each department of a research institution have dedicated statistician staff. They should not be employed directly by the researchers in order to avoid any form of financial coercion. Ideally, there should be at least one statistician per lab but I realize that this is not likely. There can, of course, be an appeals process whereby the case will be given to another statistician and redone. Statisticians should also be rotated among the researchers so that bias cannot form. This is not purely beneficial for researchers. This kind of system means an increase in employment. Which is always a plus for the economy. There is also the increased interest in mathematics for a career, which is a plus for academia. Good things all around really.

No comments:

Post a Comment