The Top Ten Reasons to be a Research Skeptic

If you’re like me, you often find that articles in the published empirical literature on management, organization, and international business (among many other fields) are confusing, frustrating, and inconclusive. The overwhelming majority of work in this field embraces a “soft” social science research model which I call the Generally Accepted Soft Social Science Publishing Process (the GASSSPP). The GASSSPP suffers from a number of deficiencies which not only make this kind of work less than scientific, but actually defeat the objectives of science. (If you’d like to know more about the reasons or references for my skepticism, click here for a paper on this I published in Russia, or go to my Social Science Research Network link for a short free book on the subject.) Here are my top ten reasons for being a skeptic about the GASSSPP literature:

1. Nearly all conclusions in the empirical GASSSPP literature are incorrectly based on statistical significance. Statistical significance means virtually nothing. A result is not meaningful because it “achieves significance” at p < .05 (or p < .01 or p < .001, either)—the p level has nothing to do with the odds that a study outcome is due to chance, as is commonly believed. Statistical significance does not specify the likelihood that an outcome would replicate if the study were repeated, nor does the p level predict the number of statistics that should be significant by chance. Rejection of a null hypothesis does not mean the alternative is correct. These and other myths about statistical significance permeate the literature, and are the basis of GASSSPP “evidence.”

2. The only valid basis for a statistical conclusion is effect size (how big is a difference, or how strong is a relationship), which is almost completely ignored in the GASSSPP literature.

3. GASSSPP research is virtually never replicated. Owing to the mistaken beliefs about p levels (see item 1 above), studies are never repeated, a core requirement of real science. Even if a GASSSPP study accidentally stumbled onto what might be an important result, lack of replication assures that we’ll never know; it also assures that junk results will never be flushed out of the system.

4. Measurement is the heart of science, but treated very casually in the GASSSPP. One recent study concluded that the validity of measures in the journals is declining, and virtually no attention is paid to validity of measures. Reliability, of course, is not the same as validity, but in GASSSPP journals it usually is treated as if it is—literally, consistency is considered the same as accuracy, even if it means simply repeating the same mistake.

5. Peer review does not establish the quality of research nearly so much as it enforces the norms of a discipline. When these norms fail to do what science does, as with the GASSSPP, peer review enforces making the same mistakes throughout the discipline. If peer review actually ensured accuracy, the majority of GASSSPP research would never have been published in the first place, owing to erroneous interpretation of statistical significance.

6. Most GASSSPP researchers believe a null hypothesis is the same as a scientific hypothesis. They are not the same, and not even necessarily related.

7. The most likely explanation for the outcome of any small-sample study is sampling error, which published investigation has shown to be “grossly underestimated” by most researchers. The majority of published research is in the “small sample” category, and the most likely explanation for observed GASSSPP “results” is sampling error.

8. Most GASSSPP literature reviews (which is where every article begins) incorrectly treat every study as if it has equal weight. The only correct way to cumulate small-sample studies is through meta-analysis; otherwise, (conservative) evaluation of effect sizes from each study is the only indication of whether the literature shows anything.

9. Thomas Kuhn never said there are legitimate models of science other than “normal” science. He actually refers to “normal science” as the careful, conservative (often dull) research that incrementally makes progress in understanding a subject of interest. It is when normal science creates a body of results not supportive of established models that breakthrough thinking is needed, and this is what Kuhn refers to as a “paradigm shift.” But he never said that there are other models of science which are just as “scientific” as normal science. Researchers who believe that the GASSSPP constitutes a legitimate type of science should read Kuhn closely (and perhaps for the first time).

10. Professional organizations and journal editors (including the “top” journals) have shown little or no interest in addressing these and other serious weaknesses in the GASSSPP literature over the past 50 years, even though compelling evidence on these errors has been published, much of it in our top journals!

For those who may not believe my arguments, let me refer you to William H. Starbuck (2006), The Production of Knowledge: the challenge of social science research. New York: Oxford (ISBN 0-19-928853-4). Starbuck and I are very much of the same mind on the GASSSPP, except that he is more willing to tolerate some forms of statistical significance testing because of its deeply ingrained usage in the social sciences, despite awareness of its flaws (a position similar to that of William Hays in his 1963 edition of Statistics for Psychologists). I disagree, and still maintain what I presented to the 1992 Research Methods Division at the national Academy of Management meetings: “Statistical significance testing will not die or change of its own accord—we must kill it.” I believe that now more strongly than ever.