What is management junk science? It is the body of social-science-based literature that typifies nearly all of the empirical research in the scholarly business and management journals, including the “top” journals. I refer to this set of methods and assumptions as the Generally Accepted Soft Social Science Publishing Process, or the GASSSPP.
The name “GASSSPP” needs a little explanation. It has three sources. First, with all due respect to the accounting profession, it is Generally Accepted in the same way as the Generally Accepted Accounting Principles (GAAP) followed in the US. The GAAP is based on convention and adoption, not science. Second, the late Paul Meehl and others refer to the “‘Soft’ Social Sciences” as those where measures of variables typically cannot be externally verified; for example, looking into the relationship between job satisfaction and individual motivation is a “soft” field because we have no firm standards to measure either variable. Third, the term “Publishing Process” refers to the primary reason for doing this kind of work—the GASSSPP is a means to get it published by the researcher(s), and build a resume filled with a history of serious academic publication. It outwardly may appear to follow the same rules as the real sciences, but close inspection of the GASSSPP shows that it is neither science nor scientific.
The reason that GASSSPP research is junk science is that it follows those practices of terminally flawed social-science research methods and assumptions where we come up with an idea, collect data presumably relevant to that idea, and then make a decision about whether we’ve found anything on the basis of “statistical significance,” that is, by referring to the p level. The p level is the probability that we got data at least as extreme as what we observed if our null hypothesis is true, or P(D|H). In and of itself, p means very little, and is never proof of the “correctness” of the hypothesis being tested. This is the root problem with the GASSSPP and has become the foundation stone on which an entire mythology of research practice and interpretation has been constructed since roughly 1930. Being peer-reviewed not only does not bestow the status of science on this body of work, but makes its problems worse by peer-reinforcement of the unscientific practices inherent in the GASSSPP. Its core deficiencies are summarized in my Top Ten Reasons to be a Research Skeptic, but let me summarize the major flaws and assumptions in the GASSSPP mythology here:
1. p tells us the odds that rejection of our null hypothesis is due to chance, i.e., P(H|D). In fact, it only tells us the probability of the data we observed given our null hypothesis, which is P(D|H). In nearly all of our journals, P(D|H) is incorrectly interpreted as P(H|D), and they are entirely and materially different.
2. Statistical significance establishes existence of a statistical effect. In fact, significance and effect are independent of each other for any sample of reasonable size, and large samples assure statistical significance even when there is no meaningful effect.
3. p < .05 proves we have support for an hypothesis. In fact, p alone is never proof of anything, and the .05 level is a convention lacking any scientific basis whatsoever.
4. p < .05 is a “significant” outcome, p < .01 is “very significant, ” and p < .001 is “highly significant.” In fact, there is no linear scale of outcome strength as a function of the p level, and these all-too-common statements are always incorrect.
5. p is the appropriate metric for those interested in theory development, and effect sizes matter only when practical application is the issue. In fact, p alone is never the appropriate metric to evaluate outcomes, and it establishes neither practical nor theoretical importance.
6. The p level indicates the likelihood that an outcome would not replicate if the study were repeated. In fact, p provides absolutely no information about replication.
7. The p level predicts the number of statistical outcomes that would be significant by chance. In fact, this would be true only if one is certain that P(type II error) = 0, and it never is.
8. A null hypothesis is a scientific hypothesis. In fact, it is not—the null hypothesis is an artifact used to construct an empirical question (and is never used in real science), where a scientific hypothesis is usually a tentative explanation of a phenomenon based on limited evidence; the two are completely unrelated.
9. Rejecting a null hypothesis means the alternative is correct. In fact, it does not—if a specific alternative is true, that must be demonstrated independently.
10. Reliability can be substituted for validity. In fact, they are absolutely not the same, but in GASSSPP journals they usually are treated as if they are—literally, consistency is considered the same as accuracy, even if it means simply repeating the same mistake, and one study (Scandura & Williams, 2000) concluded that the validity of measures in several top journals has declined in recent years, not become stronger.
In the overwhelming majority of GASSSPP research, Type II error is treated as if it doesn’t matter; neither does statistical power in a test. It comes as a shock to most researchers to learn that the real “accuracy” of an outcome from obtaining p < .05 is about 50%, equal to that from flipping a coin; not only that, but since large samples nearly guarantee statistical significance, when very large samples are used that level of “accuracy” drops to less than 50%. With very large samples, a researcher would get better results by literally flipping a coin and not bothering with statistical analysis at all. It is widely believed that p levels are appropriate for theory testing, even if we accept that they are useless for assessment of practical results; there is absolutely no truth to this belief, either.
Does all of this mean that everything in the scholarly journals is management junk science (hereinafter, MJS)? No—there is some useful work being done, but not in the GASSSPP research. The greater problem is that GASSSPP methods are now dominant in business and management research; and since GASSSPP is completely structured on myths and falsehoods, it is a scientific black hole, sucking in everything that gets to the publishing “event horizon.” I think there are some published works that are probably excellent ideas and should become much more prominent in our theory and perhaps in practice. I say “probably” however, because under the GASSSPP we’ll never know if this work was inspired insight or a singular confluence of luck and circumstance. The same is true if an article is the dumbest damn thing anybody ever cooked up. One of the numerous flaws of the GASSSPP is the lack of replication studies, a hallmark of real science, and the GASSSPP guarantees we’ll never know what anything really is—once a piece gets peer-reviewed and published, it’s “truth.”
There are many who will read this blog and already know most of the flaws in the GASSSPP and the problems with MJS. There are others who will read the Top Ten and some of my comments about MJS and will think this is all garbage. I’ve provided quite a number of references on the well-documented flaws of the GASSSPP on the reference pages, for those who may not be familiar with this area of the research methods literature.
But it is not my objective to be antagonistic. I want to be provocative, but for purposes of moving our research into the realm of science, so that the “junk” label no longer applies. I’m approaching the end of my active career, and when I decided to take up doctoral studies in the late 1960’s I thought that by the end of last century the protoscience research model we began to adopt about 1960 would have matured into something much more scientifically valid. That is clearly not the case—we not only have made nearly no progress toward that status, but in some cases have gotten worse.
I’ll expand on some of those Top Ten reasons and their implications as time goes by, and for the time I want to give everyone some questions to consider that relate to these problems and may suggest a way forward. But my most immediate and important objective is to provide some concrete recommendations about what we can do to elevate the quality and the status of our research. This may seem like starting at the end—making recommendations before stating the basis for them—but I don’t think so. For those who are already knowledgeable of the shortcomings of the GASSSPP, I’m more anxious to correspond and brainstorm with them about what we can do to improve our research than I am to convince people that what is going on in most of our journals is MJS. The latter is a foregone conclusion—what to do about it and how to do it is a wide open opportunity in our discipline.
Some Questions
1. How did we (and most of social science) institutionalize a research paradigm that falls far short of real science? (If this question puzzles or troubles the reader, I suggest reading Kuhn (1962 or any later edition) pp.vii-viii.)
2. If peer review is the primary assurance of research quality, as is widely believed, how has it failed to recognize, let alone correct, the obvious flaws in the GASSSPP in our literature?
3. Measurement is often regarded as one of the cornerstones of science (and I personally agree with that point of view). Then why do real sciences like physics and biology, where “objective” measures are easier to obtain than in the GASSSPP, put so much effort into measurement while the social sciences do not? (A shorter version of this question might be: If measurement is harder in the social sciences than in the “exact”sciences, why do social scientists pay less attention to it than real scientists?)
4. In the real sciences, a “theory” is a proffered explanation for a phenomenon that has been built up from a body of associated research which seems to support that explanation. In the GASSSPP a “theory” seems to be born fully developed, without the bothersome problem of a clear research path that points to it. How is that possible?
Corollary: How weak can “weak theory” be before it is no longer “theory?”
5. Why do we see retractions of research findings for reason of error in the exact sciences, and never see these in GASSSPP research?
6. How can an allegedly professional research discipline that is nearly obsessed with measures of its “impact” be so unconcerned with repeated findings that it has no impact on its profession?
Corollary: Can anyone demonstrate that questions (5) or (6) are not so?
7. What is the real difference between multiple correlation study and a multiple regression study? (This is admittedly not fundamental to the flaws of the GASSSPP, but is certainly related, and is a question I think we should all ponder given the nearly universal adoption of multiple regression methodology in the scholarly journals.)
I’d love to have readers contribute some questions of their own.