One of the reasons that I label the overall body of GASSSPP business-school research as “junk science” is that most of what is reported as findings is nothing more than the confirmation of the well-known fact that at some low level, everything is related to everything else. Many statisticians and methodologists have commented on this over many years (Blalock, 1960; Hays, 1963; Meehl, 1986, 1990; among others). Several have given this phenomenon different names, including “noise” (Starbuck, 2006), “ambient correlation noise” (Lykken, 1968), and my all-time favorite, the “crud factor” (Meehl, 1990). Borrowing the idea of the signal-to-noise ratio from electrical engineering (where “signal” refers to what is real in our findings, and “noise” is all of the random, happenstance, and never-to-be-tested-again results we turn out), one reason I regard the body of GASSSPP research as junk is because noise is clearly the dominant component.
A major self-inflicted wound in GASSSPP research is the continued application of the “test” of whether a statistic is statistically significantly different from zero. For example, in the current game of revelation by regression, we test our b coefficients or betas to see if they are significantly different from zero, which of course they are, and so we claim support for an hypothesis because a b coefficient of 0.03 in a study with an N of 12,000 achieves p < .01. We then pat ourselves on the back because we have had the great insight to propose this relationship. (The only problem is that the b coefficient is the slope of the relationship between that variable and the dependent variable, meaning that for every unit change in the independent variable, a change of b occurs in the dependent variable; in this case, .03 is a flat line—there is no change, there is no effect.) But anyone who wants to challenge this discovery of nothing has to have a argument that counters the standard test found in nearly every statistics text.
I propose there is a substitute—in a few words, the value of the crud factor has been discovered—it is a correlation coefficient value of +0.09, and that value should be the test standard instead of zero. Where did this come from? Webster and Starbuck (1988) did an analysis of 14, 897 correlations published in complete matrices in three of our top journals: Administrative Science Quarterly (2,583), the Academy of Management Journal (5,740), and the Journal of Applied Psychology (6,574). Webster and Starbuck (1988) took data from 261 studies covered in 12 major literature reviews. The data all originated from studies of five primary industrial-organizational (I/O) psychology variables, which were (1) job satisfaction, (2) absenteeism, (3) turnover, (4) job performance, and (5) leadership. Starbuck (2006) then combined the three resulting distributions into a single graphic (Figure 2.6, p. 49), which I had hoped to be able to reproduce here. Every doctoral student and every researcher in the world should study Starbuck’s figure, and I recommend reading the original Webster and Starbuck (1988) study in its entirety. The three distributions are nearly identical, and have a single mode at about r = +0.09.
The figure below illustrates what Starbuck found. It is an adaptation of Figure 2.6 (which, for reasons unexplained, Oxford would not give permission to publish); it is adapted in that the irregular heavy curve encompasses all three individual curves in the original figure, meaning that no individual curve lies outside the heavy curve.
In some ways, I like my adaptation better than the original, since it shows how little difference there is between the individual curves—they are essentially identical. With a nearly perfect symmetry around the mode, this composite curve is an excellent illustration of the “normal curve of error,” in that most GASSSPP findings are rather weak, with the probability of a strong positive or negative correlation being low.
(Click on figure for a clear full-screen view).
The degree to which these distributions overlap is obviously enormous, and while Webster and Starbuck (1988) did not include a goodness-of-fit test among their analyses, simple observation would suggest a chi-square value strongly indicating that these distributions are the same. The authors’ analyses of the effect sizes, i.e., the strength of the correlations, led to the general conclusion that theoretical ability to explain these five I/O variables had either not improved at all or had declined toward zero over the roughly 40 years the studies spanned. There were numerous potential explanations that might account for this, but with respect to NHST, they concluded that “Statistical significance is a very dangerous criterion. It probably causes more harm than good, by inducing researchers who have few observations to discount strong relationships and encouraging those who have many observations to highlight weak relationships. Moreover, a researcher can be certain of rejecting any point null hypothesis, and point null hypotheses usually look quite implausible if one treats them as genuine descriptions of phenomena….” (1988: 122).
Starbuck also notes (2006: 49) that “Finding significant correlations is absurdly easy in this population of variables, especially when researchers make two-tailed tests with a null hypothesis of no correlation. Choosing two variables utterly at random, a researcher has 2-to-1 odds of finding a significant correlation on the first try, and 24-to-1 odds of finding a significant correction within three tries….” He concludes that the social sciences are “drowning in statistically significant but meaningless noise.” To that, I can only add “Amen!”.
I therefore propose to editors and reviewers that we actually do something that real science does, i.e., learn from prior research, and adopt the value of r = +0.09 as the appropriate baseline value for tests of “significantly different from.” Should there ever be an editorial board of a GASSSPP journal with sufficient courage, the board could require that correlations would have to be more than one standard deviation from +0.09 (not having the original data, I don’t know what that value is). Since 68 percent of data are within the first standard deviation from a mean, this editorial policy might contribute significantly (dare I use that word?) to reducing the noise in our findings.
In my view, the results of such learning would be illuminating, to say the least. For one thing, I would guess that easily 75 percent of our published articles would have to be retracted for failing to beat the “explanatory level” of the crud factor. Rather than having to cite every piece of peer-reviewed noise as reported, researchers would have a basis for disagreeing with published conclusions, which might enable us to actually make progress in several areas of inquiry (perhaps starting with the five that Starbuck and Webster reviewed, about which after more than 50 years of research we know very little). We know that “voting” on an argument by simply looking at all the confirming and disconfirming related papers and summarizing them is neither statistically nor scientifically valid (Hunter & Schmidt, 2004; Hunter, Schmidt, & Jackson, 1982), something else we continue to do when we shouldn’t, and testing against the crud factor would open the door to all manner of signals that are now totally submerged in the noise. We might even be able to convince an editor or two that real replication studies should be published in important cases, something from which believers in the GASSSPP consider us to be exempt.
References for this page
Blalock, H. M., Jr. (1960). Social Statistics. New York: McGraw-Hill.
Hays, W. L. (1963). Statistics. New York: Holt, Rinehart and Winston.
Hunter, J. E., & Schmidt, F. L. (2004). Meta-analysis: Correcting error and bias in research findings. Thousand Oaks, Ca.: Sage.
Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: cumulating research findings across studies. Beverly Hills, Ca.: Sage.
Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin, 70(3), 151-159.
Meehl, P. E. (1986). What social scientists don’t understand. D. W. Fiske, & R. A. Shweder Metatheory in social science: pluralisms and subjectivities . Chicago: University of Chicago Press.
Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66 (Monograph Supplement 1-Vol. 66), 195-244.
Starbuck, W. H. (2006). The production of knowledge: the challenge of social science research. New York: Oxford.
Webster, J., & Starbuck, W. H. (1988). Theory building in industrial and organizational psychology. C. L. Cooper, & I. T. Robertson (editors), International Review of Industrial and Organizational Psychology 1988 (pp. 93-138). New York: Wiley.