Revelation by Regression?

Anyone who looks at a reasonably well-rated academic journal from the past few years will find that a nearly universal statistical method nowadays is the creation of a regression model.  One of the things that I find most intriguing in the antiprogress of the GASSSPP has been the growth of an apparent belief that if one dumps a bunch of measures into a regression model (or better, several variants of that model), cranks out the model(s) and looks at the p values of the b or beta coefficients and the overall model R-squared, the result will be that previously hidden truths about the relationships in the model will be revealed.  I refer to this nonsense as revelation by regression.

Having become as ingrained as it is, I have had numerous dubious opportunities, at my university and at professional meetings, to attend presentations where exploratory research (which has no reason to be evaluated this way) is subjected to a regression analysis, and where the huge N virtually assures that any nonzero coefficient will be statistically significant.  To no one’s  surprise, the researcher proudly proclaims that a b of 0.03 that is significant at p < .05 “supports Hypothesis x” (and I wish I were making this up, but this is a direct quote from one of these presentations). The “test” that yielded the magic .05 is to test whether the coefficient is different from zero; one of the things that we have known for many decades is that nothing is different from zero.  Lykken (1968) referred to this politely as “ambient correlation noise;” I prefer the late Paul Meehl’s (1990) designation, the “crud factor.”  If one borrows the electrical engineering idea of the signal-to-noise ratio, GASSSPP research results are principally noise, the “discovery” of the crud factor for a particular set of measures.  This is the only supposedly legitimate scholarly discipline I know of where one can literally build a career by explaining nothing.

What is the difference between a regression problem and a multiple-correlation problem?  The answer is, what the authors think they know.  When I learned regression, this was a technique to be applied where existing knowledge suggested that one or more variables were “predicted” by one or more others, and that was the idea to be tested.  It was a next logical step in pursuit of understanding relationships that had been suggested by a number (and often a variety) of prior examinations of variables that seemed to be related to each other, but where we had no idea of what might be “dependent” or “independent” in that set.  Regression has now, somehow, become the way we go about taking a first look at a question.

There is no such thing as revelation by regression, and the misuse of regression modeling as is now nearly the standard practice in our literature speaks volumes, but only about what we really don’t know about the interpretation of our results.  I recommend two things if regression is going to be applied to a problem: (1) Run SPSS or SAS and treat every variable as dependent in turn—I predict the results will be humbling.  (2) Along with Starbuck (2006), I think we should be required to specify one model and run it, and our “theory” sinks or swims on the outcome.  The type of “model shopping” routinely published in our top journals is yet another betrayal of the extent to which we really don’t know what we’re doing with regression or science in general.  Of course, researchers will do the model shopping before they publish their “scientifically objective” results, consistent with the kinds of scientific malpractice Bedeian et al. (2010) report, but at least we wouldn’t have to read about all the misfires.

References for this page

Bedeian, Arthur G.; Taylor, Shannon G., and Miller, Alan N. Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of Management Learning & Education. 2010 Dec; 9(4):715-725.

Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin, 70(3), 151-159.

Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66 (Monograph Supplement 1-Vol. 66), 195-244.

Starbuck, W. H. (2006). The production of knowledge: the challenge of social science research. New York: Oxford.