Regressions Smessions

Jim Manzi has a terrific takedown of the usual "let's pretend humans are widgets" logic of regression analysis that's been the longstanding gold standard in social science research.  Looking at a  piece in the Harvard Business Review

What is so striking to me about this article is how unvarnished Knott is in claiming that she has discovered a tool to do exactly what I say is so hard: make useful, reliable and non-obvious predictions for the effect of interventions in social systems. She writes that "Using standard regression analysis, the calculation tells us in a very precise way how productive each of the inputs is in generating output. It tells us, for instance, how much a 1% increase in R&D spending would increase a firm's revenue." Knott asserts that RQ allows the management if a company "to see how changes in your R&D expenditure affect the bottom line and, most important, your company's market value." She even names names: providing a table of what she thinks each of the top 20 public corporations in America should have spent on R&D, and how much more each would be worth if they followed her recommendations.

Bold stuff!

The problem really boils down to causality.  Put simply, it's essentially never the case that X causes Y.  Likely there's a feedback loop between X and Y, and there's usually also a Z (and a A, B, C and D) that influence X, Y and each other in subtle ways.  And then there's the minor fact that they symbolic models suffer from a foundational rupture from the, you know, actual phenomena (a point going back to good ole Hume).

The results of these problems are clear and spell out:

why attempts to use methods like those Knott employs (e.g., two-step Instrumental Variable models), to try to isolate the causal impact of variable X on corporate performance can't perform the magic of somehow overcoming the problem of never including so much of the relevant data. I summarized the conclusion as: "There's just no way out of the problem that what makes companies do well or badly is very, very complicated, and therefore isolating the impact of any one variable by lining up some descriptors for a few hundred companies and looking for patterns is like trying to grab liquid mercury.

Since at least one Stag Staffer has formal training in these methods, I'd be very curious to hear what he has to say on the matter.