Multiple Regression

 

Problems with multiple regression are well documented; see. for example,  Mosteller and Tukey [1977][1] and Freedman (1983)[2].  And the results of a stepwise regression, of course, depends on which direction you take your steps. But the best way to convince ones self of the problem is to do the validation you should have urged on your students in the first place. Divide the data set into halves; fit a regression model to one half and then see if this same model produces even half-way satisfactory result with the second.

            Jaques Cuze

 

 

Statistical Inference V. Scientific  Inference

There may once have been an excuse for sloppiness: "How else can the data be analyzed?" Cheap and powerful computational resources should put paid to that. We now can tailor the statistical inferences to the randomness in the design, without compromise.

But there can be a downside to honest statistical inference. In my opinion, we have allowed (encouraged?) researchers to rely too heavily on their statistical analyses to support generalization or inference to a 'target population.' Truth is, statistical inference and scientific inference are
not the same. The first contributes to the second, but usually is not.

There may once have been an excuse for sloppiness: "How else can the data be analyzed?" Cheap and powerful computational resources should put paid to that. We now can tailor the statistical inferences to the randomness in the design, without compromise.

But there can be a downside to honest statistical inference. In my opinion, we have allowed (encouraged?) researchers to rely too heavily on their statistical analyses to support generalization or inference  to a 'target population.' Truth is, statistical inference and scientific inference are not the same. The first contributes to the second, but usually is not sufficient. Researchers have to make the scientific case.  For an excellent review of the issues, see the recent MacKay & Oldford, Stat  Science, 15, 254-278.

Statisticians now should assist the empirical disciplines to integrate study design and statistical analysis in their research methodology training. We may have to wait patiently while the older generation dies off and a new generation is trained before we see widespread adoption of what we know to be more appropriate use of inferential statistics.

Cliff Lunneborg



[1] .  Data Analysis and Regression: a second course in statistics.  Addison-Wesley, Menlo Park 1977

[2] A note on screening regression equations.  Amer. Statist 1983; 37: 152-155