Problems with multiple regression are well documented; see. for example, Mosteller and Tukey [1977][1] and Freedman (1983)[2]. And the results of a stepwise regression, of course, depends on which direction you take your steps. But the best way to convince ones self of the problem is to do the validation you should have urged on your students in the first place. Divide the data set into halves; fit a regression model to one half and then see if this same model produces even half-way satisfactory result with the second.
Jaques Cuze
There may once have been an excuse for sloppiness: "How
else can the data be analyzed?" Cheap and powerful computational resources
should put paid to that. We now can tailor the statistical inferences to the randomness
in the design, without compromise.
But there can be a downside to honest statistical inference. In my opinion, we
have allowed (encouraged?) researchers to rely too heavily on their statistical
analyses to support generalization or inference to a 'target population.' Truth
is, statistical inference and scientific inference are
not the same. The first contributes to the second, but usually is not.
There may once have been an excuse for sloppiness: "How
else can the data be analyzed?" Cheap and powerful computational resources
should put paid to that. We now can tailor the statistical inferences to the randomness
in the design, without compromise.
But there can be a downside to honest statistical inference. In my opinion, we
have allowed (encouraged?) researchers to rely too heavily on their statistical
analyses to support generalization or inference to a 'target population.' Truth is, statistical inference and
scientific inference are not the same. The first contributes to the second, but
usually is not sufficient. Researchers have to make the scientific case. For an excellent review of the issues, see
the recent MacKay & Oldford, Stat Science,
15, 254-278.
Statisticians now should assist the empirical disciplines to integrate study
design and statistical analysis in their research methodology training. We may
have to wait patiently while the older generation dies off and a new generation
is trained before we see widespread adoption of what we know to be more
appropriate use of inferential statistics.
Cliff Lunneborg