← All episodes

Science and Statistics (Deep Dive)

2026-06-11 · 6:18 Deep Dive · Source

Chapters

George E. P. Box R.A. Fisher Memorial Lecture, 1974. Published in the Journal of the American Statistical Association, December 1976, Volume 71, Number 356.

Aspects of scientific method are discussed: in particular, its representation as a motivated iteration in which, in succession, practice confronts theory, and theory, practice. Rapid progress requires sufficient flexibility to profit from such confrontations, and the ability to devise parsimonious but effective models, to worry selectively about model inadequacies and to employ mathematics skillfully but appropriately. The development of statistical methods at Rothamsted Experimental Station by Sir Ronald Fisher is used to illustrate these themes.

Moving to the 1. introduction. Fisher introduced as a scientist, not a statistician. More than half his published papers were on subjects other than statistics and mathematics. Moving to the 2. aspects of scientific method. Moving to the 2.1 iteration between theory and practice. Science as a motivated iteration: facts → tentative theory → deductions → discrepancies → modified theory. The error signal drives learning. Moving to the 2.2 flexibility. The good scientist must have the flexibility and courage to seek out, recognize, and exploit errors—especially his own. "Using Bacon's analogy, he must not be like Pygmalion and fall in love with his model." Moving to the 2.3 parsimony. "Since all models are wrong the scientist cannot obtain a 'correct' one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity." Moving to the 2.4 worrying selectively. "Since all models are wrong the scientist must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad." Moving to the 2.5 role of mathematics in science. Pure mathematics is conditional: "given that A is true, does B necessarily follow?" It has nothing to do with truth in real life. Applied mathematics makes knowingly false but useful assumptions. "The statistician knows, for example, that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world." "We cannot know that any statistical technique we develop is useful unless we use it." Moving to the 3. fisher — a scientist. Detailed case study of Fisher at Rothamsted (1919–1927): - 3.2 Weighing the Baby — Fisher plotted his own children's weights as data. The baby in his textbook was his second son Harry. - 3.3 Find the Lady — The famous "lady tasting tea" experiment was real. Dr. Muriel Bristol, an algologist, correctly identified whether milk or tea was added first. - 3.4 From Soil Bacteria to Nonlinear Design — Conversations at the tea urn led to pioneering work on nonlinear design. - 3.5 From Cotton to Extreme Values — Tippett's yarn-breaking problem → extreme value theory. - 3.6 From Dung to Orthogonal Polynomials and Residual Analysis — 67 years of manure data → orthogonal polynomials, analysis of variance, and the sponsor/critic model of data analysis. "In the inferential stage, the analyst acts as a sponsor for the model. Conditional on the assumption of its truth he selects the best statistical procedures... Having completed the analysis, however, he must switch his role from sponsor to critic." - 3.7 Weeds and the Education Acts — Low wheat yields 1870–1880 explained by weeds: compulsory education meant boys weren't hand-weeding foxtail grass anymore. - 3.8 From Rainfall and Wheat Yield to Distributed Lags — Fisher invented distributed lag models before economists, using orthogonal polynomials to achieve parsimony. - 3.9 From Fertilizer and Potatoes to the Analysis of Variance — First complete ANOVA table appears "quite suddenly and unannounced in the middle of the paper after the discussion of agricultural questions." - 3.10 Mice, Tigers, and Randomization — Why randomization matters: it's the act of randomization, not distribution-free tests, that produces valid inference. - 3.11 From Muck Raking to Group Theory — The iterative sequence: analysis of records → analysis of experiments → design of experiments. Moving to the 4. perils of the open loop. Moving to the 4.1 cookbookery and mathematistry. Two diseases when the theory-practice loop is broken: - Cookbookery: forcing all problems into one or two routine techniques without thinking about objectives or assumptions. - Mathematistry: "development of theory for theory's sake, which since it seldom touches down with practice, has a tendency to redefine the problem rather than solve it." "Fisher's apparently bivalent attitude towards mathematicians has often been remarked... He himself was an artist in the use of mathematics and emphasized the importance of mathematical training for statisticians—the more mathematics known the greater the potential to be a good statistician. Why then did he sometimes seem to refer so slightingly to mathematicians? The answer I think is that his real target was 'mathematistry.'" Moving to the 5. conclusion. Fisher was an applied statistician, a mathematical statistician, a data analyst, and a designer of investigations. "It is surely because he was all of these that he was much more than the sum of the parts."