Primate Info Net

[What's New] [Search] [IDP] [WDP] [Meetings] [AV] [Primate-Jobs] [Careers] [PrimateLit] [AskPrimate] [Index]

Books Received
Primate-Science / PrimateLit


EXPERIMENTAL DESIGN AND DATA ANALYSIS FOR BIOLOGISTS


Gerry P. Quinn
Monash University


Michael J. Keough
University of Melbourne



Cambridge University Press 2002



EXCERPT FORM PREFACE

Statistical analysis is at the core of most modern biology, and many biological
hypotheses, even deceptively simple ones, are matched by complex statistical models. 
Prior to the development of modern desktop computers, determining whether the data fit 
these complex models was the province of professional statisticians. Many biologists 
instead opted for simpler models whose structure had been simplified quite arbitrarily. 
Now, with immensely powerful statistical software available to most of us, these complex 
models can be fitted, creating a new set of demands and problems for biologists.

We need to:


* know the pitfalls and assumptions of particular statistical models,


* be able to identify the type of model appropriate for the sampling design and kind of data that we plan to collect,


* be able to interpret the output of analyses using these models, and


* be able to design experiments and sampling programs optimally, i.e. with the best possible use of our limited time and resources.


The analysis may be done by professional statisticians, rather than statistically trained biologists, 
especially in large research groups or multidisciplinary teams. In these situations, 
we need to be able to speak a common language:


* frame our questions in such a way as to get a sensible answer,


* be aware of biological considerations that may cause statistical problems; 
we can not expect a statistician to be aware of the biological idiosyncrasies of our particular study, 
but if he or she lacks that information, we may get misleading or incorrect advice, and


* understand the advice or analyses that we receive, and be able to translate that back into biology.


This book aims to place biologists in a better position to do these things. It arose from our involvement in designing
and analyzing our own data, but also providing advice to students and colleagues, and teaching classes in design
and analysis. As part of these activities, we became aware, first of our limitations, prompting us to read more widely in 
the primary statistical literature, and second, and more importantly, of the complexity of the statistical models 
underlying much biological research. In particular, we continually encountered experimental designs that were not described
comprehensively in many of our favorite texts. This book describes many of the common designs used in biological research, 
and we present the statistical models underlying those designs, with enough information to highlight their benefits 
and pitfalls.


Our emphasis here is on dealing with biological data - how to design sampling programs that represent the best use of our 
resources, how to avoid mistakes that make analyzing our data difficult, and how to analyze the data when they are collected. 
We emphasize the problems associated with real world biological situations.



CONTENTS (10 pages)


Preface     xv


1. Introduction     1
1.1 Scientific method      1
   1.1.1 Pattern description      2
   1.1.2 Models     2
   1.1.3 Hypotheses and tests      3
   1.1.4 Alternatives to falsification     4
   1.1.5 Role of statistical analysis     5
1.2 Experiments and other tests     5
1.3 Data, observations and variables     7
1.4 Probability     7
1.5 Probability distributions     9
   1.5.1 Distributions for variables     10
   1.5.2 Distributions for statistics     12


2. Estimation     14
2.1 Samples and populations     14
2.2 Common parameters and statistics     15
   2.2.1 Center (location) of distribution     15
   2.2.2 Spread or variability     16
2.3 Standard errors and confidence intervals for the mean     17
   2.3.1 Normal distributions and the Central Limit Theorem     17
   2.3.2 Standard error of the sample mean     18
   2.3.3 Confidence intervals for population mean     19
   2.3.4 Interpretation of confidence intervals for population mean     
20
   2.3.5 Standard errors for other statistics     20
2.4 Methods for estimating parameters     23
   2.4.1 Maximum likelihood (ML)     23
   2.4.2 Ordinary least squares (OLS)     24
   2.4.3 ML vs OLS estimation     25
2.5 Resampling methods for estimation     25
   2.5.1 Bootstrap     25
   2.5.2 Jackknife     26
2.6 Bayesian inference - estimation     27
   2.6.1 Bayesian estimation     27
   2.6.2 Prior knowledge and probability     28
   2.6.3 Likelihood function     28
   2.6.4 Posterior probability     28
   2.6.5 Examples     29
   2.6.6 Other comments     29


3. Hypothesis testing     32
3.1 Statistical hypothesis testing      32
   3.1.1 Classical statistical hypothesis testing     32
   3.1.2 Associated probability and Type I error     34
   3.1.3 Hypothesis tests for a single population     35
   3.1.4 One- and two-tailed tests     37
   3.1.5 Hypotheses for two populations     37
   3.1.6 Parametric tests and their assumptions     39
3.2 Decision errors     42
   3.2.1 Type I and II errors     42
   3.2.2 Asymmetry and scalable decision criteria     44
3.3 Other testing methods     45
   3.3.1 Robust parametric tests     45
   3.3.2 Randomization (permutation) tests     45
   3.3.3 Rank-based non-parametric tests      46
3.4 Multiple testing     48
   3.4.1 The problem     48
   3.4.2 Adjusting significance levels and/or P values     49
3.5 Combining results from statistical tests     50
   3.5 1 Combining P values     50
   3.5.2 Meta-analysis     50
3.6 Critique of statistical hypothesis testing     51
   3.6.1 Dependence on sample size and stopping rules     51
   3.6.2 Sample space - relevance of data not observed     52
   3.6.3 P values as measure of evidence      53
   3.6.4 Null hypothesis always false     53
   3.6.5 Arbitrary significance levels     53
   3.6.6 Alternatives to statistical hypothesis testing     53
3.7 Bayesian hypothesis testing     54


4. Graphical exploration of data      58
4.1 Exploratory data analysis     58
   4.1.1 Exploring samples      58
4.2 Analysis with graphs       62
   4.2.1 Assumptions of parametric linear models      62
4.3 Transforming data      64
   4.3.1 Transformations and distributional assumptions     65
   4.3.2 Transformations and linearity      67
   4.3.3 Transformations and additivity     67
4.4  Standardizations      67
4.5 Outliers      68
4.6 Censored and missing data      68
   4.6.1 Missing data      68
   4.6.2 Censored (truncated) data     69
4.7 General issues and hints for analysis     71
   4.7.1 General issues      71


5. Correlation and regression      72
5.1 Correlation analysis     72
   5.1.1 Parametric correlation model     72
   5.1.2 Robust correlation     76
   5.1.3 Parametric and non-parametric confidence regions      76
5.2 Linear models      77
5.3 Linear regression analysis      78
   5.3.1 Simple (bivariate) linear regression      78
   5.3.2 Linear model for regression     80
   5.3.3 Estimating model parameters      85
   5.3.4 Analysis of variance      88
   5.3.5 Null hypotheses in regression      89
   5.3.6 Comparing regression models      90
   5.3.7 Variance explained      91
   5.3.8 Assumptions of regression analysis      92
   5.3.9 Regression diagnostics      94
   5.3.10 Diagnostic graphics      96
   5.3.11 Transformations      98
   5.3.12 Regression through the origin      98
   5.3.13 Weighted least squares      99
   5.3.14 X random (Model 1I regression)      100
   5.3.15 Robust regression      104
5.4 Relationship between regression and correlation     106
5.5 Smoothing     107
   5.5.1 Running means     107
   5.5.2 LO(W)ESS     107
   5.5.3 Splines     108
   5.5.4 Kernels     108
   5.5.5 Other issues     109
5.6 Power of tests in correlation and regression     109
5.7 General issues and hints for analysis     110
   5.7.1 General issues     110
   5.7.2 Hints for analysis     110


6. Multiple and complex regression     111
6.1 Multiple linear regression analysis      111
   6.1.1 Multiple linear regression model     114
   6.1.2 Estimating model parameters     119
   6.1.3 Analysis of variance     119
   6.1.4 Null hypotheses and model comparisons     121
   6.1.5 Variance explained     122
   6.1.6 Which predictors are important?     122
   6.1.7 Assumptions of multiple regression     124
   6.1.8 Regression diagnostics     125
   6.1.9 Diagnostic graphics     125
   6.1.10 Transformations      127
   6.1.11 Collinearity     127
   6.1.12 Interactions in multiple regression     130
   6.1.13 Polynomial regression     133
   6.1.14 Indicator (dummy) variables     135
   6.1.15 Finding the "best" regression model     137
   6.1.16 Hierarchical partitioning     141
   6.1.17 Other issues in multiple linear regression     142
6.2 Regression trees     143
6.3 Path analysis and structural equation modeling     145
6.4 Nonlinear models     150
6.5 Smoothing and response surfaces     152
6.6 General issues and hints for analysis     153
   6.6.1 General issues     153
   6.6.2 Hints for analysis     154


7. Design and power analysis     155
7.1 Sampling     155
   7.1.1 Sampling designs     155
   7.1.2 Size of sample     157
7.2 Experimental design     157
   7.2.1 Replication     158
   7.2.2 Controls     160
   7.2.3 Randomization     161
   7.2.4 Independence     163
   7.2.5 Reducing unexplained variance     164
7.3 Power analysis     164
   7.3.1 Using power to plan experiments (a piori power analysis)     166
   7.3.2 Post hoc power calculation     168
   7.3.3 The effect size      168
   7.3.4 Using power analyses     170
7.4 General issues and hints for analysis    171
   7.4.1 General issues     171
   7.4.2 Hints for analysis     172


8. Comparing groups or treatments - analysis of variance      173
8.1 Single factor (one way) designs     173
   8.1.1 Types of predictor variables (factors)     176
   8.1.2 Linear model for single factor analyses     178
   8.1.3 Analysis of variance      184
   8.1.4 Null hypotheses     186
   8.1.5 Comparing ANOVA models     187
   8.1.6 Unequal sample sizes (unbalanced designs)     187
8.2 Factor effects      188
   8.2.1 Random effects: variance components     188
   8.2.2 Fixed effects     190 8.3 Assumptions     191
   8.3.1 Normality     192
   8.3.2 Variance homogeneity     193
   8.3.3 Independence     193
8.4 ANOVA diagnostics     194
8.5 Robust ANOVA     195
   8.5.1 Tests with heterogeneous variances     195
   8.5.2 Rank-based ("non-parametric") tests     195
   8.5.3 Randomization tests     196
8.6 Specific comparisons of means     196
   8.6.1 Planned comparisons or contrasts     197
   8.6.2 Unplanned pairwise comparisons     199
   8.6.3 Specific contrasts versus unplanned pairwise comparisons     201
8.7 Tests for trends     202
8.8 Testing equality of group variances     203
8.9 Power of single factor ANOVA     204
8.10 General issues and hints for analysis     206
   8.10.1 General issues     206
   8.10.2 Hints for analysis     206


9. Multifactor analysis of variance     208
9.1 Nested (hierarchical) designs     208
   9.1.1 Linear models for nested analyses     210
   9.1.2 Analysis of variance      214
   9.1.3 Null hypotheses     215
   9.1.4 Unequal sample sizes (unbalanced designs)      216
   9.1.5 Comparing ANOVA models     216
   9.1.6 Factor effects in nested models     216
   9.1.7 Assumptions for nested models     218
   9.1.8 Specific comparisons for nested designs     219
   9.1.9 More complex designs     219
   9.1.10 Design and power     219
9.2 Factorial designs     221
   9.2.1 Linear models for factorial designs     225
   9.2.2 Analysis of variance     230
   9.2.3 Null hypotheses     232
   9.2.4 What are main effects and interactions really measuring?     237
   9.2.5 Comparing ANOVA models     241
   9.2.6 Unbalanced designs     241
   9.2.7 Factor effects     247
   9.2.8 Assumptions     249
   9.2 9 Robust factorial ANOVAs     250
   9.2.10 Specific comparisons on main effects     250
   9.2.11 Interpreting interactions     251
   9.2.12 More complex designs     255
   9.2.13 Power and design in factorial ANOVA     259
9.3 Pooling in multifactor designs     260
9.4 Relationship between factorial and nested designs     261
9.5 General issues and hints for analysis     261
   9.5.1 General issues     261
   9.5.2 Hints for analysis     261


10. Randomized blocks and simple
repeated measures: unreplicated two factor designs     262
10.1 Unreplicated two factor experimental designs      262
   10.1.1 Randomized complete block (RCB) designs      262
   10.1.2 Repeated measures (RM) designs      265
10.2 Analyzing RCB and RM designs      268
   10.2.1 Linear models for RCB and RM analyses       268
   10.2.2 Analysis of variance       272
   10.2.3 Null hypotheses      273
   10.2.4 Comparing ANOVA models      274
10.3 Interactions in RCB and RM models      274
   10.3.1 Importance of treatment by block interactions      274
   10.3.2 Checks for interaction in unreplicated designs      277
10.4 Assumptions     280
   10.4.1 Normality, independence of errors      280
   10.4.2 Variances and covariances - sphericity      280
   10.4.3 Recommended strategy     284
10.5 Robust RCB and RM analyses       284
10.6 Specific comparisons     285
10.7 Efficiency of blocking (to block or not to block?)    285
10.8 Time as a blocking factor     287
10.9 Analysis of unbalanced RCB designs      287
10.10 Power of RCB or simple RM designs      289
10.11 More complex block designs      290
   10.11.1 Factorial randomized block designs      290
   10.11.2 Incomplete block designs      292
   10.11.3 Latin square designs      292
   10.11.4 Crossover designs     296
10.12 Generalized randomized block designs      298
10.13 RCB and RM designs and statistical software      298
10.14 General issues and hints for analysis     299
   10.14.1 General issues      299
   10.14.2 Hints for analysis     300


11. Split-plot and repeated measures
designs: partly nested analyses of variance      301
11.1 Partly nested designs     301
   11.1.1 Split-plot designs     301
   11.1.2 Repeated measures designs     305
   11.1.3 Reasons for using these designs     309
11.2 Analyzing partly nested designs     309
   11.2.1 Linear models for partly nested analyses     310
   11.2.2 Analysis of variance     313
   11.2.3 Null hypotheses     315
   11.2.4 Comparing ANOVA models     318
11.3 Assumptions     318
   11.3.1 Between plots/subjects     318
   11.3.2 Within plots/subjects and multisample sphericity     318
11.4 Robust partly nested analyses     320
11.5 Specific comparisons     320
   11.5.1 Main effects     320
   11.5.2 Interactions     321
   11.5.3 Profile (i.e. trend) analysis     321
11.6 Analysis of unbalanced partly nested designs      322
11.7 Power for partly nested designs     323
11.8 More complex designs     323
   11.8.1 Additional between-plots/subjects factors     324
   11.8.2 Additional within plots/subjects factors     329
   11.8.3 Additional between-plots/subjects
             and within-plots/ subjects factors       332
   11.8.4 General comments about complex designs     335
11.9 Partly nested designs and statistical software     335
11.10 General issues and hints for analysis
   11.10.1 General issues
   11.10.2 Hints for individual analyses


12. Analyses of covariance      339
12.1 Single factor analysis of covariance (ANCOVA)     339
   12.1.1 Linear models for analysis of covariance      342
   12.1.2 Analysis of (co)variance     347
   12.1.3 Null hypotheses      347
   12.1.4 Comparing ANCOVA models      348
12.2 Assumptions of ANCOVA      348
   12.2.1 Linearity        348
   12.2.2 Covariate values similar across groups      349
   12.2.3 Fixed covariate (X)      349
12.3 Homogeneous slopes     349
   12.3.1 Testing for homogeneous within-group regression slopes     349
   12.3.2 Dealing with heterogeneous within-group regression slopes     
350
   12.3.3 Comparing regression lines     352
12.4 Robust ANCOVA      352
12.5 Unequal sample sizes (unbalanced designs)     353
12.6 Specific comparisons of adjusted means      353
   12.6.1 Planned contrasts     353
   12.6.2 Unplanned comparisons     353
12.7 More Complex Designs     353
   12.7.1 Designs with two or more covariates     353
   12.7.2 Factorial designs     354
   12.7.3 Nested designs with one covariate     355
   12.7.4 Partly nested models with one covariate     356
12.8 General issues and hints for analysis     357
   12.8.1 General issues     357
   12.8.2 Hints for analysis      358


13. Generalized linear models and logistic regression     359
13.1 Generalized linear models     359
13.2 Logistic regression     360
   13.2.1 Simple logistic regression      360
   13.2.2 Multiple logistic regression     365
   13.2.3 Categorical predictors     368
   13.2.4 Assumptions of logistic regression     368
   13.2.5 Goodness-of-fit and residuals     368
   13.2.6 Model diagnostics      370
   13.2.7 Model selection      370
   13.2.8 Software for logistic regression      371
13.3 Poisson regression      371
13.4 Generalized additive models     372
13.5 Models for correlated data     375
   13.5.1 Multi-level (random effects) models     376
   13.5.2 Generalized estimating equations     377
13.6 General issues and hints for analysis     378
   13.6.1 General issues      378
   13.6.2 Hints for analysis     379


14. Analyzing frequencies     380
14.1 Single variable goodness-of-fit tests     381
14.2 Contingency tables     381
   14.2.1 Two way tables      381
   14.2.2 Three way tables     388
14.3 Log-linear models      393
   14.3.1 Two way tables      394
   14.3.2 Log linear models for three way tables     395
   14.3.3 More complex tables     400
14.4 General issues and hints for analysis     400
   14.4.1 General issues     400
   14.4.2 Hints for analysis     400


15. Introduction to multivariate analyses     401
15.1 Multivariate data     401
15.2 Distributions and associations      402
15.3 Linear combinations, eigenvectors and eigenvalues     405
   15.3.1 Linear combinations of variables     405
   15.3.2 Eigenvalues      405
   15.3.3 Eigenvectors      406
   15.3.4 Derivation of components     409
15.4 Multivariate distance and dissimilarity measures      409
   15.4.1 Dissimilarity measures for continuous variables     412
   15.4.2 Dissimilarity measures for dichotomous (binary) variables     
413
   15.4.3 General dissimilarity measures for mixed variables     413
   15.4.4 Comparison of dissimilarity measures      414
15.5 Comparing distance and/or dissimilarity matrices      414
15.6 Data standardization     415
15.7 Standardization, association and dissimilarity      417
15.8 Multivariate graphics      417
15.9 Screening multivariate data sets      418
   15.9.1 Multivariate outliers      419
   15.9.2 Missing observations      419
15.10 General issues and hints for analysis      423
   15.10.1 General issues      423
   15.10.2 Hints for analysis      424


16. Multivariate analysis of variance and discriminant analysis     425
16.1 Multivariate analysis of variance (MANOVA)     425
   16.1.1 Single factor MANOVA     426
   16.1.2 Specific comparisons     432
   16.1.3 Relative importance of each response variable     432
   16.1.4 Assumptions of MANOVA     433
   16.1.5 Robust MANOVA     434
   16.1.6 More complex designs     434
16.2 Discriminant function analysis     435
   16.2.1 Description and hypothesis testing     437
   16.2.2 Classification and prediction     439
   16.2.3 Assumptions of discriminant function analysis     441
   16.2.4 More complex designs     441
16.3 MANOVA vs discriminant function analysis     441
16.4 General issues and hints for analysis      441
   16.4.1 General issues     441
   16.4.2 Hints for analysis     441


17. Principal components and correspondence analysis     443
17.1 Principal components analysis     443
   17.1.1 Deriving components     447
   17.1.2 Which association matrix to use?      450
   17.1.3 Interpreting the components      451
   17.1.4 Rotation of components      451
   17.1.5 How many components to retain?      452
   17.1.6 Assumptions      453
   17.1.7 Robust PCA       454
   17.1.8 Graphical representations      454
   17.1.9 Other uses of components      456
17.2 Factor analysis     458
17.3 Correspondence analysis     459
   17.3.1 Mechanics     459
   17.3.2 Scaling and joint plots       461
   17.3.3 Reciprocal averaging       462
   17.3.4 Use of CA with ecological data      462
   17.3.5 Detrending      463
17.4 Canonical correlation analysis      463
17.5 Redundancy analysis     466
17.6 Canonical correspondence analysis      467
17.7 Constrained and partial "ordination"       468
17.8 General issues and hints for analysis      471
   17.8.1 General issues      471
   17.8.2 Hints for analysis     471


18. Multidimensional scaling and cluster analysis     473
18.1 Multidimensional scaling      473
   18.1.1 Classical scaling - principal coordinates analysis (PCoA)     
474
   18.1.2 Enhanced multidimensional scaling     476
   18.1.3 Dissimilarities and testing hypotheses about groups of objects      482
   18.1.4 Relating MDS to original variables      487
   18.1.5 Relating MDS to covariates      487
18.2 Classification      488
   18.2.1 Cluster analysis      488
18.3 Scaling (ordination) and clustering for biological data      491
18.4 General issues and hints for analysis      493
   18.4.1 General issues     493
   18.4.2 Hints for analysis     493


19. Presentation of results     494
19.1 Presentation of analyses      494
   19.1.1 Linear models      494
   19.1.2 Other analyses        497
19.2 Layout of tables     497
19.3 Displaying summaries of the data      498
   19.3.1 Bar graph      500
   19.3.2 Line graph (category plot)      502
   19.3.3 Scatterplots      502
   19.3.4 Pie charts     503
19.4 Error bars      504
   19.4.1 Alternative approaches      506
19.5 Oral presentations      507
   19.5.1 Slides, computers, or overheads?      507
   19.5.2 Graphics packages       508
   19.5.3 Working with color      508
   19.5.4 Scanned images     509
   19.5.5 Information content      509
19.6 General issues and hints      510


References      511


Index     527



WHERE TO ORDER:


Cambridge University Press
40 West 20th Street
New York, NY 10011-4211, USA


Phone:  1-800-872-7423
Fax:   914-937-4712
Web site: http://www.cambridge.org


Price:
$110.00(Hardbound)   ISBN: 0-521-81128-7
$ 45.00(Paperback)     ISBN: 0-521-00976-6

URL: http://www.primate.wisc.edu/pin/review/expdesign.html
Page last modified: September 12, 2002
Maintained by the WPRC Library

Return to Review Copies Received
Return to PIN Home Page