![]() An additional problem with hypothesis tests arises from the “Type 3” error of model misspecification, in which neither the null nor the alternative hypothesis model adequately describes the data ( Mosteller, 1948). The fixed null error rate of hypothesis testing lies at the core of why model selection procedures based on hypothesis testing (such as stepwise regression and multiple comparisons) have always had the reputation of being jury-rigged contraptions that have never been fully satisfactory ( Gelman et al., 2012). The central difficulty with interpreting NP tests is that the Type 1 error probability (usually denoted α) remains fixed regardless of sample size, rendering problematic the question of what constitutes evidence for the model serving as the null hypothesis ( Aho et al., 2014 Murtaugh, 2014 Spanos, 2014). In the twentieth century, the bulk of scientific statistical inference was conducted with Neyman-Pearson hypothesis tests, a term which we broadly take to encompass significance testing, P-values, generalized likelihood ratio, and other special cases, adaptations, or generalizations. The error properties of the MSE minimizing criteria switch between those of evidence functions and those of Neyman-Pearson tests depending on models being compared. We find that consistent information criteria are evidence functions but the MSE minimizing (or efficient) information criteria (e.g., AIC, AICc, TIC) are not. We show that the evidence function concept fulfills the seeming objectives of model selection in ecology, both in a statistical as well as scientific sense, and that evidence functions are intuitive and easily grasped. Under some reasonable circumstances, the probability of Type 1 error is an increasing function of sample size that can even approach 1! In contrast, under model misspecification an evidential analysis retains the desirable properties of always having a greater probability of selecting the best model over an inferior one and of having the probability of selecting the best model increase monotonically with sample size. The real Type 1 and Type 2 error rates can be less, equal to, or greater than the nominal rates depending on the nature of model misspecification. Neyman-Pearson testing on the other hand, exhibits great difficulties under misspecification. The error rates in evidential analysis all decrease to 0 as sample size increases even under model misspecification. Our comparison of these approaches focuses primarily on the frequency with which errors are made, both when models are correctly specified, and when they are misspecified, but also considers ease of interpretation. A consequence of this definition is the salient property that the probabilities of misleading or weak evidence, error probabilities analogous to Type 1 and Type 2 errors in hypothesis testing, all approach 0 as sample size increases. This last approach is implemented in the form of evidence functions: statistics for comparing two models by estimating, based on data, their relative distance to the generating process (i.e., truth) ( Lele, 2004). We approximate analytically and numerically the performance of Neyman-Pearson hypothesis testing, Fisher significance testing, information criteria, and evidential statistics ( Royall, 1997). The methods for making statistical inferences in scientific analysis have diversified even within the frequentist branch of statistics, but comparison has been elusive. 4Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, Canada.3Department of Ecology, Montana State University, Bozeman, MT, United States.2Biology Department, University of Florida, Gainesville, FL, United States.1Department of Fish and Wildlife Sciences and Department of Statistical Science, University of Idaho, Moscow, ID, United States.Brian Dennis 1 *, José Miguel Ponciano 2, Mark L.
0 Comments
Leave a Reply. |