Stepwise regression - Knowledge (XXG)

411:, intense computation often being an inadequate substitute for subject area expertise. Additionally, the results of stepwise regression are often used incorrectly without adjusting them for the occurrence of model selection. Especially the practice of fitting the final selected model as if no model selection had taken place and reporting of estimates and confidence intervals as if least-squares theory were valid for them, has been described as a scandal. Widespread incorrect usage and the availability of alternatives such as 206:, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model fit criterion, deleting the variable (if any) whose loss gives the most statistically insignificant deterioration of the model fit, and repeating this process until no further variables can be deleted without a statistically significant loss of fit. 200:, which involves starting with no variables in the model, testing the addition of each variable using a chosen model fit criterion, adding the variable (if any) whose inclusion gives the most statistically significant improvement of the fit, and repeating this process until none improves the model to a statistically significant extent. 376:

The tests themselves are biased, since they are based on the same data. Wilkinson and Dallal (1981) computed percentage points of the multiple correlation coefficient by simulation and showed that a final regression obtained by forward selection, said by the F-procedure to be significant at 0.1%, was

68: 63:

The frequent practice of fitting the final selected model followed by reporting estimates and confidence intervals without adjusting them to take the model building process into account has led to calls to stop using stepwise model building altogether or to at least make sure model uncertainty is

246:

the data. In other words, stepwise regression will often fit much better in sample than it does on new out-of-sample data. Extreme cases have been noted where models have achieved statistical significance working on random numbers. This problem can be mitigated if the criterion for adding (or

360:), or mean error between the predicted value and the actual value in the hold-out sample. This method is particularly valuable when data are collected in different settings (e.g., different times, social vs. solitary situations) or when models are assumed to be generalizable. 235:, though the basic approach is applicable in many forms of model selection. This is a variation on forward selection. At each stage in the process, after a new variable is added, a test is made to check if some variables can be deleted without appreciably increasing the 348:-statistic, significance, or multiple R, but instead assess the model against a set of data that was not used to create the model. This is often done by building a model based on a sample of the dataset available (e.g., 70%) – the “ 292:

is the number of predictors. Unfortunately, this means that many variables which actually carry signal will not be included. This fence turns out to be the right trade-off between over-fitting and missing signal. If we look at the

384:, the number of the candidate independent variables from the best fit selected may be smaller than the total number of final model variables, causing the fit to appear better than it is when adjusting the 388:

value for the number of degrees of freedom. It is important to consider how many degrees of freedom have been used in the entire model, not just count the number of independent variables in the resulting

286: 231:

in cases where there is a large number of potential explanatory variables, and no underlying theory on which to base the model selection. The procedure is used primarily in

42:

in which the choice of predictive variables is carried out by an automatic procedure. In each step, a variable is considered for addition to or subtraction from the set of

324: 571: 533:

Harrell, F. E. (2001) "Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis," Springer-Verlag, New York.

415:, leaving all variables in the model, or using expert judgement to identify relevant variables have led to calls to totally avoid stepwise model selection. 467:

Efroymson, M. A. (1960) "Multiple regression analysis," Mathematical Methods for Digital Computers, Ralston A. and Wilf, H. S., (eds.), Wiley, New York.

524:

Flom, P. L. and Cassell, D. L. (2007) "Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use," NESUG 2007.

396:

Such criticisms, based upon limitations of the relationship between a model and procedure and data set used to fit it, are usually addressed by

759: 239:(RSS). The procedure terminates when the measure is (locally) maximized, or when the available improvement falls below some critical value. 64:

correctly reflected by using prespecified, automatic criteria together with more complex standard error estimates that remain unbiased.

714:

Wilkinson, L., & Dallal, G.E. (1981). Tests of significance in forward selection regression with an F-to enter stopping rule.

545:

Chatfield, C. (1995) "Model uncertainty, data mining and statistical inference," J. R. Statist. Soc. A 158, Part 3, pp. 419–466.

600: 381: 597:

Pilot willingness to take off into marginal weather, Part II: Antecedent overfitting with forward stepwise logistic regression

658:

Mark, Jonathan, & Goldberg, Michael A. (2001). Multiple regression analysis and mass assessment: A review of the issues.

357: 242:

One of the main issues with stepwise regression is that it searches a large space of possible models. Hence it is prone to

397: 339: 46:

based on some prespecified criterion. Usually, this takes the form of a forward, backward, or combined sequence of

227:

A widely used algorithm was first proposed by Efroymson (1960). This is an automatic procedure for statistical

356:

to assess the accuracy of the model. Accuracy is then often measured as the actual standard error (SE), MAPE (

236: 177: 162: 169: 434: 112: 444: 165: 612:

Foster, Dean P., & George, Edward I. (1994). The Risk Inflation Criterion for Multiple Regression.

424: 258: 727:

Hurvich, C. M. and C. L. Tsai. 1990. The impact of model selection on inference in linear regression.

614: 43: 327: 563: 429: 247:

deleting) a variable is stiff enough. The key line in the sand is at what can be thought of as the

232: 80: 635:

Donoho, David L., & Johnstone, Jain M. (1994). Ideal spatial adaptation by wavelet shrinkage.

344:

A way to test for errors in models created by step-wise regression, is to not rely on the model's

671:

Mayers, J.H., & Forgy, E.W. (1963). The Development of numerical credit evaluation systems.

554:

Efron, B. and Tibshirani, R. J. (1998) "An introduction to the bootstrap," Chapman & Hall/CRC

449: 84: 251:

point: namely how significant the best spurious variable should be based on chance alone. On a

172:, augmented with positive and negative axial points of length min(2, (int(1.5 + 412: 300: 643: 620: 439: 212:, a combination of the above, testing at each step for variables to be included or excluded. 92: 39: 582:

Efroymson, MA (1960) "Multiple regression analysis." In Ralston, A. and Wilf, HS, editors,

740:

Roecker, Ellen B. (1991). Prediction error and its estimation for subset—selected models.

401: 222: 353: 476:

Hocking, R. R. (1976) "The Analysis and Selection of Variables in Linear Regression,"

71:

In this example from engineering, necessity and sufficiency are usually determined by

753: 408: 294: 104: 326:

factor of the best possible risk. Any other cutoff will end up having a larger such

17: 392:

Models that are created may be over-simplifications of the real models of the data.

349: 369: 243: 108: 647: 637: 509: 501: 248: 76: 31: 624: 96: 567: 372:, but are controversial. Several points of criticism have been made. 72: 54: 47: 67: 66: 88: 506:

SAS/STAT User's Guide, Version 6, Fourth Edition, Volume 2,

297:

of different cutoffs, then using this bound will be within a

701:

Copas, J.B. (1983). Regression, prediction and shrinkage.

407:

Critics regard the procedure as a paradigmatic example of

352:” – and use the remainder of the dataset (e.g., 30%) as a 684:

Rencher, A. C., & Pun, F. C. (1980). Inflation of

303: 261: 673:

Journal of the American Statistical Association, 58

176:/4))), plus point(s) at the origin. There are more 318: 280: 75:. For additional consideration, when planning an 193:The main approaches for stepwise regression are: 400:the model on an independent data set, as in the 541: 539: 8: 584:Mathematical Methods for Digital Computers. 368:Stepwise regression procedures are used in 520: 518: 302: 262: 260: 490:Applied Regression Analysis, 2d Edition, 180:designs, requiring fewer runs, even for 460: 255:-statistic scale, this occurs at about 95:, one must keep in mind the number of 492:New York: John Wiley & Sons, Inc. 27:Method of statistical factor analysis 7: 703:J. Roy. Statist. Soc. Series B, 45, 603:). Federal Aviation Administration 568:handbook on engineering statistics 25: 281:{\displaystyle {\sqrt {2\log p}}} 168:exists for this type of model, a 488:Draper, N. and Smith, H. (1981) 377:in fact only significant at 5%. 358:Mean absolute percentage error 1: 760:Regression variable selection 688:² in Best Subset Regression. 340:Cross-validation (statistics) 776: 337: 220: 210:Bidirectional elimination 648:10.1093/biomet/81.3.425 319:{\displaystyle 2\log p} 237:residual sum of squares 161: < 17, an 38:is a method of fitting 625:10.1214/aos/1176325766 435:Least-angle regression 320: 282: 185: 729:American Statistician 675:(303; Sept), 799–806. 660:The Appraisal Journal 445:Regression validation 321: 283: 221:Further information: 166:design of experiments 70: 44:explanatory variables 615:Annals of Statistics 599:. (Technical Report 595:Knecht, WR. (2005). 380:When estimating the 301: 259: 204:Backward elimination 157: + 1. For 18:Backward elimination 564:Box–Behnken designs 430:Logistic regression 233:regression analysis 184: > 16. 111:accordingly. For K 81:computer simulation 36:stepwise regression 716:Technometrics, 23, 690:Technometrics, 22, 450:Lasso (statistics) 425:Freedman's paradox 382:degrees of freedom 316: 278: 186: 170:Box–Behnken design 742:Technometrics, 33 413:ensemble learning 276: 198:Forward selection 40:regression models 16:(Redirected from 767: 745: 738: 732: 725: 719: 712: 706: 699: 693: 682: 676: 669: 663: 656: 650: 633: 627: 619:(4). 1947–1975. 610: 604: 601:DOT/FAA/AM-O5/15 593: 587: 580: 574: 561: 555: 552: 546: 543: 534: 531: 525: 522: 513: 499: 493: 486: 480: 474: 468: 465: 325: 323: 322: 317: 287: 285: 284: 279: 277: 263: 153: + 3.5 147:(Stage III) 83:, or scientific 21: 775: 774: 770: 769: 768: 766: 765: 764: 750: 749: 748: 739: 735: 726: 722: 713: 709: 700: 696: 683: 679: 670: 666: 662:, Jan., 89–109. 657: 653: 634: 630: 611: 607: 594: 590: 581: 577: 562: 558: 553: 549: 544: 537: 532: 528: 523: 516: 500: 496: 487: 483: 478:Biometrics, 32. 475: 471: 466: 462: 458: 421: 402:PRESS procedure 366: 342: 336: 299: 298: 257: 256: 229:model selection 225: 223:Model selection 219: 191: 189:Main approaches 148: 141: 140:(Stage II) 129: 122: 107:and adjust the 28: 23: 22: 15: 12: 11: 5: 773: 771: 763: 762: 752: 751: 747: 746: 733: 720: 707: 694: 677: 664: 651: 628: 605: 588: 575: 556: 547: 535: 526: 514: 494: 481: 469: 459: 457: 454: 453: 452: 447: 442: 437: 432: 427: 420: 417: 394: 393: 390: 378: 365: 362: 354:validation set 338:Main article: 335: 334:Model accuracy 332: 328:risk inflation 315: 312: 309: 306: 275: 272: 269: 266: 218: 215: 214: 213: 207: 201: 190: 187: 146: 142: + 3 139: 130: + ( 128:(Stage I) 127: 120: 119: = 1 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 772: 761: 758: 757: 755: 743: 737: 734: 730: 724: 721: 717: 711: 708: 704: 698: 695: 691: 687: 681: 678: 674: 668: 665: 661: 655: 652: 649: 645: 642:(3):425–455. 641: 639: 632: 629: 626: 622: 618: 616: 609: 606: 602: 598: 592: 589: 585: 579: 576: 573: 569: 565: 560: 557: 551: 548: 542: 540: 536: 530: 527: 521: 519: 515: 511: 510:SAS Institute 507: 503: 502:SAS Institute 498: 495: 491: 485: 482: 479: 473: 470: 464: 461: 455: 451: 448: 446: 443: 441: 440:Occam's razor 438: 436: 433: 431: 428: 426: 423: 422: 418: 416: 414: 410: 409:data dredging 405: 403: 399: 391: 387: 383: 379: 375: 374: 373: 371: 363: 361: 359: 355: 351: 347: 341: 333: 331: 329: 313: 310: 307: 304: 296: 291: 273: 270: 267: 264: 254: 250: 245: 240: 238: 234: 230: 224: 216: 211: 208: 205: 202: 199: 196: 195: 194: 188: 183: 179: 175: 171: 167: 164: 160: 156: 152: 145: 137: 134: − 133: 126: 123: + 118: 114: 110: 106: 102: 98: 94: 90: 86: 82: 78: 74: 69: 65: 61: 59: 57: 52: 50: 45: 41: 37: 33: 19: 741: 736: 731:44: 214–217. 728: 723: 715: 710: 702: 697: 689: 685: 680: 672: 667: 659: 654: 636: 631: 613: 608: 596: 591: 583: 578: 559: 550: 529: 505: 504:Inc. (1989) 497: 489: 484: 477: 472: 463: 406: 395: 385: 367: 350:training set 345: 343: 289: 252: 241: 228: 226: 217:Alternatives 209: 203: 197: 192: 181: 173: 158: 154: 150: 143: 135: 131: 124: 116: 100: 62: 55: 48: 35: 29: 370:data mining 244:overfitting 109:sample size 87:to collect 744:, 459–468. 638:Biometrika 508:Cary, NC: 456:References 249:Bonferroni 149:= 0.5 97:parameters 77:experiment 32:statistics 398:verifying 364:Criticism 311:⁡ 271:⁡ 178:efficient 163:efficient 113:variables 91:for this 754:Category 718:377–380. 705:311–354. 419:See also 288:, where 105:estimate 566:from a 121:(Start) 73:F-tests 692:49–54. 586:Wiley. 85:survey 58:-tests 51:-tests 103:, to 93:model 640:, 81 617:, 22 572:NIST 512:Inc. 389:fit. 295:risk 89:data 644:doi 621:doi 570:at 308:log 268:log 138:)/2 53:or 30:In 756:: 538:^ 517:^ 404:. 330:. 115:, 99:, 79:, 60:. 34:, 686:R 646:: 623:: 386:r 346:F 314:p 305:2 290:p 274:p 265:2 253:t 182:K 174:K 159:K 155:K 151:K 144:K 136:K 132:K 125:K 117:P 101:P 56:t 49:F 20:)

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index