1180:
636:
1227:
1304:
for granted. Before trusting the results of 100 objects weighed just three times each to have confidence intervals calculated from σ, it is necessary to test for and remove a reasonable number of outliers (testing the assumption that the operator is careful and correcting for the fact that he is not perfect), and to test the assumption that the data really have a
421:
1344:
spreadsheet calculation would reveal typical values for the standard deviation (around 105 to 115% of σ). Or, one could subtract the mean of each triplet from the values, and examine the distribution of 300 values. The mean is identically zero, but the standard deviation should be somewhat smaller
1060:
1303:
against procedural errors which are not modeled by the assumption that the balance has a fixed known standard deviation σ. In practical applications where the occasional operator error can occur, or the balance can malfunction, the assumptions behind simple statistical calculations cannot be taken
1339:
After removing obvious outliers, one could subtract the median from the other two values for each object, and examine the distribution of the 200 resulting numbers. It should be normal with mean near zero and standard deviation a little larger than σ. A simple
74:
on contaminated data, at the cost of inferior efficiency on clean data from distributions such as the normal distribution. To illustrate robustness, the standard deviation can be made arbitrarily large by increasing exactly one observation (it has a
631:{\displaystyle {\begin{aligned}S_{n}&:=1.1926\,\operatorname {med} _{i}\left(\operatorname {med} _{j}(\,\left|x_{i}-x_{j}\right|\,)\right),\\Q_{n}&:=c_{n}{\text{first quartile of}}\left(\left|x_{i}-x_{j}\right|:i<j\right),\end{aligned}}}
1287:
of the three measurements and using σ would give a confidence interval. The 200 extra weighings served only to detect and correct for operator error and did nothing to improve the confidence interval. With more repetitions, one could use a
265:
1283:. Any object with an unusually large standard deviation probably has an outlier in its data. These can be removed by various non-parametric techniques. If the operator repeated the process only three times, simply taking the
798:
183:
1166:
Its square root is a robust estimator of scale, since data points are downweighted as their distance from the median increases, with points more than 9 MAD units from the median having no influence at all.
1270:
In the process of weighing 1000 objects, under practical conditions, it is easy to believe that the operator might make a mistake in procedure and so report an incorrect mass (thereby making one type of
1161:
426:
763:
under a normal distribution depends markedly on the sample size, so finite-sample correction factors (obtained from a table or from simulations) are used to calibrate the scale of
350:
compared to conventional estimators for data drawn from a distribution without outliers (such as a normal distribution), but have superior efficiency for data drawn from a
1262:, meaning that one modifies the non-robust calculations of the confidence interval so that they are not badly affected by outlying or aberrant observations in a data-set.
1275:). Suppose there were 100 objects and the operator weighed them all, one at a time, and repeated the whole process ten times. Then the operator can calculate a sample
216:
134:
666:
224:
1296:
calculation could be used to determine a confidence interval narrower than that calculated from σ, and so obtain some benefit from a large amount of extra work.
1494:
1354:
686:
1055:{\displaystyle {\frac {n\sum _{i=1}^{n}(x_{i}-Q)^{2}(1-u_{i}^{2})^{4}I(|u_{i}|<1)}{\left(\sum _{i}(1-u_{i}^{2})(1-5u_{i}^{2})I(|u_{i}|<1)\right)^{2}}},}
361:
For example, for data drawn from the normal distribution, the MAD is 37% as efficient as the sample standard deviation, while the
Rousseeuw–Croux estimator
142:
198:
of the absolute values of the differences between the data values and the overall median of the data set; for a
Gaussian distribution, MAD is related to
334:, interpreted as an alternative to the population standard deviation as a measure of scale. For example, the MAD of a sample from a standard
718:
estimation, as they are based only on differences between values. They are both more efficient than the MAD under a
Gaussian distribution:
323:
erf(1/2) (approximately 1.349), makes it an unbiased, consistent estimator for the population standard deviation if the data follow a
1213:
propose a robust depth-based estimator for location and scale simultaneously. They propose a new measure named the
Student median.
1090:
1320:
which draws random numbers from a normal distribution with standard deviation σ to simulate the situation; this can be done in
742:
is approximately unbiased for the population standard deviation even down to very modest sample sizes (<1% bias for
756:
is approximately unbiased for the population standard deviation. For small or moderate samples, the expected value of
1239:
1489:
1293:
1499:
1383:
355:
271:
190:
51:
338:
is an estimator of the population MAD, which in this case is 1, whereas the population variance does not exist.
1364:
383:
347:
71:
330:
In other situations, it makes more sense to think of a robust measure of scale as an estimator of its own
24:
792:, the biweight midvariance aims to be robust without sacrificing too much efficiency. It is defined as
387:
79:
of 0, as it can be contaminated by a single point), a defect that is not shared by robust statistics.
351:
309:
287:
36:
1359:
1329:
1305:
1259:
335:
324:
89:
45:
28:
1427:
1341:
1276:
1070:
715:
306:
298:
105:
56:
1328:, as discussed in and the same techniques can be used in other spreadsheet programs such as in
1300:
1255:
113:
102:
55:(MAD). These are contrasted with conventional or non-robust measures of scale, such as sample
40:
201:
119:
1453:
1419:
1272:
260:{\displaystyle \sigma \approx 1.4826\operatorname {MAD} \approx \operatorname {MAD} /0.6745}
644:
1407:
1321:
1316:
The theoretical analysis of such an experiment is complicated, but it is easy to set up a
313:
76:
67:
378:
Rousseeuw and Croux propose alternatives to the MAD, motivated by two weaknesses of it:
1289:
671:
331:
291:
32:
1410:; Croux, Christophe (December 1993), "Alternatives to the Median Absolute Deviation",
1179:
116:(10% trimmed range) can also be used. For a Gaussian distribution, IQR is related to
1483:
178:{\displaystyle \sigma \approx 0.7413\operatorname {IQR} =\operatorname {IQR} /1.349}
393:
it computes a symmetric statistic about a location estimate, thus not dealing with
358:, for which non-robust measures such as the standard deviation should not be used.
302:
1317:
109:
1457:
98:
94:
297:
For example, robust estimators of scale are used to estimate the population
283:
63:
1333:
394:
1471:
1292:, discarding the largest and smallest values and averaging the rest. A
1431:
1280:
401:
They propose two alternative statistics based on pairwise differences:
1284:
195:
1423:
1220:
1174:
1156:{\displaystyle u_{i}={\frac {x_{i}-Q}{9\cdot {\rm {MAD}}}}.}
1444:
Mizera, I.; Müller, C. H. (2004), "Location-scale depth",
70:, and have the advantages of both robustness and superior
272:
Median absolute deviation#Relation to standard deviation
1191:
87:
One of the most common robust measures of scale is the
370:
is 88% as efficient as the sample standard deviation.
1093:
801:
674:
647:
424:
227:
204:
145:
122:
1472:"Monte Carlo Simulation in Excel: A Practical Guide"
1418:(424), American Statistical Association: 1273–1283,
749:
For a large sample from a normal distribution, 2.22
16:
Statistical indicators of the deviation of a sample
1155:
1054:
680:
660:
630:
259:
210:
177:
128:
62:These robust statistics are particularly used as
346:These robust estimators typically have inferior
188:Another familiar robust measure of scale is the
1446:Journal of the American Statistical Association
1412:Journal of the American Statistical Association
1355:Heteroscedasticity-consistent standard errors
8:
1210:
286:of properties of the population, either for
59:, which are greatly influenced by outliers.
1135:
1134:
1114:
1107:
1098:
1092:
1041:
1022:
1016:
1007:
992:
987:
962:
957:
938:
913:
907:
898:
886:
876:
871:
852:
836:
823:
812:
802:
800:
735:For a sample from a normal distribution,
673:
652:
646:
593:
580:
561:
555:
538:
518:
507:
494:
484:
472:
454:
449:
433:
425:
423:
249:
226:
203:
167:
144:
121:
282:Robust measures of scale can be used as
1375:
1235:This section may need to be cleaned up.
93:(IQR), the difference between the 75th
7:
1495:Statistical deviation and dispersion
316:. For example, dividing the IQR by 2
112:. Other trimmed ranges, such as the
1142:
1139:
1136:
14:
1225:
1178:
301:, generally by multiplying by a
1279:for each object, and look for
1033:
1023:
1008:
1004:
998:
971:
968:
944:
924:
914:
899:
895:
883:
858:
849:
829:
519:
481:
290:or as estimators of their own
23:are methods that quantify the
1:
374:Absolute pairwise differences
101:of a sample; this is the 25%
1077:is the sample median of the
1308:with standard deviation σ.
1240:Robust confidence intervals
668:is a constant depending on
314:scale parameter: estimation
1516:
1458:10.1198/016214504000001312
1252:robust confidence interval
1211:Mizera & Müller (2004)
714:Neither of these requires
1345:(around 75 to 85% of σ).
691:These can be computed in
356:heavy-tailed distribution
191:median absolute deviation
52:median absolute deviation
1237:It has been merged from
774:The biweight midvariance
725:is 58% efficient, while
21:robust measures of scale
1365:Mean Absolute Deviation
211:{\displaystyle \sigma }
129:{\displaystyle \sigma }
39:. The most common such
1157:
1056:
828:
682:
662:
632:
388:Gaussian distributions
348:statistical efficiency
261:
212:
179:
130:
25:statistical dispersion
1384:"Interquartile Range"
1326:=NORMINV(RAND(),0,σ))
1299:These procedures are
1158:
1057:
808:
683:
663:
661:{\displaystyle c_{n}}
633:
262:
213:
180:
131:
35:data while resisting
1260:confidence intervals
1217:Confidence intervals
1091:
799:
672:
645:
422:
386:(37% efficiency) at
352:mixture distribution
310:consistent estimator
288:parameter estimation
225:
202:
143:
120:
1408:Rousseeuw, Peter J.
1360:Interquartile Range
1330:OpenOffice.org Calc
1312:Computer simulation
1306:normal distribution
997:
967:
881:
746: = 10).
336:Cauchy distribution
325:normal distribution
108:, an example of an
90:interquartile range
46:interquartile range
1277:standard deviation
1190:. You can help by
1153:
1071:indicator function
1052:
983:
953:
943:
867:
732:is 82% efficient.
678:
658:
628:
626:
299:standard deviation
257:
208:
175:
126:
57:standard deviation
1490:Robust statistics
1248:
1247:
1208:
1207:
1148:
1047:
934:
681:{\displaystyle n}
564:
563:first quartile of
114:interdecile range
41:robust statistics
1507:
1500:Scale statistics
1475:
1468:
1462:
1460:
1452:(468): 949–966,
1441:
1435:
1434:
1404:
1398:
1397:
1395:
1394:
1380:
1327:
1273:systematic error
1258:modification of
1229:
1228:
1221:
1203:
1200:
1182:
1175:
1162:
1160:
1159:
1154:
1149:
1147:
1146:
1145:
1126:
1119:
1118:
1108:
1103:
1102:
1061:
1059:
1058:
1053:
1048:
1046:
1045:
1040:
1036:
1026:
1021:
1020:
1011:
996:
991:
966:
961:
942:
927:
917:
912:
911:
902:
891:
890:
880:
875:
857:
856:
841:
840:
827:
822:
803:
687:
685:
684:
679:
667:
665:
664:
659:
657:
656:
637:
635:
634:
629:
627:
620:
616:
603:
599:
598:
597:
585:
584:
565:
562:
560:
559:
543:
542:
526:
522:
517:
513:
512:
511:
499:
498:
477:
476:
459:
458:
438:
437:
322:
321:
266:
264:
263:
258:
253:
217:
215:
214:
209:
184:
182:
181:
176:
171:
135:
133:
132:
127:
1515:
1514:
1510:
1509:
1508:
1506:
1505:
1504:
1480:
1479:
1478:
1470:Wittwer, J.W.,
1469:
1465:
1443:
1442:
1438:
1424:10.2307/2291267
1406:
1405:
1401:
1392:
1390:
1382:
1381:
1377:
1373:
1351:
1325:
1322:Microsoft Excel
1314:
1268:
1244:
1230:
1226:
1219:
1204:
1198:
1195:
1188:needs expansion
1173:
1127:
1110:
1109:
1094:
1089:
1088:
1083:
1012:
933:
929:
928:
903:
882:
848:
832:
804:
797:
796:
791:
784:
776:
769:
762:
755:
741:
730:
723:
670:
669:
648:
643:
642:
625:
624:
589:
576:
575:
571:
570:
566:
551:
544:
534:
531:
530:
503:
490:
489:
485:
468:
467:
463:
450:
439:
429:
420:
419:
413:
406:
376:
369:
344:
319:
317:
280:
223:
222:
200:
199:
141:
140:
118:
117:
85:
77:breakdown point
68:scale parameter
19:In statistics,
17:
12:
11:
5:
1513:
1511:
1503:
1502:
1497:
1492:
1482:
1481:
1477:
1476:
1474:, June 1, 2004
1463:
1436:
1399:
1374:
1372:
1369:
1368:
1367:
1362:
1357:
1350:
1347:
1313:
1310:
1290:truncated mean
1267:
1264:
1246:
1245:
1233:
1231:
1224:
1218:
1215:
1206:
1205:
1185:
1183:
1172:
1169:
1164:
1163:
1152:
1144:
1141:
1138:
1133:
1130:
1125:
1122:
1117:
1113:
1106:
1101:
1097:
1081:
1063:
1062:
1051:
1044:
1039:
1035:
1032:
1029:
1025:
1019:
1015:
1010:
1006:
1003:
1000:
995:
990:
986:
982:
979:
976:
973:
970:
965:
960:
956:
952:
949:
946:
941:
937:
932:
926:
923:
920:
916:
910:
906:
901:
897:
894:
889:
885:
879:
874:
870:
866:
863:
860:
855:
851:
847:
844:
839:
835:
831:
826:
821:
818:
815:
811:
807:
789:
782:
775:
772:
767:
760:
753:
739:
728:
721:
677:
655:
651:
639:
638:
623:
619:
615:
612:
609:
606:
602:
596:
592:
588:
583:
579:
574:
569:
558:
554:
550:
547:
545:
541:
537:
533:
532:
529:
525:
521:
516:
510:
506:
502:
497:
493:
488:
483:
480:
475:
471:
466:
462:
457:
453:
448:
445:
442:
440:
436:
432:
428:
427:
415:, defined as:
411:
404:
399:
398:
391:
375:
372:
365:
343:
340:
332:expected value
305:to make it an
292:expected value
279:
276:
268:
267:
256:
252:
248:
245:
242:
239:
236:
233:
230:
207:
186:
185:
174:
170:
166:
163:
160:
157:
154:
151:
148:
125:
84:
81:
49:(IQR) and the
15:
13:
10:
9:
6:
4:
3:
2:
1512:
1501:
1498:
1496:
1493:
1491:
1488:
1487:
1485:
1473:
1467:
1464:
1459:
1455:
1451:
1447:
1440:
1437:
1433:
1429:
1425:
1421:
1417:
1413:
1409:
1403:
1400:
1389:
1385:
1379:
1376:
1370:
1366:
1363:
1361:
1358:
1356:
1353:
1352:
1348:
1346:
1343:
1337:
1335:
1331:
1323:
1319:
1311:
1309:
1307:
1302:
1297:
1295:
1291:
1286:
1282:
1278:
1274:
1265:
1263:
1261:
1257:
1253:
1242:
1241:
1236:
1232:
1223:
1222:
1216:
1214:
1212:
1202:
1193:
1189:
1186:This section
1184:
1181:
1177:
1176:
1170:
1168:
1150:
1131:
1128:
1123:
1120:
1115:
1111:
1104:
1099:
1095:
1087:
1086:
1085:
1080:
1076:
1072:
1068:
1049:
1042:
1037:
1030:
1027:
1017:
1013:
1001:
993:
988:
984:
980:
977:
974:
963:
958:
954:
950:
947:
939:
935:
930:
921:
918:
908:
904:
892:
887:
877:
872:
868:
864:
861:
853:
845:
842:
837:
833:
824:
819:
816:
813:
809:
805:
795:
794:
793:
788:
781:
773:
771:
766:
759:
752:
747:
745:
738:
733:
731:
724:
717:
712:
710:
706:
702:
698:
694:
689:
675:
653:
649:
621:
617:
613:
610:
607:
604:
600:
594:
590:
586:
581:
577:
572:
567:
556:
552:
548:
546:
539:
535:
527:
523:
514:
508:
504:
500:
495:
491:
486:
478:
473:
469:
464:
460:
455:
451:
446:
443:
441:
434:
430:
418:
417:
416:
414:
407:
396:
392:
389:
385:
381:
380:
379:
373:
371:
368:
364:
359:
357:
353:
349:
341:
339:
337:
333:
328:
326:
315:
311:
308:
304:
300:
295:
293:
289:
285:
277:
275:
274:for details.
273:
254:
250:
246:
243:
240:
237:
234:
231:
228:
221:
220:
219:
205:
197:
193:
192:
172:
168:
164:
161:
158:
155:
152:
149:
146:
139:
138:
137:
123:
115:
111:
107:
104:
100:
97:and the 25th
96:
92:
91:
82:
80:
78:
73:
69:
65:
60:
58:
54:
53:
48:
47:
42:
38:
34:
30:
26:
22:
1466:
1449:
1445:
1439:
1415:
1411:
1402:
1391:. Retrieved
1387:
1378:
1338:
1315:
1298:
1269:
1251:
1249:
1238:
1234:
1209:
1199:October 2013
1196:
1192:adding to it
1187:
1165:
1078:
1074:
1066:
1064:
786:
779:
777:
764:
757:
750:
748:
743:
736:
734:
726:
719:
713:
708:
704:
700:
696:
692:
690:
640:
409:
402:
400:
377:
366:
362:
360:
345:
329:
303:scale factor
296:
281:
269:
189:
187:
88:
86:
61:
50:
44:
20:
18:
1342:Monte Carlo
1318:spreadsheet
703:) time and
384:inefficient
194:(MAD), the
110:L-estimator
83:IQR and MAD
1484:Categories
1393:2022-03-30
1371:References
1171:Extensions
354:or from a
342:Efficiency
284:estimators
278:Estimation
99:percentile
95:percentile
72:efficiency
64:estimators
1294:bootstrap
1132:⋅
1121:−
978:−
951:−
936:∑
865:−
843:−
810:∑
711:) space.
587:−
501:−
479:
461:
247:
241:≈
232:≈
229:σ
206:σ
165:
150:≈
147:σ
124:σ
33:numerical
1349:See also
1334:gnumeric
1281:outliers
716:location
395:skewness
307:unbiased
43:are the
37:outliers
1432:2291267
1266:Example
1069:is the
318:√
103:trimmed
1430:
1324:using
1301:robust
1285:median
1256:robust
1084:, and
1065:where
641:where
447:1.1926
382:It is
312:; see
255:0.6745
235:1.4826
196:median
153:0.7413
29:sample
1428:JSTOR
1254:is a
778:Like
173:1.349
106:range
66:of a
27:in a
1388:NIST
1332:and
1028:<
919:<
785:and
699:log
611:<
408:and
270:See
218:as:
136:as:
1454:doi
1420:doi
1194:.
470:med
452:med
244:MAD
238:MAD
162:IQR
156:IQR
31:of
1486::
1450:99
1448:,
1426:,
1416:88
1414:,
1386:.
1336:.
1250:A
1073:,
770:.
688:.
549::=
444::=
327:.
294:.
1461:.
1456::
1422::
1396:.
1243:.
1201:)
1197:(
1151:.
1143:D
1140:A
1137:M
1129:9
1124:Q
1116:i
1112:x
1105:=
1100:i
1096:u
1082:i
1079:X
1075:Q
1067:I
1050:,
1043:2
1038:)
1034:)
1031:1
1024:|
1018:i
1014:u
1009:|
1005:(
1002:I
999:)
994:2
989:i
985:u
981:5
975:1
972:(
969:)
964:2
959:i
955:u
948:1
945:(
940:i
931:(
925:)
922:1
915:|
909:i
905:u
900:|
896:(
893:I
888:4
884:)
878:2
873:i
869:u
862:1
859:(
854:2
850:)
846:Q
838:i
834:x
830:(
825:n
820:1
817:=
814:i
806:n
790:n
787:Q
783:n
780:S
768:n
765:Q
761:n
758:Q
754:n
751:Q
744:n
740:n
737:S
729:n
727:Q
722:n
720:S
709:n
707:(
705:O
701:n
697:n
695:(
693:O
676:n
654:n
650:c
622:,
618:)
614:j
608:i
605::
601:|
595:j
591:x
582:i
578:x
573:|
568:(
557:n
553:c
540:n
536:Q
528:,
524:)
520:)
515:|
509:j
505:x
496:i
492:x
487:|
482:(
474:j
465:(
456:i
435:n
431:S
412:n
410:Q
405:n
403:S
397:.
390:.
367:n
363:Q
320:2
251:/
169:/
159:=
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.