Knowledge (XXG)

Empirical statistical laws

Source 📝

78:, is another example. According to the "law", given some dataset of text, the frequency of a word is inversely proportional to its frequency rank. In other words, the second most common word should appear about half as often as the most common word, and the fifth most common world would appear about once every five times the most common word appears. However, what sets Zipf's law as an "empirical statistical law" rather than just a theorem of linguistics is that it applies to phenomena outside of its field, too. For example, a ranked list of US metropolitan populations also follow Zipf's law, and even 67:
is a popular example of such a "law". It states that roughly 80% of the effects come from 20% of the causes, and is thus also known as the 80/20 rule. In business, the 80/20 rule says that 80% of your business comes from just 20% of your customers. In software engineering, it is often said that 80%
39:
theorems and the term "law" has been carried over to these theorems. There are other statistical and probabilistic theorems that also have "law" as a part of their names that have not obviously derived from
68:
of the errors are caused by just 20% of the bugs. 20% of the world creates roughly 80% of worldwide GDP. 80% of healthcare expenses in the US are caused by 20% of the population.
82:
follows Zipf's law. This act of summarizing several natural data patterns with simple rules is a defining characteristic of these "empirical statistical laws".
48:
in the field of statistics. What distinguishes an empirical statistical law from a formal statistical theorem is the way these patterns simply appear in
273: 471: 457: 325: 405: 144: 311: 111: 101: 490: 179: 31:
and, indeed, across a range of types of data sets. Many of these observances have been formulated and proved as
326:"Chart 1: Percent of Total Health Care Expenses Incurred by Different Percentiles of U.S. Population: 2002" 91: 106: 96: 49: 428: 386: 368: 305: 174: 41: 467: 453: 223: 184: 125: 85:
Examples of empirically inspired statistical laws that have a firm theoretical basis include:
138:
Examples of "laws" which are more general observations than having a theoretical background:
420: 378: 130: 64: 215: 462:
Gelbukh, A., Sidorov, G. (2008). Zipf and Heaps Laws’ Coefficients Depend on Language. In:
158: 354:"The Area and Population of Cities: New Insights from a Different Perspective on Cities" 424: 45: 353: 484: 248: 71: 36: 432: 390: 75: 32: 300:. United Nations Development Program. New York: Oxford University Press. 1992. 79: 475: 227: 382: 332:. Rockville, MD: Agency for Healthcare Research and Quality. June 2006. 28: 27:
represents a type of behaviour that has been found across a number of
373: 274:"Microsoft's CEO: 80-20 Rule Applies To Bugs, Not Just Features" 44:. However, both types of "law" may be considered instances of a 464:
Computational Linguistics and Intelligent Text Processing
152:
Examples of supposed "laws" which are incorrect include:
52:, without a prior theoretical reasoning about the data. 16:
Statistical behavior found in a wide variety of datasets
404:
Anderson, John R.; Schooler, Lael J. (November 1991).
119:
Examples of "laws" with a weaker foundation include:
216:"Joseph Juran, 103, Pioneer in Quality Control, Dies" 60:
There are several such popular "laws of statistics".
74:, described as an "empirical statistical law" of 8: 448:Kitcher, P., Salmon, W.C. (Editors) (2009) 406:"Reflections of the Environment in Memory" 372: 197: 303: 7: 466:(pp. 332–335), Springer. 425:10.1111/j.1467-9280.1991.tb00174.x 247:Staff, Investopedia (2010-11-04). 14: 452:. University of Minnesota Press. 204:Kitcher & Salmon (2009) p.51 23:or (in popular terminology) a 1: 298:1992 Human Development Report 342:Gelbukh & Sidorov (2008) 330:Research in Action, Issue 19 272:Rooney, Paula (2002-10-03). 214:Bunkley, Nick (2008-03-03). 507: 180:Category: Statistical laws 112:Regression toward the mean 102:Law of truly large numbers 21:empirical statistical law 361:American Economic Review 352:Gabaix, Xavier (2011). 450:Scientific Explanation 383:10.1257/aer.101.5.2205 310:: CS1 maint: others ( 145:Rank–size distribution 92:Statistical regularity 42:empirical observations 413:Psychological Science 107:Central limit theorem 50:natural distributions 97:Law of large numbers 220:The New York Times 472:978-3-540-41687-6 458:978-0-8166-5765-0 185:Law (mathematics) 126:Safety in numbers 25:law of statistics 498: 491:Statistical laws 476:link to abstract 437: 436: 410: 401: 395: 394: 376: 367:(5): 2205–2225. 358: 349: 343: 340: 334: 333: 322: 316: 315: 309: 301: 294: 288: 287: 285: 284: 269: 263: 262: 260: 259: 244: 238: 237: 235: 234: 211: 205: 202: 65:Pareto principle 506: 505: 501: 500: 499: 497: 496: 495: 481: 480: 445: 440: 408: 403: 402: 398: 356: 351: 350: 346: 341: 337: 324: 323: 319: 302: 296: 295: 291: 282: 280: 271: 270: 266: 257: 255: 246: 245: 241: 232: 230: 213: 212: 208: 203: 199: 195: 169: 159:Law of averages 58: 17: 12: 11: 5: 504: 502: 494: 493: 483: 482: 479: 478: 460: 444: 441: 439: 438: 419:(6): 396–408. 396: 344: 335: 317: 289: 264: 239: 206: 196: 194: 191: 190: 189: 188: 187: 182: 177: 175:Laws of chance 168: 165: 164: 163: 162: 161: 150: 149: 148: 147: 136: 135: 134: 133: 128: 117: 116: 115: 114: 109: 104: 99: 94: 57: 54: 46:scientific law 15: 13: 10: 9: 6: 4: 3: 2: 503: 492: 489: 488: 486: 477: 473: 469: 465: 461: 459: 455: 451: 447: 446: 442: 434: 430: 426: 422: 418: 414: 407: 400: 397: 392: 388: 384: 380: 375: 370: 366: 362: 355: 348: 345: 339: 336: 331: 327: 321: 318: 313: 307: 299: 293: 290: 279: 275: 268: 265: 254: 250: 243: 240: 229: 225: 221: 217: 210: 207: 201: 198: 192: 186: 183: 181: 178: 176: 173: 172: 171: 170: 166: 160: 157: 156: 155: 154: 153: 146: 143: 142: 141: 140: 139: 132: 131:Benford's law 129: 127: 124: 123: 122: 121: 120: 113: 110: 108: 105: 103: 100: 98: 95: 93: 90: 89: 88: 87: 86: 83: 81: 77: 73: 69: 66: 61: 55: 53: 51: 47: 43: 38: 37:probabilistic 34: 30: 26: 22: 463: 449: 416: 412: 399: 364: 360: 347: 338: 329: 320: 297: 292: 281:. Retrieved 277: 267: 256:. Retrieved 253:Investopedia 252: 249:"80-20 Rule" 242: 231:. Retrieved 219: 209: 200: 151: 137: 118: 84: 70: 62: 59: 24: 20: 18: 76:linguistics 33:statistical 443:References 283:2017-05-05 258:2017-05-05 233:2017-05-05 80:forgetting 72:Zipf's law 374:1001.5289 306:cite book 228:0362-4331 485:Category 167:See also 56:Examples 29:datasets 433:8511110 391:4998367 470:  456:  431:  389:  226:  429:S2CID 409:(PDF) 387:S2CID 369:arXiv 357:(PDF) 193:Notes 468:ISBN 454:ISBN 312:link 224:ISSN 63:The 421:doi 379:doi 365:101 278:CRN 35:or 19:An 487:: 474:. 427:. 415:. 411:. 385:. 377:. 363:. 359:. 328:. 308:}} 304:{{ 276:. 251:. 222:. 218:. 435:. 423:: 417:2 393:. 381:: 371:: 314:) 286:. 261:. 236:.

Index

datasets
statistical
probabilistic
empirical observations
scientific law
natural distributions
Pareto principle
Zipf's law
linguistics
forgetting
Statistical regularity
Law of large numbers
Law of truly large numbers
Central limit theorem
Regression toward the mean
Safety in numbers
Benford's law
Rank–size distribution
Law of averages
Laws of chance
Category: Statistical laws
Law (mathematics)
"Joseph Juran, 103, Pioneer in Quality Control, Dies"
ISSN
0362-4331
"80-20 Rule"
"Microsoft's CEO: 80-20 Rule Applies To Bugs, Not Just Features"
cite book
link
"Chart 1: Percent of Total Health Care Expenses Incurred by Different Percentiles of U.S. Population: 2002"

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.