Knowledge

Talk:Divergence (statistics)

Source šŸ“

248:. It seems to me that a statistical distance, when it refers to comparison of two distributions, merely satisfies a somewhat tighter definition than what is required from a divergence. Currently the two articles give conflicting definitions of a distance, and this could be clarified by discussing them in a common framework. I think it would be okay to also explain comparison of a point to a distribution within the article Divergence (statistics), leaving no need for a Statistical distance article, except for as a redirect. 74: 53: 281:. By contrast, statistical distances are a grab-bag with various properties. It is thus valuable to distinguish so one article is the grab-bag of functions used as "distances", and the other article discusses the ones with geometric properties. This is especially important because these two concepts are frequently conflated (as the history of this article and discussions shows), so having two articles helps distinguish the concepts. 22: 582:
I have reviewed both the translated monograph Methods of Information Geometry (0-8218-0531-2) and the paper (10.1007/978-3-642-10677-4_21), which are cited to substantiate the definition of divergence provided in this article. (Both of these are by a single, Japanese-speaking author, Amari) Only the
306:
What is the origin of the || notation used in this article? Divergences and distance measures in statistics are traditionally written D(P,Q) or similar rather than D(P||Q). While the || notation is not completely unknown in the statistics research literature, it is very rare. The || notation is not
601:
Amari, Shun'ichi (2009). Leung, C.S.; Lee, M.; Chan, J.H. (eds.). Divergence, Optimization, Geometry. The 16th International Conference on Neural Information Processing (ICONIP 20009), Bangkok, Thailand, 1--5 December 2009. Lecture Notes in Computer Science, vol 5863. Berlin, Heidelberg: Springer.
213:
The notation used, while appreciably terse, is also cryptic and unapproachable. This should be revised, either to further enhance its readability or it should link to a page that helps to disambiguate the meaning of the notation used.
860:
The examples given for f-divergence are quite unprincipled. Some are even downright wrong. I have no idea where the name "Chernoff alpha-divergence" came from. And the "exponential divergence" is simply wrong: The generating function
1014:
What do you mean by infinity here? Is it that you have some division by zero or as the limit of a sequence? The definition of a divergence only requires positive (semi)-definiteness, which has no "upper limit" on the real line.
725:
commonly used by whom? It seems mostly the galaxy of information geometers swirling around Amari. A third-party review article would help. My personal impression is that "divergence" is used for any positive-definite function.
594:
Neither of the sources relevant to the definition which is the core of this article back-up the content of the article. I submit that this article either needs new sources, or else needs to be heavily modified or removed.
854: 124: 424: 798: 277:
To follow up: divergences are a special kind of statistical difference (notably inducing a positive definite metric), with important geometric interpretations and a central role in
913: 997: 710: 528: 382: 643:, which is the topic of this article, the positive definiteness is essential to the geometry (essentially it means that infinitesimally, the divergence looks like 459: 1044: 185:. However, I agree that the earlier article didn't sufficiently explain this. I've had a shot at explaining it both more intuitively and more formally, in 114: 654:
I'll have a shot at rewriting to define correctly, and hopefully avoid further confusions by clarifying both the loose use (which should be discussed at
1049: 1039: 166:
I haven't read the reference yet, but it doesn't seem necessary to put it in the definition. At least, a good explanation is required here. --
90: 709:
As it currently stands, the article is fully committed to divergence as used in information geometry. For example, it even excludes the
530:
is simpler, and probably more appropriate for this level of article, so I wouldn't object if someone changes it (and may do so myself).
591:. The additional requirement in both the monograph and the paper in NIPS is the positive definiteness condition described by Memming. 338:, to emphasize the asymmetry. It's not used Kullback & Leibler (1951), but is now common, and is a notation used on Knowledge for 343: 221: 486: 182: 81: 58: 339: 331: 181:
The positive-definiteness is actually the crucial property that connects divergences to information geometry! See discussion at
629: 622: 618: 587:
monograph uses the word "divergence" for this kind of function, and it notes in a footnote on the same page (p. 54) that this
803: 605:
Amari, Shun-ichi; Nagaoka, Hiroshi (2000). Methods of information geometry. Oxford University Press. ISBN 0-8218-0531-2.
492:
I don't have strong feelings about the choice of notation (so long as different notations are mentioned and explained!).
312: 33: 644: 1020: 679:
I've restored the positive definiteness condition, and explained it both more intuitively and more formally, in
387: 253: 21: 922: 245: 755: 1016: 225: 39: 648: 617:, which removed the positive definiteness condition that was included in the initial burst revisions ( 73: 52: 1004: 957: 864: 655: 651:, which are affine and have local potential functions, not just infinitesimal positive definiteness). 640: 633: 308: 278: 241: 217: 249: 89:
on Knowledge. If you would like to participate, please visit the project page, where you can join
934: 918: 335: 320: 268: 171: 155: 979: 692: 666: 567: 538: 469: 289: 198: 750: 498: 352: 315:. I suggest reverting to historical notation, which is simpler and more widely understood. 1000: 944: 716:
I also have serious misgivings about the article's quality. I point out several problems:
429: 1033: 1024: 1008: 965: 940: 926: 696: 670: 571: 551: 542: 482: 473: 324: 316: 293: 272: 264: 257: 229: 202: 175: 167: 688: 662: 563: 534: 465: 285: 194: 628:
The term "divergence" has been used loosely historically (as I've outlined in
86: 349:
Notation is inconsistent in the literature; more formal math often just uses
684:. This should correct the article, and also address the original confusion. 183:
Special:Permalink/1056280310#Definitions_Incompatible_with_Source_Material
647:, and thus generalizes its properties; more formally, they generalize 705:
proposal to rename the article to "Divergence (information geometry)"
621:); skepticism of the necessity of positive definiteness was noted in 154:(see definition in the ā€œgeometrical propertiesā€ section) is strictly 311:, nor does it agree with the definition of || in the Knowledge page 623:
Special:Permalink/340755871#positive_definiteness_in_the_definition
609:
Thank you for the careful reading of references and clarification!
307:
used in any of the cited references, nor in the Knowledge page on
733:
A general f-divergence does not allow a quadratic expansion for
999:
and so cannot be a divergence (as defined in the article).Ā ???
15: 972:
Kullbackā€“Leibler divergence is not an example of a divergence
729:
Again, where are you going to put total variation distance?
749:). The most general theorem I can find is Theorem 7.11 in 554:: I've changed the notation to consistently use commas in 146:
The third property of divergence is given in the text as,
849:{\displaystyle \limsup _{x\to \infty }f''(x)<\infty } 681: 614: 556: 187: 976:
Kullbackā€“Leibler divergence sometimes takes the value
982: 867: 806: 758: 658:) and the information geometry sense in this article. 501: 461:(Amari 2016). I'll add a section discussing notation. 432: 390: 355: 85:, a collaborative effort to improve the coverage of 384:, while information theory literature tends to use 991: 907: 848: 792: 522: 453: 418: 376: 344:Origin of the notation for statistical divergence 808: 602:pp. 185ā€“193. doi:10.1007/978-3-642-10677-4_21. 330:The double-bar notation seems to be common for 578:Definitions Incompatible with Source Material 8: 419:{\displaystyle D_{\text{KL}}(P\parallel Q)} 19: 722:today there is a commonly used definition 47: 981: 899: 866: 811: 805: 769: 757: 589:is not used in the original Japanese text 500: 485:: I've added a discussion of notation in 431: 395: 389: 354: 263:Closing, as no support over 2.5 years. 142:positive definiteness in the definition 49: 793:{\displaystyle f\in C^{2}(0,\infty )} 487:Special:Permalink/1052789591#Notation 7: 79:This article is within the scope of 636:, and continues to be used loosely. 630:Special:Permalink/889324520#History 38:It is of interest to the following 1045:Low-importance Statistics articles 986: 843: 818: 784: 14: 908:{\displaystyle f(x)=(\ln x)^{2}} 99:Knowledge:WikiProject Statistics 72: 51: 20: 1050:WikiProject Statistics articles 1040:Start-Class Statistics articles 119:This article has been rated as 102:Template:WikiProject Statistics 896: 883: 877: 871: 837: 831: 815: 787: 775: 517: 505: 448: 436: 413: 401: 371: 359: 1: 697:04:36, 21 November 2021 (UTC) 612:This error was introduced in 572:04:50, 21 November 2021 (UTC) 203:04:41, 21 November 2021 (UTC) 93:and see a list of open tasks. 671:21:23, 6 November 2021 (UTC) 632:), often just as a term for 543:02:49, 31 October 2021 (UTC) 474:01:34, 31 October 2021 (UTC) 313:List of mathematical symbols 294:21:29, 6 November 2021 (UTC) 258:10:33, 4 December 2014 (UTC) 190:. This is hopefully clearer! 176:18:13, 29 January 2010 (UTC) 639:However, in the context of 619:Special:Permalink/339269004 342:. There's a discussion at: 340:Kullbackā€“Leibler divergence 332:Kullbackā€“Leibler divergence 273:20:05, 15 August 2017 (UTC) 1066: 645:squared Euclidean distance 559:; thanks for raising this! 1025:13:57, 25 July 2023 (UTC) 426:, and one also sees e.g. 230:22:52, 9 March 2012 (UTC) 118: 67: 46: 1009:10:29, 5 June 2023 (UTC) 992:{\displaystyle +\infty } 966:20:26, 30 May 2022 (UTC) 927:23:43, 25 May 2022 (UTC) 711:Total variation distance 346:, but it's inconclusive. 325:10:18, 28 May 2015 (UTC) 489:; hopefully this helps. 246:Divergence (statistics) 993: 919:pony in a strange land 909: 850: 794: 524: 523:{\displaystyle D(P,Q)} 455: 420: 378: 377:{\displaystyle D(x,y)} 82:WikiProject Statistics 28:This article is rated 994: 910: 851: 795: 525: 456: 421: 379: 980: 915:is not even convex! 865: 804: 756: 656:statistical distance 641:information geometry 634:statistical distance 499: 430: 388: 353: 309:statistical distance 279:information geometry 242:Statistical distance 105:Statistics articles 989: 905: 846: 822: 790: 520: 495:I agree that just 451: 416: 374: 336:information theory 334:, particularly in 34:content assessment 807: 752:, which requires 687:ā€”Nils von Barth ( 661:ā€”Nils von Barth ( 649:Hessian manifolds 562:ā€”Nils von Barth ( 533:ā€”Nils von Barth ( 464:ā€”Nils von Barth ( 454:{\displaystyle D} 398: 284:ā€”Nils von Barth ( 220:comment added by 193:ā€”Nils von Barth ( 156:positive-definite 139: 138: 135: 134: 131: 130: 1057: 1017:AntonyRichardLee 998: 996: 995: 990: 964: 962: 954: 953: 938: 914: 912: 911: 906: 904: 903: 855: 853: 852: 847: 830: 821: 799: 797: 796: 791: 774: 773: 683: 616: 558: 529: 527: 526: 521: 460: 458: 457: 452: 425: 423: 422: 417: 400: 399: 396: 383: 381: 380: 375: 232: 189: 125:importance scale 107: 106: 103: 100: 97: 76: 69: 68: 63: 55: 48: 31: 25: 24: 16: 1065: 1064: 1060: 1059: 1058: 1056: 1055: 1054: 1030: 1029: 978: 977: 974: 958: 956: 949: 945: 932: 895: 863: 862: 823: 802: 801: 765: 754: 753: 707: 680: 613: 580: 555: 497: 496: 428: 427: 391: 386: 385: 351: 350: 304: 244:be merged into 240:I propose that 238: 236:Merger proposal 215: 211: 186: 144: 104: 101: 98: 95: 94: 61: 32:on Knowledge's 29: 12: 11: 5: 1063: 1061: 1053: 1052: 1047: 1042: 1032: 1031: 1028: 1027: 988: 985: 973: 970: 969: 968: 902: 898: 894: 891: 888: 885: 882: 879: 876: 873: 870: 858: 845: 842: 839: 836: 833: 829: 826: 820: 817: 814: 810: 789: 786: 783: 780: 777: 772: 768: 764: 761: 731: 720: 718: 706: 703: 702: 701: 700: 699: 685: 674: 673: 659: 652: 637: 626: 625:, as you note. 610: 598: 579: 576: 575: 574: 560: 548: 547: 546: 545: 531: 519: 516: 513: 510: 507: 504: 493: 490: 477: 476: 462: 450: 447: 444: 441: 438: 435: 415: 412: 409: 406: 403: 394: 373: 370: 367: 364: 361: 358: 347: 303: 300: 299: 298: 297: 296: 282: 250:Olli Niemitalo 237: 234: 210: 207: 206: 205: 191: 164: 163: 158:everywhere on 143: 140: 137: 136: 133: 132: 129: 128: 121:Low-importance 117: 111: 110: 108: 91:the discussion 77: 65: 64: 62:Lowā€‘importance 56: 44: 43: 37: 26: 13: 10: 9: 6: 4: 3: 2: 1062: 1051: 1048: 1046: 1043: 1041: 1038: 1037: 1035: 1026: 1022: 1018: 1013: 1012: 1011: 1010: 1006: 1002: 983: 971: 967: 963: 961: 955: 952: 948: 942: 936: 935:Cosmia Nebula 931: 930: 929: 928: 924: 920: 916: 900: 892: 889: 886: 880: 874: 868: 857: 840: 834: 827: 824: 812: 781: 778: 770: 766: 762: 759: 751: 748: 744: 740: 736: 730: 727: 723: 717: 714: 712: 704: 698: 694: 690: 686: 682: 678: 677: 676: 675: 672: 668: 664: 660: 657: 653: 650: 646: 642: 638: 635: 631: 627: 624: 620: 615: 611: 608: 607: 606: 603: 599: 596: 592: 590: 586: 577: 573: 569: 565: 561: 557: 553: 550: 549: 544: 540: 536: 532: 514: 511: 508: 502: 494: 491: 488: 484: 481: 480: 479: 478: 475: 471: 467: 463: 445: 442: 439: 433: 410: 407: 404: 392: 368: 365: 362: 356: 348: 345: 341: 337: 333: 329: 328: 327: 326: 322: 318: 314: 310: 301: 295: 291: 287: 283: 280: 276: 275: 274: 270: 266: 262: 261: 260: 259: 255: 251: 247: 243: 235: 233: 231: 227: 223: 219: 208: 204: 200: 196: 192: 188: 184: 180: 179: 178: 177: 173: 169: 161: 157: 153: 149: 148: 147: 141: 126: 122: 116: 113: 112: 109: 92: 88: 84: 83: 78: 75: 71: 70: 66: 60: 57: 54: 50: 45: 41: 35: 27: 23: 18: 17: 975: 959: 950: 946: 917: 859: 746: 742: 738: 734: 732: 728: 724: 719: 715: 708: 604: 600: 597: 593: 588: 584: 581: 305: 239: 216:ā€” Preceding 212: 165: 159: 151: 145: 120: 80: 40:WikiProjects 939:Please see 583:much older 302:|| Notation 222:132.3.57.68 150:The matrix 30:Start-class 1034:Categories 1001:Thatsme314 585:translated 96:Statistics 87:statistics 59:Statistics 218:unsigned 209:Notation 809:limā€†sup 552:GKSmyth 483:GKSmyth 317:GKSmyth 265:Klbrain 168:Memming 123:on the 947:Formal 689:nbarth 663:nbarth 564:nbarth 535:nbarth 466:nbarth 286:nbarth 195:nbarth 36:scale. 941:WP:RM 721:: --> 1021:talk 1005:talk 960:talk 951:Dude 943:. ā€“ā€“ 923:talk 841:< 800:and 693:talk 667:talk 568:talk 539:talk 470:talk 321:talk 290:talk 269:talk 254:talk 226:talk 199:talk 172:talk 856:. 691:) ( 665:) ( 566:) ( 537:) ( 468:) ( 288:) ( 197:) ( 115:Low 1036:: 1023:) 1007:) 987:āˆž 925:) 890:ā” 887:ln 844:āˆž 819:āˆž 816:ā†’ 785:āˆž 763:āˆˆ 747:dp 745:+ 741:, 713:. 695:) 669:) 570:) 541:) 472:) 408:āˆ„ 397:KL 323:) 292:) 271:) 256:) 228:) 201:) 174:) 1019:( 1003:( 984:+ 937:: 933:@ 921:( 901:2 897:) 893:x 884:( 881:= 878:) 875:x 872:( 869:f 838:) 835:x 832:( 828:ā€³ 825:f 813:x 788:) 782:, 779:0 776:( 771:2 767:C 760:f 743:p 739:p 737:( 735:D 518:) 515:Q 512:, 509:P 506:( 503:D 449:] 446:y 443:: 440:x 437:[ 434:D 414:) 411:Q 405:P 402:( 393:D 372:) 369:y 366:, 363:x 360:( 357:D 319:( 267:( 252:( 224:( 170:( 162:. 160:S 152:g 127:. 42::

Index


content assessment
WikiProjects
WikiProject icon
Statistics
WikiProject icon
WikiProject Statistics
statistics
the discussion
Low
importance scale
positive-definite
Memming
talk
18:13, 29 January 2010 (UTC)
Special:Permalink/1056280310#Definitions_Incompatible_with_Source_Material

nbarth
talk
04:41, 21 November 2021 (UTC)
unsigned
132.3.57.68
talk
22:52, 9 March 2012 (UTC)
Statistical distance
Divergence (statistics)
Olli Niemitalo
talk
10:33, 4 December 2014 (UTC)
Klbrain

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

ā†‘