248:. It seems to me that a statistical distance, when it refers to comparison of two distributions, merely satisfies a somewhat tighter definition than what is required from a divergence. Currently the two articles give conflicting definitions of a distance, and this could be clarified by discussing them in a common framework. I think it would be okay to also explain comparison of a point to a distribution within the article Divergence (statistics), leaving no need for a Statistical distance article, except for as a redirect.
74:
53:
281:. By contrast, statistical distances are a grab-bag with various properties. It is thus valuable to distinguish so one article is the grab-bag of functions used as "distances", and the other article discusses the ones with geometric properties. This is especially important because these two concepts are frequently conflated (as the history of this article and discussions shows), so having two articles helps distinguish the concepts.
22:
582:
I have reviewed both the translated monograph
Methods of Information Geometry (0-8218-0531-2) and the paper (10.1007/978-3-642-10677-4_21), which are cited to substantiate the definition of divergence provided in this article. (Both of these are by a single, Japanese-speaking author, Amari) Only the
306:
What is the origin of the || notation used in this article? Divergences and distance measures in statistics are traditionally written D(P,Q) or similar rather than D(P||Q). While the || notation is not completely unknown in the statistics research literature, it is very rare. The || notation is not
601:
Amari, Shun'ichi (2009). Leung, C.S.; Lee, M.; Chan, J.H. (eds.). Divergence, Optimization, Geometry. The 16th
International Conference on Neural Information Processing (ICONIP 20009), Bangkok, Thailand, 1--5 December 2009. Lecture Notes in Computer Science, vol 5863. Berlin, Heidelberg: Springer.
213:
The notation used, while appreciably terse, is also cryptic and unapproachable. This should be revised, either to further enhance its readability or it should link to a page that helps to disambiguate the meaning of the notation used.
860:
The examples given for f-divergence are quite unprincipled. Some are even downright wrong. I have no idea where the name "Chernoff alpha-divergence" came from. And the "exponential divergence" is simply wrong: The generating function
1014:
What do you mean by infinity here? Is it that you have some division by zero or as the limit of a sequence? The definition of a divergence only requires positive (semi)-definiteness, which has no "upper limit" on the real line.
725:
commonly used by whom? It seems mostly the galaxy of information geometers swirling around Amari. A third-party review article would help. My personal impression is that "divergence" is used for any positive-definite function.
594:
Neither of the sources relevant to the definition which is the core of this article back-up the content of the article. I submit that this article either needs new sources, or else needs to be heavily modified or removed.
854:
124:
424:
798:
277:
To follow up: divergences are a special kind of statistical difference (notably inducing a positive definite metric), with important geometric interpretations and a central role in
913:
997:
710:
528:
382:
643:, which is the topic of this article, the positive definiteness is essential to the geometry (essentially it means that infinitesimally, the divergence looks like
459:
1044:
185:. However, I agree that the earlier article didn't sufficiently explain this. I've had a shot at explaining it both more intuitively and more formally, in
114:
654:
I'll have a shot at rewriting to define correctly, and hopefully avoid further confusions by clarifying both the loose use (which should be discussed at
1049:
1039:
166:
I haven't read the reference yet, but it doesn't seem necessary to put it in the definition. At least, a good explanation is required here. --
90:
709:
As it currently stands, the article is fully committed to divergence as used in information geometry. For example, it even excludes the
530:
is simpler, and probably more appropriate for this level of article, so I wouldn't object if someone changes it (and may do so myself).
591:. The additional requirement in both the monograph and the paper in NIPS is the positive definiteness condition described by Memming.
338:, to emphasize the asymmetry. It's not used Kullback & Leibler (1951), but is now common, and is a notation used on Knowledge for
343:
221:
486:
182:
81:
58:
339:
331:
181:
The positive-definiteness is actually the crucial property that connects divergences to information geometry! See discussion at
629:
622:
618:
587:
monograph uses the word "divergence" for this kind of function, and it notes in a footnote on the same page (p. 54) that this
803:
605:
Amari, Shun-ichi; Nagaoka, Hiroshi (2000). Methods of information geometry. Oxford
University Press. ISBN 0-8218-0531-2.
492:
I don't have strong feelings about the choice of notation (so long as different notations are mentioned and explained!).
312:
33:
644:
1020:
679:
I've restored the positive definiteness condition, and explained it both more intuitively and more formally, in
387:
253:
21:
922:
245:
755:
1016:
225:
39:
648:
617:, which removed the positive definiteness condition that was included in the initial burst revisions (
73:
52:
1004:
957:
864:
655:
651:, which are affine and have local potential functions, not just infinitesimal positive definiteness).
640:
633:
308:
278:
241:
217:
249:
89:
on
Knowledge. If you would like to participate, please visit the project page, where you can join
934:
918:
335:
320:
268:
171:
155:
979:
692:
666:
567:
538:
469:
289:
198:
750:
498:
352:
315:. I suggest reverting to historical notation, which is simpler and more widely understood.
1000:
944:
716:
I also have serious misgivings about the article's quality. I point out several problems:
429:
1033:
1024:
1008:
965:
940:
926:
696:
670:
571:
551:
542:
482:
473:
324:
316:
293:
272:
264:
257:
229:
202:
175:
167:
688:
662:
563:
534:
465:
285:
194:
628:
The term "divergence" has been used loosely historically (as I've outlined in
86:
349:
Notation is inconsistent in the literature; more formal math often just uses
684:. This should correct the article, and also address the original confusion.
183:
Special:Permalink/1056280310#Definitions_Incompatible_with_Source_Material
647:, and thus generalizes its properties; more formally, they generalize
705:
proposal to rename the article to "Divergence (information geometry)"
621:); skepticism of the necessity of positive definiteness was noted in
154:(see definition in the āgeometrical propertiesā section) is strictly
311:, nor does it agree with the definition of || in the Knowledge page
623:
Special:Permalink/340755871#positive_definiteness_in_the_definition
609:
Thank you for the careful reading of references and clarification!
307:
used in any of the cited references, nor in the
Knowledge page on
733:
A general f-divergence does not allow a quadratic expansion for
999:
and so cannot be a divergence (as defined in the article).Ā ???
15:
972:
KullbackāLeibler divergence is not an example of a divergence
729:
Again, where are you going to put total variation distance?
749:). The most general theorem I can find is Theorem 7.11 in
554:: I've changed the notation to consistently use commas in
146:
The third property of divergence is given in the text as,
849:{\displaystyle \limsup _{x\to \infty }f''(x)<\infty }
681:
614:
556:
187:
976:
KullbackāLeibler divergence sometimes takes the value
982:
867:
806:
758:
658:) and the information geometry sense in this article.
501:
461:(Amari 2016). I'll add a section discussing notation.
432:
390:
355:
85:, a collaborative effort to improve the coverage of
384:, while information theory literature tends to use
991:
907:
848:
792:
522:
453:
418:
376:
344:Origin of the notation for statistical divergence
808:
602:pp. 185ā193. doi:10.1007/978-3-642-10677-4_21.
330:The double-bar notation seems to be common for
578:Definitions Incompatible with Source Material
8:
419:{\displaystyle D_{\text{KL}}(P\parallel Q)}
19:
722:today there is a commonly used definition
47:
981:
899:
866:
811:
805:
769:
757:
589:is not used in the original Japanese text
500:
485:: I've added a discussion of notation in
431:
395:
389:
354:
263:Closing, as no support over 2.5 years.
142:positive definiteness in the definition
49:
793:{\displaystyle f\in C^{2}(0,\infty )}
487:Special:Permalink/1052789591#Notation
7:
79:This article is within the scope of
636:, and continues to be used loosely.
630:Special:Permalink/889324520#History
38:It is of interest to the following
1045:Low-importance Statistics articles
986:
843:
818:
784:
14:
908:{\displaystyle f(x)=(\ln x)^{2}}
99:Knowledge:WikiProject Statistics
72:
51:
20:
1050:WikiProject Statistics articles
1040:Start-Class Statistics articles
119:This article has been rated as
102:Template:WikiProject Statistics
896:
883:
877:
871:
837:
831:
815:
787:
775:
517:
505:
448:
436:
413:
401:
371:
359:
1:
697:04:36, 21 November 2021 (UTC)
612:This error was introduced in
572:04:50, 21 November 2021 (UTC)
203:04:41, 21 November 2021 (UTC)
93:and see a list of open tasks.
671:21:23, 6 November 2021 (UTC)
632:), often just as a term for
543:02:49, 31 October 2021 (UTC)
474:01:34, 31 October 2021 (UTC)
313:List of mathematical symbols
294:21:29, 6 November 2021 (UTC)
258:10:33, 4 December 2014 (UTC)
190:. This is hopefully clearer!
176:18:13, 29 January 2010 (UTC)
639:However, in the context of
619:Special:Permalink/339269004
342:. There's a discussion at:
340:KullbackāLeibler divergence
332:KullbackāLeibler divergence
273:20:05, 15 August 2017 (UTC)
1066:
645:squared Euclidean distance
559:; thanks for raising this!
1025:13:57, 25 July 2023 (UTC)
426:, and one also sees e.g.
230:22:52, 9 March 2012 (UTC)
118:
67:
46:
1009:10:29, 5 June 2023 (UTC)
992:{\displaystyle +\infty }
966:20:26, 30 May 2022 (UTC)
927:23:43, 25 May 2022 (UTC)
711:Total variation distance
346:, but it's inconclusive.
325:10:18, 28 May 2015 (UTC)
489:; hopefully this helps.
246:Divergence (statistics)
993:
919:pony in a strange land
909:
850:
794:
524:
523:{\displaystyle D(P,Q)}
455:
420:
378:
377:{\displaystyle D(x,y)}
82:WikiProject Statistics
28:This article is rated
994:
910:
851:
795:
525:
456:
421:
379:
980:
915:is not even convex!
865:
804:
756:
656:statistical distance
641:information geometry
634:statistical distance
499:
430:
388:
353:
309:statistical distance
279:information geometry
242:Statistical distance
105:Statistics articles
989:
905:
846:
822:
790:
520:
495:I agree that just
451:
416:
374:
336:information theory
334:, particularly in
34:content assessment
807:
752:, which requires
687:āNils von Barth (
661:āNils von Barth (
649:Hessian manifolds
562:āNils von Barth (
533:āNils von Barth (
464:āNils von Barth (
454:{\displaystyle D}
398:
284:āNils von Barth (
220:comment added by
193:āNils von Barth (
156:positive-definite
139:
138:
135:
134:
131:
130:
1057:
1017:AntonyRichardLee
998:
996:
995:
990:
964:
962:
954:
953:
938:
914:
912:
911:
906:
904:
903:
855:
853:
852:
847:
830:
821:
799:
797:
796:
791:
774:
773:
683:
616:
558:
529:
527:
526:
521:
460:
458:
457:
452:
425:
423:
422:
417:
400:
399:
396:
383:
381:
380:
375:
232:
189:
125:importance scale
107:
106:
103:
100:
97:
76:
69:
68:
63:
55:
48:
31:
25:
24:
16:
1065:
1064:
1060:
1059:
1058:
1056:
1055:
1054:
1030:
1029:
978:
977:
974:
958:
956:
949:
945:
932:
895:
863:
862:
823:
802:
801:
765:
754:
753:
707:
680:
613:
580:
555:
497:
496:
428:
427:
391:
386:
385:
351:
350:
304:
244:be merged into
240:I propose that
238:
236:Merger proposal
215:
211:
186:
144:
104:
101:
98:
95:
94:
61:
32:on Knowledge's
29:
12:
11:
5:
1063:
1061:
1053:
1052:
1047:
1042:
1032:
1031:
1028:
1027:
988:
985:
973:
970:
969:
968:
902:
898:
894:
891:
888:
885:
882:
879:
876:
873:
870:
858:
845:
842:
839:
836:
833:
829:
826:
820:
817:
814:
810:
789:
786:
783:
780:
777:
772:
768:
764:
761:
731:
720:
718:
706:
703:
702:
701:
700:
699:
685:
674:
673:
659:
652:
637:
626:
625:, as you note.
610:
598:
579:
576:
575:
574:
560:
548:
547:
546:
545:
531:
519:
516:
513:
510:
507:
504:
493:
490:
477:
476:
462:
450:
447:
444:
441:
438:
435:
415:
412:
409:
406:
403:
394:
373:
370:
367:
364:
361:
358:
347:
303:
300:
299:
298:
297:
296:
282:
250:Olli Niemitalo
237:
234:
210:
207:
206:
205:
191:
164:
163:
158:everywhere on
143:
140:
137:
136:
133:
132:
129:
128:
121:Low-importance
117:
111:
110:
108:
91:the discussion
77:
65:
64:
62:Lowāimportance
56:
44:
43:
37:
26:
13:
10:
9:
6:
4:
3:
2:
1062:
1051:
1048:
1046:
1043:
1041:
1038:
1037:
1035:
1026:
1022:
1018:
1013:
1012:
1011:
1010:
1006:
1002:
983:
971:
967:
963:
961:
955:
952:
948:
942:
936:
935:Cosmia Nebula
931:
930:
929:
928:
924:
920:
916:
900:
892:
889:
886:
880:
874:
868:
857:
840:
834:
827:
824:
812:
781:
778:
770:
766:
762:
759:
751:
748:
744:
740:
736:
730:
727:
723:
717:
714:
712:
704:
698:
694:
690:
686:
682:
678:
677:
676:
675:
672:
668:
664:
660:
657:
653:
650:
646:
642:
638:
635:
631:
627:
624:
620:
615:
611:
608:
607:
606:
603:
599:
596:
592:
590:
586:
577:
573:
569:
565:
561:
557:
553:
550:
549:
544:
540:
536:
532:
514:
511:
508:
502:
494:
491:
488:
484:
481:
480:
479:
478:
475:
471:
467:
463:
445:
442:
439:
433:
410:
407:
404:
392:
368:
365:
362:
356:
348:
345:
341:
337:
333:
329:
328:
327:
326:
322:
318:
314:
310:
301:
295:
291:
287:
283:
280:
276:
275:
274:
270:
266:
262:
261:
260:
259:
255:
251:
247:
243:
235:
233:
231:
227:
223:
219:
208:
204:
200:
196:
192:
188:
184:
180:
179:
178:
177:
173:
169:
161:
157:
153:
149:
148:
147:
141:
126:
122:
116:
113:
112:
109:
92:
88:
84:
83:
78:
75:
71:
70:
66:
60:
57:
54:
50:
45:
41:
35:
27:
23:
18:
17:
975:
959:
950:
946:
917:
859:
746:
742:
738:
734:
732:
728:
724:
719:
715:
708:
604:
600:
597:
593:
588:
584:
581:
305:
239:
216:ā Preceding
212:
165:
159:
151:
145:
120:
80:
40:WikiProjects
939:Please see
583:much older
302:|| Notation
222:132.3.57.68
150:The matrix
30:Start-class
1034:Categories
1001:Thatsme314
585:translated
96:Statistics
87:statistics
59:Statistics
218:unsigned
209:Notation
809:limāsup
552:GKSmyth
483:GKSmyth
317:GKSmyth
265:Klbrain
168:Memming
123:on the
947:Formal
689:nbarth
663:nbarth
564:nbarth
535:nbarth
466:nbarth
286:nbarth
195:nbarth
36:scale.
941:WP:RM
721:: -->
1021:talk
1005:talk
960:talk
951:Dude
943:. āā
923:talk
841:<
800:and
693:talk
667:talk
568:talk
539:talk
470:talk
321:talk
290:talk
269:talk
254:talk
226:talk
199:talk
172:talk
856:.
691:) (
665:) (
566:) (
537:) (
468:) (
288:) (
197:) (
115:Low
1036::
1023:)
1007:)
987:ā
925:)
890:ā”
887:ln
844:ā
819:ā
816:ā
785:ā
763:ā
747:dp
745:+
741:,
713:.
695:)
669:)
570:)
541:)
472:)
408:ā„
397:KL
323:)
292:)
271:)
256:)
228:)
201:)
174:)
1019:(
1003:(
984:+
937::
933:@
921:(
901:2
897:)
893:x
884:(
881:=
878:)
875:x
872:(
869:f
838:)
835:x
832:(
828:ā³
825:f
813:x
788:)
782:,
779:0
776:(
771:2
767:C
760:f
743:p
739:p
737:(
735:D
518:)
515:Q
512:,
509:P
506:(
503:D
449:]
446:y
443::
440:x
437:[
434:D
414:)
411:Q
405:P
402:(
393:D
372:)
369:y
366:,
363:x
360:(
357:D
319:(
267:(
252:(
224:(
170:(
162:.
160:S
152:g
127:.
42::
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.