Knowledge

Triplet loss

Source 📝

86: 161:
learning problem instead of a classification problem. Here the network is trained (using a contrastive loss) to output a distance which is small if the image belongs to a known person and large if the image belongs to an unknown person. However, if we want to output the closest images to a given image, we want to learn a ranking and not just a similarity. A triplet loss is used in this case.
360: 36: 178: 651:
The indices are for individual input vectors given as a triplet. The triplet is formed by drawing an anchor input, a positive input that describes the same entity as the anchor entity, and a negative input that does not describe the same entity as the anchor entity. These inputs are then run through
160:
Consider the task of training a neural network to recognize faces (e.g. for admission to a high security zone). A classifier trained to classify an instance would have to be retrained every time a new person is added to the face database. This can be avoided by posing the problem as a similarity
668:(i.e., typical classification losses) followed by separate metric learning steps. Recent work showed that for models trained from scratch, as well as pretrained models, a special version of triplet loss doing end-to-end deep metric learning outperforms most other published methods as of 2017. 104:
algorithms where a reference input (called anchor) is compared to a matching input (called positive) and a non-matching input (called negative). The distance from the anchor to the positive is minimized, and the distance from the anchor to the negative input is maximized. An early formulation
355:{\displaystyle {\mathcal {L}}\left(A,P,N\right)=\operatorname {max} \left({\|\operatorname {f} \left(A\right)-\operatorname {f} \left(P\right)\|}_{2}-{\|\operatorname {f} \left(A\right)-\operatorname {f} \left(N\right)\|}_{2}+\alpha ,0\right)} 646: 112:
which preserves embedding orders via probability distributions, triplet loss works directly on embedded distances. Therefore, in its common implementation, it needs soft margin treatment with a slack variable
108:
By enforcing the order of distances, triplet loss models embed in the way that a pair of samples with same labels are smaller in distance than those with different labels. Unlike
514: 530: 54: 494: 131: 105:
equivalent to triplet loss was introduced (without the idea of using anchors) for metric learning from relative comparisons by M. Schultze and T. Joachims in 2003.
474: 450: 430: 406: 382: 707: 109: 790: 999: 994: 690:
In Natural Language Processing, triplet loss is one of the loss functions considered for BERT fine-tuning in the SBERT architecture.
665: 72: 916: 671:
Additionally, triplet loss has been extended to simultaneously maintain a series of distance orders by optimizing a continuous
894:
Hermans, Alexander; Beyer, Lucas; Leibe, Bastian (2017-03-22). "In Defense of the Triplet Loss for Person Re-Identification".
520:
This can then be used in a cost function, that is the sum of all losses, which can then be used for minimization of the posed
815: 763:
Schroff, F.; Kalenichenko, D.; Philbin, J. (June 2015). "FaceNet: A unified embedding for face recognition and clustering".
521: 967:
Reimers, Nils; Gurevych, Iryna (2019-08-27). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks".
89:
Effect of triplet loss minimization in training: the positive is moved closer to the anchor than the negative.
85: 702: 664:
tasks such as re-identification, a prevailing belief has been that the triplet loss is inferior to using
867: 499: 841: 855: 717: 138: 641:{\displaystyle {\mathcal {J}}=\sum _{i=1}^{{}M}{\mathcal {L}}\left(A^{(i)},P^{(i)},N^{(i)}\right)} 968: 949: 895: 845: 796: 768: 169: 738: 683:, which has been demonstrated to offer performance enhancements of visual-semantic embedding in 941: 786: 479: 116: 931: 778: 712: 684: 142: 101: 880: 693:
Other extensions involve specifying multiple negatives (multiple negatives ranking loss).
661: 154: 915:
Zhou, Mo; Niu, Zhenxing; Wang, Le; Gao, Zhanning; Zhang, Qilin; Hua, Gang (2020-04-03).
859: 459: 435: 415: 391: 367: 150: 146: 988: 953: 800: 165: 97: 17: 782: 134: 945: 936: 765:
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
973: 900: 773: 850: 84: 739:"Large Scale Online Learning of Image Similarity Through Ranking" 29: 924:
Proceedings of the AAAI Conference on Artificial Intelligence
652:
the network, and the outputs are used in the loss function.
569: 536: 184: 737:
Chechik, G.; Sharma, V.; Shalit, U.; Bengio, S. (2010).
50: 816:"Learning a distance metric from relative comparisons" 533: 502: 496:
is a margin between positive and negative pairs, and
482: 462: 438: 418: 394: 370: 181: 119: 917:"Ladder Loss for Coherent Visual-Semantic Embedding" 45:
may be too technical for most readers to understand
640: 508: 488: 468: 444: 424: 400: 376: 354: 125: 823:Advances in Neural Information Processing Systems 141:for the purpose of learning embeddings, such as 679:) of distance inequalities. This leads to the 8: 842:"Deep metric learning using Triplet network" 325: 285: 270: 230: 708:t-distributed stochastic neighbor embedding 972: 935: 899: 849: 772: 621: 602: 583: 568: 567: 560: 559: 548: 535: 534: 532: 501: 481: 461: 437: 417: 393: 369: 329: 284: 274: 229: 183: 182: 180: 137:-style formulation. It is often used for 118: 73:Learn how and when to remove this message 57:, without removing the technical details. 27:Function for machine learning algorithms 840:Ailon, Nir; Hoffer, Elad (2014-12-20). 729: 876: 865: 55:make it understandable to non-experts 7: 746:Journal of Machine Learning Research 814:Schultz, M.; Joachims, T. (2004). 509:{\displaystyle \operatorname {f} } 503: 308: 288: 253: 233: 25: 168:can be described by means of the 34: 628: 622: 609: 603: 590: 584: 1: 1000:Machine learning algorithms 1016: 995:Artificial neural networks 456:of a different class from 783:10.1109/CVPR.2015.7298682 656:Comparison and Extensions 937:10.1609/aaai.v34i07.7006 489:{\displaystyle \alpha } 126:{\displaystyle \alpha } 875:Cite journal requires 703:Siamese neural network 642: 566: 510: 490: 470: 446: 426: 402: 378: 356: 127: 90: 643: 544: 511: 491: 471: 447: 427: 412:of the same class as 403: 379: 357: 128: 88: 767:. pp. 815–823. 675:with a chain (i.e., 531: 500: 480: 460: 436: 416: 392: 368: 179: 117: 860:2014arXiv1412.6622H 718:Similarity learning 139:learning similarity 930:(7): 13050–13057. 638: 506: 486: 466: 442: 422: 398: 374: 352: 170:Euclidean distance 123: 91: 792:978-1-4673-6964-0 469:{\displaystyle A} 445:{\displaystyle N} 425:{\displaystyle A} 401:{\displaystyle P} 377:{\displaystyle A} 83: 82: 75: 16:(Redirected from 1007: 979: 978: 976: 964: 958: 957: 939: 921: 912: 906: 905: 903: 891: 885: 884: 878: 873: 871: 863: 853: 837: 831: 830: 820: 811: 805: 804: 776: 760: 754: 753: 743: 734: 713:Learning to rank 685:learning to rank 673:relevance degree 666:surrogate losses 647: 645: 644: 639: 637: 633: 632: 631: 613: 612: 594: 593: 573: 572: 565: 561: 558: 540: 539: 516:is an embedding. 515: 513: 512: 507: 495: 493: 492: 487: 475: 473: 472: 467: 451: 449: 448: 443: 431: 429: 428: 423: 407: 405: 404: 399: 383: 381: 380: 375: 361: 359: 358: 353: 351: 347: 334: 333: 328: 324: 304: 279: 278: 273: 269: 249: 213: 209: 188: 187: 143:learning to rank 132: 130: 129: 124: 102:machine learning 78: 71: 67: 64: 58: 38: 37: 30: 21: 18:Contrastive loss 1015: 1014: 1010: 1009: 1008: 1006: 1005: 1004: 985: 984: 983: 982: 966: 965: 961: 919: 914: 913: 909: 893: 892: 888: 874: 864: 839: 838: 834: 818: 813: 812: 808: 793: 762: 761: 757: 741: 736: 735: 731: 726: 699: 662:computer vision 658: 617: 598: 579: 578: 574: 529: 528: 498: 497: 478: 477: 458: 457: 434: 433: 414: 413: 390: 389: 366: 365: 314: 294: 283: 259: 239: 228: 227: 223: 193: 189: 177: 176: 155:metric learning 151:thought vectors 147:word embeddings 115: 114: 79: 68: 62: 59: 51:help improve it 48: 39: 35: 28: 23: 22: 15: 12: 11: 5: 1013: 1011: 1003: 1002: 997: 987: 986: 981: 980: 959: 907: 886: 877:|journal= 832: 806: 791: 755: 728: 727: 725: 722: 721: 720: 715: 710: 705: 698: 695: 657: 654: 649: 648: 636: 630: 627: 624: 620: 616: 611: 608: 605: 601: 597: 592: 589: 586: 582: 577: 571: 564: 557: 554: 551: 547: 543: 538: 518: 517: 505: 485: 465: 454:negative input 441: 421: 410:positive input 397: 373: 362: 350: 346: 343: 340: 337: 332: 327: 323: 320: 317: 313: 310: 307: 303: 300: 297: 293: 290: 287: 282: 277: 272: 268: 265: 262: 258: 255: 252: 248: 245: 242: 238: 235: 232: 226: 222: 219: 216: 212: 208: 205: 202: 199: 196: 192: 186: 122: 81: 80: 42: 40: 33: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 1012: 1001: 998: 996: 993: 992: 990: 975: 970: 963: 960: 955: 951: 947: 943: 938: 933: 929: 925: 918: 911: 908: 902: 897: 890: 887: 882: 869: 861: 857: 852: 847: 843: 836: 833: 828: 824: 817: 810: 807: 802: 798: 794: 788: 784: 780: 775: 770: 766: 759: 756: 751: 747: 740: 733: 730: 723: 719: 716: 714: 711: 709: 706: 704: 701: 700: 696: 694: 691: 688: 686: 682: 678: 674: 669: 667: 663: 655: 653: 634: 625: 618: 614: 606: 599: 595: 587: 580: 575: 562: 555: 552: 549: 545: 541: 527: 526: 525: 523: 483: 463: 455: 439: 419: 411: 395: 387: 371: 363: 348: 344: 341: 338: 335: 330: 321: 318: 315: 311: 305: 301: 298: 295: 291: 280: 275: 266: 263: 260: 256: 250: 246: 243: 240: 236: 224: 220: 217: 214: 210: 206: 203: 200: 197: 194: 190: 175: 174: 173: 171: 167: 166:loss function 162: 158: 156: 152: 148: 144: 140: 136: 120: 111: 106: 103: 99: 98:loss function 95: 87: 77: 74: 66: 56: 52: 46: 43:This article 41: 32: 31: 19: 962: 927: 923: 910: 889: 868:cite journal 835: 826: 822: 809: 764: 758: 752:: 1109–1135. 749: 745: 732: 692: 689: 680: 676: 672: 670: 659: 650: 522:optimization 519: 453: 409: 386:anchor input 385: 163: 159: 107: 94:Triplet loss 93: 92: 69: 60: 44: 681:Ladder Loss 989:Categories 974:1908.10084 901:1703.07737 774:1503.03832 724:References 135:hinge loss 63:April 2019 954:208139521 946:2374-3468 851:1412.6622 801:206592766 546:∑ 484:α 339:α 326:‖ 312:⁡ 306:− 292:⁡ 286:‖ 281:− 271:‖ 257:⁡ 251:− 237:⁡ 231:‖ 221:⁡ 172:function 121:α 829:: 41–48. 697:See also 524:problem 856:Bibcode 687:tasks. 133:in its 49:Please 952:  944:  799:  789:  677:ladder 384:is an 364:where 153:, and 969:arXiv 950:S2CID 920:(PDF) 896:arXiv 846:arXiv 819:(PDF) 797:S2CID 769:arXiv 742:(PDF) 408:is a 110:t-SNE 96:is a 942:ISSN 881:help 787:ISBN 452:is a 164:The 100:for 932:doi 779:doi 660:In 218:max 53:to 991:: 948:. 940:. 928:34 926:. 922:. 872:: 870:}} 866:{{ 854:. 844:. 827:16 825:. 821:. 795:. 785:. 777:. 750:11 748:. 744:. 476:, 432:, 388:, 157:. 149:, 145:, 977:. 971:: 956:. 934:: 904:. 898:: 883:) 879:( 862:. 858:: 848:: 803:. 781:: 771:: 635:) 629:) 626:i 623:( 619:N 615:, 610:) 607:i 604:( 600:P 596:, 591:) 588:i 585:( 581:A 576:( 570:L 563:M 556:1 553:= 550:i 542:= 537:J 504:f 464:A 440:N 420:A 396:P 372:A 349:) 345:0 342:, 336:+ 331:2 322:) 319:N 316:( 309:f 302:) 299:A 296:( 289:f 276:2 267:) 264:P 261:( 254:f 247:) 244:A 241:( 234:f 225:( 215:= 211:) 207:N 204:, 201:P 198:, 195:A 191:( 185:L 76:) 70:( 65:) 61:( 47:. 20:)

Index

Contrastive loss
help improve it
make it understandable to non-experts
Learn how and when to remove this message

loss function
machine learning
t-SNE
hinge loss
learning similarity
learning to rank
word embeddings
thought vectors
metric learning
loss function
Euclidean distance
optimization
computer vision
surrogate losses
learning to rank
Siamese neural network
t-distributed stochastic neighbor embedding
Learning to rank
Similarity learning
"Large Scale Online Learning of Image Similarity Through Ranking"
arXiv
1503.03832
doi
10.1109/CVPR.2015.7298682
ISBN

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.