Knowledge

Triplet loss

Source 📝

75: 150:
learning problem instead of a classification problem. Here the network is trained (using a contrastive loss) to output a distance which is small if the image belongs to a known person and large if the image belongs to an unknown person. However, if we want to output the closest images to a given image, we want to learn a ranking and not just a similarity. A triplet loss is used in this case.
349: 25: 167: 640:
The indices are for individual input vectors given as a triplet. The triplet is formed by drawing an anchor input, a positive input that describes the same entity as the anchor entity, and a negative input that does not describe the same entity as the anchor entity. These inputs are then run through
149:
Consider the task of training a neural network to recognize faces (e.g. for admission to a high security zone). A classifier trained to classify an instance would have to be retrained every time a new person is added to the face database. This can be avoided by posing the problem as a similarity
657:(i.e., typical classification losses) followed by separate metric learning steps. Recent work showed that for models trained from scratch, as well as pretrained models, a special version of triplet loss doing end-to-end deep metric learning outperforms most other published methods as of 2017. 93:
algorithms where a reference input (called anchor) is compared to a matching input (called positive) and a non-matching input (called negative). The distance from the anchor to the positive is minimized, and the distance from the anchor to the negative input is maximized. An early formulation
344:{\displaystyle {\mathcal {L}}\left(A,P,N\right)=\operatorname {max} \left({\|\operatorname {f} \left(A\right)-\operatorname {f} \left(P\right)\|}_{2}-{\|\operatorname {f} \left(A\right)-\operatorname {f} \left(N\right)\|}_{2}+\alpha ,0\right)} 635: 101:
which preserves embedding orders via probability distributions, triplet loss works directly on embedded distances. Therefore, in its common implementation, it needs soft margin treatment with a slack variable
97:
By enforcing the order of distances, triplet loss models embed in the way that a pair of samples with same labels are smaller in distance than those with different labels. Unlike
503: 519: 43: 483: 120: 94:
equivalent to triplet loss was introduced (without the idea of using anchors) for metric learning from relative comparisons by M. Schultze and T. Joachims in 2003.
463: 439: 419: 395: 371: 696: 98: 779: 988: 983: 679:
In Natural Language Processing, triplet loss is one of the loss functions considered for BERT fine-tuning in the SBERT architecture.
654: 61: 905: 660:
Additionally, triplet loss has been extended to simultaneously maintain a series of distance orders by optimizing a continuous
883:
Hermans, Alexander; Beyer, Lucas; Leibe, Bastian (2017-03-22). "In Defense of the Triplet Loss for Person Re-Identification".
509:
This can then be used in a cost function, that is the sum of all losses, which can then be used for minimization of the posed
804: 752:
Schroff, F.; Kalenichenko, D.; Philbin, J. (June 2015). "FaceNet: A unified embedding for face recognition and clustering".
510: 956:
Reimers, Nils; Gurevych, Iryna (2019-08-27). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks".
78:
Effect of triplet loss minimization in training: the positive is moved closer to the anchor than the negative.
74: 691: 653:
tasks such as re-identification, a prevailing belief has been that the triplet loss is inferior to using
856: 488: 830: 844: 706: 127: 630:{\displaystyle {\mathcal {J}}=\sum _{i=1}^{{}M}{\mathcal {L}}\left(A^{(i)},P^{(i)},N^{(i)}\right)} 957: 938: 884: 834: 785: 757: 158: 727: 672:, which has been demonstrated to offer performance enhancements of visual-semantic embedding in 930: 775: 468: 105: 920: 767: 701: 673: 131: 90: 869: 682:
Other extensions involve specifying multiple negatives (multiple negatives ranking loss).
650: 143: 904:
Zhou, Mo; Niu, Zhenxing; Wang, Le; Gao, Zhanning; Zhang, Qilin; Hua, Gang (2020-04-03).
848: 448: 424: 404: 380: 356: 139: 135: 977: 942: 789: 154: 86: 771: 123: 934: 925: 754:
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
962: 889: 762: 839: 73: 728:"Large Scale Online Learning of Image Similarity Through Ranking" 18: 913:
Proceedings of the AAAI Conference on Artificial Intelligence
641:
the network, and the outputs are used in the loss function.
558: 525: 173: 726:
Chechik, G.; Sharma, V.; Shalit, U.; Bengio, S. (2010).
39: 805:"Learning a distance metric from relative comparisons" 522: 491: 485:
is a margin between positive and negative pairs, and
471: 451: 427: 407: 383: 359: 170: 108: 906:"Ladder Loss for Coherent Visual-Semantic Embedding" 34:
may be too technical for most readers to understand
629: 497: 477: 457: 433: 413: 389: 365: 343: 114: 812:Advances in Neural Information Processing Systems 130:for the purpose of learning embeddings, such as 668:) of distance inequalities. This leads to the 8: 831:"Deep metric learning using Triplet network" 314: 274: 259: 219: 697:t-distributed stochastic neighbor embedding 961: 924: 888: 838: 761: 610: 591: 572: 557: 556: 549: 548: 537: 524: 523: 521: 490: 470: 450: 426: 406: 382: 358: 318: 273: 263: 218: 172: 171: 169: 126:-style formulation. It is often used for 107: 62:Learn how and when to remove this message 46:, without removing the technical details. 16:Function for machine learning algorithms 829:Ailon, Nir; Hoffer, Elad (2014-12-20). 718: 865: 854: 44:make it understandable to non-experts 7: 735:Journal of Machine Learning Research 803:Schultz, M.; Joachims, T. (2004). 498:{\displaystyle \operatorname {f} } 492: 297: 277: 242: 222: 14: 157:can be described by means of the 23: 617: 611: 598: 592: 579: 573: 1: 989:Machine learning algorithms 1005: 984:Artificial neural networks 445:of a different class from 772:10.1109/CVPR.2015.7298682 645:Comparison and Extensions 926:10.1609/aaai.v34i07.7006 478:{\displaystyle \alpha } 115:{\displaystyle \alpha } 864:Cite journal requires 692:Siamese neural network 631: 555: 499: 479: 459: 435: 415: 391: 367: 345: 116: 79: 632: 533: 500: 480: 460: 436: 416: 401:of the same class as 392: 368: 346: 117: 77: 756:. pp. 815–823. 664:with a chain (i.e., 520: 489: 469: 449: 425: 405: 381: 357: 168: 106: 849:2014arXiv1412.6622H 707:Similarity learning 128:learning similarity 919:(7): 13050–13057. 627: 495: 475: 455: 431: 411: 387: 363: 341: 159:Euclidean distance 112: 80: 781:978-1-4673-6964-0 458:{\displaystyle A} 434:{\displaystyle N} 414:{\displaystyle A} 390:{\displaystyle P} 366:{\displaystyle A} 72: 71: 64: 996: 968: 967: 965: 953: 947: 946: 928: 910: 901: 895: 894: 892: 880: 874: 873: 867: 862: 860: 852: 842: 826: 820: 819: 809: 800: 794: 793: 765: 749: 743: 742: 732: 723: 702:Learning to rank 674:learning to rank 662:relevance degree 655:surrogate losses 636: 634: 633: 628: 626: 622: 621: 620: 602: 601: 583: 582: 562: 561: 554: 550: 547: 529: 528: 505:is an embedding. 504: 502: 501: 496: 484: 482: 481: 476: 464: 462: 461: 456: 440: 438: 437: 432: 420: 418: 417: 412: 396: 394: 393: 388: 372: 370: 369: 364: 350: 348: 347: 342: 340: 336: 323: 322: 317: 313: 293: 268: 267: 262: 258: 238: 202: 198: 177: 176: 132:learning to rank 121: 119: 118: 113: 91:machine learning 67: 60: 56: 53: 47: 27: 26: 19: 1004: 1003: 999: 998: 997: 995: 994: 993: 974: 973: 972: 971: 955: 954: 950: 908: 903: 902: 898: 882: 881: 877: 863: 853: 828: 827: 823: 807: 802: 801: 797: 782: 751: 750: 746: 730: 725: 724: 720: 715: 688: 651:computer vision 647: 606: 587: 568: 567: 563: 518: 517: 487: 486: 467: 466: 447: 446: 423: 422: 403: 402: 379: 378: 355: 354: 303: 283: 272: 248: 228: 217: 216: 212: 182: 178: 166: 165: 144:metric learning 140:thought vectors 136:word embeddings 104: 103: 68: 57: 51: 48: 40:help improve it 37: 28: 24: 17: 12: 11: 5: 1002: 1000: 992: 991: 986: 976: 975: 970: 969: 948: 896: 875: 866:|journal= 821: 795: 780: 744: 717: 716: 714: 711: 710: 709: 704: 699: 694: 687: 684: 646: 643: 638: 637: 625: 619: 616: 613: 609: 605: 600: 597: 594: 590: 586: 581: 578: 575: 571: 566: 560: 553: 546: 543: 540: 536: 532: 527: 507: 506: 494: 474: 454: 443:negative input 430: 410: 399:positive input 386: 362: 351: 339: 335: 332: 329: 326: 321: 316: 312: 309: 306: 302: 299: 296: 292: 289: 286: 282: 279: 276: 271: 266: 261: 257: 254: 251: 247: 244: 241: 237: 234: 231: 227: 224: 221: 215: 211: 208: 205: 201: 197: 194: 191: 188: 185: 181: 175: 111: 70: 69: 31: 29: 22: 15: 13: 10: 9: 6: 4: 3: 2: 1001: 990: 987: 985: 982: 981: 979: 964: 959: 952: 949: 944: 940: 936: 932: 927: 922: 918: 914: 907: 900: 897: 891: 886: 879: 876: 871: 858: 850: 846: 841: 836: 832: 825: 822: 817: 813: 806: 799: 796: 791: 787: 783: 777: 773: 769: 764: 759: 755: 748: 745: 740: 736: 729: 722: 719: 712: 708: 705: 703: 700: 698: 695: 693: 690: 689: 685: 683: 680: 677: 675: 671: 667: 663: 658: 656: 652: 644: 642: 623: 614: 607: 603: 595: 588: 584: 576: 569: 564: 551: 544: 541: 538: 534: 530: 516: 515: 514: 512: 472: 452: 444: 428: 408: 400: 384: 376: 360: 352: 337: 333: 330: 327: 324: 319: 310: 307: 304: 300: 294: 290: 287: 284: 280: 269: 264: 255: 252: 249: 245: 239: 235: 232: 229: 225: 213: 209: 206: 203: 199: 195: 192: 189: 186: 183: 179: 164: 163: 162: 160: 156: 155:loss function 151: 147: 145: 141: 137: 133: 129: 125: 109: 100: 95: 92: 88: 87:loss function 84: 76: 66: 63: 55: 45: 41: 35: 32:This article 30: 21: 20: 951: 916: 912: 899: 878: 857:cite journal 824: 815: 811: 798: 753: 747: 741:: 1109–1135. 738: 734: 721: 681: 678: 669: 665: 661: 659: 648: 639: 511:optimization 508: 442: 398: 375:anchor input 374: 152: 148: 96: 83:Triplet loss 82: 81: 58: 49: 33: 670:Ladder Loss 978:Categories 963:1908.10084 890:1703.07737 763:1503.03832 713:References 124:hinge loss 52:April 2019 943:208139521 935:2374-3468 840:1412.6622 790:206592766 535:∑ 473:α 328:α 315:‖ 301:⁡ 295:− 281:⁡ 275:‖ 270:− 260:‖ 246:⁡ 240:− 226:⁡ 220:‖ 210:⁡ 161:function 110:α 818:: 41–48. 686:See also 513:problem 845:Bibcode 676:tasks. 122:in its 38:Please 941:  933:  788:  778:  666:ladder 373:is an 353:where 142:, and 958:arXiv 939:S2CID 909:(PDF) 885:arXiv 835:arXiv 808:(PDF) 786:S2CID 758:arXiv 731:(PDF) 397:is a 99:t-SNE 85:is a 931:ISSN 870:help 776:ISBN 441:is a 153:The 89:for 921:doi 768:doi 649:In 207:max 42:to 980:: 937:. 929:. 917:34 915:. 911:. 861:: 859:}} 855:{{ 843:. 833:. 816:16 814:. 810:. 784:. 774:. 766:. 739:11 737:. 733:. 465:, 421:, 377:, 146:. 138:, 134:, 966:. 960:: 945:. 923:: 893:. 887:: 872:) 868:( 851:. 847:: 837:: 792:. 770:: 760:: 624:) 618:) 615:i 612:( 608:N 604:, 599:) 596:i 593:( 589:P 585:, 580:) 577:i 574:( 570:A 565:( 559:L 552:M 545:1 542:= 539:i 531:= 526:J 493:f 453:A 429:N 409:A 385:P 361:A 338:) 334:0 331:, 325:+ 320:2 311:) 308:N 305:( 298:f 291:) 288:A 285:( 278:f 265:2 256:) 253:P 250:( 243:f 236:) 233:A 230:( 223:f 214:( 204:= 200:) 196:N 193:, 190:P 187:, 184:A 180:( 174:L 65:) 59:( 54:) 50:( 36:.

Index

help improve it
make it understandable to non-experts
Learn how and when to remove this message

loss function
machine learning
t-SNE
hinge loss
learning similarity
learning to rank
word embeddings
thought vectors
metric learning
loss function
Euclidean distance
optimization
computer vision
surrogate losses
learning to rank
Siamese neural network
t-distributed stochastic neighbor embedding
Learning to rank
Similarity learning
"Large Scale Online Learning of Image Similarity Through Ranking"
arXiv
1503.03832
doi
10.1109/CVPR.2015.7298682
ISBN
978-1-4673-6964-0

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.