Knowledge

Mixed-precision arithmetic

Source 📝

88: 61:, a coarse integral guess can be made and refined over many iterations until the error in precision makes it such that the smallest addition or subtraction to the guess is still too coarse to be an acceptable answer. When this happens, the precision can be increased to something more precise, which allows for smaller increments to be used for the approximation. 652:
Abdelfattah, Ahmad; Anzt, Hartwig; Boman, Erik G.; Carson, Erin; Cojean, Terry; Dongarra, Jack; Gates, Mark; Grützmacher, Thomas; Higham, Nicholas J.; Li, Sherry; Lindquist, Neil; Liu, Yang; Loe, Jennifer; Luszczek, Piotr; Nayak, Pratik; Pranesh, Sri; Rajamanickam, Siva; Ribizel, Tobias; Smith,
587: 424:
is the scaling factor. Within the optimizer update, the scaled gradient is cast to a higher precision before it is scaled down (no longer underflowing, as it is in a higher precision) to update the weights.
508: 704:
Micikevicius, Paulius; Narang, Sharan; Alben, Jonah; Diamos, Gregory; Elsen, Erich; Garcia, David; Ginsburg, Boris; Houston, Michael; Kuchaiev, Oleksii (2018-02-15). "Mixed Precision Training".
300:
algorithms can use coarse and efficient half-precision floats for certain tasks, but can be more accurate if they use more precise but slower single-precision floats. Some platforms, including
31:
A common usage of mixed-precision arithmetic is for operating on inaccurate numbers with a small width and expanding them to a larger, more accurate representation. For example, two
79:
A floating-point number is typically packed into a single bit-string, as the sign bit, the exponent field, and the significand or mantissa, from left to right. As an example, a
483: 456: 402: 377: 357:
update. This is done to prevent the gradients from underflowing to zero when using low-precision data types like FP16. Mathematically, if the unscaled gradient is
503: 422: 343:
can often be performed in FP16 without loss of accuracy, even if the master copy weights are stored in FP32. Low-precision weights are used during forward pass.
312:
CPUs and GPUs, provide mixed-precision arithmetic for this purpose, using coarse floats when possible, but expanding them to higher precision when necessary.
623: 597:
to automatically adjust the scale factor for loss scaling. That is, it periodically increase the scale factor. Whenever the gradients contain a
40: 339:
means automatically converting a floating-point number between different precisions, such as from FP32 to FP16, during training. For example,
32: 582:{\displaystyle {\frac {\partial (k{\mathcal {L}})}{\partial \mathbf {w} }}=k{\frac {\partial {\mathcal {L}}}{\partial \mathbf {w} }}} 774: 71:
utilize mixed-precision arithmetic to be more efficient with regards to memory and processing time, as well as power consumption.
434:. This is done to prevent the gradients from underflowing to zero when using low-precision data types. If the unscaled loss is 44: 36: 750: 809: 354: 725: 679: 20: 804: 325:
implements automatic mixed-precision (AMP), which performs autocasting, gradient scaling, and loss scaling.
309: 68: 340: 461: 653:
Barry; Swirydowicz, Kasia; Thomas, Stephen; Tomov, Stanimire; Tsai, Yaohung M.; Yamazaki, Ichitaro;
437: 594: 382: 145: 360: 705: 658: 230: 505:
is the scaling factor. Since gradient scaling and loss scaling are mathematically equivalent by
654: 297: 293: 202: 174: 54: 50: 430:
means multiplying the loss function by a constant factor during training, typically before
431: 601:(indicating overflow), the weight update is skipped, and the scale factor is decreased. 39:(16-bit) floating-point numbers may be multiplied together to result in a more accurate 488: 407: 258: 141: 798: 64: 57:) are good candidates for mixed-precision arithmetic. In an iterative algorithm like 58: 83:
standard 32-bit float ("FP32", "float32", or "binary32") is packed as follows:
657:(2020). "A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic". 87: 350: 80: 775:"What Every User Should Know About Mixed Precision Training in PyTorch" 322: 301: 43:(32-bit) float. In this way, mixed-precision arithmetic approximates 710: 663: 305: 624:"Difference Between Single-, Double-, Multi-, Mixed-Precision" 598: 23:
that uses numbers with varying widths in a single operation.
561: 526: 470: 443: 353:
by a constant factor during training, typically before the
680:"The US again has the world's most powerful supercomputer" 589:, loss scaling is an implementation of gradient scaling. 751:"Mixed Precision — PyTorch Training Performance Guide" 511: 491: 464: 440: 410: 385: 363: 292:Mixed-precision arithmetic is used in the field of 47:, albeit with a low number of possible precisions. 726:"Mixed-Precision Training of Deep Neural Networks" 581: 497: 477: 450: 416: 396: 371: 8: 618: 616: 614: 647: 645: 709: 662: 571: 560: 559: 553: 539: 525: 524: 512: 510: 490: 469: 468: 463: 442: 441: 439: 409: 389: 384: 364: 362: 95: 610: 333:at a high precision, usually in FP32. 7: 568: 556: 536: 515: 14: 572: 540: 390: 365: 93:The IEEE 754 binary floats are: 86: 478:{\displaystyle k{\mathcal {L}}} 531: 518: 451:{\displaystyle {\mathcal {L}}} 45:arbitrary-precision arithmetic 1: 397:{\displaystyle s\mathbf {g} } 372:{\displaystyle \mathbf {g} } 329:The weights are stored in a 826: 678:Holt, Kris (8 June 2018). 17:Mixed-precision arithmetic 379:, the scaled gradient is 318:Automatic mixed precision 118: 112: 106: 104: 101: 98: 21:floating-point arithmetic 755:residentmario.github.io 583: 499: 479: 452: 418: 398: 373: 341:matrix multiplications 231:x86 extended precision 730:NVIDIA Technical Blog 584: 500: 480: 458:, the scaled loss is 453: 419: 399: 374: 75:Floating point format 509: 489: 462: 438: 408: 383: 361: 51:Iterative algorithms 810:Computer arithmetic 595:exponential backoff 630:. 15 November 2019 579: 495: 475: 448: 414: 394: 369: 349:means multiplying 593:PyTorch AMP uses 577: 545: 498:{\displaystyle k} 417:{\displaystyle s} 285: 284: 817: 789: 788: 786: 785: 771: 765: 764: 762: 761: 747: 741: 740: 738: 737: 722: 716: 715: 713: 701: 695: 694: 692: 690: 675: 669: 668: 666: 655:Urike Meier Yang 649: 640: 639: 637: 635: 620: 588: 586: 585: 580: 578: 576: 575: 566: 565: 564: 554: 546: 544: 543: 534: 530: 529: 513: 504: 502: 501: 496: 484: 482: 481: 476: 474: 473: 457: 455: 454: 449: 447: 446: 423: 421: 420: 415: 403: 401: 400: 395: 393: 378: 376: 375: 370: 368: 355:weight optimizer 347:Gradient scaling 298:gradient descent 294:machine learning 288:Machine learning 96: 90: 55:gradient descent 41:single-precision 825: 824: 820: 819: 818: 816: 815: 814: 795: 794: 793: 792: 783: 781: 773: 772: 768: 759: 757: 749: 748: 744: 735: 733: 724: 723: 719: 703: 702: 698: 688: 686: 677: 676: 672: 651: 650: 643: 633: 631: 622: 621: 612: 607: 567: 555: 535: 514: 507: 506: 487: 486: 460: 459: 436: 435: 432:backpropagation 406: 405: 381: 380: 359: 358: 320: 290: 121:decimal digits 77: 29: 12: 11: 5: 823: 821: 813: 812: 807: 805:Floating point 797: 796: 791: 790: 766: 742: 717: 696: 670: 641: 609: 608: 606: 603: 591: 590: 574: 570: 563: 558: 552: 549: 542: 538: 533: 528: 523: 520: 517: 494: 472: 467: 445: 425: 413: 392: 388: 367: 344: 334: 319: 316: 289: 286: 283: 282: 279: 276: 273: 270: 267: 264: 261: 255: 254: 251: 248: 245: 242: 239: 236: 233: 227: 226: 223: 220: 217: 214: 211: 208: 205: 199: 198: 195: 192: 189: 186: 183: 180: 177: 171: 170: 167: 164: 161: 158: 155: 152: 149: 138: 137: 134: 131: 128: 124: 123: 117: 111: 105: 103: 100: 76: 73: 65:Supercomputers 33:half-precision 28: 25: 13: 10: 9: 6: 4: 3: 2: 822: 811: 808: 806: 803: 802: 800: 780: 776: 770: 767: 756: 752: 746: 743: 731: 727: 721: 718: 712: 707: 700: 697: 685: 681: 674: 671: 665: 660: 656: 648: 646: 642: 629: 625: 619: 617: 615: 611: 604: 602: 600: 596: 550: 547: 521: 492: 465: 433: 429: 426: 411: 386: 356: 352: 348: 345: 342: 338: 335: 332: 328: 327: 326: 324: 317: 315: 313: 311: 307: 303: 299: 295: 287: 280: 277: 274: 271: 268: 265: 262: 260: 257: 256: 252: 249: 246: 243: 240: 237: 234: 232: 229: 228: 224: 221: 218: 215: 212: 209: 206: 204: 201: 200: 196: 193: 190: 187: 184: 181: 178: 176: 173: 172: 168: 165: 162: 159: 156: 153: 150: 147: 146:IEEE 754-2008 143: 140: 139: 135: 132: 129: 126: 125: 122: 116: 110: 97: 94: 91: 89: 84: 82: 74: 72: 70: 66: 62: 60: 56: 52: 48: 46: 42: 38: 34: 26: 24: 22: 19:is a form of 18: 782:. Retrieved 778: 769: 758:. Retrieved 754: 745: 734:. Retrieved 732:. 2017-10-11 729: 720: 699: 687:. Retrieved 683: 673: 632:. Retrieved 627: 592: 428:Loss scaling 427: 346: 336: 330: 321: 314: 291: 133:Significand 120: 114: 108: 92: 85: 78: 63: 49: 30: 16: 15: 634:30 December 628:NVIDIA Blog 337:Autocasting 331:master copy 59:square root 799:Categories 784:2024-09-10 760:2024-09-10 736:2024-09-10 711:1710.03740 664:2007.06674 605:References 119:Number of 115:precision 569:∂ 557:∂ 537:∂ 516:∂ 351:gradients 130:Exponent 107:Exponent 684:Engadget 296:, since 81:IEEE 754 67:such as 37:bfloat16 27:Overview 779:PyTorch 689:20 July 323:PyTorch 485:where 404:where 308:, and 302:Nvidia 281:~34.0 275:16383 253:~19.2 247:16383 225:~15.9 203:Double 175:Single 136:Total 69:Summit 53:(like 706:arXiv 659:arXiv 306:Intel 219:1023 197:~7.2 169:~3.3 127:Sign 113:Bits 109:bias 102:Bits 99:Type 691:2018 636:2020 278:113 272:128 269:112 259:Quad 191:127 142:Half 599:NaN 310:AMD 266:15 250:64 244:80 241:64 238:15 222:53 216:64 213:52 210:11 194:24 188:32 185:23 166:11 163:15 160:16 157:10 35:or 801:: 777:. 753:. 728:. 682:. 644:^ 626:. 613:^ 304:, 263:1 235:1 207:1 182:8 179:1 154:5 151:1 148:) 787:. 763:. 739:. 714:. 708:: 693:. 667:. 661:: 638:. 573:w 562:L 551:k 548:= 541:w 532:) 527:L 522:k 519:( 493:k 471:L 466:k 444:L 412:s 391:g 387:s 366:g 144:(

Index

floating-point arithmetic
half-precision
bfloat16
single-precision
arbitrary-precision arithmetic
Iterative algorithms
gradient descent
square root
Supercomputers
Summit
IEEE 754

Half
IEEE 754-2008
Single
Double
x86 extended precision
Quad
machine learning
gradient descent
Nvidia
Intel
AMD
PyTorch
matrix multiplications
gradients
weight optimizer
backpropagation
exponential backoff
NaN

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.