Knowledge (XXG)

Subnormal number

Source ๐Ÿ“

31: 884: 566:
range. Because of this, a common measure to avoid subnormals on processors where there would be a performance penalty is to cut the signal to zero once it reaches subnormal levels or mix in an extremely quiet noise signal. Other methods of preventing subnormal numbers include adding a DC offset, quantizing numbers, adding a Nyquist signal, etc. Since the
257:), "denormal" is used to refer exclusively to subnormal numbers. This usage persists in various standards documents, especially when discussing hardware that is incapable of representing any other denormalized numbers, but the discussion here uses the term "subnormal" in line with the 2008 revision of 565:
Some applications need to contain code to avoid subnormal numbers, either to maintain accuracy, or in order to avoid the performance penalty in some processors. For instance, in audio processing applications, subnormal values usually represent a signal so quiet that it is out of the human hearing
550:
Some systems handle subnormal values in hardware, in the same way as normal values. Others leave the handling of subnormal values to system software ("assist"), only handling normal values and zero in hardware. Handling subnormal values in software always leads to a significant decrease in
518:
do not directly support subnormal numbers in hardware, but rather trap to some kind of software support. While this may be transparent to the user, it can result in calculations that produce or consume subnormal numbers being much slower than similar calculations on normal numbers.
551:
performance. When subnormal values are entirely computed in hardware, implementation techniques exist to allow their processing at speeds comparable to normal numbers. However, the speed of computation remains significantly reduced on many modern x86 processors; in extreme cases,
327:). Conversely, a denormalized floating point value has a significand with a leading digit of zero. Of these, the subnormal numbers represent values which if normalized would have exponents below the smallest representable exponent (the exponent having a limited range). 466:
of 0, but are interpreted with the value of the smallest allowed exponent, which is one greater (i.e., as if it were encoded as a 1). In decimal interchange formats they require no special encoding because the format supports unnormalized numbers directly.
490:
Subnormal numbers provide the guarantee that addition and subtraction of floating-point numbers never underflows; two nearby floating-point numbers always have a representable non-zero difference. Without gradual underflow, the subtraction
440:), allowing the representation of numbers closer to zero than the smallest normal number. A floating-point number may be recognized as subnormal whenever its exponent is the least value possible. 514:
proposal that was eventually adopted, but this implementation demonstrated that subnormal numbers could be supported in a practical implementation. Some implementations of
1414: 1342: 1321: 611:
is to return zero instead of a subnormal float for operations that would result in a subnormal float, even if the input arguments are not themselves subnormal.
34:
An unaugmented floating-point system would contain only normalized numbers (indicated in red). Allowing denormalized numbers (blue) extends the system's range.
844:
Most compilers will already provide the previous macro by default, otherwise the following code snippet can be used (the definition for FTZ is analogous):
1356: 870:, and therefore well-behaved software should save and restore the denormalization mode before returning to the caller or calling code in other libraries. 1374: 1083: 402:, the leading binary digit is always 1. In a subnormal number, since the exponent is the least that it can be, zero is the leading significant digit (0. 280:
for details of how real numbers relate to floating point representations. "Representation" rather than "number" may be used when clarity is required.
127: 231: 555:
involving subnormal operands may take as many as 100 additional clock cycles, causing the fastest instructions to run as much as six times slower.
137: 323:(also commonly called mantissa); rather, leading zeros are removed by adjusting the exponent (for example, the number 0.0123 would be written as 117: 107: 100: 1276: 447:
approach (discarding all significant digits when underflow is reached). Hence the production of a subnormal number is sometimes called
1455: 931: 905: 276:
The term "number" is used rather loosely, to describe a particular sequence of digits, rather than a mathematical abstraction; see
131: 482:. The subnormal floats are a linearly spaced set of values, which span the gap between the negative and positive normal floats. 121: 111: 909: 552: 205: 179: 164: 1227:"Instruction tables: Lists of instruction latencies, throughputs and microoperation breakdowns for Intel, AMD and VIA CPUs" 288:
Mathematical real numbers may be approximated by multiple floating point representations. One representation is defined as
462:
and are supported in both binary and decimal formats. In binary interchange formats, subnormal numbers are encoded with a
1488: 867: 224: 894: 913: 898: 1432: 1058: 596: 567: 511: 277: 80: 63: 59: 1360: 1378: 1087: 858:#define _MM_SET_DENORMALS_ZERO_MODE(mode) _mm_setcsr((_mm_getcsr() & ~_MM_DENORMALS_ZERO_MASK) | (mode)) 616: 217: 174: 948:. For the scalar FPU and in the AArch64 SIMD, the flush-to-zero behavior is optional and controlled by the 675:
For other x86-SSE platforms where the C library has not yet implemented this flag, the following may work:
443:
By filling the underflow gap like this, significant digits are lost, but not as abruptly as when using the
83: 1159: 1447: 510:
while the IEEE 754 standard was being written. They were by far the most controversial feature in the
634:
flags on targets supporting SSE is given below, but is not widely supported. It is known to work on
184: 55: 1109: 515: 335: 331: 1250:
Andrysco, Marc; Kohlbrenner, David; Mowery, Keaton; Jhala, Ranjit; Lerner, Sorin; Shacham, Hovav.
861:#define _MM_GET_DENORMALS_ZERO_MODE() (_mm_getcsr() & _MM_DENORMALS_ZERO_MASK) 30: 1182: 1123: 562:
that allows a malicious web site to extract page content from another site inside a web browser.
471: 143: 499:
can underflow and produce zero even though the values are not equal. This can, in turn, lead to
1322:"x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ)" 1451: 1280: 607:
is to treat subnormal input arguments to floating-point operations as zero, and the effect of
1400: 1174: 655:// See https://opensource.apple.com/source/Libm/Libm-287.1/Source/Intel/, fenv.c and fenv.h. 500: 39: 574:
has provided such a functionality in CPU hardware, which rounds subnormal numbers to zero.
1251: 1201: 538:
No other denormalized numbers exist in the IEEE binary floating point formats, but they
249:
In some older documents (especially standards documents such as the initial releases of
1303: 1226: 169: 1482: 1467: 1141: 559: 558:
This speed difference can be a security risk. Researchers showed that it provides a
528: 463: 455: 258: 250: 1186: 535:
are represented by having a zero exponent field with a non-zero significand field.
451:
because it allows a calculation to lose precision slowly when the result is small.
316: 1473:
for examples of where subnormal numbers help improve the results of calculations.
883: 542:
exist in some other formats, including the IEEE decimal floating point formats.
320: 944:
AArch32 NEON (SIMD) FPU always uses a flush-to-zero mode, which is the same as
507: 479: 475: 159: 619:
have varying default states depending on platform and optimization level.
470:
Mathematically speaking, the normalized floating-point numbers of a given
635: 254: 90: 1202:"Quantifying the Interference Caused by Subnormal Floating-Point Values" 1178: 952:
bit of the control register – FPSCR in Arm32 and FPCR in AArch64.
62:. Any non-zero number with magnitude smaller than the smallest positive 17: 670:// fesetenv(FE_DFL_ENV) // Disable both, clobbering other CSR settings. 1471: 1375:"Re: Changing floating point state (Was: double vs float performance)" 1304:"Denormal numbers in floating point signal processing applications" 612: 571: 399: 334:
number is the part of a floating-point number that represents the
189: 29: 1443: 1440:
Proceedings 16th IEEE Symposium on Computer Arithmetic (Arith16)
1431:
Eric Schwarz, Martin Schmookler and Son Dao Trong (June 2003).
877: 623: 273:
denormalized IEEE binary numbers outside the subnormal range.
1277:"Pentium 4 denormalization: CPU spikes in audio applications" 338:. For a positive normalised number it can be represented as 1158:
Schwarz, E.M.; Schmookler, M.; Son Dao Trong (July 2005).
1049:
Some ARM processors have hardware handling of subnormals.
795:
macros wrap a more readable interface for the code above.
269:
are often used interchangeably, in part because there are
503:
errors that cannot occur when gradual underflow is used.
866:
The default denormalization behavior is mandated by the
1200:
Dooley, Isaac; Kale, Laxmikant (12 September 2006).
949: 945: 792: 788: 652:// Sets DAZ and FTZ, clobbering other CSR settings. 631: 627: 608: 604: 600: 592: 588: 1433:"Hardware Implementations of Denormalized Numbers" 478:spaced, and as such any finite-sized normal float 1252:"On Subnormal Floating Point and Abnormal Timing" 1110:"An Interview with the Old Man of Floating-Point" 1160:"FPU Implementations with Denormalized Numbers" 599:by default for optimization levels higher than 1401:"C++ Compiler for Linux* Systems User's Guide" 960:#if defined(__arm64__) || defined(__aarch64__) 74:can also refer to numbers outside that range. 578:Disabling subnormal floats at the code level 225: 27:Denormalized floating-point numbers near zero 8: 912:. Unsourced material may be challenged and 587:Intel's C and Fortran compilers enable the 1444:16th IEEE Symposium on Computer Arithmetic 506:Subnormal numbers were implemented in the 232: 218: 76: 1136:(Note that the XenuOS documentation uses 932:Learn how and when to remove this message 855:#define _MM_DENORMALS_ZERO_OFF 0x0000 852:#define _MM_DENORMALS_ZERO_ON 0x0040 849:#define _MM_DENORMALS_ZERO_MASK 0x0040 1074: 197: 151: 89: 79: 1112:. University of California, Berkeley. 7: 1343:"Intelยฎ MPI Library โ€“ Documentation" 910:adding citations to reliable sources 387:represents a significant digit, and 330:The significand (or mantissa) of an 1357:"Re: Macbook pro performance issue" 1302:de Soras, Laurent (19 April 2005). 315:floating-point value, there are no 626:-compliant method of enabling the 529:IEEE binary floating point formats 261:. In casual discussions the terms 25: 1084:"IEEE 754R meeting minutes, 2002" 1320:Casey, Shawn (16 October 2008). 882: 391:is the precision) with non-zero 458:, denormal numbers are renamed 180:IBM floating-point architecture 1275:Serris, John (16 April 2002). 1167:IEEE Transactions on Computers 664:FE_DFL_DISABLE_SSE_DENORMS_ENV 1: 1041://Set the 24th bit (FTZ) to 1 292:, and others are defined as 1466:See also various papers on 1377:. Apple Inc. Archived from 1359:. Apple Inc. Archived from 955:One way to do this can be: 809:_MM_SET_DENORMALS_ZERO_MODE 789:_MM_SET_DENORMALS_ZERO_MODE 649:#pragma STDC FENV_ACCESS ON 398:. Notice that for a binary 1505: 978:"mrs %0, fpcr" 595:(flush-to-zero) flags for 445:flush to zero on underflow 1059:Logarithmic number system 591:(denormals-are-zero) and 304:by their relationship to 60:floating-point arithmetic 1005:"msr fpcr, %0" 996://Load the FPCR register 957: 846: 797: 677: 640: 1126:. Caldera International 830:_MM_SET_FLUSH_ZERO_MODE 793:_MM_SET_FLUSH_ZERO_MODE 175:Microsoft Binary Format 1124:"Denormalized numbers" 35: 1448:IEEE Computer Society 815:_MM_DENORMALS_ZERO_ON 638:since at least 2006. 570:processor extension, 33: 1450:. pp. 104โ€“111. 906:improve this section 516:floating-point units 48:denormalized numbers 1489:Computer arithmetic 1415:"Aarch64 Registers" 1283:on 25 February 2012 1179:10.1109/TC.2005.118 827:<xmmintrin.h> 806:<pmmintrin.h> 683:<xmmintrin.h> 560:timing side channel 480:cannot include zero 332:IEEE floating-point 206:Arbitrary precision 58:gap around zero in 1381:on 15 January 2014 1363:on 26 August 2016. 1090:on 15 October 2016 546:Performance issues 336:significant digits 190:G.711 8-bit floats 144:Extended precision 50:(sometimes called 46:are the subset of 36: 942: 941: 934: 836:_MM_FLUSH_ZERO_ON 460:subnormal numbers 449:gradual underflow 242: 241: 44:subnormal numbers 16:(Redirected from 1496: 1461: 1437: 1419: 1418: 1411: 1405: 1404: 1397: 1391: 1390: 1388: 1386: 1371: 1365: 1364: 1353: 1347: 1346: 1339: 1333: 1332: 1330: 1328: 1317: 1311: 1310: 1308: 1299: 1293: 1292: 1290: 1288: 1279:. Archived from 1272: 1266: 1265: 1263: 1261: 1256: 1247: 1241: 1240: 1238: 1236: 1231: 1222: 1216: 1215: 1213: 1211: 1206: 1197: 1191: 1190: 1164: 1155: 1149: 1135: 1133: 1131: 1120: 1114: 1113: 1106: 1100: 1099: 1097: 1095: 1086:. Archived from 1079: 1045: 1042: 1039: 1036: 1033: 1030: 1027: 1024: 1021: 1018: 1015: 1012: 1009: 1006: 1003: 1000: 997: 994: 991: 988: 985: 982: 979: 976: 973: 970: 967: 964: 961: 951: 947: 937: 930: 926: 923: 917: 886: 878: 862: 859: 856: 853: 850: 840: 837: 834: 831: 828: 825: 822: 821:// To enable FTZ 819: 816: 813: 810: 807: 804: 801: 800:// To enable DAZ 794: 790: 783: 780: 777: 774: 771: 768: 765: 762: 759: 756: 753: 750: 747: 744: 741: 738: 735: 732: 729: 726: 723: 720: 717: 714: 711: 708: 705: 702: 699: 696: 693: 690: 687: 684: 681: 671: 668: 665: 662: 659: 656: 653: 650: 647: 644: 633: 629: 610: 606: 603:. The effect of 602: 594: 590: 501:division by zero 326: 234: 227: 220: 77: 54:) that fill the 40:computer science 21: 1504: 1503: 1499: 1498: 1497: 1495: 1494: 1493: 1479: 1478: 1458: 1435: 1430: 1427: 1425:Further reading 1422: 1413: 1412: 1408: 1399: 1398: 1394: 1384: 1382: 1373: 1372: 1368: 1355: 1354: 1350: 1341: 1340: 1336: 1326: 1324: 1319: 1318: 1314: 1306: 1301: 1300: 1296: 1286: 1284: 1274: 1273: 1269: 1259: 1257: 1254: 1249: 1248: 1244: 1234: 1232: 1229: 1224: 1223: 1219: 1209: 1207: 1204: 1199: 1198: 1194: 1162: 1157: 1156: 1152: 1129: 1127: 1122: 1121: 1117: 1108: 1107: 1103: 1093: 1091: 1082:William Kahan. 1081: 1080: 1076: 1072: 1067: 1055: 1047: 1046: 1043: 1040: 1037: 1034: 1031: 1028: 1025: 1022: 1019: 1016: 1013: 1010: 1007: 1004: 1001: 998: 995: 992: 989: 986: 983: 980: 977: 974: 971: 968: 965: 962: 959: 938: 927: 921: 918: 903: 887: 876: 864: 863: 860: 857: 854: 851: 848: 842: 841: 838: 835: 832: 829: 826: 823: 820: 817: 814: 811: 808: 805: 802: 799: 785: 784: 782:// Disable both 781: 778: 775: 772: 769: 766: 763: 760: 757: 754: 751: 748: 745: 742: 739: 736: 733: 730: 727: 724: 721: 718: 715: 712: 709: 706: 703: 700: 697: 694: 691: 688: 685: 682: 679: 673: 672: 669: 666: 663: 660: 657: 654: 651: 648: 645: 642: 585: 580: 548: 525: 488: 476:logarithmically 464:biased exponent 439: 430: 420: 414: 408: 397: 382: 373: 363: 357: 351: 344: 324: 286: 247: 238: 185:PMBus Linear-11 28: 23: 22: 15: 12: 11: 5: 1502: 1500: 1492: 1491: 1481: 1480: 1475: 1474: 1463: 1462: 1456: 1426: 1423: 1421: 1420: 1406: 1392: 1366: 1348: 1334: 1312: 1294: 1267: 1242: 1217: 1192: 1173:(7): 825โ€“836. 1150: 1115: 1101: 1073: 1071: 1068: 1066: 1063: 1062: 1061: 1054: 1051: 984:"=r" 958: 940: 939: 890: 888: 881: 875: 872: 847: 798: 678: 646:<fenv.h> 641: 584: 581: 579: 576: 547: 544: 524: 521: 487: 484: 434: 425: 418: 412: 406: 395: 377: 368: 361: 355: 349: 342: 285: 282: 278:Floating Point 255:the C language 246: 243: 240: 239: 237: 236: 229: 222: 214: 211: 210: 209: 208: 200: 199: 195: 194: 193: 192: 187: 182: 177: 172: 170:TensorFloat-32 167: 162: 154: 153: 149: 148: 147: 146: 141: 134: 124: 114: 104: 94: 93: 87: 86: 81:Floating-point 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 1501: 1490: 1487: 1486: 1484: 1477: 1472: 1469: 1468:William Kahan 1465: 1464: 1459: 1457:0-7695-1894-X 1453: 1449: 1445: 1441: 1434: 1429: 1428: 1424: 1416: 1410: 1407: 1402: 1396: 1393: 1380: 1376: 1370: 1367: 1362: 1358: 1352: 1349: 1344: 1338: 1335: 1323: 1316: 1313: 1305: 1298: 1295: 1282: 1278: 1271: 1268: 1253: 1246: 1243: 1228: 1221: 1218: 1203: 1196: 1193: 1188: 1184: 1180: 1176: 1172: 1168: 1161: 1154: 1151: 1147: 1143: 1139: 1125: 1119: 1116: 1111: 1105: 1102: 1089: 1085: 1078: 1075: 1069: 1064: 1060: 1057: 1056: 1052: 1050: 1011:"r" 956: 953: 936: 933: 925: 915: 911: 907: 901: 900: 896: 891:This section 889: 885: 880: 879: 873: 871: 869: 845: 796: 676: 639: 637: 625: 620: 618: 614: 598: 582: 577: 575: 573: 569: 563: 561: 556: 554: 545: 543: 541: 536: 534: 530: 522: 520: 517: 513: 509: 504: 502: 498: 495: โˆ’  494: 485: 483: 481: 477: 473: 468: 465: 461: 457: 456:IEEE 754-2008 452: 450: 446: 441: 437: 433: 428: 424: 417: 411: 405: 401: 394: 390: 386: 380: 376: 371: 367: 360: 354: 348: 341: 337: 333: 328: 322: 318: 317:leading zeros 314: 309: 307: 303: 299: 295: 291: 283: 281: 279: 274: 272: 268: 264: 260: 256: 252: 244: 235: 230: 228: 223: 221: 216: 215: 213: 212: 207: 204: 203: 202: 201: 196: 191: 188: 186: 183: 181: 178: 176: 173: 171: 168: 166: 163: 161: 158: 157: 156: 155: 150: 145: 142: 139: 135: 133: 130:(binary128), 129: 125: 123: 119: 115: 113: 109: 105: 102: 98: 97: 96: 95: 92: 88: 85: 82: 78: 75: 73: 69: 65: 64:normal number 61: 57: 53: 49: 45: 41: 32: 19: 1476: 1470:'s web site 1439: 1409: 1395: 1383:. Retrieved 1379:the original 1369: 1361:the original 1351: 1337: 1325:. Retrieved 1315: 1297: 1285:. Retrieved 1281:the original 1270: 1258:. Retrieved 1245: 1233:. Retrieved 1225:Fog, Agner. 1220: 1208:. Retrieved 1195: 1170: 1166: 1153: 1145: 1137: 1128:. Retrieved 1118: 1104: 1092:. Retrieved 1088:the original 1077: 1048: 954: 943: 928: 919: 904:Please help 892: 865: 843: 786: 674: 621: 586: 564: 557: 553:instructions 549: 539: 537: 532: 526: 512:K-C-S format 505: 496: 492: 489: 474:are roughly 469: 459: 453: 448: 444: 442: 435: 431: 426: 422: 415: 409: 403: 392: 388: 384: 378: 374: 369: 365: 358: 352: 346: 339: 329: 312: 310: 305: 301: 297: 293: 289: 287: 275: 270: 266: 262: 248: 198:Alternatives 120:(binary64), 110:(binary32), 71: 67: 51: 47: 43: 37: 1327:3 September 1210:30 November 1094:29 December 321:significand 245:Terminology 140:(binary256) 1385:24 January 1235:25 January 1130:11 October 1070:References 922:March 2023 764:_mm_getcsr 758:_mm_setcsr 740:_mm_getcsr 734:_mm_setcsr 716:_mm_getcsr 710:_mm_setcsr 692:_mm_getcsr 686:_mm_setcsr 533:subnormals 508:Intel 8087 486:Background 284:Definition 132:decimal128 103:(binary16) 1260:5 October 1146:subnormal 946:FTZ + DAZ 893:does not 583:Intel SSE 325:1.23 ร— 10 294:subnormal 263:subnormal 160:Minifloat 136:256-bit: 128:Quadruple 126:128-bit: 122:decimal64 112:decimal32 68:subnormal 56:underflow 52:denormals 1483:Category 1403:. Intel. 1345:. Intel. 1287:29 April 1187:26470540 1142:IEEE 754 1138:denormal 1053:See also 1029:<< 963:uint64_t 824:#include 803:#include 680:#include 658:fesetenv 643:#include 636:Mac OS X 302:unnormal 298:denormal 267:denormal 259:IEEE 754 251:IEEE 754 165:bfloat16 116:64-bit: 106:32-bit: 99:16-bit: 91:IEEE 754 72:denormal 70:, while 18:Denormal 914:removed 899:sources 755:// Both 383:(where 319:in the 138:Octuple 84:formats 1454:  1417:. Arm. 1185:  1140:where 1044:#endif 776:0x8040 749:0x8040 731:// FTZ 725:0x8000 707:// DAZ 701:0x0040 622:A non- 313:normal 306:normal 290:normal 118:Double 108:Single 1436:(PDF) 1307:(PDF) 1255:(PDF) 1230:(PDF) 1205:(PDF) 1183:S2CID 1163:(PDF) 1144:uses 1065:Notes 770:& 613:clang 572:Intel 400:radix 311:In a 300:, or 152:Other 1452:ISBN 1387:2013 1329:2010 1289:2015 1262:2015 1237:2011 1212:2010 1132:2023 1096:2013 1017:fpcr 990:fpcr 966:fpcr 897:any 895:cite 791:and 787:The 630:and 615:and 568:SSE2 523:IEEE 472:sign 265:and 253:and 101:Half 1175:doi 1038:)); 999:asm 993:)); 972:asm 908:by 874:ARM 868:ABI 632:FTZ 628:DAZ 624:C99 617:gcc 609:FTZ 605:DAZ 601:-O0 597:SSE 593:FTZ 589:DAZ 527:In 454:In 421:... 364:... 66:is 38:In 1485:: 1446:. 1442:. 1438:. 1181:. 1171:54 1169:. 1165:. 1148:.) 1032:24 1008::: 950:FZ 839:); 818:); 779:); 767:() 752:); 743:() 728:); 719:() 704:); 695:() 667:); 540:do 531:, 438:โˆ’1 429:โˆ’2 381:โˆ’1 372:โˆ’2 308:. 296:, 271:no 42:, 1460:. 1389:. 1331:. 1309:. 1291:. 1264:. 1239:. 1214:. 1189:. 1177:: 1134:. 1098:. 1035:) 1026:1 1023:( 1020:| 1014:( 1002:( 987:( 981:: 975:( 969:; 935:) 929:( 924:) 920:( 916:. 902:. 833:( 812:( 773:~ 761:( 746:| 737:( 722:| 713:( 698:| 689:( 661:( 497:b 493:a 436:p 432:m 427:p 423:m 419:3 416:m 413:2 410:m 407:1 404:m 396:0 393:m 389:p 385:m 379:p 375:m 370:p 366:m 362:3 359:m 356:2 353:m 350:1 347:m 345:. 343:0 340:m 233:e 226:t 219:v 20:)

Index

Denormal

computer science
underflow
floating-point arithmetic
normal number
Floating-point
formats
IEEE 754
Half
Single
decimal32
Double
decimal64
Quadruple
decimal128
Octuple
Extended precision
Minifloat
bfloat16
TensorFloat-32
Microsoft Binary Format
IBM floating-point architecture
PMBus Linear-11
G.711 8-bit floats
Arbitrary precision
v
t
e
IEEE 754

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

โ†‘