Knowledge (XXG)

Double-precision floating-point format

Source ๐Ÿ“

366: 637:
Between 2=4,503,599,627,370,496 and 2=9,007,199,254,740,992 the representable numbers are exactly the integers. For the next range, from 2 to 2, everything is multiplied by 2, so the representable numbers are the even ones, etc. Conversely, for the previous range from 2 to 2, the spacing is 0.5, etc.
1913:
data encoding format supports numeric values, and the grammar to which numeric expressions must conform has no limits on the precision or range of the numbers so encoded. However, RFC 8259 advises that, since IEEE 754 binary64 numbers are widely implemented, good interoperability can be achieved by
348:
precision (2 โ‰ˆ 1.11 ร— 10). If a decimal string with at most 15 significant digits is converted to the IEEE 754 double-precision format, giving a normal number, and then converted back to a decimal string with the same number of digits, the final result should match the original
1298:
floating-point standard does not specify endianness. Theoretically, this means that even standard IEEE floating-point data written by one machine might not be readable by another. However, on modern standard computers (i.e., implementing IEEE 754), one may safely assume that the endianness is the
1864:
provides the types SHORT-FLOAT, SINGLE-FLOAT, DOUBLE-FLOAT and LONG-FLOAT. Most implementations provide SINGLE-FLOATs and DOUBLE-FLOATs with the other types appropriate synonyms. Common Lisp provides exceptions for catching floating-point underflows and overflows, and the inexact floating-point
1260: 356:
having an implicit integer bit of value 1 (except for special data, see the exponent encoding below). With the 52 bits of the fraction (F) significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53
648:
The 11 bit width of the exponent allows the representation of numbers between 10 and 10, with full 15โ€“17 decimal digits precision. By compromising precision, the subnormal representation allows even smaller values up to about 5 ร— 10.
632: 1759:
Using double-precision floating-point variables is usually slower than working with their single precision counterparts. One area of computing where this is a particular issue is parallel code running on GPUs. For example, when using
349:
string. If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number.
505: 1131: 295:
Double-precision binary floating-point is a commonly used format on PCs, due to its wider range over single-precision floating point, in spite of its performance and bandwidth cost. It is commonly known simply as
1150: 1286:
processors that have mixed-endian floating-point representation for double-precision numbers: each of the two 32-bit words is stored as little-endian, but the most significant word is stored first.
2118: 960: 341:: an exponent value of 1023 represents the actual zero. Exponents range from โˆ’1022 to +1023 because exponents of โˆ’1023 (all 0s) and +1024 (all 1s) are reserved for special numbers. 1877:
before version 1.2, every implementation had to be IEEE 754 compliant. Version 1.2 allowed implementations to bring extra precision in intermediate computations for platforms like
1830:
type corresponds to double precision. However, on 32-bit x86 with extended precision by default, some compilers may not conform to the C standard or the arithmetic may suffer from
735: 887: 813: 1775:
Additionally, many mathematical functions (e.g., sin, cos, atan2, log, exp and sqrt) need more computations to give accurate double-precision results, and are therefore slower.
1290:
floating point stores little-endian 16-bit words in big-endian order. Because there have been many floating-point formats with no network standard representation for them, the
641:
The spacing as a fraction of the numbers in the range from 2 to 2 is 2. The maximum relative rounding error when rounding a number to the nearest representable one (the
516: 1810:(or when SSE2 is not used, for compatibility purpose) and with extended precision used by default, software may have difficulties to fulfill some requirements. 392: 1865:
exception, as per IEEE 754. No infinities and NaNs are described in the ANSI standard, however, several implementations do provide these as extensions.
1278:
Although many processors use little-endian storage for all types of data (integer, floating point), there are a number of hardware architectures where
2345: 173: 400: 277: 2105: 1070: 2716: 2350: 1802:
Doubles are implemented in many programming languages in different ways such as the following. On processors with only dynamic precision, such as
1768:
platform, calculations with double precision can take, depending on hardware, from 2 to 32 times as long to complete compared to those done using
661:
representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard. Examples of such representations would be:
183: 2335: 1769: 1710:
family processors, use the most significant bit of the significand field to indicate a quiet NaN; this is what is recommended by IEEE 754. The
153: 64: 2323: 2224: 146: 1958: 1255:{\displaystyle (-1)^{\text{sign}}\times 2^{1-1023}\times 0.{\text{fraction}}=(-1)^{\text{sign}}\times 2^{-1022}\times 0.{\text{fraction}}} 2008: 1898: 108:. Before the widespread adoption of IEEE 754-1985, the representation and properties of floating-point data types depended on the 177: 2474: 167: 157: 2062: 2591: 2396: 2328: 2290: 251: 225: 210: 2777: 2772: 2491: 2421: 2269: 1291: 270: 2746: 1984: 1922: 1874: 1299:
same for floating-point numbers as for integers, making the conversion straightforward regardless of data type. Small
2767: 2501: 2369: 2156: 1926: 1743:
Value = 2 ร— 1.Fraction โ€“ Note that Fraction must not be converted to decimal here = 2 ร— (15 5555 5555 5555
2679: 2631: 2543: 2521: 2516: 2444: 2310: 1885:
was introduced to enforce strict IEEE 754 computations. Strict floating point has been restored in Java 17.
740: 126: 42: 2553: 2217: 1784:
Integers from −2 to 2 (−9,007,199,254,740,992 to 9,007,199,254,740,992) can be exactly represented.
1294:
standard uses big-endian IEEE 754 as its representation. It may therefore appear strange that the widespread
2706: 2621: 2029: 919: 263: 220: 1282:
numbers are represented in big-endian form while integers are represented in little-endian form. There are
365: 2449: 2305: 2264: 2259: 691: 323: 129: 94: 45: 846: 772: 2439: 2414: 109: 2241: 230: 101: 2711: 2689: 2616: 2469: 2461: 2381: 2210: 1699: 345: 2694: 2674: 2626: 2601: 2386: 2355: 189: 1914:
implementations processing JSON if they expect no more precision or range than binary64 offers.
627:{\displaystyle (-1)^{\text{sign}}\left(1+\sum _{i=1}^{52}b_{52-i}2^{-i}\right)\times 2^{e-1023}} 1951: 2581: 2511: 2486: 2300: 2295: 2004: 330:
The sign bit determines the sign of the number (including when this number is zero, which is
2726: 2611: 2409: 1722: 1707: 1283: 1137: 1010: 2731: 2596: 2548: 2481: 1300: 642: 74: 1787:
Integers between 2 and 2 = 18,014,398,509,481,984 round to a multiple of 2 (even number).
112:
and computer model, and upon decisions made by programming-language implementers. E.g.,
2684: 2506: 2496: 2404: 1952:"Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic" 1822:. Double precision is not required by the standards (except by the optional annex F of 1739:= 1021 Exponent Bias = 1023 (constant value; see above) Fraction = 5 5555 5555 5555 1279: 377: 215: 2761: 2606: 1064:
Except for the above exceptions, the entire double-precision number is described by:
658: 372: 338: 86: 53: 2563: 2538: 2181: 2054: 1819: 2741: 2736: 2586: 2533: 2360: 2135: 1861: 1058: 1002: 353: 331: 320: 89:. IEEE 754 specifies additional floating-point formats, including 32-bit base-2 57: 17: 500:{\displaystyle (-1)^{\text{sign}}(1.b_{51}b_{50}...b_{0})_{2}\times 2^{e-1023}} 2646: 2641: 2558: 2526: 2431: 2374: 1894: 1272: 117: 1126:{\displaystyle (-1)^{\text{sign}}\times 2^{e-1023}\times 1.{\text{fraction}}} 371:
The real value assumed by a given 64-bit double-precision datum with a given
2721: 2699: 2656: 2651: 2318: 2274: 2233: 205: 1976: 1790:
Integers between 2 and 2 = 36,028,797,018,963,968 round to a multiple of 4.
2636: 1882: 1831: 1751:= 0.333333333333333314829616256247390992939472198486328125 โ‰ˆ 1/3 1295: 1034: 314: 308: 136: 113: 71: 1518:โ‰™ +2 ร— 2 = 2 โ‰ˆ 4.9406564584124654 ร— 10 (Min. subnormal positive double) 657:
The double-precision binary floating-point exponent is encoded using an
1842: 1711: 105: 1761: 1303:
using special floating-point formats may be another matter, however.
337:
The exponent field is an 11-bit unsigned integer from 0 to 2047, in
2085: 1681:
0 10000000000 1001001000011111101101010100010001000010110100011000
1661:
0 01111111101 0101010101010101010101010101010101010101010101010101
1644:
0 11111111111 1111111111111111111111111111111111111111111111111111
1631:
0 11111111111 1000000000000000000000000000000000000000000000000001
1618:
0 11111111111 0000000000000000000000000000000000000000000000000001
1605:
1 11111111111 0000000000000000000000000000000000000000000000000000
1592:
0 11111111111 0000000000000000000000000000000000000000000000000000
1579:
1 00000000000 0000000000000000000000000000000000000000000000000000
1566:
0 00000000000 0000000000000000000000000000000000000000000000000000
1549:
0 11111111110 1111111111111111111111111111111111111111111111111111
1536:
0 00000000001 0000000000000000000000000000000000000000000000000000
1523:
0 00000000000 1111111111111111111111111111111111111111111111111111
1510:
0 00000000000 0000000000000000000000000000000000000000000000000001
1485:
0 01111111000 1000000000000000000000000000000000000000000000000000
1464:
0 10000000011 0111000000000000000000000000000000000000000000000000
1443:
0 10000000001 1000000000000000000000000000000000000000000000000000
1422:
0 10000000001 0100000000000000000000000000000000000000000000000000
1405:
0 10000000001 0000000000000000000000000000000000000000000000000000
1384:
0 10000000000 1000000000000000000000000000000000000000000000000000
1367:
1 10000000000 0000000000000000000000000000000000000000000000000000
1354:
0 10000000000 0000000000000000000000000000000000000000000000000000
1341:
0 01111111111 0000000000000000000000000000000000000000000000000010
1328:
0 01111111111 0000000000000000000000000000000000000000000000000001
1315:
0 01111111111 0000000000000000000000000000000000000000000000000000
235: 1544:โ‰™ +2 ร— 1 โ‰ˆ 2.2250738585072014 ร— 10 (Min. normal positive double) 1531:โ‰™ +2 ร— (1 โˆ’ 2) โ‰ˆ 2.2250738585072009 ร— 10 (Max. subnormal double) 2254: 1910: 1901:
shall be done using double-precision floating-point arithmetic.
1807: 1765: 1336:โ‰™ +2 ร— (1 + 2) โ‰ˆ 1.0000000000000002, the smallest number > 1 291:
IEEE 754 double-precision binary floating-point format: binary64
2206: 2136:"The JavaScript Object Notation (JSON) Data Interchange Format" 2055:"Bug 323 โ€“ optimized code gives strange floating point results" 2249: 2030:"Nvidia's New Titan V Pushes 110 Teraflops From A Single Chip" 1878: 1823: 1803: 1703: 1695: 1287: 1042: 63:
Double precision may be chosen when the range or precision of
49: 2202: 1845:
provides several integer and real types, and the 64-bit type
1557:โ‰™ +2 ร— (1 + (1 โˆ’ 2)) โ‰ˆ 1.7976931348623157 ร— 10 (Max. double) 1826:, covering IEEE 754 arithmetic), but on most systems, the 1702:
and depend on the processor. Most processors, such as the
2086:"JEP 306: Restore Always-Strict Floating-Point Semantics" 1731:
Given the hexadecimal representation 3FD5 5555 5555 5555
1725:, because of the odd number of bits in the significand. 77:, the 64-bit base-2 format is officially referred to as 1639:โ‰™ NaN (qNaN on most processors, such as x86 and ARM) 1626:โ‰™ NaN (sNaN on most processors, such as x86 and ARM) 1153: 1073: 922: 849: 775: 694: 519: 403: 380: 344:
The 53-bit significand precision gives from 15 to 17
2005:"pack โ€“ convert a list into a binary representation" 1714:
processors use the bit to indicate a signaling NaN.
2665: 2574: 2460: 2430: 2395: 2283: 2240: 1793:Integers between 2 and 2 round to a multiple of 2. 1254: 1144:= 0) the double-precision number is described by: 1125: 954: 881: 807: 729: 626: 499: 386: 361:(2) โ‰ˆ 15.955). The bits are laid out as follows: 2138:. Internet Engineering Task Force. December 2017 2113:(5th ed.). Ecma International. p. 29, ยง8.5 1755:Execution speed with double-precision arithmetic 2182:"Documentation - The Zig Programming Language" 2218: 271: 93:and, more recently, base-10 representations ( 8: 2157:"Data Types - The Rust Programming Language" 1849:, accessible via Fortran's intrinsic module 2225: 2211: 2203: 2107:ECMA-262 ECMAScript Language Specification 278: 264: 122: 1247: 1232: 1219: 1198: 1180: 1167: 1152: 1118: 1100: 1087: 1072: 946: 927: 921: 873: 854: 848: 799: 780: 774: 718: 699: 693: 612: 591: 575: 565: 554: 533: 518: 485: 472: 462: 443: 433: 417: 402: 379: 104:to provide floating-point data types was 52:in computer memory; it represents a wide 1942: 1779:Precision limitations on integer values 1652:โ‰™ NaN (an alternative encoding of NaN) 1061:. All bit patterns are valid encoding. 243: 197: 135: 125: 955:{\displaystyle 2^{2046-1023}=2^{1023}} 116:'s double-precision data type was the 56:of numeric values by using a floating 31:Double-precision floating-point format 1964:from the original on 8 February 2012. 7: 1747:ร— 2) = 2 ร— 15 5555 5555 5555 1349:โ‰™ +2 ร— (1 + 2) โ‰ˆ 1.0000000000000004 730:{\displaystyle 2^{1-1023}=2^{-1022}} 300:. The IEEE 754 standard specifies a 1853:, corresponds to double precision. 882:{\displaystyle 2^{1029-1023}=2^{6}} 808:{\displaystyle 2^{1023-1023}=2^{0}} 2065:from the original on 30 April 2018 1818:C and C++ offer a wide variety of 25: 2124:from the original on 2012-03-13. 1950:William Kahan (1 October 1997). 1721:rounds down, instead of up like 1698:are not completely specified in 1669:โ‰™ +2 ร— (1 + 2 + 2 + ... + 2) โ‰ˆ / 1271:This section is an excerpt from 364: 326:: 53 bits (52 explicitly stored) 2011:from the original on 2009-02-18 1987:from the original on 2018-07-03 352:The format is written with the 226:IBM floating-point architecture 1216: 1206: 1164: 1154: 1084: 1074: 1057:is the fractional part of the 530: 520: 469: 423: 414: 404: 1: 2291:Arbitrary-precision or bignum 1735:, Sign = 0 Exponent = 3FD 27:64-bit computer number format 1975:Savard, John J. G. (2018) , 1897:standard, all arithmetic in 1273:Endianness ยง Floating point 2794: 1696:Encodings of qNaN and sNaN 1270: 346:significant decimal digits 2632:Strongly typed identifier 1613:โ‰™ โˆ’โˆž (negative infinity) 1600:โ‰™ +โˆž (positive infinity) 1308:Double-precision examples 394:and a 52-bit fraction is 1977:"Floating-Point Formats" 984:have a special meaning: 2707:Parametric polymorphism 1001:is used to represent a 739:(smallest exponent for 221:Microsoft Binary Format 120:floating-point format. 67:would be insufficient. 48:, usually occupying 64 1256: 1127: 956: 883: 809: 731: 628: 570: 501: 388: 95:decimal floating point 1685:= 4009 21FB 5444 2D18 1665:= 3FD5 5555 5555 5555 1648:โ‰™ 7FFF FFFF FFFF FFFF 1635:โ‰™ 7FF8 0000 0000 0001 1622:โ‰™ 7FF0 0000 0000 0001 1609:โ‰™ FFF0 0000 0000 0000 1596:โ‰™ 7FF0 0000 0000 0000 1583:โ‰™ 8000 0000 0000 0000 1570:โ‰™ 0000 0000 0000 0000 1553:โ‰™ 7FEF FFFF FFFF FFFF 1540:โ‰™ 0010 0000 0000 0000 1527:โ‰™ 000F FFFF FFFF FFFF 1514:โ‰™ 0000 0000 0000 0001 1501:= 0.01171875 (3/256) 1489:โ‰™ 3F88 0000 0000 0000 1468:โ‰™ 4037 0000 0000 0000 1447:โ‰™ 4018 0000 0000 0000 1426:โ‰™ 4014 0000 0000 0000 1409:โ‰™ 4010 0000 0000 0000 1388:โ‰™ 4008 0000 0000 0000 1371:โ‰™ C000 0000 0000 0000 1358:โ‰™ 4000 0000 0000 0000 1345:โ‰™ 3FF0 0000 0000 0002 1332:โ‰™ 3FF0 0000 0000 0001 1319:โ‰™ 3FF0 0000 0000 0000 1257: 1128: 1033:is used to represent 957: 884: 810: 732: 629: 550: 502: 389: 110:computer manufacturer 102:programming languages 2778:Floating point types 1937:Notes and references 1893:As specified by the 1151: 1071: 920: 847: 773: 692: 517: 401: 378: 2773:Computer arithmetic 2712:Primitive data type 2617:Recursive data type 2470:Algebraic data type 2346:Quadruple precision 964:(highest exponent) 252:Arbitrary precision 2675:Abstract data type 2356:Extended precision 2315:Reduced precision 1881:. Thus a modifier 1252: 1123: 952: 879: 805: 727: 645:) is therefore 2. 624: 497: 384: 236:G.711 8-bit floats 190:Extended precision 33:(sometimes called 2768:Binary arithmetic 2755: 2754: 2487:Associative array 2351:Octuple precision 2161:doc.rust-lang.org 2084:Darcy, Joseph D. 1693: 1692: 1676: 1675: 1656: 1655: 1561: 1560: 1505: 1504: 1379: 1378: 1250: 1222: 1201: 1170: 1138:subnormal numbers 1121: 1090: 1011:subnormal numbers 968: 967: 653:Exponent encoding 536: 420: 387:{\displaystyle e} 288: 287: 100:One of the first 16:(Redirected from 2785: 2727:Type constructor 2612:Opaque data type 2544:Record or Struct 2341:Double precision 2336:Single precision 2227: 2220: 2213: 2204: 2197: 2196: 2194: 2192: 2178: 2172: 2171: 2169: 2167: 2153: 2147: 2146: 2144: 2143: 2132: 2126: 2125: 2123: 2112: 2102: 2096: 2095: 2093: 2092: 2081: 2075: 2074: 2072: 2070: 2051: 2045: 2044: 2042: 2041: 2026: 2020: 2019: 2017: 2016: 2001: 1995: 1994: 1993: 1992: 1972: 1966: 1965: 1963: 1956: 1947: 1932: 1852: 1848: 1829: 1820:arithmetic types 1770:single precision 1728:In more detail: 1723:single precision 1678: 1677: 1658: 1657: 1563: 1562: 1507: 1506: 1381: 1380: 1312: 1311: 1301:embedded systems 1261: 1259: 1258: 1253: 1251: 1248: 1240: 1239: 1224: 1223: 1220: 1202: 1199: 1191: 1190: 1172: 1171: 1168: 1132: 1130: 1129: 1124: 1122: 1119: 1111: 1110: 1092: 1091: 1088: 1032: 1025: 1000: 993: 983: 976: 961: 959: 958: 953: 951: 950: 938: 937: 911: 904: 888: 886: 885: 880: 878: 877: 865: 864: 838: 831: 814: 812: 811: 806: 804: 803: 791: 790: 764: 757: 736: 734: 733: 728: 726: 725: 710: 709: 683: 676: 664: 663: 633: 631: 630: 625: 623: 622: 604: 600: 599: 598: 586: 585: 569: 564: 538: 537: 534: 506: 504: 503: 498: 496: 495: 477: 476: 467: 466: 448: 447: 438: 437: 422: 421: 418: 393: 391: 390: 385: 368: 280: 273: 266: 123: 91:single precision 81:; it was called 65:single precision 21: 18:Double precision 2793: 2792: 2788: 2787: 2786: 2784: 2783: 2782: 2758: 2757: 2756: 2751: 2732:Type conversion 2667: 2661: 2597:Enumerated type 2570: 2456: 2450:null-terminated 2426: 2391: 2279: 2236: 2231: 2201: 2200: 2190: 2188: 2180: 2179: 2175: 2165: 2163: 2155: 2154: 2150: 2141: 2139: 2134: 2133: 2129: 2121: 2115:The Number Type 2110: 2104: 2103: 2099: 2090: 2088: 2083: 2082: 2078: 2068: 2066: 2053: 2052: 2048: 2039: 2037: 2028: 2027: 2023: 2014: 2012: 2003: 2002: 1998: 1990: 1988: 1974: 1973: 1969: 1961: 1954: 1949: 1948: 1944: 1939: 1930: 1920: 1907: 1891: 1871: 1859: 1851:iso_fortran_env 1850: 1846: 1840: 1832:double rounding 1827: 1816: 1800: 1798:Implementations 1781: 1757: 1752: 1750: 1746: 1742: 1738: 1734: 1720: 1706:family and the 1688: 1684: 1672: 1668: 1664: 1651: 1647: 1638: 1634: 1625: 1621: 1612: 1608: 1599: 1595: 1586: 1582: 1573: 1569: 1556: 1552: 1543: 1539: 1530: 1526: 1517: 1513: 1500: 1496: 1492: 1488: 1479: 1475: 1471: 1467: 1458: 1454: 1450: 1446: 1437: 1433: 1429: 1425: 1416: 1412: 1408: 1399: 1395: 1391: 1387: 1374: 1370: 1361: 1357: 1348: 1344: 1335: 1331: 1322: 1318: 1310: 1305: 1304: 1276: 1268: 1228: 1215: 1176: 1163: 1149: 1148: 1136:In the case of 1096: 1083: 1069: 1068: 1031: 1027: 1024: 1020: 999: 995: 992: 988: 982: 978: 975: 971: 942: 923: 918: 917: 910: 906: 903: 899: 869: 850: 845: 844: 837: 833: 830: 826: 795: 776: 771: 770: 763: 759: 756: 752: 714: 695: 690: 689: 682: 678: 675: 671: 655: 643:machine epsilon 608: 587: 571: 543: 539: 529: 515: 514: 481: 468: 458: 439: 429: 413: 399: 398: 376: 375: 373:biased exponent 360: 293: 284: 231:PMBus Linear-11 28: 23: 22: 15: 12: 11: 5: 2791: 2789: 2781: 2780: 2775: 2770: 2760: 2759: 2753: 2752: 2750: 2749: 2744: 2739: 2734: 2729: 2724: 2719: 2714: 2709: 2704: 2703: 2702: 2692: 2687: 2685:Data structure 2682: 2677: 2671: 2669: 2663: 2662: 2660: 2659: 2654: 2649: 2644: 2639: 2634: 2629: 2624: 2619: 2614: 2609: 2604: 2599: 2594: 2589: 2584: 2578: 2576: 2572: 2571: 2569: 2568: 2567: 2566: 2556: 2551: 2546: 2541: 2536: 2531: 2530: 2529: 2519: 2514: 2509: 2504: 2499: 2494: 2489: 2484: 2479: 2478: 2477: 2466: 2464: 2458: 2457: 2455: 2454: 2453: 2452: 2442: 2436: 2434: 2428: 2427: 2425: 2424: 2419: 2418: 2417: 2412: 2401: 2399: 2393: 2392: 2390: 2389: 2384: 2379: 2378: 2377: 2367: 2366: 2365: 2364: 2363: 2353: 2348: 2343: 2338: 2333: 2332: 2331: 2326: 2324:Half precision 2321: 2311:Floating point 2308: 2303: 2298: 2293: 2287: 2285: 2281: 2280: 2278: 2277: 2272: 2267: 2262: 2257: 2252: 2246: 2244: 2238: 2237: 2232: 2230: 2229: 2222: 2215: 2207: 2199: 2198: 2173: 2148: 2127: 2097: 2076: 2046: 2034:Tom's Hardware 2021: 1996: 1967: 1941: 1940: 1938: 1935: 1919: 1916: 1906: 1903: 1890: 1887: 1870: 1867: 1858: 1855: 1839: 1836: 1815: 1812: 1799: 1796: 1795: 1794: 1791: 1788: 1785: 1780: 1777: 1756: 1753: 1748: 1744: 1740: 1736: 1732: 1730: 1718: 1691: 1690: 1686: 1682: 1674: 1673: 1670: 1666: 1662: 1654: 1653: 1649: 1645: 1641: 1640: 1636: 1632: 1628: 1627: 1623: 1619: 1615: 1614: 1610: 1606: 1602: 1601: 1597: 1593: 1589: 1588: 1584: 1580: 1576: 1575: 1571: 1567: 1559: 1558: 1554: 1550: 1546: 1545: 1541: 1537: 1533: 1532: 1528: 1524: 1520: 1519: 1515: 1511: 1503: 1502: 1498: 1494: 1490: 1486: 1482: 1481: 1477: 1473: 1469: 1465: 1461: 1460: 1456: 1452: 1448: 1444: 1440: 1439: 1435: 1431: 1427: 1423: 1419: 1418: 1414: 1413:โ‰™ +2 ร— 1 = 100 1410: 1406: 1402: 1401: 1397: 1393: 1389: 1385: 1377: 1376: 1375:โ‰™ โˆ’2 ร— 1 = โˆ’2 1372: 1368: 1364: 1363: 1359: 1355: 1351: 1350: 1346: 1342: 1338: 1337: 1333: 1329: 1325: 1324: 1320: 1316: 1309: 1306: 1280:floating-point 1277: 1269: 1267: 1264: 1263: 1262: 1246: 1243: 1238: 1235: 1231: 1227: 1218: 1214: 1211: 1208: 1205: 1197: 1194: 1189: 1186: 1183: 1179: 1175: 1166: 1162: 1159: 1156: 1134: 1133: 1117: 1114: 1109: 1106: 1103: 1099: 1095: 1086: 1082: 1079: 1076: 1051: 1050: 1029: 1022: 1018: 997: 990: 980: 973: 970:The exponents 966: 965: 962: 949: 945: 941: 936: 933: 930: 926: 915: 913: 908: 901: 892: 891: 889: 876: 872: 868: 863: 860: 857: 853: 842: 840: 835: 828: 819: 818: 817:(zero offset) 815: 802: 798: 794: 789: 786: 783: 779: 768: 766: 761: 754: 745: 744: 741:normal numbers 737: 724: 721: 717: 713: 708: 705: 702: 698: 687: 685: 680: 673: 654: 651: 635: 634: 621: 618: 615: 611: 607: 603: 597: 594: 590: 584: 581: 578: 574: 568: 563: 560: 557: 553: 549: 546: 542: 532: 528: 525: 522: 508: 507: 494: 491: 488: 484: 480: 475: 471: 465: 461: 457: 454: 451: 446: 442: 436: 432: 428: 425: 416: 412: 409: 406: 383: 358: 328: 327: 318: 312: 292: 289: 286: 285: 283: 282: 275: 268: 260: 257: 256: 255: 254: 246: 245: 241: 240: 239: 238: 233: 228: 223: 218: 216:TensorFloat-32 213: 208: 200: 199: 195: 194: 193: 192: 187: 180: 170: 160: 150: 140: 139: 133: 132: 127:Floating-point 43:floating-point 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 2790: 2779: 2776: 2774: 2771: 2769: 2766: 2765: 2763: 2748: 2745: 2743: 2740: 2738: 2735: 2733: 2730: 2728: 2725: 2723: 2720: 2718: 2715: 2713: 2710: 2708: 2705: 2701: 2698: 2697: 2696: 2693: 2691: 2688: 2686: 2683: 2681: 2678: 2676: 2673: 2672: 2670: 2664: 2658: 2655: 2653: 2650: 2648: 2645: 2643: 2640: 2638: 2635: 2633: 2630: 2628: 2625: 2623: 2620: 2618: 2615: 2613: 2610: 2608: 2607:Function type 2605: 2603: 2600: 2598: 2595: 2593: 2590: 2588: 2585: 2583: 2580: 2579: 2577: 2573: 2565: 2562: 2561: 2560: 2557: 2555: 2552: 2550: 2547: 2545: 2542: 2540: 2537: 2535: 2532: 2528: 2525: 2524: 2523: 2520: 2518: 2515: 2513: 2510: 2508: 2505: 2503: 2500: 2498: 2495: 2493: 2490: 2488: 2485: 2483: 2480: 2476: 2473: 2472: 2471: 2468: 2467: 2465: 2463: 2459: 2451: 2448: 2447: 2446: 2443: 2441: 2438: 2437: 2435: 2433: 2429: 2423: 2420: 2416: 2413: 2411: 2408: 2407: 2406: 2403: 2402: 2400: 2398: 2394: 2388: 2385: 2383: 2380: 2376: 2373: 2372: 2371: 2368: 2362: 2359: 2358: 2357: 2354: 2352: 2349: 2347: 2344: 2342: 2339: 2337: 2334: 2330: 2327: 2325: 2322: 2320: 2317: 2316: 2314: 2313: 2312: 2309: 2307: 2304: 2302: 2299: 2297: 2294: 2292: 2289: 2288: 2286: 2282: 2276: 2273: 2271: 2268: 2266: 2263: 2261: 2258: 2256: 2253: 2251: 2248: 2247: 2245: 2243: 2242:Uninterpreted 2239: 2235: 2228: 2223: 2221: 2216: 2214: 2209: 2208: 2205: 2187: 2183: 2177: 2174: 2162: 2158: 2152: 2149: 2137: 2131: 2128: 2120: 2116: 2109: 2108: 2101: 2098: 2087: 2080: 2077: 2064: 2060: 2056: 2050: 2047: 2035: 2031: 2025: 2022: 2010: 2006: 2000: 1997: 1986: 1982: 1978: 1971: 1968: 1960: 1957:. p. 4. 1953: 1946: 1943: 1936: 1934: 1928: 1924: 1917: 1915: 1912: 1904: 1902: 1900: 1896: 1888: 1886: 1884: 1880: 1876: 1868: 1866: 1863: 1856: 1854: 1844: 1837: 1835: 1833: 1825: 1821: 1813: 1811: 1809: 1805: 1797: 1792: 1789: 1786: 1783: 1782: 1778: 1776: 1773: 1771: 1767: 1763: 1754: 1729: 1726: 1724: 1717:By default, / 1715: 1713: 1709: 1705: 1701: 1697: 1680: 1679: 1660: 1659: 1643: 1642: 1630: 1629: 1617: 1616: 1604: 1603: 1591: 1590: 1578: 1577: 1565: 1564: 1548: 1547: 1535: 1534: 1522: 1521: 1509: 1508: 1484: 1483: 1472:โ‰™ +2 ร— 1.0111 1463: 1462: 1442: 1441: 1421: 1420: 1404: 1403: 1383: 1382: 1366: 1365: 1362:โ‰™ +2 ร— 1 = 2 1353: 1352: 1340: 1339: 1327: 1326: 1323:โ‰™ +2 ร— 1 = 1 1314: 1313: 1307: 1302: 1297: 1293: 1289: 1285: 1281: 1274: 1265: 1244: 1241: 1236: 1233: 1229: 1225: 1212: 1209: 1203: 1195: 1192: 1187: 1184: 1181: 1177: 1173: 1160: 1157: 1147: 1146: 1145: 1143: 1139: 1115: 1112: 1107: 1104: 1101: 1097: 1093: 1080: 1077: 1067: 1066: 1065: 1062: 1060: 1056: 1048: 1044: 1040: 1036: 1019: 1016: 1012: 1008: 1004: 987: 986: 985: 963: 947: 943: 939: 934: 931: 928: 924: 916: 914: 897: 894: 893: 890: 874: 870: 866: 861: 858: 855: 851: 843: 841: 824: 821: 820: 816: 800: 796: 792: 787: 784: 781: 777: 769: 767: 750: 747: 746: 742: 738: 722: 719: 715: 711: 706: 703: 700: 696: 688: 686: 669: 666: 665: 662: 660: 659:offset-binary 652: 650: 646: 644: 639: 619: 616: 613: 609: 605: 601: 595: 592: 588: 582: 579: 576: 572: 566: 561: 558: 555: 551: 547: 544: 540: 526: 523: 513: 512: 511: 492: 489: 486: 482: 478: 473: 463: 459: 455: 452: 449: 444: 440: 434: 430: 426: 410: 407: 397: 396: 395: 381: 374: 369: 367: 362: 355: 350: 347: 342: 340: 335: 333: 325: 322: 319: 316: 313: 310: 307: 306: 305: 303: 299: 290: 281: 276: 274: 269: 267: 262: 261: 259: 258: 253: 250: 249: 248: 247: 242: 237: 234: 232: 229: 227: 224: 222: 219: 217: 214: 212: 209: 207: 204: 203: 202: 201: 196: 191: 188: 185: 181: 179: 176:(binary128), 175: 171: 169: 165: 161: 159: 155: 151: 148: 144: 143: 142: 141: 138: 134: 131: 128: 124: 121: 119: 115: 111: 107: 103: 98: 96: 92: 88: 87:IEEE 754-1985 84: 80: 76: 73: 68: 66: 61: 59: 55: 54:dynamic range 51: 47: 46:number format 44: 40: 36: 32: 19: 2512:Intersection 2340: 2189:. Retrieved 2185: 2176: 2164:. Retrieved 2160: 2151: 2140:. Retrieved 2130: 2114: 2106: 2100: 2089:. Retrieved 2079: 2067:. Retrieved 2058: 2049: 2038:. Retrieved 2036:. 2017-12-08 2033: 2024: 2013:. Retrieved 1999: 1989:, retrieved 1980: 1970: 1945: 1921: 1918:Rust and Zig 1908: 1892: 1872: 1860: 1841: 1817: 1801: 1774: 1758: 1727: 1716: 1694: 1497:= 0.00000011 1141: 1135: 1063: 1054: 1052: 1046: 1038: 1014: 1006: 969: 895: 822: 748: 667: 656: 647: 640: 636: 509: 370: 363: 351: 343: 336: 329: 301: 297: 294: 244:Alternatives 166:(binary64), 163: 156:(binary32), 99: 90: 82: 78: 69: 62: 38: 34: 30: 29: 2742:Type theory 2737:Type system 2587:Bottom type 2534:Option type 2475:generalized 2361:Long double 2306:Fixed point 2186:ziglang.org 2059:gcc.gnu.org 1933:data type. 1862:Common Lisp 1857:Common Lisp 1430:โ‰™ +2 ร— 1.01 1059:significand 1021:11111111111 1003:signed zero 989:00000000000 900:11111111110 827:10000000101 753:01111111111 672:00000000001 354:significand 339:biased form 321:Significand 304:as having: 186:(binary256) 58:radix point 2762:Categories 2647:Empty type 2642:Type class 2592:Collection 2549:Refinement 2527:metaobject 2375:signedness 2234:Data types 2142:2022-02-01 2091:2021-09-12 2040:2018-11-05 2015:2009-02-04 1991:2018-07-16 1899:JavaScript 1895:ECMAScript 1889:JavaScript 1493:โ‰™ +2 ร— 1.1 1451:โ‰™ +2 ร— 1.1 1392:โ‰™ +2 ร— 1.1 1266:Endianness 178:decimal128 149:(binary16) 118:64-bit MBF 2722:Subtyping 2717:Interface 2700:metaclass 2652:Unit type 2622:Semaphore 2602:Exception 2507:Inductive 2497:Dependent 2462:Composite 2440:Character 2422:Reference 2319:Minifloat 2275:Bit array 2191:10 August 2166:10 August 1981:quadibloc 1929:have the 1814:C and C++ 1242:× 1234:− 1226:× 1210:− 1193:× 1185:− 1174:× 1158:− 1113:× 1105:− 1094:× 1078:− 1041:= 0) and 1017:โ‰  0); and 1009:= 0) and 932:− 859:− 785:− 720:− 704:− 617:− 606:× 593:− 580:− 552:∑ 524:− 490:− 479:× 408:− 324:precision 317:: 11 bits 206:Minifloat 182:256-bit: 174:Quadruple 172:128-bit: 168:decimal64 158:decimal32 2747:Variable 2637:Top type 2502:Equality 2410:physical 2387:Rational 2382:Interval 2329:bfloat16 2119:Archived 2069:30 April 2063:Archived 2009:Archived 1985:archived 1959:Archived 1883:strictfp 1806:without 1700:IEEE 754 1296:IEEE 754 1249:fraction 1200:fraction 1120:fraction 315:Exponent 309:Sign bit 302:binary64 211:bfloat16 162:64-bit: 152:32-bit: 145:16-bit: 137:IEEE 754 114:GW-BASIC 79:binary64 75:standard 72:IEEE 754 2690:Generic 2666:Related 2582:Boolean 2539:Product 2415:virtual 2405:Address 2397:Pointer 2370:Integer 2301:Decimal 2296:Complex 2284:Numeric 1843:Fortran 1838:Fortran 1712:PA-RISC 1476:= 10111 912:=2046: 839:=1029: 765:=1023: 311:: 1 bit 184:Octuple 130:formats 106:Fortran 70:In the 41:) is a 39:float64 2680:Boxing 2668:topics 2627:Stream 2564:tagged 2522:Object 2445:String 1847:real64 1828:double 1762:NVIDIA 1053:where 332:signed 298:double 164:Double 154:Single 83:double 2575:Other 2559:Union 2492:Class 2482:Array 2265:Tryte 2122:(PDF) 2111:(PDF) 1962:(PDF) 1955:(PDF) 1689:โ‰ˆ pi 1587:โ‰™ โˆ’0 1574:โ‰™ +0 1480:= 23 1455:= 110 1434:= 101 1049:โ‰  0), 198:Other 2695:Kind 2657:Void 2517:List 2432:Text 2270:Word 2260:Trit 2255:Byte 2193:2024 2168:2024 2071:2018 1925:and 1923:Rust 1911:JSON 1909:The 1905:JSON 1875:Java 1869:Java 1808:SSE2 1766:CUDA 1459:= 6 1438:= 5 1417:= 4 1400:= 3 1396:= 11 1237:1022 1221:sign 1188:1023 1169:sign 1108:1023 1089:sign 1045:(if 1043:NaNs 1037:(if 1013:(if 1005:(if 977:and 948:1023 935:1023 929:2046 862:1023 856:1029 788:1023 782:1023 723:1022 707:1023 684:=1: 620:1023 535:sign 493:1023 419:sign 147:Half 50:bits 35:FP64 2554:Set 2250:Bit 1931:f64 1927:Zig 1879:x87 1873:On 1824:C99 1804:x86 1764:'s 1708:ARM 1704:x86 1292:XDR 1288:VAX 1284:ARM 1028:7ff 996:000 979:7ff 972:000 907:7fe 834:405 760:3ff 679:001 510:or 357:log 334:). 97:). 85:in 37:or 2764:: 2184:. 2159:. 2117:. 2061:. 2057:. 2032:. 2007:. 1983:, 1979:, 1834:. 1772:. 1749:16 1745:16 1741:16 1737:16 1733:16 1687:16 1667:16 1650:16 1637:16 1624:16 1611:16 1598:16 1585:16 1572:16 1555:16 1542:16 1529:16 1516:16 1491:16 1470:16 1449:16 1428:16 1411:16 1390:16 1373:16 1360:16 1347:16 1334:16 1321:16 1245:0. 1196:0. 1116:1. 1030:16 998:16 981:16 974:16 909:16 836:16 762:16 743:) 681:16 577:52 567:52 445:50 435:51 427:1. 359:10 60:. 2226:e 2219:t 2212:v 2195:. 2170:. 2145:. 2094:. 2073:. 2043:. 2018:. 1719:3 1683:2 1671:3 1663:2 1646:2 1633:2 1620:2 1607:2 1594:2 1581:2 1568:2 1551:2 1538:2 1525:2 1512:2 1499:2 1495:2 1487:2 1478:2 1474:2 1466:2 1457:2 1453:2 1445:2 1436:2 1432:2 1424:2 1415:2 1407:2 1398:2 1394:2 1386:2 1369:2 1356:2 1343:2 1330:2 1317:2 1275:. 1230:2 1217:) 1213:1 1207:( 1204:= 1182:1 1178:2 1165:) 1161:1 1155:( 1142:e 1140:( 1102:e 1098:2 1085:) 1081:1 1075:( 1055:F 1047:F 1039:F 1035:โˆž 1026:= 1023:2 1015:F 1007:F 994:= 991:2 944:2 940:= 925:2 905:= 902:2 898:= 896:e 875:6 871:2 867:= 852:2 832:= 829:2 825:= 823:e 801:0 797:2 793:= 778:2 758:= 755:2 751:= 749:e 716:2 712:= 701:1 697:2 677:= 674:2 670:= 668:e 614:e 610:2 602:) 596:i 589:2 583:i 573:b 562:1 559:= 556:i 548:+ 545:1 541:( 531:) 527:1 521:( 487:e 483:2 474:2 470:) 464:0 460:b 456:. 453:. 450:. 441:b 431:b 424:( 415:) 411:1 405:( 382:e 279:e 272:t 265:v 20:)

Index

Double precision
floating-point
number format
bits
dynamic range
radix point
single precision
IEEE 754
standard
IEEE 754-1985
decimal floating point
programming languages
Fortran
computer manufacturer
GW-BASIC
64-bit MBF
Floating-point
formats
IEEE 754
Half
Single
decimal32
Double
decimal64
Quadruple
decimal128
Octuple
Extended precision
Minifloat
bfloat16

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

โ†‘