Knowledge (XXG)

XOP instruction set

Source đź“ť

142:) have a value larger than or equal to 8 in order to avoid overlap with existing instructions. The C4 byte used in the VEX scheme has no such restriction. This may prevent the use of the m-bits for other purposes in the future in the XOP scheme, but not in the VEX scheme. Another possible problem is that the pp bits have the value 00 in the XOP scheme, while they have the value 01 in the VEX scheme for instructions that have no legacy equivalent. This may complicate the use of the pp bits for other purposes in the future. 1004:. Like the AVX instruction VPBLENDVB, it is a four-operand instruction with three source operands and a destination. For each bit in the third operand (which acts as a selector), 1 selects the same bit in the first source, and 0 selects the same in the second source. When used together with the XOP vector comparison instructions above this can be used to implement a vectorized ternary move, or if the second input is the same as the destination, a conditional move ( 1198:. It takes three registers as input, the first two are source registers and the third the selector register. Each byte in the selector selects one of the bytes in one of the two input registers for the output. The selector can also apply effects on the selected bytes such as setting it to 0, reverse the bit order, and repeating the most-significant bit. All of the effects or the input can in addition be inverted. 476:
Horizontal addition instructions adds adjacent values in the input vector to each other. The output size in the instructions below describes how wide the horizontal addition performed is. For instance horizontal byte to word adds two bytes at a time and returns the result as vector of words, but byte
816:
This set of vector compare instructions all take an immediate as an extra argument. The immediate controls what kind of comparison is performed. There are eight comparison possible for each instruction. The vectors are compared and all comparisons that evaluate to true set all corresponding bits in
1042:
in that they can shift each unit with a different amount using a vector register interpreted as packed signed integers. The sign indicates the direction of shift or rotate, with positive values causing left shift and negative right shift Intel has specified a different incompatible set of variable
134:
Commentators have seen this as evidence that Intel has not allowed AMD to use any part of the large VEX coding space. AMD has been forced to use different codes in order to avoid using any code combination that Intel might possibly be using in its development pipeline for something else. The XOP
1419:
Byte value 0x8F is an existing opcode for a POP instruction. This instruction uses the ModR/M byte, which follows the opcode, but it does not make use of the "reg" (register) field, which is bits 3-5. Some opcodes which don't use "reg" multiplex instructions by using these bits to signify eight
135:
coding scheme is as close to the VEX scheme as technically possible without risking that the AMD codes overlap with future Intel codes. This inference is speculative, since no public information is available about negotiations between the two companies on this issue.
149:
instruction sets. Intel initially proposed FMA4 in AVX/FMA specification version 3 to supersede the 3-operand FMA proposed by AMD in SSE5. After AMD adopted FMA4, Intel canceled FMA4 support and reverted to FMA3 in the AVX/FMA specification version 5 (See
1420:
different instructions (0x80-0x83 and 0xD0-0xDF, among others); 0x8F does not. This means, for a standard POP instruction, bits 3-5 should always be zero. Since the m-bits are bits 0-4, requiring a value 8 or higher sets bit 3 of the byte following 0x8F.
62:. Most of the instructions are integer instructions, but it also contains floating point permutation and floating point fraction extraction instructions. See the index for a list of instruction types. 1512:
But with Zen being a clean-sheet design, there are some instruction set extensions found in Bulldozer processors not found in Zen/znver1. Those no longer present include FMA4 and XOP.
477:
to quadword adds eight bytes together at a time and returns the result as vector of quadwords. Six additional horizontal addition and subtraction instructions can be found in
2118: 2487: 1825: 1448: 158: 817:
the destination to 1, and false comparisons sets all the same bits to 0. This result can be used directly in VPCMOV instruction for a vectorized
1737: 1973: 1549: 2111: 58:
The XOP instruction set contains several different types of vector instructions since it was originally intended as a major upgrade to
2212: 2130: 2493: 2372: 2248: 2138: 2080: 2068: 2063: 2058: 2053: 2503: 2142: 1789: 1678: 1275:
These instructions extracts the fractional part of floating point, that is the part that would be lost in conversion to integer.
1705: 1763: 1357: 2104: 1351: 1363: 1345: 2389: 1818: 1501: 2347: 2311: 2021: 1543: 1380: 1218: 128: 116: 108: 81: 1572: 2566: 2556: 2462: 2418: 2273: 59: 36: 161:, its third-generation x86-64 architecture in its first iteration (znver1 – Zen, version 1), will not support 89: 2561: 2468: 2397: 2168: 2163: 1811: 52: 2236: 1336: 32: 1454: 157:
In March 2015, AMD explicitly revealed in the description of the patch for the GNU Binutils package that
2438: 2261: 2031: 1948: 1385: 93: 48: 123:
equivalents in AVX were classified as the XOP extension. The XOP instructions have an opcode byte 8F (
2479: 2450: 2195: 1651: 1479:
AMD64 Architecture Programmer's Manual Volume 6: 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions
1390: 2432: 2329: 2085: 2046: 2041: 2036: 1859: 1741: 186: 151: 146: 112: 85: 2523: 2517: 2511: 51:
processor core, which was released on October 12, 2011. However AMD removed support for XOP from
1943: 2402: 2200: 2180: 2364: 818: 1619: 1599: 115:
instruction sets announced by Intel have been changed to use the coding proposed by Intel.
2127: 1769: 28: 1652:"AMD64 Architecture Programmer's Manual, Volume4: 128-Bit and 256-Bit Media Instructions" 1968: 177:
instructions developed specifically for the "Bulldozer" family of micro-architectures.
97: 84:, parts that overlapped with AVX were removed or moved to separate standards such as 2550: 1983: 1963: 1958: 1713: 1528: 2426: 1624: 1604: 124: 1978: 1918: 1913: 1869: 481:, but they operate on two input vectors and only does two and two operations. 139: 2096: 107:
All SSE5 instructions that were equivalent or similar to instructions in the
2456: 2378: 2219: 2156: 2151: 1953: 1923: 1524: 1477: 1190:
instruction PALIGNR and PSHUFB and adds more to both. Some compare it the
1993: 1988: 1506: 2341: 2224: 2207: 2190: 2173: 1897: 1578: 1191: 2444: 2407: 2267: 2016: 2011: 1864: 1854: 1706:"Intel Architecture Instruction Set Extensions Programming Reference" 2353: 2291: 2231: 1849: 1709: 1187: 478: 390:
Multiply Accumulate (with Saturation) High Doubleword to Quadword
348:
Multiply Accumulate (with Saturation) Low Doubleword to Quadword
101: 44: 1803: 1225:
they can select output from any of the fields in the two inputs.
2335: 2317: 2303: 2297: 2285: 2279: 2185: 2075: 1892: 1502:"The Impact Of GCC Zen Compiler Tuning On AMD Ryzen Performance" 1395: 1039: 1001: 190: 77: 71: 2100: 1807: 305:
Multiply Accumulate (with Saturation) Doubleword to Doubleword
2252: 1834: 1685: 1658: 1486: 1400: 40: 432:
Multiply Add Accumulate (with Saturation) Word to Doubleword
262:
Multiply Accumulate (with Saturation) Low Word to Doubleword
145:
A similar compatibility issue is the difference between the
685:
Horizontal add two signed/unsigned doublewords to quadword
76:
XOP is a revised subset of what was originally intended as
1324:
Extract Fraction Scalar Single-Precision Floating Point
1314:
Extract Fraction Scalar Double-Precision Floating-Point
1304:
Extract Fraction Packed Single-Precision Floating-Point
1294:
Extract Fraction Packed Double-Precision Floating-Point
546:
Horizontal add four signed/unsigned bytes to doubleword
80:. It was changed to be similar but not overlapping with 785:
Horizontal subtract two signed doublewords to quadword
612:
Horizontal add two signed/unsigned words to doubleword
581:
Horizontal add eight signed/unsigned bytes to quadword
1620:"[PATCH] Remove CpuFMA4 From Znver1 CPU Flags" 1574:
Intel Advanced Vector Extensions Programming Reference
1000:
works as bitwise variant of the blend instructions in
651:
Horizontal add four signed/unsigned words to quadword
189:. These are all four operand instructions similar to 138:
The use of the 8F byte requires that the m-bits (see
1442: 1440: 1438: 1436: 2502: 2478: 2416: 2388: 2363: 2247: 2137: 2004: 1936: 1906: 1885: 1878: 1842: 1264:Permute Two-Source Single-Precision Floating-Point 1254:Permute Two-Source Double-Precision Floating-Point 750:Horizontal subtract two signed words to doubleword 219:Multiply Accumulate (with Saturation) Word to Word 127:), but otherwise almost identical coding scheme as 16:Computer instruction set introduced by AMD in 2009 1038:The shift instructions here differ from those in 507:Horizontal add two signed/unsigned bytes to word 100:floating-point conversion implemented as F16C by 1472: 1470: 1679:"New "Bulldozer" and "Piledriver" Instructions" 181:Integer vector multiply–accumulate instructions 35:on May 1, 2009, is an extension to the 128-bit 2375:(ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012) 2112: 1819: 1791:New "Bulldozer" and "Piledriver" Instructions 1646: 1644: 1642: 1640: 1638: 1636: 1634: 715:Horizontal subtract two signed bytes to word 8: 1209:instructions are two source versions of the 1034:Integer vector shift and rotate instructions 1598:Ganesh Gopalasubramanian (March 10, 2015). 2119: 2105: 2097: 1882: 1826: 1812: 1804: 1227: 1186:is a single instruction that combines the 1045: 915: 823: 483: 193:and they all operate on signed integers. 1277: 1010: 195: 2435:(2008); ARMv8 also has AES instructions 1432: 1412: 1600:"[PATCH] add znver1 processor" 1768:, AMD Developer blogs, archived from 1453:, AMD Developer blogs, archived from 7: 1122:Packed Shift Arithmetic Doublewords 900:Compare Vector Unsigned Doublewords 1043:vector shift instructions in AVX2. 1271:Floating-point fraction extraction 1132:Packed Shift Arithmetic Quadwords 910:Compare Vector Unsigned Quadwords 860:Compare Vector Signed Doublewords 472:Integer vector horizontal addition 14: 1500:Michael Larabel (March 3, 2017). 1366:processors (including "v2"), 2015 1162:Packed Shift Logical Doublewords 185:These are integer version of the 2535:Suspended extensions' dates are 870:Compare Vector Signed Quadwords 1545:Intel AVX Programming Reference 1172:Packed Shift Logical Quadwords 1577:, January 2009, archived from 1112:Packed Shift Arithmetic Words 1102:Packed Shift Arithmetic Bytes 890:Compare Vector Unsigned Words 880:Compare Vector Unsigned Bytes 1: 1738:"Buldozer x264 optimisations" 1618:Amit Pawar (August 7, 2015). 1342:"Heavy Equipment" processors 1762:Dave Christie (2009-05-07), 1548:, March 2008, archived from 1530:Stop the instruction set war 1447:Dave Christie (2009-05-07), 850:Compare Vector Signed Words 840:Compare Vector Signed Bytes 595:r0 = a0+a1+a2+a3+a4+a5+a6+a7 131:with the 3-byte VEX prefix. 1152:Packed Shift Logical Words 1142:Packed Shift Logical Bytes 454:r0 = a0 * b0 + a1 * b1 + c0 2583: 1082:Packed Rotate Doublewords 69: 2533: 1318: 1308: 1298: 1288: 1283: 1280: 1258: 1248: 1238: 1233: 1230: 1166: 1156: 1146: 1136: 1126: 1116: 1106: 1096: 1086: 1076: 1066: 1056: 1051: 1048: 921: 918: 904: 894: 884: 874: 864: 854: 844: 834: 829: 826: 779: 744: 709: 675: 641: 602: 571: 536: 497: 492: 489: 486: 422: 380: 338: 295: 252: 209: 204: 201: 198: 39:core instructions in the 2332:(FMA4: 2011, FMA3: 2012) 1092:Packed Rotate Quadwords 1027:Vector Conditional Move 47:instruction set for the 2390:Compressed instructions 993:Vector conditional move 88:(floating-point vector 53:Zen (microarchitecture) 954:Greater Than or Equal 812:Integer vector compare 1949:High Bandwidth Memory 2480:Transactional memory 1581:on February 29, 2012 1527:(December 5, 2009), 1244:Packed Permute Byte 1072:Packed Rotate Words 1062:Packed Rotate Bytes 443:) + 4 doublewords ( 316:) + 4 doublewords ( 273:) + 4 doublewords ( 1797:, AMD, October 2012 1719:on February 1, 2014 1360:processors, Q1 2014 1354:processors, Q4 2012 1348:processors, Q4 2011 938:Less Than or Equal 757:) → 4 doublewords ( 619:) → 4 doublewords ( 553:) → 4 doublewords ( 447:) → 4 doublewords ( 320:) → 4 doublewords ( 277:) → 4 doublewords ( 187:FMA instruction set 90:multiply–accumulate 25:eXtended Operations 1765:Striking a balance 1450:Striking a balance 401:) + 2 quadwords ( 393:2x4 doublewords ( 359:) + 2 quadwords ( 351:2x4 doublewords ( 308:2x4 doublewords ( 2544: 2543: 2094: 2093: 1932: 1931: 1358:Steamroller-based 1328: 1327: 1268: 1267: 1221:which means like 1176: 1175: 1031: 1030: 990: 989: 914: 913: 809: 808: 792:) → 2 quadwords ( 692:) → 2 quadwords ( 658:) → 2 quadwords ( 588:) → 2 quadwords ( 469: 468: 416:r1 = a3 * b3 + c1 412:r0 = a1 * b1 + c0 405:) → 2 quadwords ( 374:r1 = a2 * b2 + c1 370:r0 = a0 * b0 + c0 363:) → 2 quadwords ( 331:r1 = a1 * b1 + c1 327:r0 = a0 * b0 + c0 288:r1 = a2 * b2 + c1 284:r0 = a0 * b0 + c0 245:r1 = a1 * b1 + c1 241:r0 = a0 * b0 + c0 140:VEX coding scheme 2574: 2567:AMD technologies 2557:X86 instructions 2365:Bit manipulation 2121: 2114: 2107: 2098: 1883: 1828: 1821: 1814: 1805: 1799: 1798: 1796: 1786: 1780: 1779: 1778: 1777: 1759: 1753: 1752: 1750: 1749: 1740:. Archived from 1734: 1728: 1727: 1725: 1724: 1718: 1712:. Archived from 1702: 1696: 1695: 1693: 1692: 1683: 1675: 1669: 1668: 1666: 1665: 1656: 1648: 1629: 1628: 1615: 1609: 1608: 1595: 1589: 1588: 1587: 1586: 1569: 1563: 1562: 1561: 1560: 1554: 1540: 1534: 1533: 1521: 1515: 1514: 1497: 1491: 1490: 1484: 1474: 1465: 1464: 1463: 1462: 1444: 1421: 1417: 1352:Piledriver-based 1321: 1311: 1301: 1291: 1278: 1261: 1251: 1241: 1228: 1224: 1217:instructions in 1216: 1212: 1208: 1204: 1197: 1185: 1169: 1159: 1149: 1139: 1129: 1119: 1109: 1099: 1089: 1079: 1069: 1059: 1046: 1024: 1011: 1007: 999: 916: 907: 897: 887: 877: 867: 857: 847: 837: 824: 819:conditional move 804: 800: 795: 791: 788:4 doublewords ( 782: 773: 769: 765: 760: 756: 747: 738: 734: 730: 725: 721: 712: 704: 700: 695: 691: 688:4 doublewords ( 682: 678: 670: 669:r1 = a4+a5+a6+a7 666: 665:r0 = a0+a1+a2+a3 661: 657: 648: 644: 635: 631: 627: 622: 618: 609: 605: 596: 591: 587: 578: 574: 565: 564:r1 = a4+a5+a6+a7 561: 560:r0 = a0+a1+a2+a3 556: 552: 543: 539: 530: 526: 522: 517: 513: 504: 500: 484: 463: 459: 455: 450: 446: 442: 438: 429: 425: 417: 413: 408: 404: 400: 396: 387: 383: 375: 371: 366: 362: 358: 354: 345: 341: 332: 328: 323: 319: 315: 311: 302: 298: 289: 285: 280: 276: 272: 268: 259: 255: 246: 242: 237: 233: 229: 225: 216: 212: 196: 176: 172: 168: 164: 2582: 2581: 2577: 2576: 2575: 2573: 2572: 2571: 2547: 2546: 2545: 2540: 2529: 2498: 2474: 2412: 2384: 2359: 2243: 2133: 2128:Instruction set 2125: 2095: 2090: 2000: 1928: 1902: 1874: 1860:Radeon Software 1838: 1832: 1802: 1794: 1788: 1787: 1783: 1775: 1773: 1761: 1760: 1756: 1747: 1745: 1736: 1735: 1731: 1722: 1720: 1716: 1704: 1703: 1699: 1690: 1688: 1681: 1677: 1676: 1672: 1663: 1661: 1654: 1650: 1649: 1632: 1627:(Mailing list). 1617: 1616: 1612: 1607:(Mailing list). 1597: 1596: 1592: 1584: 1582: 1571: 1570: 1566: 1558: 1556: 1552: 1542: 1541: 1537: 1523: 1522: 1518: 1499: 1498: 1494: 1482: 1476: 1475: 1468: 1460: 1458: 1446: 1445: 1434: 1430: 1425: 1424: 1418: 1414: 1409: 1377: 1364:Excavator-based 1346:Bulldozer-based 1333: 1319: 1309: 1299: 1289: 1273: 1259: 1249: 1239: 1222: 1214: 1210: 1206: 1202: 1195: 1183: 1181: 1167: 1157: 1147: 1137: 1127: 1117: 1107: 1097: 1087: 1077: 1067: 1057: 1036: 1022: 1005: 997: 995: 905: 895: 885: 875: 865: 855: 845: 835: 814: 802: 798: 793: 789: 780: 771: 767: 763: 758: 754: 745: 736: 732: 728: 723: 719: 710: 702: 698: 693: 689: 680: 676: 668: 664: 659: 655: 646: 642: 633: 629: 625: 620: 616: 607: 603: 594: 589: 585: 576: 572: 563: 559: 554: 550: 541: 537: 528: 524: 520: 515: 511: 502: 498: 474: 461: 457: 453: 448: 444: 440: 436: 427: 423: 415: 411: 406: 402: 398: 394: 385: 381: 373: 369: 364: 360: 356: 352: 343: 339: 330: 326: 321: 317: 313: 309: 300: 296: 287: 283: 278: 274: 270: 266: 257: 253: 244: 240: 235: 231: 227: 223: 214: 210: 183: 174: 170: 166: 162: 74: 68: 31:, announced by 29:instruction set 17: 12: 11: 5: 2580: 2578: 2570: 2569: 2564: 2562:SIMD computing 2559: 2549: 2548: 2542: 2541: 2537:struck through 2534: 2531: 2530: 2528: 2527: 2521: 2515: 2508: 2506: 2504:Virtualization 2500: 2499: 2497: 2496: 2491: 2484: 2482: 2476: 2475: 2473: 2472: 2466: 2460: 2454: 2448: 2442: 2436: 2430: 2423: 2421: 2414: 2413: 2411: 2410: 2405: 2400: 2394: 2392: 2386: 2385: 2383: 2382: 2376: 2369: 2367: 2361: 2360: 2358: 2357: 2351: 2345: 2339: 2333: 2327: 2321: 2315: 2309: 2301: 2295: 2289: 2283: 2277: 2271: 2265: 2258: 2256: 2245: 2244: 2242: 2241: 2240: 2239: 2229: 2228: 2227: 2217: 2216: 2215: 2205: 2204: 2203: 2198: 2193: 2188: 2178: 2177: 2176: 2171: 2161: 2160: 2159: 2148: 2146: 2135: 2134: 2126: 2124: 2123: 2116: 2109: 2101: 2092: 2091: 2089: 2088: 2083: 2078: 2073: 2072: 2071: 2066: 2061: 2051: 2050: 2049: 2044: 2034: 2029: 2024: 2019: 2014: 2008: 2006: 2002: 2001: 1999: 1998: 1997: 1996: 1986: 1981: 1976: 1971: 1966: 1961: 1956: 1951: 1946: 1940: 1938: 1934: 1933: 1930: 1929: 1927: 1926: 1921: 1916: 1910: 1908: 1904: 1903: 1901: 1900: 1895: 1889: 1887: 1880: 1876: 1875: 1873: 1872: 1867: 1862: 1857: 1852: 1846: 1844: 1840: 1839: 1833: 1831: 1830: 1823: 1816: 1808: 1801: 1800: 1781: 1754: 1729: 1697: 1670: 1630: 1610: 1590: 1564: 1535: 1516: 1492: 1466: 1431: 1429: 1426: 1423: 1422: 1411: 1410: 1408: 1405: 1404: 1403: 1398: 1393: 1388: 1383: 1376: 1373: 1372: 1371: 1370: 1369: 1368: 1367: 1361: 1355: 1349: 1332: 1329: 1326: 1325: 1322: 1316: 1315: 1312: 1306: 1305: 1302: 1296: 1295: 1292: 1286: 1285: 1282: 1272: 1269: 1266: 1265: 1262: 1256: 1255: 1252: 1246: 1245: 1242: 1236: 1235: 1232: 1180: 1179:Vector permute 1177: 1174: 1173: 1170: 1164: 1163: 1160: 1154: 1153: 1150: 1144: 1143: 1140: 1134: 1133: 1130: 1124: 1123: 1120: 1114: 1113: 1110: 1104: 1103: 1100: 1094: 1093: 1090: 1084: 1083: 1080: 1074: 1073: 1070: 1064: 1063: 1060: 1054: 1053: 1050: 1035: 1032: 1029: 1028: 1025: 1019: 1018: 1015: 994: 991: 988: 987: 984: 980: 979: 976: 972: 971: 968: 964: 963: 960: 956: 955: 952: 948: 947: 944: 940: 939: 936: 932: 931: 928: 924: 923: 920: 912: 911: 908: 902: 901: 898: 892: 891: 888: 882: 881: 878: 872: 871: 868: 862: 861: 858: 852: 851: 848: 842: 841: 838: 832: 831: 828: 813: 810: 807: 806: 786: 783: 777: 776: 751: 748: 742: 741: 722:5) → 8 words ( 716: 713: 707: 706: 686: 683: 673: 672: 652: 649: 639: 638: 613: 610: 600: 599: 582: 579: 569: 568: 547: 544: 534: 533: 508: 505: 495: 494: 491: 488: 473: 470: 467: 466: 433: 430: 420: 419: 391: 388: 378: 377: 349: 346: 336: 335: 306: 303: 293: 292: 263: 260: 250: 249: 230:) + 8 words ( 220: 217: 207: 206: 203: 200: 182: 179: 122: 98:Half-precision 70:Main article: 67: 64: 15: 13: 10: 9: 6: 4: 3: 2: 2579: 2568: 2565: 2563: 2560: 2558: 2555: 2554: 2552: 2538: 2532: 2525: 2522: 2519: 2516: 2513: 2510: 2509: 2507: 2505: 2501: 2495: 2492: 2489: 2486: 2485: 2483: 2481: 2477: 2470: 2467: 2464: 2461: 2458: 2455: 2452: 2449: 2446: 2443: 2440: 2437: 2434: 2431: 2428: 2425: 2424: 2422: 2420: 2417:Security and 2415: 2409: 2406: 2404: 2401: 2399: 2396: 2395: 2393: 2391: 2387: 2380: 2377: 2374: 2371: 2370: 2368: 2366: 2362: 2355: 2352: 2349: 2346: 2343: 2340: 2337: 2334: 2331: 2328: 2325: 2322: 2319: 2316: 2313: 2310: 2308: 2305: 2302: 2299: 2296: 2293: 2290: 2287: 2284: 2281: 2278: 2275: 2272: 2269: 2266: 2263: 2260: 2259: 2257: 2254: 2250: 2246: 2238: 2235: 2234: 2233: 2230: 2226: 2223: 2222: 2221: 2218: 2214: 2211: 2210: 2209: 2206: 2202: 2199: 2197: 2194: 2192: 2189: 2187: 2184: 2183: 2182: 2179: 2175: 2172: 2170: 2167: 2166: 2165: 2162: 2158: 2155: 2154: 2153: 2150: 2149: 2147: 2144: 2140: 2136: 2132: 2129: 2122: 2117: 2115: 2110: 2108: 2103: 2102: 2099: 2087: 2084: 2082: 2079: 2077: 2074: 2070: 2067: 2065: 2062: 2060: 2057: 2056: 2055: 2052: 2048: 2045: 2043: 2040: 2039: 2038: 2035: 2033: 2030: 2028: 2025: 2023: 2020: 2018: 2015: 2013: 2010: 2009: 2007: 2003: 1995: 1992: 1991: 1990: 1987: 1985: 1982: 1980: 1977: 1975: 1972: 1970: 1967: 1965: 1962: 1960: 1957: 1955: 1952: 1950: 1947: 1945: 1942: 1941: 1939: 1935: 1925: 1922: 1920: 1917: 1915: 1912: 1911: 1909: 1905: 1899: 1896: 1894: 1891: 1890: 1888: 1884: 1881: 1877: 1871: 1868: 1866: 1863: 1861: 1858: 1856: 1853: 1851: 1848: 1847: 1845: 1841: 1836: 1829: 1824: 1822: 1817: 1815: 1810: 1809: 1806: 1793: 1792: 1785: 1782: 1772:on 2013-11-09 1771: 1767: 1766: 1758: 1755: 1744:on 2014-01-15 1743: 1739: 1733: 1730: 1715: 1711: 1707: 1701: 1698: 1687: 1680: 1674: 1671: 1660: 1653: 1647: 1645: 1643: 1641: 1639: 1637: 1635: 1631: 1626: 1625: 1621: 1614: 1611: 1606: 1605: 1601: 1594: 1591: 1580: 1576: 1575: 1568: 1565: 1555:on 2011-08-07 1551: 1547: 1546: 1539: 1536: 1532: 1531: 1526: 1520: 1517: 1513: 1509: 1508: 1503: 1496: 1493: 1489:, May 1, 2009 1488: 1481: 1480: 1473: 1471: 1467: 1457:on 2013-11-04 1456: 1452: 1451: 1443: 1441: 1439: 1437: 1433: 1427: 1416: 1413: 1406: 1402: 1399: 1397: 1394: 1392: 1389: 1387: 1384: 1382: 1379: 1378: 1374: 1365: 1362: 1359: 1356: 1353: 1350: 1347: 1344: 1343: 1341: 1340: 1338: 1335: 1334: 1331:CPUs with XOP 1330: 1323: 1317: 1313: 1307: 1303: 1297: 1293: 1287: 1279: 1276: 1270: 1263: 1257: 1253: 1247: 1243: 1237: 1229: 1226: 1220: 1199: 1193: 1189: 1178: 1171: 1165: 1161: 1155: 1151: 1145: 1141: 1135: 1131: 1125: 1121: 1115: 1111: 1105: 1101: 1095: 1091: 1085: 1081: 1075: 1071: 1065: 1061: 1055: 1047: 1044: 1041: 1033: 1026: 1021: 1020: 1016: 1013: 1012: 1009: 1003: 992: 985: 982: 981: 977: 974: 973: 969: 966: 965: 961: 958: 957: 953: 950: 949: 946:Greater Than 945: 942: 941: 937: 934: 933: 929: 926: 925: 917: 909: 903: 899: 893: 889: 883: 879: 873: 869: 863: 859: 853: 849: 843: 839: 833: 825: 822: 820: 811: 805: 787: 784: 778: 775: 752: 749: 743: 740: 717: 714: 708: 705: 687: 684: 674: 671: 653: 650: 640: 637: 614: 611: 601: 598: 583: 580: 570: 567: 548: 545: 535: 532: 514:) → 8 words ( 509: 506: 496: 485: 482: 480: 471: 465: 434: 431: 421: 418: 392: 389: 379: 376: 350: 347: 337: 334: 307: 304: 294: 291: 264: 261: 251: 248: 234:) → 8 words ( 221: 218: 208: 197: 194: 192: 188: 180: 178: 160: 155: 153: 148: 147:FMA3 and FMA4 143: 141: 136: 132: 130: 126: 120: 119:instructions 118: 114: 110: 105: 103: 99: 95: 91: 87: 83: 79: 73: 65: 63: 61: 56: 54: 50: 46: 42: 38: 34: 30: 26: 22: 2536: 2419:cryptography 2323: 2306: 2026: 2005:Instructions 1944:Cool'n'Quiet 1790: 1784: 1774:, retrieved 1770:the original 1764: 1757: 1746:. Retrieved 1742:the original 1732: 1721:. Retrieved 1714:the original 1700: 1689:. Retrieved 1673: 1662:. Retrieved 1623: 1613: 1603: 1593: 1583:, retrieved 1579:the original 1573: 1567: 1557:, retrieved 1550:the original 1544: 1538: 1529: 1519: 1511: 1505: 1495: 1478: 1459:, retrieved 1455:the original 1449: 1415: 1284:Description 1281:Instruction 1274: 1234:Description 1231:Instruction 1200: 1194:instruction 1182: 1052:Description 1049:Instruction 1037: 1017:Description 1014:Instruction 996: 830:Description 827:Instruction 815: 797: 762: 727: 697: 663: 624: 593: 558: 519: 490:Description 487:Instruction 475: 462:a3 * b3 + c1 458:r1 = a2 * b2 452: 435:2x8 words ( 410: 368: 325: 282: 265:2x8 words ( 239: 222:2x8 words ( 202:Description 199:Instruction 184: 156: 144: 137: 133: 106: 75: 57: 24: 20: 18: 2403:MIPS16e ASE 922:Comparison 152:FMA history 125:hexadecimal 2551:Categories 2131:extensions 2032:CVT16/F16C 1979:AMD Wraith 1969:Turbo Core 1937:Technology 1870:Xilinx ISE 1837:technology 1776:2012-01-17 1748:2014-01-13 1723:2014-01-29 1691:2014-01-13 1664:2014-01-13 1585:2012-01-17 1559:2012-01-17 1461:2013-11-04 1428:References 1260:VPERMIL2PS 1250:VPERMIL2PD 1207:VPERMIL2PS 1203:VPERMIL2PD 970:Not Equal 930:Less Than 919:Immediate 803:r1 = a2-a3 799:r0 = a0-a1 772:r2 = a4-a5 768:r1 = a2-a3 764:r0 = a0-a1 753:8 words ( 737:r2 = a4-a5 733:r1 = a2-a3 729:r0 = a0-a1 718:16 bytes ( 703:r1 = a2+a3 699:r0 = a0+a1 654:8 words ( 634:r2 = a4+a5 630:r1 = a2+a3 626:r0 = a0+a1 615:8 words ( 584:16 bytes ( 549:16 bytes ( 529:r2 = a4+a5 525:r1 = a2+a3 521:r0 = a0+a1 510:16 bytes ( 493:Operation 428:VPMADCSSWD 386:VPMACSSDQH 344:VPMACSSDQL 205:Operation 2220:Power ISA 2201:MIPS SIMD 1964:PowerTune 1959:PowerPlay 1954:PowerNow! 1879:Platforms 1525:Agner Fog 1215:VPERMILPS 1211:VPERMILPD 681:VPHADDUDQ 647:VPHADDUWQ 608:VPHADDUWD 577:VPHADDUBQ 542:VPHADDUBD 503:VPHADDUBW 424:VPMADCSWD 382:VPMACSDQH 340:VPMACSDQL 301:VPMACSSDD 258:VPMACSSWD 215:VPMACSSWW 49:Bulldozer 2526:(AMD-Vi) 1994:Ryzen AI 1907:Obsolete 1843:Software 1507:Phoronix 1375:See also 781:VPHSUBDQ 746:VPHSUBWD 711:VPHSUBBW 677:VPHADDDQ 643:VPHADDWQ 604:VPHADDWD 573:VPHADDBQ 538:VPHADDBD 499:VPHADDBW 297:VPMACSDD 254:VPMACSWD 211:VPMACSWW 92:) and 55:onward. 2427:PadLock 2342:AVX-512 2208:PA-RISC 2191:MIPS-3D 1898:GPUOpen 1886:Current 1320:VFRCZSS 1310:VFRCZSD 1300:VFRCZPS 1290:VFRCZPD 1192:Altivec 906:VPCOMUQ 896:VPCOMUD 886:VPCOMUW 876:VPCOMUB 121:without 117:Integer 66:History 2520:(2006) 2514:(2005) 2490:(2013) 2471:(2021) 2465:(2015) 2459:(2015) 2453:(2013) 2447:(2012) 2445:RDRAND 2441:(2010) 2433:AES-NI 2429:(2003) 2381:(2014) 2356:(2023) 2350:(2022) 2344:(2015) 2338:(2013) 2326:(2009) 2320:(2009) 2314:(2008) 2307:(2007) 2300:(2006) 2294:(2006) 2288:(2004) 2282:(2001) 2276:(1999) 2270:(1998) 2268:3DNow! 2264:(1996) 2017:3DNow! 2012:X86-64 1984:Virtex 1919:Dragon 1914:Spider 1865:Vivado 1855:AMDGPU 1240:VPPERM 1223:VPPERM 1184:VPPERM 1168:VPSHLQ 1158:VPSHLD 1148:VPSHLW 1138:VPSHLB 1128:VPSHAQ 1118:VPSHAD 1108:VPSHAW 1098:VPSHAB 1088:VPROTQ 1078:VPROTD 1068:VPROTW 1058:VPROTB 1023:VPCMOV 998:VPCMOV 978:False 962:Equal 866:VPCOMQ 856:VPCOMD 846:VPCOMW 836:VPCOMB 774:, ... 739:, ... 636:, ... 597:, ... 586:a0-a15 566:, ... 551:a0-a15 531:, ... 512:a0-a15 2518:AMD-V 2439:CLMUL 2398:Thumb 2354:AVX10 2292:SSSE3 2232:SPARC 2152:Alpha 1924:Horus 1850:AGESA 1795:(PDF) 1717:(PDF) 1710:Intel 1682:(PDF) 1655:(PDF) 1553:(PDF) 1483:(PDF) 1407:Notes 1386:CVT16 1196:VPERM 1188:SSSE3 986:True 794:r0-r1 790:a0-a3 759:r0-r3 755:a0-a7 724:r0-r7 720:a0-a1 694:r0-r1 690:a0-a3 660:r0-r1 656:a0-a7 621:r0-r3 617:a0-a7 590:r0-r1 555:r0-r3 516:r0-r7 479:SSSE3 464:, .. 449:r0-r3 445:c0-c3 441:b0-b7 437:a0-a7 407:r0-r3 403:c0-c1 399:b0-b3 395:a0-a3 365:r0-r3 361:c0-c1 357:b0-b3 353:a0-a3 333:, .. 322:r0-r3 318:c0-c3 314:b0-b3 310:a0-a3 279:r0-r3 275:c0-c3 271:b0-b7 267:a0-a7 247:, .. 236:r0-r7 232:c0-c7 228:b0-b7 224:a0-a7 102:Intel 94:CVT16 45:AMD64 2524:VT-d 2512:VT-x 2336:AVX2 2318:F16C 2304:SSE5 2298:SSE4 2286:SSE3 2280:SSE2 2249:SIMD 2186:MDMX 2181:MIPS 2169:NEON 2143:RISC 2139:SIMD 2076:SSE5 2064:BMI1 2047:FMA3 2042:FMA4 1989:XDNA 1974:ASTC 1893:ROCm 1396:SSE5 1391:FMA4 1213:and 1205:and 1201:The 1040:SSE2 1006:CMOV 1002:SSE4 983:111 975:110 967:101 959:100 951:011 943:010 935:001 927:000 290:, . 191:FMA4 173:and 167:FMA4 113:FMA4 111:and 86:FMA4 78:SSE5 72:SSE5 43:and 19:The 2494:ASF 2488:TSX 2469:TDX 2463:SGX 2457:MPX 2451:SHA 2408:RVC 2379:ADX 2373:BMI 2348:AMX 2330:FMA 2324:XOP 2312:AVX 2274:SSE 2262:MMX 2253:x86 2237:VIS 2225:VMX 2213:MAX 2196:MXU 2174:SVE 2164:ARM 2157:MVI 2086:AES 2081:ASF 2069:TBM 2059:ABM 2054:BMI 2037:FMA 2027:XOP 2022:AVX 1835:AMD 1686:AMD 1659:AMD 1487:AMD 1401:x86 1381:AVX 1337:AMD 1219:AVX 1008:). 175:LWP 171:XOP 163:TBM 159:Zen 154:). 129:AVX 109:AVX 104:). 82:AVX 60:SSE 41:x86 37:SSE 33:AMD 21:XOP 2553:: 1708:. 1684:. 1657:. 1633:^ 1622:. 1602:. 1510:. 1504:. 1485:, 1469:^ 1435:^ 1339:: 821:. 801:, 796:) 770:, 766:, 761:) 735:, 731:, 726:) 701:, 696:) 679:, 667:, 662:) 645:, 632:, 628:, 623:) 606:, 592:) 575:, 562:, 557:) 540:, 527:, 523:, 518:) 501:, 460:+ 456:, 451:) 439:, 426:, 414:, 409:) 397:, 384:, 372:, 367:) 355:, 342:, 329:, 324:) 312:, 299:, 286:, 281:) 269:, 256:, 243:, 238:) 226:, 213:, 169:, 165:, 27:) 2539:. 2255:) 2251:( 2145:) 2141:( 2120:e 2113:t 2106:v 1827:e 1820:t 1813:v 1751:. 1726:. 1694:. 1667:. 96:( 23:(

Index

instruction set
AMD
SSE
x86
AMD64
Bulldozer
Zen (microarchitecture)
SSE
SSE5
SSE5
AVX
FMA4
multiply–accumulate
CVT16
Half-precision
Intel
AVX
FMA4
Integer
hexadecimal
AVX
VEX coding scheme
FMA3 and FMA4
FMA history
Zen
FMA instruction set
FMA4
SSSE3
conditional move
SSE4

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑