3578:
67:
214:. These techniques devote runtime resources toward figuring out implicit parallelism in a single thread. They are used in systems where they have evolved continuously (with backward compatibility) from single core processors. They usually have a 'few' cores (e.g. 2, 4, 8) and may be complemented by a manycore
459:, an improved 520-core variant of SW26010, with 512-bit SIMD (also adding support for half-precision), used in a prototype, meant for an exascale system (and in the future 10 exascale system), and according to datacenterdynamics China is rumored to already have two separate exascale systems secretly
535:
have over 5 million CPU cores. When there are also coprocessors, e.g. GPUs used with, then those cores are not listed in the core-count, then quite a few more computers would hit those targets.
2904:
1931:
565:, once one of the fastest supercomputers in the world, using a custom manycore architecture. As of November 2018, it was the world's third fastest supercomputer (as ranked by the
903:
2042:
1225:
2994:
1744:
2846:
1901:
1467:
1284:
1247:
739:"The Cell architecture is like nothing we have ever seen in commodity microprocessors, it is closer in design to multiprocessor vector supercomputers"
1896:
694:
Olofsson, Andreas; Nordstrƶm, Tomas; Ul-Abdin, Zain (2014). "Kickstarting High-performance Energy-efficient
Manycore Architectures with Epiphany".
1968:
347:
2975:
1721:
257:
and local memories gives software the opportunity to explicitly optimise the spatial layout of tasks (e.g. as seen in tooling developed for
3242:
2665:
1789:
1052:
896:
3265:
2675:
1816:
88:
3154:
943:
342:
820:
3260:
3237:
1983:
1811:
1784:
1163:
593:
158:
1134:
2839:
2798:
2361:
1254:
1220:
1215:
1099:
306:
250:
139:
3232:
3047:
2773:
2670:
2071:
1978:
1779:
1022:
1000:
889:
795:
608:
111:
38:
3603:
3339:
3202:
1518:
953:
92:
3563:
3397:
3015:
2935:
1973:
1821:
1655:
1269:
1230:
1087:
118:
750:
3608:
2410:
2255:
2250:
2172:
1648:
1609:
1264:
1259:
1193:
1005:
1129:
485:
A number of computers built from multicore processors have one million or more individual CPU cores. Examples include:
3613:
3582:
3528:
2988:
2832:
2037:
1734:
1432:
77:
870:
677:
283:, and only being suitable for highly parallel code (high throughput, but extremely poor single thread performance).
125:
3507:
3302:
3187:
3149:
2999:
2889:
2687:
2334:
1751:
1242:
1210:
980:
968:
948:
598:
265:
237:
is an issue limiting the scaling of multicore processors. Manycore processors may bypass this with methods such as
50:
96:
81:
3523:
3502:
3447:
3334:
3324:
3297:
3159:
2778:
2741:
2731:
1119:
292:
180:
198:
serial code, and therefore place more emphasis on high single-thread performance (e.g. devoting more silicon to
3477:
3103:
3042:
2955:
2793:
2200:
2136:
2113:
1963:
1925:
1761:
1711:
1706:
1183:
1077:
985:
539:
219:
107:
990:
3392:
3538:
3533:
2983:
2746:
2529:
2423:
2387:
2304:
2288:
2130:
1919:
1878:
1866:
1729:
1643:
1564:
1329:
933:
628:
425:
280:
223:
659:
3277:
3209:
3113:
3005:
2960:
2552:
2524:
2434:
2399:
2148:
2142:
2124:
1858:
1852:
1756:
1660:
1551:
1490:
1352:
995:
613:
544:
409:
199:
3067:
400:
3369:
3329:
3282:
3272:
3010:
2930:
2869:
2726:
2635:
2381:
2093:
1911:
1670:
1638:
1596:
1508:
1309:
1124:
1114:
1104:
1094:
1064:
1047:
912:
618:
215:
467:, a manycore processor designed for running convolutional neural nets for embedded vision applications
396:
3309:
3197:
3192:
3182:
3169:
2965:
2756:
2692:
2278:
2000:
1890:
1837:
1369:
1082:
938:
920:
583:
497:
246:
203:
187:
176:
172:
34:
2803:
2405:
393:
3472:
3427:
3253:
3248:
3227:
3093:
2788:
2608:
2459:
2441:
2393:
2047:
1994:
1799:
1794:
1771:
1687:
1569:
1424:
1319:
1178:
519:
441:
3497:
3346:
3319:
3144:
3108:
3098:
2879:
2874:
2855:
2660:
2652:
2504:
2479:
2158:
1682:
1623:
1503:
1235:
963:
720:
695:
633:
518:, a massively parallel (1 million CPU cores) manycore processor (ARM-based) built as part of the
3057:
850:"Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks"
132:
3543:
3219:
3177:
3072:
2613:
2580:
2496:
2428:
2329:
2319:
2309:
2240:
2235:
2230:
2153:
2082:
1988:
1948:
1581:
1531:
1481:
1457:
1339:
1279:
1274:
1156:
1072:
558:
493:
451:
405:
388:
337:
273:
254:
242:
46:
179:, and for higher throughput (or lower power consumption) at the expense of latency and lower
45:(from a few tens of cores to thousands or more). Manycore processors are used extensively in
3553:
3352:
3287:
3134:
2950:
2945:
2940:
2909:
2783:
2716:
2557:
2464:
2418:
2225:
2220:
2215:
2210:
2205:
2195:
2065:
2032:
1943:
1938:
1847:
1699:
1694:
1677:
1665:
1604:
1168:
1146:
1032:
1010:
928:
776:
603:
588:
269:
264:
Manycore processors may have more in common (conceptually) with technologies originating in
3417:
3357:
3292:
3139:
3129:
3062:
2894:
2884:
2697:
2682:
2630:
2534:
2509:
2346:
2339:
2190:
2185:
2180:
2119:
2027:
2017:
1739:
1574:
1526:
1289:
1173:
1141:
1042:
1037:
958:
854:
IEEE International Solid-State
Circuits Conference, ISSCC 2016, Digest of Technical Papers
623:
238:
234:
3052:
734:
512:, with 20,480,000 processing elements total plus the 1,250 Intel Xeon D host processors.
3548:
3364:
3021:
2914:
2808:
2642:
2625:
2618:
2514:
2371:
2108:
2022:
1953:
1536:
1498:
1447:
1442:
1437:
1151:
975:
509:
474:
456:
447:
435:
363:
301:
42:
3597:
3437:
3314:
2603:
2519:
1559:
1541:
1334:
1027:
562:
552:
548:
532:
505:
211:
3037:
2813:
2751:
2567:
2544:
2356:
2077:
1015:
845:
3558:
2598:
2562:
2273:
2245:
2103:
1958:
780:
311:
207:
66:
2484:
2474:
2469:
2451:
2351:
2324:
1586:
1419:
1389:
1109:
716:
3432:
3407:
2575:
2572:
2314:
1384:
1362:
515:
470:
444:, a manycore processor using message passing aimed at low power applications
431:
258:
3482:
3462:
3387:
2590:
1462:
1409:
881:
758:
384:
369:
321:
849:
17:
3487:
3467:
3442:
3077:
1399:
1357:
570:
489:
464:
360:
3457:
3452:
2824:
2702:
1414:
1379:
1344:
566:
450:, a 260-core manycore processor used in the then top 1 supercomputer
415:
379:
316:
297:
771:
Barker, J; Bowden, J (2013). "Manycore
Parallelism through OpenMP".
279:
GPUs may be considered a form of manycore processor having multiple
1872:
1404:
1374:
873:, published on Feb 19, 2010 (more than one dead link in the slide)
700:
419:
253:, or read-only/non-coherent caches. A manycore processor using a
3492:
3422:
3412:
2736:
1884:
1804:
1394:
775:. IWOMP. Lecture Notes in Computer Science, vol 8122. Springer.
638:
2828:
885:
3402:
3379:
1324:
1314:
333:
60:
821:"China's Exascale Prototype Supercomputer Tests AI Workloads"
660:"The Future of Many Core Computing: A tale of two processors"
175:
in being optimized from the outset for a higher degree of
876:
796:"A First Peek At China's Sunway Exascale Supercomputer"
773:
OpenMP in the Era of Low Power
Devices and Accelerators
210:
execution units, and larger, more general caches), and
190:, by contrast, are usually designed to efficiently run
561:, a massively parallel (10 million CPU cores) Chinese
3516:
3378:
3218:
3168:
3122:
3086:
3030:
2974:
2923:
2862:
2766:
2715:
2651:
2589:
2543:
2495:
2450:
2370:
2297:
2266:
2171:
2092:
2056:
2010:
1910:
1836:
1770:
1720:
1631:
1622:
1595:
1550:
1517:
1489:
1480:
1300:
1203:
1192:
1063:
919:
527:
Specific computers with 5 million or more CPU cores
387:Epiphany Architecture, a manycore chip using PGAS
27:Multi-core processor with a large number of cores
573:manycore processors, each containing 256 cores.
871:Architecting solutions for the Manycore future
481:Specific manycore computers with 1M+ CPU cores
438:with a manycore network on a chip architecture
2840:
897:
569:list), obtaining its performance from 40,960
8:
844:Chen, Yu-Hsin; Krishna, Tushar; Emer, Joel;
428:, a manycore processor using message passing
1902:Computer performance by orders of magnitude
95:. Unsourced material may be challenged and
41:, containing numerous simpler, independent
2847:
2833:
2825:
2367:
2007:
1628:
1486:
1200:
904:
890:
882:
699:
159:Learn how and when to remove this message
751:"OEMs show systems with Intel MIC chips"
399:, a 100-core DSP/GPP processor based on
794:Morgan, Timothy Prickett (2021-02-10).
650:
348:Asynchronous array of simple processors
171:Manycore processors are distinct from
336:, which can be described as manycore
7:
1873:Floating-point operations per second
676:Hendry, Gilbert; Kretschmann, Mark.
555:ARM-based cores, 7,630,848 in total.
422:accelerator for data-intensive tasks
93:adding citations to reliable sources
57:Contrast with multicore architecture
343:Massively parallel processor array
25:
3577:
3576:
2799:Semiconductor device fabrication
723:from the original on 2021-12-21.
307:Partitioned global address space
251:partitioned global address space
65:
3048:Analysis of parallel algorithms
2774:History of general-purpose CPUs
1001:Nondeterministic Turing machine
609:Multiprocessor system on a chip
354:Specific manycore architectures
954:Deterministic finite automaton
819:Hemsoth, Nicole (2021-04-19).
749:Rick Merritt (June 20, 2011),
717:"IBM SyNAPSE Deep Dive Part 3"
37:designed for a high degree of
1:
2995:Simultaneous and heterogenous
1745:Simultaneous and heterogenous
715:Amir, Arnon (June 11, 2015).
658:Mattson, Tim (January 2010).
3583:Category: Parallel computing
2429:Integrated memory controller
2411:Translation lookaside buffer
1610:Memory dependence prediction
1053:Random-access stored program
1006:Probabilistic Turing machine
372:coprocessor, which has MIC (
1885:Synaptic updates per second
781:10.1007/978-3-642-40698-0_4
508:developed by ExaScaler and
328:Classes of manycore systems
287:Suitable programming models
3630:
2890:High-performance computing
2289:Heterogeneous architecture
1211:Orthogonal instruction set
981:Alternating Turing machine
969:Quantum cellular automaton
599:High-performance computing
266:high-performance computing
51:high-performance computing
3572:
3524:Automatic parallelization
3160:Application checkpointing
2779:Microprocessor chronology
2742:Dynamic frequency scaling
1897:Cache performance metrics
300:or other APIs supporting
293:Message passing interface
181:single-thread performance
2794:Hardware security module
2137:Digital signal processor
2114:Graphics processing unit
1926:Graphics processing unit
186:The broader category of
3539:Embarrassingly parallel
3534:Deterministic algorithm
2747:Dynamic voltage scaling
2530:Memory address register
2424:Branch target predictor
2388:Address generation unit
2131:Physics processing unit
1920:Central processing unit
1879:Transactions per second
1867:Instructions per second
1790:Array processing (SIMT)
934:Stored-program computer
629:Embarrassingly parallel
426:Teraflops Research Chip
281:shader processing units
3254:Associative processing
3210:Non-blocking algorithm
3016:Clustered multi-thread
2553:Hardwired control unit
2435:Memory management unit
2400:Memory management unit
2149:Secure cryptoprocessor
2143:Tensor Processing Unit
2125:Vision processing unit
1859:Cycles per instruction
1853:Instructions per cycle
1800:Associative processing
1491:Instruction pipelining
913:Processor technologies
614:Vision processing unit
410:vision processing unit
200:out-of-order execution
3604:Computer architecture
3370:Hardware acceleration
3283:Superscalar processor
3273:Dataflow architecture
2870:Distributed computing
2636:Sum-addressed decoder
2382:Arithmetic logic unit
1509:Classic RISC pipeline
1463:Epiphany architecture
1310:Motorola 68000 series
619:Memory access pattern
374:Many Integrated Cores
188:multi-core processors
173:multi-core processors
35:multi-core processors
33:are special kinds of
3249:Pipelined processing
3198:Explicit parallelism
3193:Implicit parallelism
3183:Dataflow programming
2757:Performance per watt
2335:replacement policies
2001:Package on a package
1891:Performance per watt
1795:Pipelined processing
1565:Tomasulo's algorithm
1370:Clipper architecture
1226:Application-specific
939:Finite-state machine
877:Eyeriss architecture
678:"IBM Cell Processor"
584:Multi-core processor
224:heterogeneous system
177:explicit parallelism
108:"Manycore processor"
89:improve this section
3609:Manycore processors
3473:Parallel Extensions
3278:Pipelined processor
2789:Digital electronics
2442:Instruction decoder
2394:Floating-point unit
2048:Soft microprocessor
1995:System in a package
1570:Reservation station
1100:Transport-triggered
856:. pp. 262ā263.
735:"cell architecture"
520:Human Brain Project
401:HyperX Architecture
39:parallel processing
31:Manycore processors
3614:Parallel computing
3347:Massively parallel
3325:distributed shared
3145:Cache invalidation
3109:Instruction window
2900:Manycore processor
2880:Massively parallel
2875:Parallel computing
2856:Parallel computing
2661:Integrated circuit
2505:Processor register
2159:Baseband processor
1504:Operand forwarding
964:Cellular automaton
634:Massively parallel
366:2,048-core modules
47:embedded computers
3591:
3590:
3544:Parallel slowdown
3178:Stream processing
3068:KarpāFlatt metric
2822:
2821:
2711:
2710:
2330:Instruction cache
2320:Scratchpad memory
2167:
2166:
2154:Network processor
2083:Network on a chip
2038:Ultra-low-voltage
1989:Multi-chip module
1832:
1831:
1618:
1617:
1605:Branch prediction
1582:Register renaming
1476:
1475:
1458:VISC architecture
1280:Quantum computing
1275:VISC architecture
1157:Secondary storage
1073:Microarchitecture
1033:Register machines
825:The Next Platform
800:The Next Platform
559:Sunway TaihuLight
504:, dawn light), a
452:Sunway TaihuLight
406:Movidius Myriad 2
389:scratchpad memory
338:vector processors
274:vector processors
255:network on a chip
243:scratchpad memory
169:
168:
161:
143:
16:(Redirected from
3621:
3580:
3579:
3554:Software lockout
3353:Computer cluster
3288:Vector processor
3243:Array processing
3228:Flynn's taxonomy
3135:Memory coherence
2910:Computer network
2849:
2842:
2835:
2826:
2784:Processor design
2676:Power management
2558:Instruction unit
2419:Branch predictor
2368:
2066:System on a chip
2008:
1848:Transistor count
1772:Flynn's taxonomy
1629:
1487:
1290:Addressing modes
1201:
1147:Memory hierarchy
1011:Hypercomputation
929:Abstract machine
906:
899:
892:
883:
858:
857:
841:
835:
834:
832:
831:
816:
810:
809:
807:
806:
791:
785:
784:
768:
762:
761:
746:
740:
738:
731:
725:
724:
719:. IBM Research.
712:
706:
705:
703:
691:
685:
684:
682:
673:
667:
666:
664:
655:
604:Computer cluster
589:Vector processor
397:hx3100 Processor
164:
157:
153:
150:
144:
142:
101:
69:
61:
21:
3629:
3628:
3624:
3623:
3622:
3620:
3619:
3618:
3594:
3593:
3592:
3587:
3568:
3512:
3418:Coarray Fortran
3374:
3358:Beowulf cluster
3214:
3164:
3155:Synchronization
3140:Cache coherence
3130:Multiprocessing
3118:
3082:
3063:Cost efficiency
3058:Gustafson's law
3026:
2970:
2919:
2895:Multiprocessing
2885:Cloud computing
2858:
2853:
2823:
2818:
2804:Tickātock model
2762:
2718:
2707:
2647:
2631:Address decoder
2585:
2539:
2535:Program counter
2510:Status register
2491:
2446:
2406:Loadāstore unit
2373:
2366:
2293:
2262:
2163:
2120:Image processor
2095:
2088:
2058:
2052:
2028:Microcontroller
2018:Embedded system
2006:
1906:
1839:
1828:
1766:
1716:
1614:
1591:
1575:Re-order buffer
1546:
1527:Data dependency
1513:
1472:
1302:
1296:
1195:
1194:Instruction set
1188:
1174:Multiprocessing
1142:Cache hierarchy
1135:Register/memory
1059:
959:Queue automaton
915:
910:
867:
862:
861:
843:
842:
838:
829:
827:
818:
817:
813:
804:
802:
793:
792:
788:
770:
769:
765:
755:www.eetimes.com
748:
747:
743:
733:
732:
728:
714:
713:
709:
693:
692:
688:
680:
675:
674:
670:
662:
657:
656:
652:
647:
624:Cache coherency
580:
529:
483:
356:
330:
302:compute kernels
289:
239:message passing
235:Cache coherency
232:
165:
154:
148:
145:
102:
100:
86:
70:
59:
43:processor cores
28:
23:
22:
15:
12:
11:
5:
3627:
3625:
3617:
3616:
3611:
3606:
3596:
3595:
3589:
3588:
3586:
3585:
3573:
3570:
3569:
3567:
3566:
3561:
3556:
3551:
3549:Race condition
3546:
3541:
3536:
3531:
3526:
3520:
3518:
3514:
3513:
3511:
3510:
3505:
3500:
3495:
3490:
3485:
3480:
3475:
3470:
3465:
3460:
3455:
3450:
3445:
3440:
3435:
3430:
3425:
3420:
3415:
3410:
3405:
3400:
3395:
3390:
3384:
3382:
3376:
3375:
3373:
3372:
3367:
3362:
3361:
3360:
3350:
3344:
3343:
3342:
3337:
3332:
3327:
3322:
3317:
3307:
3306:
3305:
3300:
3293:Multiprocessor
3290:
3285:
3280:
3275:
3270:
3269:
3268:
3263:
3258:
3257:
3256:
3251:
3246:
3235:
3224:
3222:
3216:
3215:
3213:
3212:
3207:
3206:
3205:
3200:
3195:
3185:
3180:
3174:
3172:
3166:
3165:
3163:
3162:
3157:
3152:
3147:
3142:
3137:
3132:
3126:
3124:
3120:
3119:
3117:
3116:
3111:
3106:
3101:
3096:
3090:
3088:
3084:
3083:
3081:
3080:
3075:
3070:
3065:
3060:
3055:
3050:
3045:
3040:
3034:
3032:
3028:
3027:
3025:
3024:
3022:Hardware scout
3019:
3013:
3008:
3003:
2997:
2992:
2986:
2980:
2978:
2976:Multithreading
2972:
2971:
2969:
2968:
2963:
2958:
2953:
2948:
2943:
2938:
2933:
2927:
2925:
2921:
2920:
2918:
2917:
2915:Systolic array
2912:
2907:
2902:
2897:
2892:
2887:
2882:
2877:
2872:
2866:
2864:
2860:
2859:
2854:
2852:
2851:
2844:
2837:
2829:
2820:
2819:
2817:
2816:
2811:
2809:Pin grid array
2806:
2801:
2796:
2791:
2786:
2781:
2776:
2770:
2768:
2764:
2763:
2761:
2760:
2754:
2749:
2744:
2739:
2734:
2729:
2723:
2721:
2713:
2712:
2709:
2708:
2706:
2705:
2700:
2695:
2690:
2685:
2680:
2679:
2678:
2673:
2668:
2657:
2655:
2649:
2648:
2646:
2645:
2643:Barrel shifter
2640:
2639:
2638:
2633:
2626:Binary decoder
2623:
2622:
2621:
2611:
2606:
2601:
2595:
2593:
2587:
2586:
2584:
2583:
2578:
2570:
2565:
2560:
2555:
2549:
2547:
2541:
2540:
2538:
2537:
2532:
2527:
2522:
2517:
2515:Stack register
2512:
2507:
2501:
2499:
2493:
2492:
2490:
2489:
2488:
2487:
2482:
2472:
2467:
2462:
2456:
2454:
2448:
2447:
2445:
2444:
2439:
2438:
2437:
2426:
2421:
2416:
2415:
2414:
2408:
2397:
2391:
2385:
2378:
2376:
2365:
2364:
2359:
2354:
2349:
2344:
2343:
2342:
2337:
2332:
2327:
2322:
2317:
2307:
2301:
2299:
2295:
2294:
2292:
2291:
2286:
2281:
2276:
2270:
2268:
2264:
2263:
2261:
2260:
2259:
2258:
2248:
2243:
2238:
2233:
2228:
2223:
2218:
2213:
2208:
2203:
2198:
2193:
2188:
2183:
2177:
2175:
2169:
2168:
2165:
2164:
2162:
2161:
2156:
2151:
2146:
2140:
2134:
2128:
2122:
2117:
2111:
2109:AI accelerator
2106:
2100:
2098:
2090:
2089:
2087:
2086:
2080:
2075:
2072:Multiprocessor
2069:
2062:
2060:
2054:
2053:
2051:
2050:
2045:
2040:
2035:
2030:
2025:
2023:Microprocessor
2020:
2014:
2012:
2011:By application
2005:
2004:
1998:
1992:
1986:
1981:
1976:
1971:
1966:
1961:
1956:
1954:Tile processor
1951:
1946:
1941:
1936:
1935:
1934:
1923:
1916:
1914:
1908:
1907:
1905:
1904:
1899:
1894:
1888:
1882:
1876:
1870:
1864:
1863:
1862:
1850:
1844:
1842:
1834:
1833:
1830:
1829:
1827:
1826:
1825:
1824:
1814:
1809:
1808:
1807:
1802:
1797:
1792:
1782:
1776:
1774:
1768:
1767:
1765:
1764:
1759:
1754:
1749:
1748:
1747:
1742:
1740:Hyperthreading
1732:
1726:
1724:
1722:Multithreading
1718:
1717:
1715:
1714:
1709:
1704:
1703:
1702:
1692:
1691:
1690:
1685:
1675:
1674:
1673:
1668:
1658:
1653:
1652:
1651:
1646:
1635:
1633:
1626:
1620:
1619:
1616:
1615:
1613:
1612:
1607:
1601:
1599:
1593:
1592:
1590:
1589:
1584:
1579:
1578:
1577:
1572:
1562:
1556:
1554:
1548:
1547:
1545:
1544:
1539:
1534:
1529:
1523:
1521:
1515:
1514:
1512:
1511:
1506:
1501:
1499:Pipeline stall
1495:
1493:
1484:
1478:
1477:
1474:
1473:
1471:
1470:
1465:
1460:
1455:
1452:
1451:
1450:
1448:z/Architecture
1445:
1440:
1435:
1427:
1422:
1417:
1412:
1407:
1402:
1397:
1392:
1387:
1382:
1377:
1372:
1367:
1366:
1365:
1360:
1355:
1347:
1342:
1337:
1332:
1327:
1322:
1317:
1312:
1306:
1304:
1298:
1297:
1295:
1294:
1293:
1292:
1282:
1277:
1272:
1267:
1262:
1257:
1252:
1251:
1250:
1240:
1239:
1238:
1228:
1223:
1218:
1213:
1207:
1205:
1198:
1190:
1189:
1187:
1186:
1181:
1176:
1171:
1166:
1161:
1160:
1159:
1154:
1152:Virtual memory
1144:
1139:
1138:
1137:
1132:
1127:
1122:
1112:
1107:
1102:
1097:
1092:
1091:
1090:
1080:
1075:
1069:
1067:
1061:
1060:
1058:
1057:
1056:
1055:
1050:
1045:
1040:
1030:
1025:
1020:
1019:
1018:
1013:
1008:
1003:
998:
993:
988:
983:
976:Turing machine
973:
972:
971:
966:
961:
956:
951:
946:
936:
931:
925:
923:
917:
916:
911:
909:
908:
901:
894:
886:
880:
879:
874:
866:
865:External links
863:
860:
859:
836:
811:
786:
763:
741:
726:
707:
686:
668:
649:
648:
646:
643:
642:
641:
636:
631:
626:
621:
616:
611:
606:
601:
596:
591:
586:
579:
576:
575:
574:
556:
542:
533:supercomputers
528:
525:
524:
523:
513:
510:PEZY Computing
482:
479:
478:
477:
475:AI accelerator
468:
462:
461:
460:
448:Sunway SW26010
445:
439:
436:AI accelerator
429:
423:
413:
403:
394:Coherent Logix
391:
382:
377:
376:) architecture
367:
364:PEZY Computing
355:
352:
351:
350:
345:
340:
329:
326:
325:
324:
319:
314:
309:
304:
295:
288:
285:
231:
228:
167:
166:
73:
71:
64:
58:
55:
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
3626:
3615:
3612:
3610:
3607:
3605:
3602:
3601:
3599:
3584:
3575:
3574:
3571:
3565:
3562:
3560:
3557:
3555:
3552:
3550:
3547:
3545:
3542:
3540:
3537:
3535:
3532:
3530:
3527:
3525:
3522:
3521:
3519:
3515:
3509:
3506:
3504:
3501:
3499:
3496:
3494:
3491:
3489:
3486:
3484:
3481:
3479:
3476:
3474:
3471:
3469:
3466:
3464:
3461:
3459:
3456:
3454:
3451:
3449:
3446:
3444:
3441:
3439:
3438:Global Arrays
3436:
3434:
3431:
3429:
3426:
3424:
3421:
3419:
3416:
3414:
3411:
3409:
3406:
3404:
3401:
3399:
3396:
3394:
3391:
3389:
3386:
3385:
3383:
3381:
3377:
3371:
3368:
3366:
3365:Grid computer
3363:
3359:
3356:
3355:
3354:
3351:
3348:
3345:
3341:
3338:
3336:
3333:
3331:
3328:
3326:
3323:
3321:
3318:
3316:
3313:
3312:
3311:
3308:
3304:
3301:
3299:
3296:
3295:
3294:
3291:
3289:
3286:
3284:
3281:
3279:
3276:
3274:
3271:
3267:
3264:
3262:
3259:
3255:
3252:
3250:
3247:
3244:
3241:
3240:
3239:
3236:
3234:
3231:
3230:
3229:
3226:
3225:
3223:
3221:
3217:
3211:
3208:
3204:
3201:
3199:
3196:
3194:
3191:
3190:
3189:
3186:
3184:
3181:
3179:
3176:
3175:
3173:
3171:
3167:
3161:
3158:
3156:
3153:
3151:
3148:
3146:
3143:
3141:
3138:
3136:
3133:
3131:
3128:
3127:
3125:
3121:
3115:
3112:
3110:
3107:
3105:
3102:
3100:
3097:
3095:
3092:
3091:
3089:
3085:
3079:
3076:
3074:
3071:
3069:
3066:
3064:
3061:
3059:
3056:
3054:
3051:
3049:
3046:
3044:
3041:
3039:
3036:
3035:
3033:
3029:
3023:
3020:
3017:
3014:
3012:
3009:
3007:
3004:
3001:
2998:
2996:
2993:
2990:
2987:
2985:
2982:
2981:
2979:
2977:
2973:
2967:
2964:
2962:
2959:
2957:
2954:
2952:
2949:
2947:
2944:
2942:
2939:
2937:
2934:
2932:
2929:
2928:
2926:
2922:
2916:
2913:
2911:
2908:
2906:
2903:
2901:
2898:
2896:
2893:
2891:
2888:
2886:
2883:
2881:
2878:
2876:
2873:
2871:
2868:
2867:
2865:
2861:
2857:
2850:
2845:
2843:
2838:
2836:
2831:
2830:
2827:
2815:
2812:
2810:
2807:
2805:
2802:
2800:
2797:
2795:
2792:
2790:
2787:
2785:
2782:
2780:
2777:
2775:
2772:
2771:
2769:
2765:
2758:
2755:
2753:
2750:
2748:
2745:
2743:
2740:
2738:
2735:
2733:
2730:
2728:
2725:
2724:
2722:
2720:
2714:
2704:
2701:
2699:
2696:
2694:
2691:
2689:
2686:
2684:
2681:
2677:
2674:
2672:
2669:
2667:
2664:
2663:
2662:
2659:
2658:
2656:
2654:
2650:
2644:
2641:
2637:
2634:
2632:
2629:
2628:
2627:
2624:
2620:
2617:
2616:
2615:
2612:
2610:
2607:
2605:
2604:Demultiplexer
2602:
2600:
2597:
2596:
2594:
2592:
2588:
2582:
2579:
2577:
2574:
2571:
2569:
2566:
2564:
2561:
2559:
2556:
2554:
2551:
2550:
2548:
2546:
2542:
2536:
2533:
2531:
2528:
2526:
2525:Memory buffer
2523:
2521:
2520:Register file
2518:
2516:
2513:
2511:
2508:
2506:
2503:
2502:
2500:
2498:
2494:
2486:
2483:
2481:
2478:
2477:
2476:
2473:
2471:
2468:
2466:
2463:
2461:
2460:Combinational
2458:
2457:
2455:
2453:
2449:
2443:
2440:
2436:
2433:
2432:
2430:
2427:
2425:
2422:
2420:
2417:
2412:
2409:
2407:
2404:
2403:
2401:
2398:
2395:
2392:
2389:
2386:
2383:
2380:
2379:
2377:
2375:
2369:
2363:
2360:
2358:
2355:
2353:
2350:
2348:
2345:
2341:
2338:
2336:
2333:
2331:
2328:
2326:
2323:
2321:
2318:
2316:
2313:
2312:
2311:
2308:
2306:
2303:
2302:
2300:
2296:
2290:
2287:
2285:
2282:
2280:
2277:
2275:
2272:
2271:
2269:
2265:
2257:
2254:
2253:
2252:
2249:
2247:
2244:
2242:
2239:
2237:
2234:
2232:
2229:
2227:
2224:
2222:
2219:
2217:
2214:
2212:
2209:
2207:
2204:
2202:
2199:
2197:
2194:
2192:
2189:
2187:
2184:
2182:
2179:
2178:
2176:
2174:
2170:
2160:
2157:
2155:
2152:
2150:
2147:
2144:
2141:
2138:
2135:
2132:
2129:
2126:
2123:
2121:
2118:
2115:
2112:
2110:
2107:
2105:
2102:
2101:
2099:
2097:
2091:
2084:
2081:
2079:
2076:
2073:
2070:
2067:
2064:
2063:
2061:
2055:
2049:
2046:
2044:
2041:
2039:
2036:
2034:
2031:
2029:
2026:
2024:
2021:
2019:
2016:
2015:
2013:
2009:
2002:
1999:
1996:
1993:
1990:
1987:
1985:
1982:
1980:
1977:
1975:
1972:
1970:
1967:
1965:
1962:
1960:
1957:
1955:
1952:
1950:
1947:
1945:
1942:
1940:
1937:
1933:
1930:
1929:
1927:
1924:
1921:
1918:
1917:
1915:
1913:
1909:
1903:
1900:
1898:
1895:
1892:
1889:
1886:
1883:
1880:
1877:
1874:
1871:
1868:
1865:
1860:
1857:
1856:
1854:
1851:
1849:
1846:
1845:
1843:
1841:
1835:
1823:
1820:
1819:
1818:
1815:
1813:
1810:
1806:
1803:
1801:
1798:
1796:
1793:
1791:
1788:
1787:
1786:
1783:
1781:
1778:
1777:
1775:
1773:
1769:
1763:
1760:
1758:
1755:
1753:
1750:
1746:
1743:
1741:
1738:
1737:
1736:
1733:
1731:
1728:
1727:
1725:
1723:
1719:
1713:
1710:
1708:
1705:
1701:
1698:
1697:
1696:
1693:
1689:
1686:
1684:
1681:
1680:
1679:
1676:
1672:
1669:
1667:
1664:
1663:
1662:
1659:
1657:
1654:
1650:
1647:
1645:
1642:
1641:
1640:
1637:
1636:
1634:
1630:
1627:
1625:
1621:
1611:
1608:
1606:
1603:
1602:
1600:
1598:
1594:
1588:
1585:
1583:
1580:
1576:
1573:
1571:
1568:
1567:
1566:
1563:
1561:
1560:Scoreboarding
1558:
1557:
1555:
1553:
1549:
1543:
1542:False sharing
1540:
1538:
1535:
1533:
1530:
1528:
1525:
1524:
1522:
1520:
1516:
1510:
1507:
1505:
1502:
1500:
1497:
1496:
1494:
1492:
1488:
1485:
1483:
1479:
1469:
1466:
1464:
1461:
1459:
1456:
1453:
1449:
1446:
1444:
1441:
1439:
1436:
1434:
1431:
1430:
1428:
1426:
1423:
1421:
1418:
1416:
1413:
1411:
1408:
1406:
1403:
1401:
1398:
1396:
1393:
1391:
1388:
1386:
1383:
1381:
1378:
1376:
1373:
1371:
1368:
1364:
1361:
1359:
1356:
1354:
1351:
1350:
1348:
1346:
1343:
1341:
1338:
1336:
1335:Stanford MIPS
1333:
1331:
1328:
1326:
1323:
1321:
1318:
1316:
1313:
1311:
1308:
1307:
1305:
1299:
1291:
1288:
1287:
1286:
1283:
1281:
1278:
1276:
1273:
1271:
1268:
1266:
1263:
1261:
1258:
1256:
1253:
1249:
1246:
1245:
1244:
1241:
1237:
1234:
1233:
1232:
1229:
1227:
1224:
1222:
1219:
1217:
1214:
1212:
1209:
1208:
1206:
1202:
1199:
1197:
1196:architectures
1191:
1185:
1182:
1180:
1177:
1175:
1172:
1170:
1167:
1165:
1164:Heterogeneous
1162:
1158:
1155:
1153:
1150:
1149:
1148:
1145:
1143:
1140:
1136:
1133:
1131:
1128:
1126:
1123:
1121:
1118:
1117:
1116:
1115:Memory access
1113:
1111:
1108:
1106:
1103:
1101:
1098:
1096:
1093:
1089:
1086:
1085:
1084:
1081:
1079:
1076:
1074:
1071:
1070:
1068:
1066:
1062:
1054:
1051:
1049:
1048:Random-access
1046:
1044:
1041:
1039:
1036:
1035:
1034:
1031:
1029:
1028:Stack machine
1026:
1024:
1021:
1017:
1014:
1012:
1009:
1007:
1004:
1002:
999:
997:
994:
992:
989:
987:
984:
982:
979:
978:
977:
974:
970:
967:
965:
962:
960:
957:
955:
952:
950:
947:
945:
944:with datapath
942:
941:
940:
937:
935:
932:
930:
927:
926:
924:
922:
918:
914:
907:
902:
900:
895:
893:
888:
887:
884:
878:
875:
872:
869:
868:
864:
855:
851:
847:
846:Sze, Vivienne
840:
837:
826:
822:
815:
812:
801:
797:
790:
787:
782:
778:
774:
767:
764:
760:
756:
752:
745:
742:
736:
730:
727:
722:
718:
711:
708:
702:
697:
690:
687:
679:
672:
669:
661:
654:
651:
644:
640:
637:
635:
632:
630:
627:
625:
622:
620:
617:
615:
612:
610:
607:
605:
602:
600:
597:
595:
592:
590:
587:
585:
582:
581:
577:
572:
568:
564:
563:supercomputer
560:
557:
554:
553:Fujitsu A64FX
550:
549:supercomputer
547:, a Japanese
546:
543:
541:
538:
537:
536:
534:
526:
521:
517:
514:
511:
507:
506:supercomputer
503:
499:
495:
491:
488:
487:
486:
480:
476:
473:, a manycore
472:
469:
466:
463:
458:
455:
454:
453:
449:
446:
443:
440:
437:
433:
430:
427:
424:
421:
418:, a manycore
417:
414:
411:
408:, a manycore
407:
404:
402:
398:
395:
392:
390:
386:
383:
381:
378:
375:
371:
368:
365:
361:
358:
357:
353:
349:
346:
344:
341:
339:
335:
332:
331:
327:
323:
320:
318:
315:
313:
310:
308:
305:
303:
299:
296:
294:
291:
290:
286:
284:
282:
277:
275:
271:
267:
262:
260:
256:
252:
248:
244:
240:
236:
229:
227:
225:
221:
217:
213:
212:shared memory
209:
205:
201:
197:
193:
189:
184:
182:
178:
174:
163:
160:
152:
149:December 2022
141:
138:
134:
131:
127:
124:
120:
117:
113:
110: ā
109:
105:
104:Find sources:
98:
94:
90:
84:
83:
79:
74:This section
72:
68:
63:
62:
56:
54:
52:
48:
44:
40:
36:
32:
19:
3123:Coordination
3053:Amdahl's law
2989:Simultaneous
2899:
2814:Chip carrier
2752:Clock gating
2671:Mixed-signal
2568:Write buffer
2545:Control unit
2357:Clock signal
2283:
2096:accelerators
2078:Cypress PSoC
1735:Simultaneous
1552:Out-of-order
1184:Neuromorphic
1065:Architecture
1023:Belt machine
1016:Zeno machine
949:Hierarchical
853:
839:
828:. Retrieved
824:
814:
803:. Retrieved
799:
789:
772:
766:
754:
744:
729:
710:
689:
671:
653:
531:Quite a few
530:
501:
484:
442:Green arrays
373:
359:ZettaScaler
278:
263:
233:
195:
191:
185:
170:
155:
146:
136:
129:
122:
115:
103:
87:Please help
75:
30:
29:
3559:Scalability
3320:distributed
3203:Concurrency
3170:Programming
3011:Cooperative
3000:Speculative
2936:Instruction
2599:Multiplexer
2563:Data buffer
2274:Single-core
2246:bit slicing
2104:Coprocessor
1959:Coprocessor
1840:performance
1762:Cooperative
1752:Speculative
1712:Distributed
1671:Superscalar
1656:Instruction
1624:Parallelism
1597:Speculative
1429:System/3x0
1301:Instruction
1078:Von Neumann
991:PostāTuring
362:, Japanese
312:Actor model
218:(such as a
216:accelerator
208:superscalar
3598:Categories
3564:Starvation
3303:asymmetric
3038:PRAM model
3006:Preemptive
2719:management
2614:Multiplier
2475:Logic gate
2465:Sequential
2372:Functional
2352:Clock rate
2325:Data cache
2298:Components
2279:Multi-core
2267:Core count
1757:Preemptive
1661:Pipelining
1644:Bit-serial
1587:Wide-issue
1532:Structural
1454:Tilera ISA
1420:MicroBlaze
1390:ETRAX CRIS
1285:Comparison
1130:Loadāstore
1110:Endianness
830:2021-11-18
805:2021-11-18
645:References
230:Motivation
119:newspapers
3298:symmetric
3043:PEM model
2653:Circuitry
2573:Microcode
2497:Registers
2340:coherence
2315:CPU cache
2173:Word size
1838:Processor
1482:Execution
1385:DEC Alpha
1363:Power ISA
1179:Cognitive
986:Universal
701:1412.5538
516:SpiNNaker
471:Graphcore
432:TrueNorth
259:TrueNorth
204:pipelines
202:, deeper
194:parallel
76:does not
3529:Deadlock
3517:Problems
3483:pthreads
3463:OpenHMPP
3388:Ateji PX
3349:computer
3220:Hardware
3087:Elements
3073:Slowdown
2984:Temporal
2966:Pipeline
2591:Datapath
2284:Manycore
2256:variable
2094:Hardware
1730:Temporal
1410:OpenRISC
1105:Cellular
1095:Dataflow
1088:modified
848:(2016).
759:EE Times
721:Archived
578:See also
540:Frontier
494:Japanese
385:Adapteva
370:Xeon Phi
322:Dataflow
270:clusters
268:such as
18:Manycore
3488:RaftLib
3468:OpenACC
3443:GPUOpen
3433:C++ AMP
3408:Charm++
3150:Barrier
3094:Process
3078:Speedup
2863:General
2767:Related
2698:Quantum
2688:Digital
2683:Boolean
2581:Counter
2480:Quantum
2241:512-bit
2236:256-bit
2231:128-bit
2074:(MPSoC)
2059:on chip
2057:Systems
1875:(FLOPS)
1688:Process
1537:Control
1519:Hazards
1405:Itanium
1400:Unicore
1358:PowerPC
1083:Harvard
1043:Pointer
1038:Counter
996:Quantum
571:SW26010
498:Hepburn
490:Gyoukou
465:Eyeriss
457:SW52020
222:) in a
206:, more
133:scholar
97:removed
82:sources
3581:
3458:OpenCL
3453:OpenMP
3398:Chapel
3315:shared
3310:Memory
3245:(SIMT)
3188:Models
3099:Thread
3031:Theory
3002:(SpMT)
2956:Memory
2941:Thread
2924:Levels
2703:Switch
2693:Analog
2431:(IMC)
2402:(MMU)
2251:others
2226:64-bit
2221:48-bit
2216:32-bit
2211:24-bit
2206:16-bit
2201:15-bit
2196:12-bit
2033:Mobile
1949:Stream
1944:Barrel
1939:Vector
1928:(GPU)
1887:(SUPS)
1855:(IPC)
1707:Memory
1700:Vector
1683:Thread
1666:Scalar
1468:Others
1415:RISC-V
1380:SuperH
1349:Power
1345:MIPS-X
1320:PDP-11
1169:Fabric
921:Models
567:TOP500
551:using
545:Fugaku
416:Kalray
380:Tilera
317:OpenMP
298:OpenCL
135:
128:
121:
114:
106:
3428:Dryad
3393:Boost
3114:Array
3104:Fiber
3018:(CMT)
2991:(SMT)
2905:GPGPU
2759:(PPW)
2717:Power
2609:Adder
2485:Array
2452:Logic
2413:(TLB)
2396:(FPU)
2390:(AGU)
2384:(ALU)
2374:units
2310:Cache
2191:8-bit
2186:4-bit
2181:1-bit
2145:(TPU)
2139:(DSP)
2133:(PPU)
2127:(VPU)
2116:(GPU)
2085:(NoC)
2068:(SoC)
2003:(PoP)
1997:(SiP)
1991:(MCM)
1932:GPGPU
1922:(CPU)
1912:Types
1893:(PPW)
1881:(TPS)
1869:(IPS)
1861:(CPI)
1632:Level
1443:S/390
1438:S/370
1433:S/360
1375:SPARC
1353:POWER
1236:TRIPS
1204:Types
696:arXiv
681:(PDF)
663:(PDF)
502:gyÅkÅ
496:: ęå
434:, an
420:PCI-e
412:(VPU)
140:JSTOR
126:books
3493:ROCm
3423:CUDA
3413:Cilk
3380:APIs
3340:COMA
3335:NUMA
3266:MIMD
3261:MISD
3238:SIMD
3233:SISD
2961:Loop
2951:Data
2946:Task
2737:ACPI
2470:Glue
2362:FIFO
2305:Core
2043:ASIP
1984:CPLD
1979:FPOA
1974:FPGA
1969:ASIC
1822:SPMD
1817:MIMD
1812:MISD
1805:SWAR
1785:SIMD
1780:SISD
1695:Data
1678:Task
1649:Word
1395:M32R
1340:MIPS
1303:sets
1270:ZISC
1265:NISC
1260:OISC
1255:MISC
1248:EPIC
1243:VLIW
1231:EDGE
1221:RISC
1216:CISC
1125:HUMA
1120:NUMA
639:CUDA
594:SIMD
334:GPUs
272:and
192:both
112:news
80:any
78:cite
49:and
3508:ZPL
3503:TBB
3498:UPC
3478:PVM
3448:MPI
3403:HPX
3330:UMA
2931:Bit
2732:APM
2727:PMU
2619:CPU
2576:ROM
2347:Bus
1964:PAL
1639:Bit
1425:LMC
1330:ARM
1325:x86
1315:VAX
777:doi
261:).
247:DMA
220:GPU
196:and
91:by
3600::
2666:3D
852:.
823:.
798:.
757:,
753:,
500::
276:.
249:,
245:,
241:,
226:.
183:.
53:.
2848:e
2841:t
2834:v
905:e
898:t
891:v
833:.
808:.
783:.
779::
737:.
704:.
698::
683:.
665:.
522:.
492:(
162:)
156:(
151:)
147:(
137:Ā·
130:Ā·
123:Ā·
116:Ā·
99:.
85:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.