Knowledge

Massively parallel processor array

Source 📝

135:, whose basic objects run in parallel, each on their own processor. Likewise, large data objects may be broken up and distributed into local memories with parallel access. Objects communicate over a parallel structure of dedicated channels. The objective is to maximize aggregate throughput while minimizing local latency, optimizing performance and efficiency. An MPPA's 511:
Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffmann, Paul Johnson, Walter Lee, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Saman Amarasinghe, and Anant Agarwal, "A 16-issue multiple-program-counter microprocessor with point-to-point scalar
491:
Yu, Zhiyi, Michael Meeuwsen, Ryan Apperson, Omar Sattari, Michael Lai, Jeremy Webb, Eric Work, Tinoosh Mohsenin, Mandeep Singh, and Bevan Baas. "An asynchronous array of simple processors for DSP applications." In IEEE International Solid-State Circuits Conference,(ISSCC’06), vol. 49, pp. 428-429.
531:
Ou, Peng, Jiajie Zhang, Heng Quan, Yi Li, Maofei He, Zheng Yu, Xueqiu Yu et al. "A 65nm 39GOPS/W 24-core processor with 11 Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array." In Solid-State Circuits Conference Digest of Technical Papers
521:
Yu, Zhiyi, Kaidi You, Ruijin Xiao, Heng Quan, Peng Ou, Yan Ying, Haofan Yang, and Xiaoyang Zeng. "An 800MHz 320mW 16-core processor with message-passing and shared-memory inter-core communication mechanisms." In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE
501:
Truong, Dean, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen et al. "A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling." In Symposium on VLSI Circuits, pp. 22-23.
604:
Shaw, David E.; Adams, Peter J.; Azaria, Asaph; Bank, Joseph A.; Batson, Brannon; Bell, Alistair; Bergdorf, Michael; Bhatt, Jhanvi; Butts, J. Adam; Correia, Timothy; Dirks, Robert M.; Dror, Ron O.; Eastwood, Michael P.; Edwards, Bruce; Even, Amos (2021-11-14). "Anton 3".
648:
Adams, Peter J.; Batson, Brannon; Bell, Alistair; Bhatt, Jhanvi; Butts, J. Adam; Correia, Timothy; Edwards, Bruce; Feldmann, Peter; Fenton, Christopher H.; Forte, Anthony; Gagliardo, Joseph; Gill, Gennette; Gorlatova, Maria; Greskamp, Brian; Grossman, J.P. (2021-08-22).
433:
Vangal, Sriram R., Jason Howard, Gregory Ruhl, Saurabh Dighe, Howard Wilson, James Tschanz, David Finan et al. "An 80-tile sub-100-w teraflops processor in 65-nm cmos." Solid-State Circuits, IEEE Journal of 43, no. 1 (2008):
87:
accessed locally, not shared globally. Each processor is strictly encapsulated, accessing only its own code and memory. Point-to-point communication between processors is directly realized in the configurable interconnect.
312:
simulations, contain arrays of 576 processors arranged in a 12×24 tiled grid of pairs of cores; a routed network links these tiles together and extends off-chip to other nodes in a full system.
63:. By harnessing a large number of processors working in parallel, an MPPA chip can accomplish more demanding tasks than conventional chips. MPPAs are based on a software parallel 372:
Mike Butts, "Multicore and Massively Parallel Platforms and Moore's Law Scalability", Proceedings of the Embedded Systems Conference - Silicon Valley, April 2008
409:
Laurent Bonetto, "Massively parallel processing arrays (MPPAs) for embedded HD video and imaging (Part 2)", Video/Imaging DesignLine, July 18, 2008
397:
Laurent Bonetto, "Massively parallel processing arrays (MPPAs) for embedded HD video and imaging (Part 1)", Video/Imaging DesignLine, May 16, 2008
331: 271: 200: 360:
Mike Butts, "Synchronization through Communication in a Massively Parallel Processor Array", IEEE Micro, vol. 27, no. 5, September/October 2007,
670: 624: 465: 421:
Paul Chen, "Multimode sensor processing using Massively Parallel Processor Arrays (MPPAs)", Programmable Logic DesignLine, March 18, 2008
264: 80: 144: 381:
Mike Butts, Brad Budlong, Paul Wasson, Ed White, "Reconfigurable Work Farms on a Massively Parallel Processor Array", Proceedings of
112: 275: 700: 705: 116: 104: 100: 196: 96: 60: 543: 607:
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
48: 188: 92: 386: 361: 160: 512:
operand network," Proceedings of the IEEE International Solid-State Circuits Conference, February 2003
256: 140: 136: 52: 244: 305: 91:
The MPPA's massive parallelism and its distributed memory MIMD architecture distinguishes it from
17: 676: 630: 445: 309: 290: 228: 184: 168: 84: 44: 40: 224: 666: 620: 461: 172: 64: 56: 658: 610: 584: 453: 283: 248: 176: 164: 382: 341: 180: 156: 107:
architecture, mainly intended for general-purpose computing. It's also distinguished from
68: 326: 694: 680: 634: 410: 398: 128: 191:
and other compute-intensive streaming media applications, which otherwise would use
267:
due to all 4096 of the 3,000 gate cores having its own Content-Addressable Memory.
662: 457: 651:"The ΛNTON 3 ASIC: A Fire-Breathing Monster for Molecular Dynamics Simulations" 650: 480: 589: 572: 422: 298: 615: 446:"Artificial Neural Network on a Massively Parallel Associative Architecture" 321: 270:
Fabricated MPPAs developed in universities include: 36-core and 167-core
252: 216: 132: 336: 294: 301:
supercomputer, which is as of 2016 the world's fastest supercomputer.
240: 236: 212: 127:
An MPPA application is developed by expressing it as a hierarchical
232: 220: 108: 263:
Array rather than an MPPA. Strictly speaking it could qualify as
83:(Multiple Instruction streams, Multiple Data) architecture, with 192: 573:"The Sunway TaihuLight Supercomputer: System and Applications" 571:
Fu, Haohuan; Liao, Junfeng; Yang, Jinzhe; et al. (2016).
279: 55:
memories. These processors pass work to one another through a
532:(ISSCC), 2013 IEEE International, pp. 56-57. IEEE, 2013. 211:
MPPAs developed in companies include ones designed at:
411:http://www.eetimes.com/document.asp?doc_id=1273830 399:http://www.eetimes.com/document.asp?doc_id=1273823 99:architectures, which have fewer processors and an 259:Linedancer differs in that it was a Massive wide 272:Asynchronous Array of Simple Processors (AsAP) 8: 481:https://core.ac.uk/download/pdf/25268094.pdf 657:. Palo Alto, CA, USA: IEEE. pp. 1–22. 423:http://www.pldesignline.com/howto/206904379 609:. St. Louis Missouri: ACM. pp. 1–11. 614: 588: 544:"Report on the Sunway TaihuLight System" 450:International Neural Network Conference 353: 332:Asynchronous array of simple processors 655:2021 IEEE Hot Chips 33 Symposium (HCS) 282:, and 16-core and 24-core arrays from 522:International, pp. 64-66. IEEE, 2012. 293:project developed their own 260-core 7: 155:MPPAs are used in high-performance 145:communicating sequential processes 47:array of hundreds or thousands of 29:massively parallel processor array 25: 18:Massively Parallel Processor Array 542:Dongarra, Jack (June 20, 2016). 304:Anton 3 processors, designed by 67:for developing high-performance 276:University of California, Davis 1: 663:10.1109/HCS52781.2021.9567084 33:multi purpose processor array 458:10.1007/978-94-009-0643-3_39 722: 590:10.1007/s11432-016-5588-7 115:architectures, used for 616:10.1145/3458817.3487397 297:manycore chip for the 189:software-defined radio 171:applications, such as 444:Krikelis, A. (1990). 387:IEEE Computer Society 362:IEEE Computer Society 161:hardware acceleration 141:Kahn process network 137:model of computation 701:Manycore processors 577:Sci. China Inf. Sci 306:D. E. Shaw Research 278:, 16-core RAW from 706:Parallel computing 310:molecular dynamics 185:network processing 85:distributed memory 45:massively parallel 41:integrated circuit 31:, also known as a 672:978-1-6654-1397-8 626:978-1-4503-8442-1 467:978-0-7923-0831-7 173:video compression 65:programming model 16:(Redirected from 713: 685: 684: 645: 639: 638: 618: 601: 595: 594: 592: 568: 562: 561: 559: 557: 548: 539: 533: 529: 523: 519: 513: 509: 503: 499: 493: 489: 483: 478: 472: 471: 441: 435: 431: 425: 419: 413: 407: 401: 395: 389: 379: 373: 370: 364: 358: 284:Fudan University 274:arrays from the 257:Aspex (Ericsson) 177:image processing 165:desktop computer 157:embedded systems 139:is similar to a 59:interconnect of 21: 721: 720: 716: 715: 714: 712: 711: 710: 691: 690: 689: 688: 673: 647: 646: 642: 627: 603: 602: 598: 570: 569: 565: 555: 553: 546: 541: 540: 536: 530: 526: 520: 516: 510: 506: 500: 496: 490: 486: 479: 475: 468: 452:. p. 673. 443: 442: 438: 432: 428: 420: 416: 408: 404: 396: 392: 380: 376: 371: 367: 359: 355: 350: 342:Array processor 318: 209: 181:medical imaging 153: 125: 77: 69:embedded system 39:) is a type of 23: 22: 15: 12: 11: 5: 719: 717: 709: 708: 703: 693: 692: 687: 686: 671: 640: 625: 596: 563: 551:www.netlib.org 534: 524: 514: 504: 494: 484: 473: 466: 436: 426: 414: 402: 390: 385:, April 2008, 374: 365: 352: 351: 349: 346: 345: 344: 339: 334: 329: 327:AI accelerator 324: 317: 314: 245:Coherent Logix 208: 205: 152: 149: 124: 121: 119:applications. 76: 73: 71:applications. 57:reconfigurable 24: 14: 13: 10: 9: 6: 4: 3: 2: 718: 707: 704: 702: 699: 698: 696: 682: 678: 674: 668: 664: 660: 656: 652: 644: 641: 636: 632: 628: 622: 617: 612: 608: 600: 597: 591: 586: 582: 578: 574: 567: 564: 552: 545: 538: 535: 528: 525: 518: 515: 508: 505: 498: 495: 488: 485: 482: 477: 474: 469: 463: 459: 455: 451: 447: 440: 437: 430: 427: 424: 418: 415: 412: 406: 403: 400: 394: 391: 388: 384: 378: 375: 369: 366: 363: 357: 354: 347: 343: 340: 338: 335: 333: 330: 328: 325: 323: 320: 319: 315: 313: 311: 307: 302: 300: 296: 292: 287: 285: 281: 277: 273: 268: 266: 262: 258: 254: 250: 246: 242: 238: 234: 230: 226: 222: 218: 214: 206: 204: 202: 198: 194: 190: 186: 182: 178: 174: 170: 166: 162: 158: 150: 148: 146: 142: 138: 134: 130: 129:block diagram 122: 120: 118: 114: 110: 106: 105:shared memory 102: 98: 94: 89: 86: 82: 74: 72: 70: 66: 62: 58: 54: 50: 46: 42: 38: 34: 30: 19: 654: 643: 606: 599: 580: 576: 566: 554:. Retrieved 550: 537: 527: 517: 507: 497: 487: 476: 449: 439: 429: 417: 405: 393: 377: 368: 356: 303: 289:The Chinese 288: 269: 260: 210: 154: 151:Applications 126: 90: 78: 75:Architecture 43:which has a 36: 32: 28: 26: 229:GreenArrays 123:Programming 695:Categories 348:References 299:TaihuLight 225:IntellaSys 79:MPPA is a 681:239039245 635:239036976 103:or other 93:multicore 556:June 20, 322:Manycore 316:See also 253:Adapteva 217:PicoChip 207:Examples 133:workflow 97:manycore 61:channels 337:SW26010 295:SW26010 203:chips. 199:and/or 147:(CSP). 679:  669:  633:  623:  464:  434:29-41. 291:Sunway 251:, and 249:Tabula 241:Kalray 237:Tilera 213:Ambric 169:server 109:GPGPUs 677:S2CID 631:S2CID 583:(7). 547:(PDF) 233:ASOCS 221:Intel 111:with 667:ISBN 621:ISBN 558:2016 502:2008 492:2006 462:ISBN 383:FCCM 308:for 265:SIMT 261:SIMD 201:ASIC 193:FPGA 167:and 159:and 113:SIMD 95:and 81:MIMD 51:and 49:CPUs 37:MPPA 659:doi 611:doi 585:doi 454:doi 280:MIT 197:DSP 163:of 143:or 131:or 117:HPC 101:SMP 53:RAM 697:: 675:. 665:. 653:. 629:. 619:. 581:59 579:. 575:. 549:. 460:. 448:. 286:. 255:. 247:, 243:, 239:, 235:, 231:, 227:, 223:, 219:, 215:, 195:, 187:, 183:, 179:, 175:, 27:A 683:. 661:: 637:. 613:: 593:. 587:: 560:. 470:. 456:: 35:( 20:)

Index

Massively Parallel Processor Array
integrated circuit
massively parallel
CPUs
RAM
reconfigurable
channels
programming model
embedded system
MIMD
distributed memory
multicore
manycore
SMP
shared memory
GPGPUs
SIMD
HPC
block diagram
workflow
model of computation
Kahn process network
communicating sequential processes
embedded systems
hardware acceleration
desktop computer
server
video compression
image processing
medical imaging

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.