Knowledge (XXG)

QPACE

Source 📝

498:. The main goal was the design of an application-optimized scalable architecture that beats industrial products in terms of compute performance, price-performance ratio, and energy efficiency. The project officially started in 2008. Two installations were deployed in the summer of 2009. The final design was completed in early 2010. Since then QPACE is used for calculations of 714:
for heat-critical components. The thermal box interfaces to a coldplate, which is connected to the water-cooling circuit. The performance of the coldplate allows for the removal of the heat from up to 32 nodes. The node cards are mounted on both sides of the coldplate, i.e., 16 nodes each are mounted
603:
Sixteen node cards are monitored and controlled by a separate administration card, called the root card. One more administration card per rack, called the superroot card, is used to monitor and control the power supplies. The root cards and superroot cards are also used for synchronization of the
709:
The compute nodes of the QPACE supercomputer are cooled by water. Roughly 115 Watt have to be dissipated from each node card. The cooling solution is based on a two-component design. Each node card is mounted to a thermal box, which acts as a large
509:
In November 2009 QPACE was the leading architecture on the Green500 list of the most energy-efficient supercomputers in the world. The title was defended in June 2010, when the architecture achieved an energy signature of 773
715:
on the top and bottom of the coldplate. The efficiency of the cooling solution allows for the cooling of the compute nodes with warm water. The QPACE cooling solution also influenced other supercomputer designs such as
591:
are re-programmable semiconductor devices that allow for a customized specification of the functional behavior. The QPACE network processor is tightly coupled to the PowerXCell 8i via a Rambus-proprietary I/O interface.
595:
The smallest building block of QPACE is the node card, which hosts the PowerXCell 8i and the FPGA. Node cards are mounted on backplanes, each of which can host up to 32 node cards. One QPACE rack houses up to eight
600:, with four backplanes each mounted to the front and back side. The maximum number of node cards per rack is 256. QPACE relies on a water-cooling solution to achieve this packaging density. 986: 418: 682:, while a custom-designed communications protocol optimized for small message sizes is used for message passing. A unique feature of the torus network design is the support for 700:
The global signals network is a simple 2-wire system arranged as a tree network. This network is used for evaluation of global conditions and synchronization of the nodes.
568:. The processor received much attention in the scientific community due to its outstanding floating-point performance. It is one of the building blocks of the 411: 919: 878: 572:
cluster, which was the first supercomputer architecture to break the PFLOPS barrier. Cluster architectures based on the PowerXCell 8i typically rely on
639:
transceiver connects the node card to the I/O network. Six 10 Gigabit transceivers are used for passing messages between neighboring nodes in a
404: 366: 694:. The latency for communication between two SPEs on neighboring nodes is 3 μs. The peak bandwidth per link and direction is about 1 GB/s. 893: 852: 654:
and can be changed at any time at the cost of rebooting the node card. Most entities of the QPACE network co-coprocessor are coded in
376: 351: 526: 1040: 443: 502:. The system architecture is also suitable for other applications that mainly rely on nearest-neighbor communication, e.g., 1076: 651: 1081: 462:
The QPACE supercomputer is a research project carried out by several academic institutions in collaboration with the
1025: 371: 968: 953: 731: 491: 105: 747: 479: 583:. For QPACE an entirely different approach was chosen. A custom-designed network co-processor implemented on 1071: 751: 736: 646:
The QPACE network co-processor is implemented on a Xilinx Virtex-5 FPGA, which is directly connected to the
483: 522:
list of most powerful supercomputers, QPACE ranked #110-#112 in November 2009, and #131-#133 in June 2010.
435: 19: 670:
The torus network is a high-speed communication path that allows for nearest-neighbor communication in a
774: 687: 613: 561: 554: 495: 439: 478:. The academic design team of about 20 junior and senior scientists, mostly physicists, came from the 691: 557: 341: 284: 277: 679: 466:
Research and Development Laboratory in Böblingen, Germany, and other industrial partners including
934: 779: 671: 640: 534: 515: 467: 897: 856: 746:
in double precision, and 400 TFLOPS in single precision. The installations are operated by the
913: 872: 808: 503: 318: 301: 270: 666:
The QPACE network co-processor connects the PowerXCell 8i to three communications networks:
636: 313: 573: 675: 569: 1065: 769: 727:
Two identical installations of QPACE with four racks have been operating since 2009:
565: 447: 356: 70: 65: 647: 576: 346: 60: 55: 50: 45: 789: 686:
communication between the private memory areas, called the Local Stores, of the
632: 499: 451: 227: 203: 190: 185: 179: 621: 617: 580: 308: 196: 969:
QPACE: Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine
839: 827: 711: 683: 597: 361: 27: 650:
of the PowerXCell 8i. The functional behavior of the FPGA is defined by a
716: 250: 538: 471: 381: 238: 166: 87: 23: 616:
multi-core processor. Each node card hosts one PowerXCell 8i, 4 GB of
763: 625: 584: 542: 519: 475: 233: 209: 161: 156: 151: 146: 140: 133: 126: 119: 112: 957:, Proceedings of the 3rd conference on Computing frontiers (2006) 9 1006: 743: 511: 82: 697:
Switched 1 Gigabit Ethernet is used for file I/O and maintenance.
784: 655: 628: 588: 487: 324: 530: 463: 988:
Application Note: FPGA to IBM Power Processor Interface Setup
954:
The Potential of the Cell Processor for Scientific Computing
810:
Lattice Boltzmann fluid-dynamics on the QPACE supercomputer
935:
The Potential of On-Chip Multiprocessing for QCD Machines
1054: 1008:
QPACE - a QCD parallel computer based on Cell processors
1042:
Aquasar: Der Weg zu optimal effizienten Rechenzentren
579:
interconnected by industry-standard networks such as
412: 8: 742:The aggregate peak performance is about 200 587:FPGAs is used to connect the compute nodes. 529:(DFG) in the framework of SFB/TRR-55 and by 419: 405: 15: 840:http://www.green500.org/lists/green201006 828:http://www.green500.org/lists/green200911 533:. Additional contributions were made by 1001: 999: 997: 972:, Computing in Science and Engineering 800: 333: 293: 261: 97: 34: 18: 918:: CS1 maint: archived copy as title ( 911: 877:: CS1 maint: archived copy as title ( 870: 35:NXP (formerly Freescale and Motorola) 7: 1030:, STRONGnet Conference, Cyprus, 2010 938:, Lecture Notes in Computer Science 826:The Green500 list, November 2009, 750:, Jülich Research Centre, and the 674:. The torus network relies on the 14: 560:, an enhanced version of the IBM 850:The Top500 list, November 2009, 688:Synergistic Processing Elements 672:three-dimensional toroidal mesh 641:three-dimensional toroidal mesh 838:The Green500 list, June 2010, 766:, a follow-up project to QPACE 612:The heart of QPACE is the IBM 452:lattice quantum chromodynamics 1: 652:hardware description language 450:designed for applications in 991:, IBM Research report, 2008 891:The Top500 list, June 2010, 813:, Procedia Computer Science 1098: 527:German Research Foundation 438:Parallel Computing on the 1011:, Proceedings of Science 553:In 2008 IBM released the 1027:Synchronization in QPACE 748:University of Regensburg 525:QPACE was funded by the 480:University of Regensburg 42:PowerPC e series (2006) 985:I. Ouda, K. Schleupen, 752:University of Wuppertal 737:University of Wuppertal 484:University of Wuppertal 732:Jülich Research Centre 492:Jülich Research Centre 175:PowerPC series (1992) 775:Cell (microprocessor) 562:Cell Broadband Engine 496:University of Ferrara 440:Cell Broadband Engine 1077:Cell BE architecture 966:G. Goldrian et al., 951:S. Williams et al., 807:L. Biferale et al., 692:direct memory access 558:multi-core processor 482:(project lead), the 342:OpenPOWER Foundation 932:G. Bilardi et al., 900:on October 17, 2012 859:on October 17, 2012 680:10 Gigabit Ethernet 564:used, e.g., in the 1082:Parallel computing 1039:B. Michel et al., 780:Torus interconnect 637:1 Gigabit Ethernet 631:and seven network 444:massively parallel 395:historic in italic 223:RAD series (1997) 79:Qor series (2008) 1005:H. Baier et al., 516:Linpack benchmark 504:lattice Boltzmann 429: 428: 391:Cancelled in gray 1089: 1057: 1052: 1046: 1037: 1031: 1022: 1016: 1003: 992: 983: 977: 964: 958: 949: 943: 930: 924: 923: 917: 909: 907: 905: 896:. Archived from 889: 883: 882: 876: 868: 866: 864: 855:. Archived from 848: 842: 836: 830: 824: 818: 805: 514:per Watt in the 421: 414: 407: 392: 304: 16: 1097: 1096: 1092: 1091: 1090: 1088: 1087: 1086: 1062: 1061: 1060: 1053: 1049: 1038: 1034: 1023: 1019: 1004: 995: 984: 980: 965: 961: 950: 946: 931: 927: 910: 903: 901: 894:"Archived copy" 892: 890: 886: 869: 862: 860: 853:"Archived copy" 851: 849: 845: 837: 833: 825: 821: 806: 802: 798: 760: 725: 707: 664: 626:Xilinx Virtex-5 610: 604:compute nodes. 585:Xilinx Virtex-5 574:IBM BladeCenter 551: 460: 425: 390: 302: 12: 11: 5: 1095: 1093: 1085: 1084: 1079: 1074: 1072:Supercomputers 1064: 1063: 1059: 1058: 1055:Qpace - کیوپیس 1047: 1032: 1017: 993: 978: 959: 944: 925: 884: 843: 831: 819: 799: 797: 794: 793: 792: 787: 782: 777: 772: 767: 759: 756: 740: 739: 734: 724: 721: 706: 703: 702: 701: 698: 695: 676:physical layer 663: 660: 609: 606: 570:IBM Roadrunner 550: 547: 459: 456: 427: 426: 424: 423: 416: 409: 401: 398: 397: 387: 386: 385: 384: 379: 374: 369: 364: 359: 354: 349: 344: 336: 335: 331: 330: 329: 328: 321: 316: 311: 306: 296: 295: 291: 290: 289: 288: 281: 274: 264: 263: 259: 258: 257: 256: 246: 245: 244: 243: 242: 241: 236: 231: 221: 220: 219: 216: 207: 200: 193: 188: 183: 172: 171: 170: 169: 164: 159: 154: 149: 144: 137: 130: 123: 116: 108:series (1990) 100: 99: 95: 94: 93: 92: 91: 90: 85: 76: 75: 74: 73: 68: 63: 58: 53: 48: 37: 36: 32: 31: 13: 10: 9: 6: 4: 3: 2: 1094: 1083: 1080: 1078: 1075: 1073: 1070: 1069: 1067: 1056: 1051: 1048: 1044: 1043: 1036: 1033: 1029: 1028: 1021: 1018: 1014: 1010: 1009: 1002: 1000: 998: 994: 990: 989: 982: 979: 975: 971: 970: 963: 960: 956: 955: 948: 945: 941: 937: 936: 929: 926: 921: 915: 899: 895: 888: 885: 880: 874: 858: 854: 847: 844: 841: 835: 832: 829: 823: 820: 816: 812: 811: 804: 801: 795: 791: 788: 786: 783: 781: 778: 776: 773: 771: 770:Supercomputer 768: 765: 762: 761: 757: 755: 753: 749: 745: 738: 735: 733: 730: 729: 728: 723:Installations 722: 720: 718: 713: 704: 699: 696: 693: 689: 685: 681: 677: 673: 669: 668: 667: 661: 659: 657: 653: 649: 648:I/O interface 644: 642: 638: 634: 630: 627: 623: 619: 615: 614:PowerXCell 8i 607: 605: 601: 599: 593: 590: 586: 582: 578: 577:blade servers 575: 571: 567: 566:PlayStation 3 563: 559: 556: 555:PowerXCell 8i 548: 546: 544: 540: 536: 532: 528: 523: 521: 517: 513: 507: 505: 501: 497: 493: 489: 485: 481: 477: 473: 469: 465: 457: 455: 453: 449: 448:supercomputer 446:and scalable 445: 441: 437: 433: 422: 417: 415: 410: 408: 403: 402: 400: 399: 396: 389: 388: 383: 380: 378: 375: 373: 370: 368: 365: 363: 360: 358: 355: 353: 350: 348: 345: 343: 340: 339: 338: 337: 334:Related links 332: 327: 326: 322: 320: 317: 315: 312: 310: 307: 305: 300: 299: 298: 297: 292: 287: 286: 282: 280: 279: 275: 273: 272: 268: 267: 266: 265: 260: 254: 252: 248: 247: 240: 237: 235: 232: 230: 229: 225: 224: 222: 217: 214: 213: 211: 208: 206: 205: 201: 199: 198: 194: 192: 189: 187: 184: 182: 181: 177: 176: 174: 173: 168: 165: 163: 160: 158: 155: 153: 150: 148: 145: 143: 142: 138: 136: 135: 131: 129: 128: 124: 122: 121: 117: 115: 114: 110: 109: 107: 104: 103: 102: 101: 96: 89: 86: 84: 81: 80: 78: 77: 72: 69: 67: 64: 62: 59: 57: 54: 52: 49: 47: 44: 43: 41: 40: 39: 38: 33: 30:architectures 29: 25: 21: 17: 1050: 1041: 1035: 1026: 1024:S. Solbrig, 1020: 1012: 1007: 987: 981: 973: 967: 962: 952: 947: 939: 933: 928: 902:. Retrieved 898:the original 887: 861:. Retrieved 857:the original 846: 834: 822: 814: 809: 803: 741: 726: 708: 665: 645: 633:transceivers 611: 602: 594: 552: 549:Architecture 524: 508: 461: 431: 430: 394: 347:AIM alliance 323: 283: 276: 269: 262:IBM/Nintendo 249: 226: 202: 195: 178: 139: 132: 125: 118: 111: 904:January 17, 863:January 17, 817:(2010) 1075 790:Lattice QCD 635:. A single 500:lattice QCD 1066:Categories 942:(2005) 386 796:References 690:(SPEs) by 618:DDR2 SDRAM 598:backplanes 581:Infiniband 494:, and the 309:PWRficient 1013:(LAT2009) 976:(2008) 46 712:heat sink 684:zero-copy 608:Node card 518:. In the 506:methods. 490:Zeuthen, 362:Power.org 357:Blue Gene 28:Power ISA 914:cite web 873:cite web 758:See also 717:SuperMUC 662:Networks 535:Eurotech 468:Eurotech 458:Overview 285:Espresso 278:Broadway 705:Cooling 442:) is a 382:AltiVec 239:RAD5500 228:RAD6000 212:(2010) 167:Power10 88:Qorivva 24:PowerPC 1045:, 2011 764:QPACE2 744:TFLOPS 624:, one 543:Xilinx 541:, and 520:Top500 512:MFLOPS 476:Xilinx 474:, and 255:(1996) 253:series 234:RAD750 162:POWER9 157:POWER8 152:POWER7 147:POWER6 141:POWER5 134:POWER4 127:POWER3 120:POWER2 113:POWER1 26:, and 1015:, 001 620:with 589:FPGAs 539:Knürr 472:Knürr 432:QPACE 319:Xenon 303:Titan 294:Other 271:Gekko 106:Power 83:QorIQ 71:e6500 66:e5500 20:POWER 940:3769 920:link 906:2013 879:link 865:2013 785:FPGA 656:VHDL 629:FPGA 488:DESY 377:CHRP 372:PReP 367:PAPR 352:RISC 325:X704 314:Cell 251:RS64 197:74xx 61:e600 56:e500 51:e300 46:e200 678:of 622:ECC 531:IBM 464:IBM 454:. 436:QCD 218:A2O 215:A2I 204:970 191:7xx 186:4xx 180:6xx 98:IBM 1068:: 996:^ 974:10 916:}} 912:{{ 875:}} 871:{{ 754:. 719:. 658:. 643:. 545:. 537:, 486:, 470:, 393:, 210:A2 22:, 922:) 908:. 881:) 867:. 815:1 434:( 420:e 413:t 406:v

Index

POWER
PowerPC
Power ISA
e200
e300
e500
e600
e5500
e6500
QorIQ
Qorivva
Power
POWER1
POWER2
POWER3
POWER4
POWER5
POWER6
POWER7
POWER8
POWER9
Power10
6xx
4xx
7xx
74xx
970
A2
RAD6000
RAD750

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.