498:. The main goal was the design of an application-optimized scalable architecture that beats industrial products in terms of compute performance, price-performance ratio, and energy efficiency. The project officially started in 2008. Two installations were deployed in the summer of 2009. The final design was completed in early 2010. Since then QPACE is used for calculations of
714:
for heat-critical components. The thermal box interfaces to a coldplate, which is connected to the water-cooling circuit. The performance of the coldplate allows for the removal of the heat from up to 32 nodes. The node cards are mounted on both sides of the coldplate, i.e., 16 nodes each are mounted
603:
Sixteen node cards are monitored and controlled by a separate administration card, called the root card. One more administration card per rack, called the superroot card, is used to monitor and control the power supplies. The root cards and superroot cards are also used for synchronization of the
709:
The compute nodes of the QPACE supercomputer are cooled by water. Roughly 115 Watt have to be dissipated from each node card. The cooling solution is based on a two-component design. Each node card is mounted to a thermal box, which acts as a large
509:
In
November 2009 QPACE was the leading architecture on the Green500 list of the most energy-efficient supercomputers in the world. The title was defended in June 2010, when the architecture achieved an energy signature of 773
715:
on the top and bottom of the coldplate. The efficiency of the cooling solution allows for the cooling of the compute nodes with warm water. The QPACE cooling solution also influenced other supercomputer designs such as
591:
are re-programmable semiconductor devices that allow for a customized specification of the functional behavior. The QPACE network processor is tightly coupled to the PowerXCell 8i via a Rambus-proprietary I/O interface.
595:
The smallest building block of QPACE is the node card, which hosts the PowerXCell 8i and the FPGA. Node cards are mounted on backplanes, each of which can host up to 32 node cards. One QPACE rack houses up to eight
600:, with four backplanes each mounted to the front and back side. The maximum number of node cards per rack is 256. QPACE relies on a water-cooling solution to achieve this packaging density.
986:
418:
682:, while a custom-designed communications protocol optimized for small message sizes is used for message passing. A unique feature of the torus network design is the support for
700:
The global signals network is a simple 2-wire system arranged as a tree network. This network is used for evaluation of global conditions and synchronization of the nodes.
568:. The processor received much attention in the scientific community due to its outstanding floating-point performance. It is one of the building blocks of the
411:
919:
878:
572:
cluster, which was the first supercomputer architecture to break the PFLOPS barrier. Cluster architectures based on the PowerXCell 8i typically rely on
639:
transceiver connects the node card to the I/O network. Six 10 Gigabit transceivers are used for passing messages between neighboring nodes in a
404:
366:
694:. The latency for communication between two SPEs on neighboring nodes is 3 μs. The peak bandwidth per link and direction is about 1 GB/s.
893:
852:
654:
and can be changed at any time at the cost of rebooting the node card. Most entities of the QPACE network co-coprocessor are coded in
376:
351:
526:
1040:
443:
502:. The system architecture is also suitable for other applications that mainly rely on nearest-neighbor communication, e.g.,
1076:
651:
1081:
462:
The QPACE supercomputer is a research project carried out by several academic institutions in collaboration with the
1025:
371:
968:
953:
731:
491:
105:
747:
479:
583:. For QPACE an entirely different approach was chosen. A custom-designed network co-processor implemented on
1071:
751:
736:
646:
The QPACE network co-processor is implemented on a Xilinx Virtex-5 FPGA, which is directly connected to the
483:
522:
list of most powerful supercomputers, QPACE ranked #110-#112 in
November 2009, and #131-#133 in June 2010.
435:
19:
670:
The torus network is a high-speed communication path that allows for nearest-neighbor communication in a
774:
687:
613:
561:
554:
495:
439:
478:. The academic design team of about 20 junior and senior scientists, mostly physicists, came from the
691:
557:
341:
284:
277:
679:
466:
Research and
Development Laboratory in Böblingen, Germany, and other industrial partners including
934:
779:
671:
640:
534:
515:
467:
897:
856:
746:
in double precision, and 400 TFLOPS in single precision. The installations are operated by the
913:
872:
808:
503:
318:
301:
270:
666:
The QPACE network co-processor connects the PowerXCell 8i to three communications networks:
636:
313:
573:
675:
569:
1065:
769:
727:
Two identical installations of QPACE with four racks have been operating since 2009:
565:
447:
356:
70:
65:
647:
576:
346:
60:
55:
50:
45:
789:
686:
communication between the private memory areas, called the Local Stores, of the
632:
499:
451:
227:
203:
190:
185:
179:
621:
617:
580:
308:
196:
969:
QPACE: Quantum
Chromodynamics Parallel Computing on the Cell Broadband Engine
839:
827:
711:
683:
597:
361:
27:
650:
of the PowerXCell 8i. The functional behavior of the FPGA is defined by a
716:
250:
538:
471:
381:
238:
166:
87:
23:
616:
multi-core processor. Each node card hosts one PowerXCell 8i, 4 GB of
763:
625:
584:
542:
519:
475:
233:
209:
161:
156:
151:
146:
140:
133:
126:
119:
112:
957:, Proceedings of the 3rd conference on Computing frontiers (2006) 9
1006:
743:
511:
82:
697:
Switched 1 Gigabit
Ethernet is used for file I/O and maintenance.
784:
655:
628:
588:
487:
324:
530:
463:
988:
Application Note: FPGA to IBM Power
Processor Interface Setup
954:
The
Potential of the Cell Processor for Scientific Computing
810:
Lattice
Boltzmann fluid-dynamics on the QPACE supercomputer
935:
The
Potential of On-Chip Multiprocessing for QCD Machines
1054:
1008:
QPACE - a QCD parallel computer based on Cell processors
1042:
579:
interconnected by industry-standard networks such as
412:
8:
742:The aggregate peak performance is about 200
587:FPGAs is used to connect the compute nodes.
529:(DFG) in the framework of SFB/TRR-55 and by
419:
405:
15:
840:http://www.green500.org/lists/green201006
828:http://www.green500.org/lists/green200911
533:. Additional contributions were made by
1001:
999:
997:
972:, Computing in Science and Engineering
800:
333:
293:
261:
97:
34:
18:
918:: CS1 maint: archived copy as title (
911:
877:: CS1 maint: archived copy as title (
870:
35:NXP (formerly Freescale and Motorola)
7:
1030:, STRONGnet Conference, Cyprus, 2010
938:, Lecture Notes in Computer Science
826:The Green500 list, November 2009,
750:, Jülich Research Centre, and the
674:. The torus network relies on the
14:
560:, an enhanced version of the IBM
850:The Top500 list, November 2009,
688:Synergistic Processing Elements
672:three-dimensional toroidal mesh
641:three-dimensional toroidal mesh
838:The Green500 list, June 2010,
766:, a follow-up project to QPACE
612:The heart of QPACE is the IBM
452:lattice quantum chromodynamics
1:
652:hardware description language
450:designed for applications in
991:, IBM Research report, 2008
891:The Top500 list, June 2010,
813:, Procedia Computer Science
1098:
527:German Research Foundation
438:Parallel Computing on the
1011:, Proceedings of Science
553:In 2008 IBM released the
1027:Synchronization in QPACE
748:University of Regensburg
525:QPACE was funded by the
480:University of Regensburg
42:PowerPC e series (2006)
985:I. Ouda, K. Schleupen,
752:University of Wuppertal
737:University of Wuppertal
484:University of Wuppertal
732:Jülich Research Centre
492:Jülich Research Centre
175:PowerPC series (1992)
775:Cell (microprocessor)
562:Cell Broadband Engine
496:University of Ferrara
440:Cell Broadband Engine
1077:Cell BE architecture
966:G. Goldrian et al.,
951:S. Williams et al.,
807:L. Biferale et al.,
692:direct memory access
558:multi-core processor
482:(project lead), the
342:OpenPOWER Foundation
932:G. Bilardi et al.,
900:on October 17, 2012
859:on October 17, 2012
680:10 Gigabit Ethernet
564:used, e.g., in the
1082:Parallel computing
1039:B. Michel et al.,
780:Torus interconnect
637:1 Gigabit Ethernet
631:and seven network
444:massively parallel
395:historic in italic
223:RAD series (1997)
79:Qor series (2008)
1005:H. Baier et al.,
516:Linpack benchmark
504:lattice Boltzmann
429:
428:
391:Cancelled in gray
1089:
1057:
1052:
1046:
1037:
1031:
1022:
1016:
1003:
992:
983:
977:
964:
958:
949:
943:
930:
924:
923:
917:
909:
907:
905:
896:. Archived from
889:
883:
882:
876:
868:
866:
864:
855:. Archived from
848:
842:
836:
830:
824:
818:
805:
514:per Watt in the
421:
414:
407:
392:
304:
16:
1097:
1096:
1092:
1091:
1090:
1088:
1087:
1086:
1062:
1061:
1060:
1053:
1049:
1038:
1034:
1023:
1019:
1004:
995:
984:
980:
965:
961:
950:
946:
931:
927:
910:
903:
901:
894:"Archived copy"
892:
890:
886:
869:
862:
860:
853:"Archived copy"
851:
849:
845:
837:
833:
825:
821:
806:
802:
798:
760:
725:
707:
664:
626:Xilinx Virtex-5
610:
604:compute nodes.
585:Xilinx Virtex-5
574:IBM BladeCenter
551:
460:
425:
390:
302:
12:
11:
5:
1095:
1093:
1085:
1084:
1079:
1074:
1072:Supercomputers
1064:
1063:
1059:
1058:
1055:Qpace - کیوپیس
1047:
1032:
1017:
993:
978:
959:
944:
925:
884:
843:
831:
819:
799:
797:
794:
793:
792:
787:
782:
777:
772:
767:
759:
756:
740:
739:
734:
724:
721:
706:
703:
702:
701:
698:
695:
676:physical layer
663:
660:
609:
606:
570:IBM Roadrunner
550:
547:
459:
456:
427:
426:
424:
423:
416:
409:
401:
398:
397:
387:
386:
385:
384:
379:
374:
369:
364:
359:
354:
349:
344:
336:
335:
331:
330:
329:
328:
321:
316:
311:
306:
296:
295:
291:
290:
289:
288:
281:
274:
264:
263:
259:
258:
257:
256:
246:
245:
244:
243:
242:
241:
236:
231:
221:
220:
219:
216:
207:
200:
193:
188:
183:
172:
171:
170:
169:
164:
159:
154:
149:
144:
137:
130:
123:
116:
108:series (1990)
100:
99:
95:
94:
93:
92:
91:
90:
85:
76:
75:
74:
73:
68:
63:
58:
53:
48:
37:
36:
32:
31:
13:
10:
9:
6:
4:
3:
2:
1094:
1083:
1080:
1078:
1075:
1073:
1070:
1069:
1067:
1056:
1051:
1048:
1044:
1043:
1036:
1033:
1029:
1028:
1021:
1018:
1014:
1010:
1009:
1002:
1000:
998:
994:
990:
989:
982:
979:
975:
971:
970:
963:
960:
956:
955:
948:
945:
941:
937:
936:
929:
926:
921:
915:
899:
895:
888:
885:
880:
874:
858:
854:
847:
844:
841:
835:
832:
829:
823:
820:
816:
812:
811:
804:
801:
795:
791:
788:
786:
783:
781:
778:
776:
773:
771:
770:Supercomputer
768:
765:
762:
761:
757:
755:
753:
749:
745:
738:
735:
733:
730:
729:
728:
723:Installations
722:
720:
718:
713:
704:
699:
696:
693:
689:
685:
681:
677:
673:
669:
668:
667:
661:
659:
657:
653:
649:
648:I/O interface
644:
642:
638:
634:
630:
627:
623:
619:
615:
614:PowerXCell 8i
607:
605:
601:
599:
593:
590:
586:
582:
578:
577:blade servers
575:
571:
567:
566:PlayStation 3
563:
559:
556:
555:PowerXCell 8i
548:
546:
544:
540:
536:
532:
528:
523:
521:
517:
513:
507:
505:
501:
497:
493:
489:
485:
481:
477:
473:
469:
465:
457:
455:
453:
449:
448:supercomputer
446:and scalable
445:
441:
437:
433:
422:
417:
415:
410:
408:
403:
402:
400:
399:
396:
389:
388:
383:
380:
378:
375:
373:
370:
368:
365:
363:
360:
358:
355:
353:
350:
348:
345:
343:
340:
339:
338:
337:
334:Related links
332:
327:
326:
322:
320:
317:
315:
312:
310:
307:
305:
300:
299:
298:
297:
292:
287:
286:
282:
280:
279:
275:
273:
272:
268:
267:
266:
265:
260:
254:
252:
248:
247:
240:
237:
235:
232:
230:
229:
225:
224:
222:
217:
214:
213:
211:
208:
206:
205:
201:
199:
198:
194:
192:
189:
187:
184:
182:
181:
177:
176:
174:
173:
168:
165:
163:
160:
158:
155:
153:
150:
148:
145:
143:
142:
138:
136:
135:
131:
129:
128:
124:
122:
121:
117:
115:
114:
110:
109:
107:
104:
103:
102:
101:
96:
89:
86:
84:
81:
80:
78:
77:
72:
69:
67:
64:
62:
59:
57:
54:
52:
49:
47:
44:
43:
41:
40:
39:
38:
33:
30:architectures
29:
25:
21:
17:
1050:
1041:
1035:
1026:
1024:S. Solbrig,
1020:
1012:
1007:
987:
981:
973:
967:
962:
952:
947:
939:
933:
928:
902:. Retrieved
898:the original
887:
861:. Retrieved
857:the original
846:
834:
822:
814:
809:
803:
741:
726:
708:
665:
645:
633:transceivers
611:
602:
594:
552:
549:Architecture
524:
508:
461:
431:
430:
394:
347:AIM alliance
323:
283:
276:
269:
262:IBM/Nintendo
249:
226:
202:
195:
178:
139:
132:
125:
118:
111:
904:January 17,
863:January 17,
817:(2010) 1075
790:Lattice QCD
635:. A single
500:lattice QCD
1066:Categories
942:(2005) 386
796:References
690:(SPEs) by
618:DDR2 SDRAM
598:backplanes
581:Infiniband
494:, and the
309:PWRficient
1013:(LAT2009)
976:(2008) 46
712:heat sink
684:zero-copy
608:Node card
518:. In the
506:methods.
490:Zeuthen,
362:Power.org
357:Blue Gene
28:Power ISA
914:cite web
873:cite web
758:See also
717:SuperMUC
662:Networks
535:Eurotech
468:Eurotech
458:Overview
285:Espresso
278:Broadway
705:Cooling
442:) is a
382:AltiVec
239:RAD5500
228:RAD6000
212:(2010)
167:Power10
88:Qorivva
24:PowerPC
1045:, 2011
764:QPACE2
744:TFLOPS
624:, one
543:Xilinx
541:, and
520:Top500
512:MFLOPS
476:Xilinx
474:, and
255:(1996)
253:series
234:RAD750
162:POWER9
157:POWER8
152:POWER7
147:POWER6
141:POWER5
134:POWER4
127:POWER3
120:POWER2
113:POWER1
26:, and
1015:, 001
620:with
589:FPGAs
539:Knürr
472:Knürr
432:QPACE
319:Xenon
303:Titan
294:Other
271:Gekko
106:Power
83:QorIQ
71:e6500
66:e5500
20:POWER
940:3769
920:link
906:2013
879:link
865:2013
785:FPGA
656:VHDL
629:FPGA
488:DESY
377:CHRP
372:PReP
367:PAPR
352:RISC
325:X704
314:Cell
251:RS64
197:74xx
61:e600
56:e500
51:e300
46:e200
678:of
622:ECC
531:IBM
464:IBM
454:.
436:QCD
218:A2O
215:A2I
204:970
191:7xx
186:4xx
180:6xx
98:IBM
1068::
996:^
974:10
916:}}
912:{{
875:}}
871:{{
754:.
719:.
658:.
643:.
545:.
537:,
486:,
470:,
393:,
210:A2
22:,
922:)
908:.
881:)
867:.
815:1
434:(
420:e
413:t
406:v
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.