Knowledge (XXG)

Cell software development

Source 📝

304:
In many cases, however, a directly equivalent instruction does not exist. The workaround might be obvious or it might not. For example, if saturation behavior is required on the SPU, it can be coded by adding additional SPU instructions to accomplish this (with some loss of efficiency). At the other
257:
to expose useful SPU instructions in C and C++. Instructions that differ only in the type of operand (such as a, ai, ah, ahi, fa, and dfa for addition) are typically represented by a single C/C++ intrinsic which selects the proper instruction based on the type of the operand.
35:
An open source software-based strategy was adopted to accelerate the development of a Cell BE ecosystem and to provide an environment to develop Cell applications, including a GCC-based Cell compiler, binutils and a port of the Linux operating system.
288:
In some cases it is possible to port existing VMX code directly. If the VMX code is highly generic (makes few assumptions about the execution environment) the translation can be relatively straightforward. The two processors specify a different
285:. Depending on how many VMX specific features are involved, the adaptation involved can range anywhere from straightforward, to onerous, to completely impractical. The most important workloads for the SPU generally map quite well. 244:
The IBM PPE Vector/SIMD manual does not define operations for double-precision floating point, though IBM has published material implying certain double-precision performance numbers associated with the Cell PPE VMX technology.
305:
extreme, if Java floating-point semantics are required, this is almost impossible to achieve on the SPU processor. To achieve the same computation on the SPU might require that an entirely different
720: 515: 324:
Transferring data between the local stores of different SPUs can have a large performance cost. The local stores of individual SPUs can be exploited using a variety of strategies.
89:(Vector Multimedia Extensions) technology is conceptually similar to the vector model provided by the SPU processors, but there are many significant differences. 508: 501: 410: 601: 327:
Applications with high locality, such as dense matrix computations, represent an ideal workload class for the local stores in Cell BE.
756: 297:
exist with the same behaviors, they do not have the same instruction names, so this must be mapped as well. IBM provides compiler
700: 535: 412:
IBM Systems Journal - Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture
294: 787: 398: 364: 27:-compatible PPU core, and novel software development challenges with regard to the functionally reduced SPU coprocessors. 270:
that could potentially be adapted and recompiled to run on the SPU. This code base includes VMX code that runs under the
316:. For this reason, most algorithms adapted to Altivec will usually adapt successfully to the SPU architecture as well. 761: 226: 220: 267: 190: 573: 204: 741: 382: 464:"Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture" 313: 669: 616: 561: 524: 446: 360:
Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture
611: 312:
The most important conceptual similarity between VMX and the SPU architecture is supporting the same
237:
compliance where the Java standard falls silent. In a typical implementation, non-Java mode converts
710: 331: 20: 797: 792: 298: 254: 130: 56: 241:
values to zero but Java mode traps into an emulator when the processor encounters such a value.
416: 751: 578: 480: 463: 354: 766: 638: 115: 430: 628: 633: 275: 64: 340:
More sophisticated applications can use multiple strategies for different data types.
781: 705: 606: 290: 230: 60: 715: 664: 481:"Cell GC: Using the Cell Synergistic Processor as a Garbage Collection Coprocessor" 301:
which take care of this mapping transparently as part of the development toolkit.
679: 566: 746: 736: 306: 359: 278: 238: 52: 674: 585: 540: 493: 282: 271: 154: 86: 24: 383:"An Open Source Environment for Cell Broadband Engine System Software" 659: 623: 399:
IBM Research Project - Compiler Technology for Scalable Architectures
349: 266:
There is a great body of code which has been developed for other
23:
involves a mixture of conventional development practices for the
643: 497: 545: 234: 48: 330:
Streaming computations can be efficiently accommodated using
334:
of memory block transfers using a multi-buffering strategy.
337:
The software cache offers a solution for random accesses.
447:"Synergistic Processing in Cell's Multicore Architecture" 293:, so recompilation is required at a minimum. Even where 729: 693: 652: 594: 554: 377: 375: 432:IBM's Octopiler, or, why the PS3 is running late 365:Compiler Technology for Scalable Architectures 509: 8: 516: 502: 494: 355:Optimizing Compiler for a CELL Processor 91: 371: 7: 14: 350:The Cell Project at IBM Research 233:, extended to include IEEE and 81:Differences between VMX and SPU 670:Initiative for a Common Engine 562:Synergistic Processing Element 281:, where it is better known as 199:single precision, IEEE double 1: 624:Toshiba Qosmio F50, G50, G55 415:, 2017-10-23, archived from 762:Simultaneous multithreading 586:Vector Multimedia Extension 253:Compilers for Cell provide 227:Java Language Specification 814: 531: 435:, ArsTechnica, 2006-02-26 309:be written from scratch. 268:IBM Power microprocessors 721:STI Center of Competence 574:Power Processing Element 320:Local store exploitation 262:Porting VMX code for SPU 229:1 subset of the default 742:Heterogeneous computing 182:big (default), little 94:VMX to SPU Comparison 788:Cell BE architecture 685:Software development 525:Cell BE architecture 76:Adapting VMX for SPU 71:Software portability 17:Software development 332:software pipelining 314:vectorization model 168:saturation support 100: 57:software developers 21:Cell microprocessor 629:IBM BladeCenter QS 607:Sony PlayStation 3 291:binary code format 92: 775: 774: 752:Scratchpad memory 216: 215: 149:128-bit quadword 805: 767:Vector processor 639:Namco System 357 518: 511: 504: 495: 488: 487: 485: 477: 471: 470: 468: 460: 454: 453: 451: 443: 437: 436: 427: 421: 420: 407: 401: 396: 390: 389: 387: 379: 225:conforms to the 205:Memory alignment 146:128-bit quadword 101: 813: 812: 808: 807: 806: 804: 803: 802: 778: 777: 776: 771: 725: 689: 648: 595:Implementations 590: 550: 527: 522: 492: 491: 483: 479: 478: 474: 469:. January 2006. 466: 462: 461: 457: 449: 445: 444: 440: 429: 428: 424: 409: 408: 404: 397: 393: 385: 381: 380: 373: 346: 322: 264: 251: 143:register width 96: 83: 78: 73: 65:Cell processors 42: 33: 12: 11: 5: 811: 809: 801: 800: 795: 790: 780: 779: 773: 772: 770: 769: 764: 759: 754: 749: 744: 739: 733: 731: 727: 726: 724: 723: 718: 713: 711:James A. Kahle 708: 703: 697: 695: 691: 690: 688: 687: 682: 677: 672: 667: 662: 656: 654: 650: 649: 647: 646: 641: 636: 634:IBM Roadrunner 631: 626: 621: 620: 619: 614: 604: 598: 596: 592: 591: 589: 588: 583: 582: 581: 571: 570: 569: 558: 556: 552: 551: 549: 548: 543: 538: 532: 529: 528: 523: 521: 520: 513: 506: 498: 490: 489: 472: 455: 438: 422: 402: 391: 370: 369: 368: 367: 362: 357: 352: 345: 342: 321: 318: 263: 260: 250: 247: 214: 213: 212:quadword only 210: 207: 201: 200: 197: 196:Java, non-Java 194: 191:floating point 187: 186: 183: 180: 179:byte ordering 176: 175: 172: 169: 165: 164: 163:8, 16, 32, 64 161: 158: 151: 150: 147: 144: 140: 139: 136: 133: 126: 125: 122: 119: 112: 111: 108: 105: 82: 79: 77: 74: 72: 69: 41: 38: 32: 29: 13: 10: 9: 6: 4: 3: 2: 810: 799: 796: 794: 791: 789: 786: 785: 783: 768: 765: 763: 760: 758: 755: 753: 750: 748: 745: 743: 740: 738: 735: 734: 732: 728: 722: 719: 717: 714: 712: 709: 707: 706:Peter Hofstee 704: 702: 699: 698: 696: 692: 686: 683: 681: 678: 676: 673: 671: 668: 666: 663: 661: 658: 657: 655: 651: 645: 642: 640: 637: 635: 632: 630: 627: 625: 622: 618: 615: 613: 610: 609: 608: 605: 603: 600: 599: 597: 593: 587: 584: 580: 577: 576: 575: 572: 568: 565: 564: 563: 560: 559: 557: 553: 547: 544: 542: 539: 537: 534: 533: 530: 526: 519: 514: 512: 507: 505: 500: 499: 496: 486:. March 2008. 482: 476: 473: 465: 459: 456: 452:. March 2006. 448: 442: 439: 434: 433: 426: 423: 419:on 2006-04-11 418: 414: 413: 406: 403: 400: 395: 392: 384: 378: 376: 372: 366: 363: 361: 358: 356: 353: 351: 348: 347: 343: 341: 338: 335: 333: 328: 325: 319: 317: 315: 310: 308: 302: 300: 296: 292: 286: 284: 280: 277: 273: 269: 261: 259: 256: 248: 246: 242: 240: 236: 232: 231:IEEE Standard 228: 224: 222: 211: 209:quadword only 208: 206: 203: 202: 198: 195: 192: 189: 188: 184: 181: 178: 177: 173: 170: 167: 166: 162: 159: 156: 153: 152: 148: 145: 142: 141: 137: 134: 132: 128: 127: 123: 120: 117: 114: 113: 109: 106: 103: 102: 99: 95: 90: 88: 80: 75: 70: 68: 66: 62: 58: 54: 50: 46: 39: 37: 31:Linux on Cell 30: 28: 26: 22: 18: 716:Ken Kutaragi 684: 665:Folding@home 555:Architecture 475: 458: 441: 431: 425: 417:the original 411: 405: 394: 388:. June 2007. 339: 336: 329: 326: 323: 311: 303: 295:instructions 287: 265: 252: 243: 219: 217: 97: 93: 84: 44: 43: 34: 16: 15: 701:David Bader 680:PhyreEngine 602:Fabrication 567:SpursEngine 274:version of 185:big endian 782:Categories 344:References 299:intrinsics 255:intrinsics 249:Intrinsics 129:number of 98:unfinished 51:prototype 798:Vaporware 793:Compilers 747:Power ISA 737:Gameframe 307:algorithm 160:8, 16, 32 131:registers 59:to write 55:to allow 45:Octopiler 40:Octopiler 653:Software 617:clusters 279:Mac OS X 239:denormal 218:The VMX 157:formats 124:32 bits 53:compiler 19:for the 675:OtherOS 541:Toshiba 283:Altivec 276:Apple's 272:PowerPC 155:integer 121:32 bits 104:feature 25:PowerPC 694:People 660:Apulet 612:models 193:modes 579:Xenon 484:(PDF) 467:(PDF) 450:(PDF) 386:(PDF) 118:size 49:IBM's 757:SIMD 730:Misc 644:Zego 536:Sony 223:mode 221:Java 138:128 116:word 110:SPU 85:The 63:for 61:code 546:IBM 235:C9X 174:no 171:yes 107:VMX 87:VMX 47:is 784:: 374:^ 135:32 67:. 517:e 510:t 503:v

Index

Cell microprocessor
PowerPC
IBM's
compiler
software developers
code
Cell processors
VMX
word
registers
integer
floating point
Memory alignment
Java
Java Language Specification
IEEE Standard
C9X
denormal
intrinsics
IBM Power microprocessors
PowerPC
Apple's
Mac OS X
Altivec
binary code format
instructions
intrinsics
algorithm
vectorization model
software pipelining

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.