Stream processing - Knowledge (XXG)

1508:, the SRF is shared between all the various ALU clusters. The key concept and innovation here done with Stanford's Imagine chip is that the compiler is able to automate and allocate memory in an optimal way, fully transparent to the programmer. The dependencies between kernel functions and data is known through the programming model which enables the compiler to perform flow analysis and optimally pack the SRFs. Commonly, this cache and DMA management can take up the majority of a project's schedule, something the stream processor (or at least Imagine) totally automates. Tests done at Stanford showed that the compiler did an as well or better job at scheduling memory than if you hand tuned the thing with much effort. 3246: 962:(AoS) topology. This means that should some algorithim be applied to the location of each particle in turn it must skip over memory locations containing the other attributes. If these attributes are not needed this results in wasteful usage of the CPU cache. Additionally, a SIMD instruction will typically expect the data it will operate on to be contiguous in memory, the elements may also need to be 1532: 800: 1766:

machine. In the native programming model all DMA and program scheduling is left up to the programmer. The hardware provides a fast ring bus among the processors for local communication. Because the local memory for instructions and data is limited the only programs that can exploit this architecture

1086:

More modern stream processing frameworks provide a FIFO like interface to structure data as a literal stream. This abstraction provides a means to specify data dependencies implicitly while enabling the runtime/hardware to take full advantage of that knowledge for efficient computation. One of the

769:

In this paradigm, the whole dataset is defined, rather than each component block being defined separately. Describing the set of data is assumed to be in the first two rows. After that, the result is inferred from the sources and kernel. For simplicity, there's a 1:1 mapping between input and output

203:

is a specific type of temporal locality common in signal and media processing applications where data is produced once, read once or twice later in the application, and never read again. Intermediate streams passed between kernels as well as intermediate data within kernel functions can capture this

1466:

The stream processor is usually equipped with a fast, efficient, proprietary memory bus (crossbar switches are now common, multi-buses have been employed in the past). The exact amount of memory lanes is dependent on the market range. As this is written, there are still 64-bit wide interconnections

785:

Although there are various degrees of flexibility allowed by the model, stream processors usually impose some limitations on the kernel or stream size. For example, consumer hardware often lacks the ability to perform high-precision math, lacks complex indirection chains or presents lower limits on

372:

Another sample code fragment detects weddings among a flow of external "events" such as church bells ringing, the appearance of a man in a tuxedo or morning suit, a woman in a flowing white gown and rice flying through the air. A "complex" or "composite" event is what one infers from the individual

1066:

Instead of holding the data in the structure, it holds only pointers (memory locations) for the data. Shortcomings are that if an multiple attributes to of an object are to be operated on they might now be distant in memory and so result in a cache miss. The aligning and any needed padding lead to

486:

based, which means they conceptually perform only one operation at a time. As the computing needs of the world evolved, the amount of data to be managed increased very quickly. It was obvious that the sequential programming model could not cope with the increased need for processing power. Various

1590:

improved this with full-duplex communications, getting a GPU (and possibly a generic stream processor) to work will possibly take long amounts of time. This means it's usually counter-productive to use them for small datasets. Because changing the kernel is a rather expensive operation the stream

1455:

Historically, CPUs began implementing various tiers of memory access optimizations because of the ever-increasing performance when compared to relatively slow growing external memory bandwidth. As this gap widened, big amounts of die area were dedicated to hiding memory latencies. Since fetching

1619:

is an early (circa 1985) graphics processor capable of combining three source streams of 16 component bit vectors in 256 ways to produce an output stream consisting of 16 component bit vectors. Total input stream bandwidth is up to 42 million bits per second. Output stream bandwidth is up to 28

238:

query (a query that executes forever processing arriving data based on timestamps and window duration). This code fragment illustrates a JOIN of two data streams, one for stock orders, and one for the resulting stock trades. The query outputs a stream of all Orders matched by a Trade within one

1601:

is a very widespread and heavily used practice on stream processors, with GPUs featuring pipelines exceeding 200 stages. The cost for switching settings is dependent on the setting being modified but it is now considered to always be expensive. To avoid those problems at various levels of the

1070:

For stream processors, the usage of structures is encouraged. From an application point of view, all the attributes can be defined with some flexibility. Taking GPUs as reference, there is a set of attributes (at least 16) available. For each attribute, the application can state the number of

777:

While stream processing is a branch of SIMD/MIMD processing, they must not be confused. Although SIMD implementations can often work in a "streaming" manner, their performance is not comparable: the model envisions a very different usage pattern which allows far greater performance by itself.

858:

The most immediate challenge in the realm of parallel processing does not lie as much in the type of hardware architecture used, but in how easy it will be to program the system in question in a real-world environment with acceptable performance. Machines like Imagine use a straightforward

781:

It has been noted that when applied on generic processors such as standard CPU, only a 1.5x speedup can be reached. By contrast, ad-hoc stream processors easily reach over 10x performance, mainly attributed to the more efficient memory access and higher levels of parallel processing.

1714:

R2xx/NV2x: kernel stream operations became explicitly under the programmer's control but only for vertex processing (fragments were still using old paradigms). No branching support severely hampered flexibility but some types of algorithms could be run (notably, low-precision fluid

502:

Although those two paradigms were efficient, real-world implementations were plagued with limitations from memory alignment problems to synchronization issues and limited parallelism. Only few SIMD processors survived as stand-alone components; most were embedded in standard CPUs.

1467:

around (entry-level). Most mid-range models use a fast 128-bit crossbar switch matrix (4 or 2 segments), while high-end models deploy huge amounts of memory (actually up to 512 MB) with a slightly slower crossbar that is 256 bits wide. By contrast, standard processors from

1511:

There is proof; there can be a lot of clusters because inter-cluster communication is assumed to be rare. Internally however, each cluster can efficiently exploit a much lower amount of ALUs because intra-cluster communication is common and thus needs to be highly efficient.

1841:

Cal2Many a code generation framework from Halmstad University, Sweden. It takes CAL code as input and generates different target specific languages including sequential C, Chisel, parallel C targeting Epiphany architecture, ajava & astruct targeting Ambric architecture,

150:(DMA) when dependencies become known. The elimination of manual DMA management reduces software complexity, and an associated elimination for hardware cached I/O, reduces the data area expanse that has to be involved with service by specialized computational units such as 685:

What happened however is that the packed SIMD register holds a certain amount of data so it's not possible to get more parallelism. The speed up is somewhat limited by the assumption we made of performing four parallel operations (please note this is common for both

1650:

project, called Merrimac, is aimed at developing a stream-based supercomputer. Merrimac intends to use a stream architecture and advanced interconnection networks to provide more performance per unit cost than cluster-based scientific computers built from the same

1462:

Beginning from a whole system point of view, stream processors usually exist in a controlled environment. GPUs do exist on an add-in board (this seems to also apply to Imagine). CPUs continue do the job of managing system resources, running applications, and such.

142:, where one kernel function is applied to all elements in the stream, is typical. Since the kernel and stream abstractions expose data dependencies, compiler tools can fully automate and optimize on-chip management tasks. Stream processing hardware can use 1810:, FPGA). Applications can be developed in any combination of C, C++, and Java for the CPU. Verilog or VHDL for FPGAs. Cuda is currently used for Nvidia GPGPUs. Auto-Pipe also handles coordination of TCP connections between multiple machines. 1585:

Although an order of magnitude speedup can be reasonably expected (even from mainstream GPUs when computing in a streaming manner), not all applications benefit from this. Communication latencies are actually the biggest problem. Although

773:

An implementation of this paradigm can "unroll" a loop internally. This allows throughput to scale with chip complexity, easily utilizing hundreds of ALUs. The elimination of complex data patterns makes much of this extra power available.

177:) but less so for general purpose processing with more randomized data access (such as databases). By sacrificing some flexibility in the model, the implications allow easier, faster and more efficient execution. Depending on the context, 867:

between programmer, tools and hardware. Programmers beat tools in mapping algorithms to parallel hardware, and tools beat programmers in figuring out smartest memory allocation schemes, etc. Of particular concern are MIMD designs such as

1485:

Because of the SIMD nature of the stream processor's execution units (ALUs clusters), read/write operations are expected to happen in bulk, so memories are optimized for high bandwidth rather than low latency (this is a difference from

1602:

pipeline, many techniques have been deployed such as "über shaders" and "texture atlases". Those techniques are game-oriented because of the nature of GPUs, but the concepts are interesting for generic stream processing as well.

225:

For each record we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable.

1781:

Most programming languages for stream processors start with Java, C or C++ and add extensions which provide specific instructions to allow application developers to tag kernels and/or streams. This also applies to most

1631:, is a flexible architecture intended to be both fast and energy efficient. The project, originally conceived in 1996, included architecture, software tools, a VLSI implementation and a development board, was funded by 1771:

or adhere to a stream programming model. With a suitable algorithm the performance of the Cell can rival that of pure stream processors, however this nearly always requires a complete redesign of algorithms and

841:

stream processing projects included the Stanford Real-Time Programmable Shading Project started in 1999. A prototype called Imagine was developed in 2002. A project called Merrimac ran until about 2004.

487:

efforts have been spent on finding alternative ways to perform massive amounts of computations but the only solution was to exploit some level of parallel execution. The result of those efforts was

1478:

Memory access patterns are much more predictable. While arrays do exist, their dimension is fixed at kernel invocation. The thing which most closely matches a multiple pointer indirection is an

197:

exists in a kernel if the same function is applied to all records of an input stream and a number of records can be processed simultaneously without waiting for results from previous records.

1758:, is a hardware architecture that can function like a stream processor with appropriate software support. It consists of a controlling processor, the PPE (Power Processing Element, an IBM 1497:

Most (90%) of a stream processor's work is done on-chip, requiring only 1% of the global data to be stored to memory. This is where knowing the kernel temporaries and dependencies pays.

876: 191:, the number of arithmetic operations per I/O or global memory reference. In many signal processing applications today it is well over 50:1 and increasing with algorithmic complexity. 2572: 1504:(SRF). This is conceptually a large cache in which stream data is stored to be transferred to external memory in bulks. As a cache-like software-controlled structure to the various 1456:

information and opcodes to those few ALUs is expensive, very little die area is dedicated to actual mathematical machinery (as a rough estimation, consider it to be less than 10%).

234:

By way of illustration, the following code fragments demonstrate detection of patterns within event streams. The first is an example of processing a data stream using a continuous

1071:

components and the format of the components (but only primitive data types are supported for now). The various attributes are then attached to a memory block, possibly defining a

1718:

R3xx/NV4x: flexible branching support although some limitations still exist on the number of operations to be executed and strict recursion depth, as well as array manipulation.

1518:

This three-tiered data access pattern, makes it easy to keep temporary data away from slow memories, thus making the silicon implementation highly efficient and power-saving.

1865: 966:. By moving the memory location of the data out of the structure data can be better organised for efficient access in a stream and for SIMD instructions to operate one. A 1838:: a high-level programming language for writing (dataflow) actors, which are stateful operators that transform input streams of data objects (tokens) into output streams. 173:

Stream processing is essentially a compromise, driven by a data-centric model that works very well for traditional DSP or GPU-type applications (such as image, video and

682:. The number of jump instructions is also decreased, as the loop is run fewer times. These gains result from the parallel execution of the four mathematical operations. 582:

This is the sequential paradigm that is most familiar. Variations do exist (such as inner loops, structures and such), but they ultimately boil down to that construct.

1459:

A similar architecture exists on stream processors but thanks to the new programming model, the amount of transistors dedicated to management is actually very little.

879:. Programmers often create representations of enitities in memory, for example, the location of an particle in 3D space, the colour of the ball and its size as below: 850:

rapidly evolved in both speed and functionality. Since these early days, dozens of stream processing languages have been developed, as well as specialized hardware.

124:

The stream processing paradigm simplifies parallel software and hardware by restricting the parallel computation that can be performed. Given a sequence of data (a

2662: 2514: 1762:) and a set of SIMD coprocessors, called SPEs (Synergistic Processing Elements), each with independent program counters and instruction memory, in effect a 1954:

IBM Spade - Stream Processing Application Declarative Engine (B. Gedik, et al. SPADE: the system S declarative stream processing engine. ACM SIGMOD 2008.)

1079:

all the various attributes in a single set of parameters (usually this looks like a structure or a "magic global variable"), performs the operations and

872:, for which the programmer needs to deal with application partitioning across multiple cores and deal with process synchronization and load balancing. 491:, a programming paradigm which allowed applying one instruction to multiple instances of (different) data. Most of the time, SIMD was being used in a 671:, much information is actually not taken into account here such as the number of vector components and their data format. This is done for clarity. 1075:

between 'consecutive' elements of the same attributes, effectively allowing interleaved data. When the GPU begins the stream processing, it will

138:, and optimal local on-chip memory reuse is attempted, in order to minimize the loss in bandwidth, associated with external memory interaction. 2643: 1911:

Commercial implementations are either general purpose or tied to specific hardware by a vendor. Examples of general purpose languages include:

2415: 2233: 2910: 2181: 1806:, an application development environment for streaming applications that allows authoring of applications for heterogeneous systems (CPU, 818: 2933: 1814: 1763: 496: 1515:

To keep those ALUs fetched with data, each ALU is equipped with local register files (LRFs), which are basically its usable registers.

2822: 1859: 1803: 1670:

2007. The family contains four members ranging from 30 GOPS to 220 16-bit GOPS (billions of operations per second), all fabricated at

2928: 2905: 1572: 488: 2507: 2388:

Memeti, Suejb; Pllana, Sabri (October 2018). "HSTREAM: A Directive-Based Language Extension for Heterogeneous Stream Computing".

2166: 1443:

Apart from specifying streaming applications in high-level languages, models of computation (MoCs) also have been widely used as

239:

second of the Order being placed. The output stream is sorted by timestamp, in this case, the timestamp from the Orders stream.

2900: 2715: 1904: 483: 3281: 3276: 3271: 3007: 2870: 1846: 1067:

increased memory usage. Overall, memory management may be more complicated if structures are added and removed for example.

3231: 3065: 2683: 2603: 2136: 118: 98: 2176: 1500:

Internally, a stream processor features some clever communication and management circuits but what's interesting is the

3250: 3196: 2656: 2500: 1948: 1883: 1823:

BeepBeep, a simple and lightweight Java-based event stream processing library from the Formal Computer Science Lab at

1747: 1864:

SPar - C++ domain-specific language for expressing stream parallelism from the Application Modelling Group (GMAP) at

1550: 1858:

RaftLib - open source C++ stream processing template library originally from the Stream Based Supercomputing Lab at

3175: 2970: 2855: 2817: 2667: 2557: 1924:

Java extension that enables a simple expression of stream programming, the Actor model, and the MapReduce algorithm

3191: 3170: 3115: 3002: 2992: 2965: 2827: 2156: 2006: 1796:

Free Edition, enables a simple expression of stream programming, the actor model, and the MapReduce algorithm on

691: 174: 3145: 2771: 2710: 2623: 2055:

Batch file-based processing (emulates some of actual stream processing, but much lower performance in general)

2027: 1675: 963: 847: 114: 3060: 3206: 3201: 2651: 479: 178: 2336: 2210: 2945: 2877: 2781: 2673: 2628: 2146: 2012: 1874: 1659: 1554: 184:

Stream processing is especially suitable for applications that exhibit three application characteristics:

102: 2735: 1542: 3037: 2997: 2950: 2940: 2678: 2598: 2537: 2246: 2151: 1683: 1505: 869: 151: 106: 71: 1482:, which is however guaranteed to finally read or write from a specific memory area (inside a stream). 2977: 2865: 2860: 2850: 2837: 2633: 1850: 1721:

R8xx: Supports append/consume buffers and atomic operations. This generation is the state of the art.

1598: 953: 860: 158: 147: 135: 67: 63: 51: 2272: 3140: 3095: 2921: 2916: 2895: 2761: 2186: 2171: 2141: 1751: 1628: 968: 958: 838: 221:

In wireless signal processing, each record could be a sequence of samples received from an antenna.

110: 82: 3165: 3014: 2987: 2812: 2776: 2766: 2567: 2547: 2542: 2523: 2393: 2161: 2131: 1835: 1687: 1679: 863:

scheduling. This in itself is a result of the research at MIT and Stanford in finding an optimal

668: 507: 78: 55: 2725: 1707:

Pre-R2xx/NV2x: no explicit support for stream processing. Kernel operations were hidden in the

3211: 2887: 2740: 2411: 2221: 2031: 1640: 90: 2259: 3221: 3020: 2955: 2802: 2618: 2613: 2608: 2577: 2442: 2403: 2191: 2023: 1988: 1783: 1768: 31: 2458: 2430: 212:

In graphics, each record might be the vertex, normal, and color information for a triangle;

3085: 3025: 2960: 2807: 2797: 2730: 2562: 2552: 74: 2720: 952:

When multiple of these structures exist in memory they are placed end to end creating an

3286: 3216: 3032: 2689: 2582: 1739: 1726: 1092: 130: 94: 86: 3265: 3105: 2982: 1591:

architecture also incurs penalties for small streams, a behaviour referred to as the

1468: 143: 2705: 2084: 2074: 2064: 2059: 1732: 2234:"A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing" 770:

data but this does not need to be. Applied kernels can also be much more complex.

218:

In a video encoder, each record may be 256 pixels forming a macroblock of data; or

17: 2390:

2018 IEEE International Conference on Computational Science and Engineering (CSE)

2286: 674:

You can see however, this method reduces the number of decoded instructions from

3226: 2069: 2049: 2035: 1915: 1855:

HSTREAM: a directive-based language extension for heterogeneous stream computing

1587: 59: 1624: 1087:

simplest and most efficient stream processing modalities to date for C++, is

58:, or sequences of events in time, as the central input and output objects of 3100: 3075: 2365: 2311: 1957: 1786:, which can be considered stream programming languages to a certain degree. 1491: 1472: 2407: 181:

design may be tuned for maximum efficiency or a trade-off for flexibility.

1704:. Various generations to be noted from a stream processing point of view: 506:

Consider a simple program adding up two arrays containing 100 4-component

478:

Basic computers started from a sequential execution paradigm. Traditional

3150: 3130: 3055: 2002: 1980: 1944: 1937: 1931: 1921: 1830: 1793: 1647: 1444: 1095:

together as a data flow graph using C++ stream operators. As an example:

859:

single-threaded model with automated dependencies, memory allocation and

134:) is applied to each element in the stream. Kernel functions are usually 2275:, Universities of Stanford, Rice, California (Davis) and Reservoir Labs. 1494:, for example). This also allows for efficient memory bus negotiations. 3155: 3135: 3110: 2745: 2474: 1886:, which provides separation of coordination and algorithmic programming 1759: 1612: 1088: 843: 687: 495:

environment. By using more complicated structures, one could also have

215:

In image processing, each record might be a single pixel from an image;

1927:

Embiot, a lightweight embedded streaming analytics agent from Telchemy

1388:/** add kernels to map, both hello and p are executed concurrently **/ 1083:

the results to some memory area for later processing (or retrieving).

3125: 3120: 2492: 2479: 1997: 1961: 1870: 1818: 1701: 1487: 1696:

are widespread, consumer-grade stream processors designed mainly by

2398: 1674:

in a 130 nanometer process. The devices target the high end of the

1965: 1807: 1667: 1636: 1632: 1616: 162: 1789:

Non-commercial examples of stream programming languages include:

3160: 3090: 3080: 2376: 2344: 2222:

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs

2044: 1993: 1896: 1824: 1671: 492: 204:

locality directly using the stream processing programming model.

2496: 3070: 3047: 1984: 1890: 1797: 1755: 1708: 1697: 1693: 1525: 793: 235: 2431:

PeakStream unveils multicore and CPU/GPU programming solution

663:

This is actually oversimplified. It assumes the instruction

27:

Programming paradigm for parallel processing of data streams

704:// This is a fictional language for demonstration purposes. 2271:

Kapasi, Dally, Rixner, Khailany, Owens, Ahn and Mattson,

165:(Streams and Iteration in a Single Assignment Language). 2092:

Eclipse Streamsheets - spreadsheet for stream processing

1918:' Jacket, a commercialization of a GPU engine for MATLAB 1666:

project, was announced during a feature presentation at

1907:

could be considered stream processing in a broad sense.

1901:

WaveScript functional stream processing, also from MIT.

1802:

Auto-Pipe, from the Stream Based Supercomputing Lab at

877:

array-of-structures (AoS) and structure-of-arrays (SoA)

814: 157:

During the 1980s stream processing was explored within

2337:"Merrimac - Stanford Streaming Supercomputer Project" 2015:, a commercialization of the Imagine work at Stanford 1711:

and provided too little flexibility for general use.

3184: 3046: 2886: 2836: 2790: 2754: 2698: 2642: 2591: 2530: 2245:Khailany, Dally, Rixner, Kapasi, Owens and Towles: 1934:

game engine for PlayStation 3, Xbox360, Wii, and PC

1866:

Pontifical Catholic University of Rio Grande do Sul

786:the number of instructions which can be executed. 2287:"Stanford Real-Time Programmable Shading Project" 2260:"Stream processing in General-Purpose Processors" 2247:"Exploring VLSI Scalability of Stream Processors" 2236:, Stanford University and Stream Processors, Inc. 1979:Brook+ (AMD hardware optimized implementation of 1971:TStreams, Hewlett-Packard Cambridge Research Lab 1930:Floodgate, a stream processor provided with the 932:// 8 bit per channel, say we care about RGB only 875:A drawback of SIMD programming was the issue of 2109:Datastreams - Data streaming analytics platform 1940:, a "directive" vision of Many-Core programming 586:Parallel SIMD paradigm, packed registers (SWAR) 944:// ... and many other attributes may follow... 846:also researched stream-enhanced processors as 819:sources that evaluate within a broader context 89:for these systems includes components such as 2508: 8: 1553:. There might be a discussion about this on 208:Examples of records within streams include: 1996:(Compute Unified Device Architecture) from 1439:Models of computation for stream processing 884:// A particle in a three-dimensional space. 2515: 2501: 2493: 2312:"The Imagine - Image and Signal Processor" 1777:Stream programming libraries and languages 667:works. Although this is what happens with 77:. Stream processing systems aim to expose 2460:TStreams: How to Write a Parallel Program 2444:TStreams: A Model of Parallel Computation 2397: 1735:brand name for product line targeting HPC 1729:brand name for product line targeting HPC 1573:Learn how and when to remove this message 1475:have only a single 64-bit wide data bus. 1813:ACOTES programming model: language from 2203: 972:(SoA), as shown below, can allow this. 373:simple events: a wedding is happening. 2080:Continuous operator stream processing 1662:, a commercial spin-off of Stanford's 1358:/** instantiate hello world kernel **/ 474:Comparison to prior parallel paradigms 2232:IEEE Journal of Solid-State Circuits: 1882:S-Net coordination language from the 809:focuses too much on specific examples 7: 2475:"GitHub - walmartlabs/Mupd8: Muppet" 1091:, which enables linking independent 698:Parallel stream paradigm (SIMD/MIMD) 1975:Vendor-specific languages include: 1815:Polytechnic University of Catalonia 2211:A SHORT INTRO TO STREAM PROCESSING 2106:Microsoft Azure - Stream analytics 1860:Washington University in St. Louis 1804:Washington University in St. Louis 1767:effectively either require a tiny 676:numElements * componentsPerElement 85:for efficient implementation. The 25: 1825:Université du Québec à Chicoutimi 1447:models and process-based models. 514:Conventional, sequential paradigm 3245: 3244: 2273:"Programmable Stream Processors" 2167:Partitioned global address space 1879:Shallows, an open source project 1530: 1325:/** instantiate print kernel **/ 798: 62:. Stream processing encompasses 2716:Analysis of parallel algorithms 2249:, Stanford and Rice University. 1905:Functional reactive programming 1847:Technical University of Munich 1451:Generic processor architecture 119:field-programmable gate arrays 105:; and hardware components for 97:, for expressing computation; 1: 2663:Simultaneous and heterogenous 2137:Data Stream Management System 2100:Amazon Web Services - Kinesis 1943:PeakStream, a spinout of the 1623:Imagine, headed by Professor 510:(i.e. 400 numbers in total). 161:. An example is the language 146:, for example, to initiate a 81:for data streams and rely on 48:distributed stream processing 3251:Category: Parallel computing 2177:Real Time Streaming Protocol 2096:Stream processing services: 1884:University of Hertfordshire 1748:Sony Computer Entertainment 1522:Hardware-in-the-loop issues 128:), a series of operations ( 3303: 2558:High-performance computing 2392:. IEEE. pp. 138–145. 2120:Eventador SQLStreamBuilder 3240: 3192:Automatic parallelization 2828:Application checkpointing 2258:Gummaraju and Rosenblum, 2157:Molecular modeling on GPU 2043:WSO2 stream processor by 1960:, a commercialization of 1370:/** make a map object **/ 848:graphics processing units 175:digital signal processing 115:graphics processing units 99:stream management systems 1620:million bits per second. 1097: 974: 881: 701: 589: 517: 375: 241: 3207:Embarrassingly parallel 3202:Deterministic algorithm 2291:Research group web site 2115:IBM streaming analytics 2103:Google Cloud - Dataflow 2019:Event-Based Processing 1409:/** execute the map **/ 854:Programming model notes 101:, for distribution and 40:event stream processing 2922:Associative processing 2878:Non-blocking algorithm 2684:Clustered multi-thread 2408:10.1109/CSE.2018.00026 2262:, Stanford University. 2147:Flow-based programming 2013:Stream Processors, Inc 1875:University of Waterloo 1684:multifunction printers 1660:Stream Processors, Inc 669:instruction intrinsics 152:arithmetic logic units 44:data stream processing 3282:Models of computation 3277:Programming paradigms 3272:Computer architecture 3038:Hardware acceleration 2951:Superscalar processor 2941:Dataflow architecture 2538:Distributed computing 2152:Hardware acceleration 1947:project (acquired by 917:// not even an array! 2917:Pipelined processing 2866:Explicit parallelism 2861:Implicit parallelism 2851:Dataflow programming 2347:on December 18, 2013 2007:Throughput Computing 1851:University of Denver 1829:Brook language from 1543:confusing or unclear 1502:Stream Register File 815:improve this section 159:dataflow programming 148:direct memory access 111:floating-point units 83:streaming algorithms 68:reactive programming 64:dataflow programming 52:programming paradigm 3141:Parallel Extensions 2946:Pipelined processor 2463:(Technical report). 2447:(Technical report). 2187:Streaming algorithm 2172:Real-time computing 2142:Dimension reduction 1752:Toshiba Corporation 1629:Stanford University 1593:short stream effect 1551:clarify the section 969:structure of arrays 959:array of structures 839:Stanford University 79:parallel processing 3015:Massively parallel 2993:distributed shared 2813:Cache invalidation 2777:Instruction window 2568:Manycore processor 2548:Massively parallel 2543:Parallel computing 2524:Parallel computing 2162:Parallel computing 2132:Data stream mining 1845:DUP language from 1836:CAL Actor Language 1688:video surveillance 1680:video conferencing 634:// for each vector 411:"tuxedo" 91:programming models 18:Stream programming 3259: 3258: 3212:Parallel slowdown 2846:Stream processing 2736:Karp–Flatt metric 2417:978-1-5386-7649-3 2089:Walmartlabs Mupd8 2032:stream processing 1873:library from the 1784:shading languages 1746:, an alliance of 1678:market including 1641:Texas Instruments 1615:in the Commodore 1583: 1582: 1575: 1480:indirection chain 1259:"Hello World 865:layering of tasks 836: 835: 737:"@arg0" 189:Compute intensity 140:Uniform streaming 36:stream processing 16:(Redirected from 3294: 3248: 3247: 3222:Software lockout 3021:Computer cluster 2956:Vector processor 2911:Array processing 2896:Flynn's taxonomy 2803:Memory coherence 2578:Computer network 2517: 2510: 2503: 2494: 2485: 2484: 2471: 2465: 2464: 2455: 2449: 2448: 2439: 2433: 2428: 2422: 2421: 2401: 2385: 2379: 2374: 2368: 2363: 2357: 2356: 2354: 2352: 2343:. Archived from 2333: 2327: 2326: 2324: 2322: 2308: 2302: 2301: 2299: 2297: 2282: 2276: 2269: 2263: 2256: 2250: 2243: 2237: 2230: 2224: 2219: 2213: 2208: 2192:Vector processor 1769:memory footprint 1578: 1571: 1567: 1564: 1558: 1534: 1533: 1526: 1434: 1431: 1428: 1425: 1422: 1419: 1416: 1413: 1410: 1407: 1404: 1401: 1398: 1395: 1392: 1389: 1386: 1383: 1380: 1377: 1374: 1371: 1368: 1365: 1362: 1359: 1356: 1353: 1350: 1347: 1344: 1341: 1338: 1335: 1332: 1329: 1326: 1323: 1320: 1317: 1314: 1311: 1308: 1305: 1302: 1299: 1296: 1293: 1290: 1287: 1284: 1281: 1278: 1275: 1272: 1269: 1266: 1263: 1260: 1257: 1254: 1251: 1248: 1245: 1242: 1239: 1236: 1233: 1230: 1227: 1224: 1221: 1218: 1215: 1212: 1209: 1206: 1203: 1200: 1197: 1194: 1191: 1188: 1185: 1182: 1179: 1176: 1173: 1170: 1167: 1164: 1161: 1158: 1155: 1152: 1149: 1146: 1143: 1140: 1137: 1134: 1131: 1128: 1125: 1122: 1119: 1116: 1113: 1110: 1107: 1104: 1101: 1062: 1059: 1056: 1053: 1050: 1047: 1044: 1041: 1038: 1035: 1032: 1029: 1026: 1023: 1020: 1017: 1014: 1011: 1008: 1005: 1002: 999: 996: 993: 990: 987: 984: 981: 978: 948: 945: 942: 939: 936: 933: 930: 927: 924: 921: 918: 915: 912: 909: 906: 903: 900: 897: 894: 891: 888: 885: 831: 828: 822: 802: 801: 794: 765: 762: 759: 756: 753: 750: 747: 744: 741: 738: 735: 732: 729: 726: 723: 720: 717: 714: 711: 708: 705: 666: 659: 656: 653: 650: 647: 644: 641: 638: 635: 632: 629: 626: 623: 620: 617: 614: 611: 608: 605: 602: 599: 596: 593: 578: 575: 572: 569: 566: 563: 560: 557: 554: 551: 548: 545: 542: 539: 536: 533: 530: 527: 524: 521: 469: 466: 463: 460: 457: 454: 451: 448: 445: 442: 439: 436: 435:"gown" 433: 430: 427: 424: 421: 418: 415: 412: 409: 406: 403: 400: 397: 394: 391: 388: 385: 382: 379: 368: 365: 362: 359: 356: 353: 350: 347: 344: 341: 338: 335: 332: 329: 326: 323: 320: 317: 314: 311: 308: 305: 302: 299: 296: 293: 290: 287: 284: 281: 278: 275: 272: 269: 266: 263: 260: 257: 254: 251: 248: 245: 195:Data parallelism 131:kernel functions 32:computer science 21: 3302: 3301: 3297: 3296: 3295: 3293: 3292: 3291: 3262: 3261: 3260: 3255: 3236: 3180: 3086:Coarray Fortran 3042: 3026:Beowulf cluster 2882: 2832: 2823:Synchronization 2808:Cache coherence 2798:Multiprocessing 2786: 2750: 2731:Cost efficiency 2726:Gustafson's law 2694: 2638: 2587: 2563:Multiprocessing 2553:Cloud computing 2526: 2521: 2490: 2488: 2473: 2472: 2468: 2457: 2456: 2452: 2441: 2440: 2436: 2429: 2425: 2418: 2387: 2386: 2382: 2375: 2371: 2364: 2360: 2350: 2348: 2335: 2334: 2330: 2320: 2318: 2310: 2309: 2305: 2295: 2293: 2284: 2283: 2279: 2270: 2266: 2257: 2253: 2244: 2240: 2231: 2227: 2220: 2216: 2209: 2205: 2201: 2196: 2127: 1968:in August 2009) 1779: 1608: 1579: 1568: 1562: 1559: 1548: 1535: 1531: 1524: 1453: 1441: 1436: 1435: 1432: 1429: 1426: 1423: 1420: 1417: 1414: 1411: 1408: 1405: 1402: 1399: 1396: 1393: 1390: 1387: 1384: 1381: 1378: 1375: 1372: 1369: 1366: 1363: 1360: 1357: 1354: 1351: 1348: 1345: 1342: 1339: 1336: 1333: 1330: 1327: 1324: 1321: 1318: 1315: 1312: 1309: 1306: 1303: 1300: 1297: 1294: 1291: 1288: 1285: 1282: 1279: 1276: 1273: 1270: 1267: 1264: 1261: 1258: 1255: 1252: 1249: 1246: 1243: 1240: 1237: 1234: 1231: 1228: 1225: 1222: 1219: 1216: 1213: 1210: 1207: 1204: 1201: 1198: 1195: 1192: 1189: 1186: 1183: 1180: 1177: 1174: 1171: 1168: 1165: 1162: 1159: 1156: 1153: 1150: 1147: 1144: 1141: 1138: 1135: 1132: 1129: 1126: 1123: 1120: 1117: 1115:<cstdlib> 1114: 1111: 1108: 1105: 1102: 1099: 1093:compute kernels 1064: 1063: 1060: 1057: 1054: 1051: 1048: 1045: 1042: 1039: 1036: 1033: 1030: 1027: 1024: 1021: 1018: 1015: 1012: 1009: 1006: 1003: 1000: 997: 994: 991: 988: 985: 982: 979: 976: 950: 949: 946: 943: 940: 937: 934: 931: 928: 925: 922: 919: 916: 913: 910: 907: 904: 901: 898: 895: 892: 889: 886: 883: 856: 832: 826: 823: 812: 803: 799: 792: 767: 766: 763: 760: 757: 754: 751: 748: 745: 742: 739: 736: 733: 730: 727: 724: 721: 718: 715: 712: 709: 706: 703: 700: 664: 661: 660: 657: 654: 651: 648: 645: 642: 639: 636: 633: 630: 627: 624: 621: 618: 615: 612: 609: 606: 603: 600: 597: 594: 591: 588: 580: 579: 576: 573: 570: 567: 564: 561: 558: 555: 552: 549: 546: 543: 540: 537: 534: 531: 528: 525: 522: 519: 516: 476: 471: 470: 467: 464: 461: 458: 455: 452: 449: 446: 443: 440: 437: 434: 431: 428: 425: 422: 419: 416: 413: 410: 407: 404: 401: 398: 395: 393:"man" 392: 389: 386: 383: 380: 377: 370: 369: 366: 363: 360: 357: 354: 351: 348: 345: 342: 339: 336: 333: 330: 327: 324: 321: 318: 315: 312: 309: 306: 303: 300: 297: 294: 291: 288: 285: 282: 279: 276: 273: 270: 267: 264: 261: 258: 255: 252: 249: 246: 243: 232: 171: 95:query languages 75:data processing 38:(also known as 28: 23: 22: 15: 12: 11: 5: 3300: 3298: 3290: 3289: 3284: 3279: 3274: 3264: 3263: 3257: 3256: 3254: 3253: 3241: 3238: 3237: 3235: 3234: 3229: 3224: 3219: 3217:Race condition 3214: 3209: 3204: 3199: 3194: 3188: 3186: 3182: 3181: 3179: 3178: 3173: 3168: 3163: 3158: 3153: 3148: 3143: 3138: 3133: 3128: 3123: 3118: 3113: 3108: 3103: 3098: 3093: 3088: 3083: 3078: 3073: 3068: 3063: 3058: 3052: 3050: 3044: 3043: 3041: 3040: 3035: 3030: 3029: 3028: 3018: 3012: 3011: 3010: 3005: 3000: 2995: 2990: 2985: 2975: 2974: 2973: 2968: 2961:Multiprocessor 2958: 2953: 2948: 2943: 2938: 2937: 2936: 2931: 2926: 2925: 2924: 2919: 2914: 2903: 2892: 2890: 2884: 2883: 2881: 2880: 2875: 2874: 2873: 2868: 2863: 2853: 2848: 2842: 2840: 2834: 2833: 2831: 2830: 2825: 2820: 2815: 2810: 2805: 2800: 2794: 2792: 2788: 2787: 2785: 2784: 2779: 2774: 2769: 2764: 2758: 2756: 2752: 2751: 2749: 2748: 2743: 2738: 2733: 2728: 2723: 2718: 2713: 2708: 2702: 2700: 2696: 2695: 2693: 2692: 2690:Hardware scout 2687: 2681: 2676: 2671: 2665: 2660: 2654: 2648: 2646: 2644:Multithreading 2640: 2639: 2637: 2636: 2631: 2626: 2621: 2616: 2611: 2606: 2601: 2595: 2593: 2589: 2588: 2586: 2585: 2583:Systolic array 2580: 2575: 2570: 2565: 2560: 2555: 2550: 2545: 2540: 2534: 2532: 2528: 2527: 2522: 2520: 2519: 2512: 2505: 2497: 2487: 2486: 2466: 2450: 2434: 2423: 2416: 2380: 2369: 2358: 2341:Group web site 2328: 2316:Group web site 2303: 2277: 2264: 2251: 2238: 2225: 2214: 2202: 2200: 2197: 2195: 2194: 2189: 2184: 2179: 2174: 2169: 2164: 2159: 2154: 2149: 2144: 2139: 2134: 2128: 2126: 2123: 2122: 2121: 2118: 2117: 2116: 2110: 2107: 2104: 2101: 2094: 2093: 2090: 2087: 2078: 2077: 2072: 2067: 2062: 2053: 2052: 2047: 2041: 2038: 2017: 2016: 2009: 2000: 1991: 1973: 1972: 1969: 1955: 1952: 1941: 1935: 1928: 1925: 1919: 1909: 1908: 1902: 1899: 1893: 1889:StreamIt from 1887: 1880: 1877: 1868: 1862: 1856: 1853: 1843: 1839: 1833: 1827: 1821: 1811: 1800: 1778: 1775: 1774: 1773: 1740:Cell processor 1736: 1730: 1727:AMD FireStream 1724: 1723: 1722: 1719: 1716: 1712: 1691: 1652: 1644: 1621: 1607: 1604: 1581: 1580: 1538: 1536: 1529: 1523: 1520: 1452: 1449: 1440: 1437: 1121:<string> 1109:<raftio> 1098: 975: 882: 855: 852: 834: 833: 806: 804: 797: 791: 788: 702: 699: 696: 590: 587: 584: 518: 515: 512: 475: 472: 376: 242: 231: 228: 223: 222: 219: 216: 213: 206: 205: 198: 192: 170: 167: 87:software stack 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 3299: 3288: 3285: 3283: 3280: 3278: 3275: 3273: 3270: 3269: 3267: 3252: 3243: 3242: 3239: 3233: 3230: 3228: 3225: 3223: 3220: 3218: 3215: 3213: 3210: 3208: 3205: 3203: 3200: 3198: 3195: 3193: 3190: 3189: 3187: 3183: 3177: 3174: 3172: 3169: 3167: 3164: 3162: 3159: 3157: 3154: 3152: 3149: 3147: 3144: 3142: 3139: 3137: 3134: 3132: 3129: 3127: 3124: 3122: 3119: 3117: 3114: 3112: 3109: 3107: 3106:Global Arrays 3104: 3102: 3099: 3097: 3094: 3092: 3089: 3087: 3084: 3082: 3079: 3077: 3074: 3072: 3069: 3067: 3064: 3062: 3059: 3057: 3054: 3053: 3051: 3049: 3045: 3039: 3036: 3034: 3033:Grid computer 3031: 3027: 3024: 3023: 3022: 3019: 3016: 3013: 3009: 3006: 3004: 3001: 2999: 2996: 2994: 2991: 2989: 2986: 2984: 2981: 2980: 2979: 2976: 2972: 2969: 2967: 2964: 2963: 2962: 2959: 2957: 2954: 2952: 2949: 2947: 2944: 2942: 2939: 2935: 2932: 2930: 2927: 2923: 2920: 2918: 2915: 2912: 2909: 2908: 2907: 2904: 2902: 2899: 2898: 2897: 2894: 2893: 2891: 2889: 2885: 2879: 2876: 2872: 2869: 2867: 2864: 2862: 2859: 2858: 2857: 2854: 2852: 2849: 2847: 2844: 2843: 2841: 2839: 2835: 2829: 2826: 2824: 2821: 2819: 2816: 2814: 2811: 2809: 2806: 2804: 2801: 2799: 2796: 2795: 2793: 2789: 2783: 2780: 2778: 2775: 2773: 2770: 2768: 2765: 2763: 2760: 2759: 2757: 2753: 2747: 2744: 2742: 2739: 2737: 2734: 2732: 2729: 2727: 2724: 2722: 2719: 2717: 2714: 2712: 2709: 2707: 2704: 2703: 2701: 2697: 2691: 2688: 2685: 2682: 2680: 2677: 2675: 2672: 2669: 2666: 2664: 2661: 2658: 2655: 2653: 2650: 2649: 2647: 2645: 2641: 2635: 2632: 2630: 2627: 2625: 2622: 2620: 2617: 2615: 2612: 2610: 2607: 2605: 2602: 2600: 2597: 2596: 2594: 2590: 2584: 2581: 2579: 2576: 2574: 2571: 2569: 2566: 2564: 2561: 2559: 2556: 2554: 2551: 2549: 2546: 2544: 2541: 2539: 2536: 2535: 2533: 2529: 2525: 2518: 2513: 2511: 2506: 2504: 2499: 2498: 2495: 2491: 2482: 2481: 2476: 2470: 2467: 2462: 2461: 2454: 2451: 2446: 2445: 2438: 2435: 2432: 2427: 2424: 2419: 2413: 2409: 2405: 2400: 2395: 2391: 2384: 2381: 2378: 2373: 2370: 2367: 2362: 2359: 2346: 2342: 2338: 2332: 2329: 2317: 2313: 2307: 2304: 2292: 2288: 2281: 2278: 2274: 2268: 2265: 2261: 2255: 2252: 2248: 2242: 2239: 2235: 2229: 2226: 2223: 2218: 2215: 2212: 2207: 2204: 2198: 2193: 2190: 2188: 2185: 2183: 2180: 2178: 2175: 2173: 2170: 2168: 2165: 2163: 2160: 2158: 2155: 2153: 2150: 2148: 2145: 2143: 2140: 2138: 2135: 2133: 2130: 2129: 2124: 2119: 2114: 2113: 2111: 2108: 2105: 2102: 2099: 2098: 2097: 2091: 2088: 2086: 2083: 2082: 2081: 2076: 2073: 2071: 2068: 2066: 2063: 2061: 2058: 2057: 2056: 2051: 2048: 2046: 2042: 2039: 2037: 2033: 2029: 2028:complex event 2026:- a combined 2025: 2022: 2021: 2020: 2014: 2011:StreamC from 2010: 2008: 2004: 2001: 1999: 1995: 1992: 1990: 1986: 1982: 1978: 1977: 1976: 1970: 1967: 1964:(acquired by 1963: 1959: 1956: 1953: 1951:in June 2007) 1950: 1946: 1942: 1939: 1936: 1933: 1929: 1926: 1923: 1920: 1917: 1914: 1913: 1912: 1906: 1903: 1900: 1898: 1894: 1892: 1888: 1885: 1881: 1878: 1876: 1872: 1869: 1867: 1863: 1861: 1857: 1854: 1852: 1848: 1844: 1840: 1837: 1834: 1832: 1828: 1826: 1822: 1820: 1816: 1812: 1809: 1805: 1801: 1799: 1795: 1792: 1791: 1790: 1787: 1785: 1776: 1770: 1765: 1761: 1757: 1753: 1749: 1745: 1741: 1737: 1734: 1731: 1728: 1725: 1720: 1717: 1713: 1710: 1706: 1705: 1703: 1699: 1695: 1692: 1689: 1685: 1681: 1677: 1673: 1669: 1665: 1661: 1657: 1653: 1649: 1645: 1642: 1638: 1634: 1630: 1626: 1625:William Dally 1622: 1618: 1614: 1610: 1609: 1605: 1603: 1600: 1596: 1594: 1589: 1577: 1574: 1566: 1556: 1555:the talk page 1552: 1546: 1544: 1539:This section 1537: 1528: 1527: 1521: 1519: 1516: 1513: 1509: 1507: 1503: 1498: 1495: 1493: 1489: 1483: 1481: 1476: 1474: 1470: 1469:Intel Pentium 1464: 1460: 1457: 1450: 1448: 1446: 1438: 1205:"0" 1096: 1094: 1090: 1084: 1082: 1078: 1074: 1068: 973: 971: 970: 965: 961: 960: 955: 880: 878: 873: 871: 866: 862: 853: 851: 849: 845: 840: 830: 827:February 2023 820: 816: 810: 807:This section 805: 796: 795: 789: 787: 783: 779: 775: 771: 716:streamElement 697: 695: 693: 689: 683: 681: 677: 672: 670: 585: 583: 513: 511: 509: 504: 500: 499:parallelism. 498: 494: 490: 485: 481: 473: 374: 240: 237: 230:Code examples 229: 227: 220: 217: 214: 211: 210: 209: 202: 201:Data locality 199: 196: 193: 190: 187: 186: 185: 182: 180: 176: 168: 166: 164: 160: 155: 153: 149: 145: 144:scoreboarding 141: 137: 133: 132: 127: 122: 120: 116: 112: 108: 104: 100: 96: 92: 88: 84: 80: 76: 73: 69: 65: 61: 57: 53: 49: 45: 41: 37: 33: 19: 2845: 2791:Coordination 2721:Amdahl's law 2657:Simultaneous 2489: 2478: 2469: 2459: 2453: 2443: 2437: 2426: 2389: 2383: 2372: 2361: 2349:. Retrieved 2345:the original 2340: 2331: 2319:. Retrieved 2315: 2306: 2294:. Retrieved 2290: 2280: 2267: 2254: 2241: 2228: 2217: 2206: 2112:IBM streams 2095: 2085:Apache Flink 2079: 2075:Apache Spark 2065:Apache Storm 2060:Apache Kafka 2054: 2018: 1974: 1910: 1895:Siddhi from 1788: 1780: 1743: 1733:Nvidia Tesla 1715:simulation). 1686:and digital 1663: 1658:family from 1655: 1597: 1592: 1584: 1569: 1563:January 2008 1560: 1549:Please help 1540: 1517: 1514: 1510: 1501: 1499: 1496: 1484: 1479: 1477: 1465: 1461: 1458: 1454: 1442: 1427:EXIT_SUCCESS 1103:<raft> 1085: 1080: 1076: 1072: 1069: 1065: 967: 957: 951: 874: 864: 857: 837: 824: 813:Please help 808: 784: 780: 776: 772: 768: 731:streamKernel 684: 679: 675: 673: 662: 581: 505: 501: 477: 371: 233: 224: 207: 200: 194: 188: 183: 172: 169:Applications 156: 139: 129: 125: 123: 107:acceleration 54:which views 47: 43: 39: 35: 29: 3227:Scalability 2988:distributed 2871:Concurrency 2838:Programming 2679:Cooperative 2668:Speculative 2604:Instruction 2285:Eric Chan. 2070:Apache Apex 2050:Apache NiFi 2036:Software AG 1916:AccelerEyes 1651:technology. 1588:PCI Express 680:numElements 450:Rice_Flying 444:Church_Bell 331:'1' 72:distributed 60:computation 3266:Categories 3232:Starvation 2971:asymmetric 2706:PRAM model 2674:Preemptive 2399:1809.09387 2199:References 2034:engine by 1690:equipment. 1599:Pipelining 1545:to readers 1043:colorGreen 980:particle_t 890:particle_t 817:by adding 665:vector_sum 637:vector_sum 247:DataStream 109:including 103:scheduling 2966:symmetric 2711:PEM model 1958:RapidMind 1817:based on 1772:software. 1492:DDR SDRAM 1473:Athlon 64 1034:colorBlue 337:FOLLOWING 256:TimeStamp 179:processor 136:pipelined 3197:Deadlock 3185:Problems 3151:pthreads 3131:OpenHMPP 3056:Ateji PX 3017:computer 2888:Hardware 2755:Elements 2741:Slowdown 2652:Temporal 2634:Pipeline 2377:Merrimac 2351:March 9, 2321:March 9, 2296:March 9, 2125:See also 2040:Wallaroo 2005:- C for 2003:Intel Ct 1938:OpenHMPP 1932:Gamebryo 1922:Ateji PX 1831:Stanford 1794:Ateji PX 1648:Stanford 1646:Another 1606:Examples 1471:to some 1445:dataflow 1400:>> 1118:#include 1112:#include 1106:#include 1100:#include 1081:scatters 1025:colorRed 1016:unsigned 920:unsigned 844:AT&T 790:Research 761:elements 728:instance 707:elements 414:FOLLOWED 328:INTERVAL 3156:RaftLib 3136:OpenACC 3111:GPUOpen 3101:C++ AMP 3076:Charm++ 2818:Barrier 2762:Process 2746:Speedup 2531:General 2366:Imagine 1983:) from 1760:PowerPC 1664:Imagine 1656:Storm-1 1613:Blitter 1541:may be 1223:kstatus 1214:virtual 1184:addPort 1089:RaftLib 964:aligned 688:AltiVec 655:source1 649:source0 574:source1 568:source0 508:vectors 468:Wedding 429:Clothes 405:Clothes 364:orderId 352:orderId 268:orderId 56:streams 50:) is a 3249: 3126:OpenCL 3121:OpenMP 3066:Chapel 2983:shared 2978:Memory 2913:(SIMT) 2856:Models 2767:Thread 2699:Theory 2670:(SpMT) 2624:Memory 2609:Thread 2592:Levels 2480:GitHub 2414: 1998:Nvidia 1949:Google 1819:OpenMP 1754:, and 1702:Nvidia 1488:Rambus 1424:return 1346:string 1271:return 1265:" 1253:string 1235:output 1196:string 1178:output 1169:kernel 1148:public 1142:kernel 1133:public 1077:gather 1073:stride 977:struct 956:in an 954:arrays 887:struct 755:invoke 749:kernel 743:result 722:kernel 643:result 562:result 465:ACTION 456:WITHIN 432:EQUALS 423:Person 408:EQUALS 399:Person 390:EQUALS 387:Gender 381:Person 358:Trades 346:Orders 334:SECOND 316:Trades 310:Orders 304:amount 292:amount 286:Orders 280:ticker 274:Orders 262:Orders 250:Orders 244:SELECT 126:stream 117:, and 70:, and 3287:GPGPU 3096:Dryad 3061:Boost 2782:Array 2772:Fiber 2686:(CMT) 2659:(SMT) 2573:GPGPU 2394:arXiv 2024:Apama 1981:Brook 1966:Intel 1945:Brook 1842:etc.. 1808:GPGPU 1742:from 1668:ISSCC 1637:Intel 1633:DARPA 1617:Amiga 1397:hello 1364:hello 1334:print 1124:class 1049:float 986:float 935:float 926:color 896:float 713:array 462:hours 325:RANGE 298:Trade 163:SISAL 46:, or 3161:ROCm 3091:CUDA 3081:Cilk 3048:APIs 3008:COMA 3003:NUMA 2934:MIMD 2929:MISD 2906:SIMD 2901:SISD 2629:Loop 2619:Data 2614:Task 2412:ISBN 2353:2017 2323:2017 2298:2017 2182:SIMT 2045:WSO2 2030:and 1994:CUDA 1897:WSO2 1849:and 1764:MIMD 1738:The 1700:and 1694:GPUs 1672:TSMC 1654:The 1639:and 1611:The 1506:ALUs 1490:and 1373:raft 1349:> 1337:< 1328:raft 1316:argv 1310:char 1304:argc 1295:main 1280:stop 1274:raft 1241:push 1217:raft 1199:> 1187:< 1163:raft 1136:raft 1055:size 1019:byte 938:size 923:byte 870:Cell 690:and 616:< 544:< 497:MIMD 493:SWAR 489:SIMD 484:SISD 482:are 480:CPUs 378:WHEN 319:OVER 313:JOIN 307:FROM 93:and 3176:ZPL 3171:TBB 3166:UPC 3146:PVM 3116:MPI 3071:HPX 2998:UMA 2599:Bit 2404:doi 1989:ATI 1985:AMD 1891:MIT 1798:JVM 1756:IBM 1744:STI 1709:API 1698:AMD 1676:DSP 1627:of 1421:(); 1418:exe 1379:map 1340:std 1301:int 1292:int 1268:)); 1247:std 1226:run 1190:std 861:DMA 694:). 692:SSE 678:to 619:100 598:int 592:for 547:400 526:int 520:for 438:AND 396:AND 236:SQL 30:In 3268:: 2477:. 2410:. 2402:. 2339:. 2314:. 2289:. 1962:Sh 1871:Sh 1750:, 1682:, 1635:, 1595:. 1394:+= 1376::: 1361:hi 1343::: 1331::: 1313:** 1289:}; 1277::: 1262:\n 1250::: 1229:() 1220::: 1208:); 1193::: 1172:() 1166::: 1157:() 1154:hi 1139::: 1127:hi 1061:}; 947:}; 719:() 658:); 628:++ 625:el 613:el 601:el 556:++ 447:OR 420:BY 343:ON 154:. 121:. 113:, 66:, 42:, 34:, 2516:e 2509:t 2502:v 2483:. 2420:. 2406:: 2396:: 2355:. 2325:. 2300:. 1987:/ 1643:. 1576:) 1570:( 1565:) 1561:( 1557:. 1547:. 1433:} 1430:; 1415:. 1412:m 1406:; 1403:p 1391:m 1385:; 1382:m 1367:; 1355:; 1352:p 1322:{ 1319:) 1307:, 1298:( 1286:} 1283:; 1256:( 1244:( 1238:. 1232:{ 1211:} 1202:( 1181:. 1175:{ 1160:: 1151:: 1145:{ 1130:: 1058:; 1052:* 1046:; 1040:* 1037:, 1031:* 1028:, 1022:* 1013:; 1010:z 1007:* 1004:, 1001:y 998:* 995:, 992:x 989:* 983:{ 941:; 929:; 914:; 911:z 908:, 905:y 902:, 899:x 893:{ 829:) 825:( 821:. 811:. 764:) 758:( 752:. 746:= 740:) 734:( 725:= 710:= 652:, 646:, 640:( 631:) 622:; 610:; 607:0 604:= 595:( 577:; 571:+ 565:= 559:) 553:i 550:; 541:i 538:; 535:0 532:= 529:i 523:( 459:2 453:) 441:( 426:. 417:- 402:. 384:. 367:; 361:. 355:= 349:. 340:) 322:( 301:. 295:, 289:. 283:, 277:. 271:, 265:. 259:, 253:. 20:)

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index