1508:, the SRF is shared between all the various ALU clusters. The key concept and innovation here done with Stanford's Imagine chip is that the compiler is able to automate and allocate memory in an optimal way, fully transparent to the programmer. The dependencies between kernel functions and data is known through the programming model which enables the compiler to perform flow analysis and optimally pack the SRFs. Commonly, this cache and DMA management can take up the majority of a project's schedule, something the stream processor (or at least Imagine) totally automates. Tests done at Stanford showed that the compiler did an as well or better job at scheduling memory than if you hand tuned the thing with much effort.
3246:
962:(AoS) topology. This means that should some algorithim be applied to the location of each particle in turn it must skip over memory locations containing the other attributes. If these attributes are not needed this results in wasteful usage of the CPU cache. Additionally, a SIMD instruction will typically expect the data it will operate on to be contiguous in memory, the elements may also need to be
1532:
800:
1766:
machine. In the native programming model all DMA and program scheduling is left up to the programmer. The hardware provides a fast ring bus among the processors for local communication. Because the local memory for instructions and data is limited the only programs that can exploit this architecture
1086:
More modern stream processing frameworks provide a FIFO like interface to structure data as a literal stream. This abstraction provides a means to specify data dependencies implicitly while enabling the runtime/hardware to take full advantage of that knowledge for efficient computation. One of the
769:
In this paradigm, the whole dataset is defined, rather than each component block being defined separately. Describing the set of data is assumed to be in the first two rows. After that, the result is inferred from the sources and kernel. For simplicity, there's a 1:1 mapping between input and output
203:
is a specific type of temporal locality common in signal and media processing applications where data is produced once, read once or twice later in the application, and never read again. Intermediate streams passed between kernels as well as intermediate data within kernel functions can capture this
1466:
The stream processor is usually equipped with a fast, efficient, proprietary memory bus (crossbar switches are now common, multi-buses have been employed in the past). The exact amount of memory lanes is dependent on the market range. As this is written, there are still 64-bit wide interconnections
785:
Although there are various degrees of flexibility allowed by the model, stream processors usually impose some limitations on the kernel or stream size. For example, consumer hardware often lacks the ability to perform high-precision math, lacks complex indirection chains or presents lower limits on
372:
Another sample code fragment detects weddings among a flow of external "events" such as church bells ringing, the appearance of a man in a tuxedo or morning suit, a woman in a flowing white gown and rice flying through the air. A "complex" or "composite" event is what one infers from the individual
1066:
Instead of holding the data in the structure, it holds only pointers (memory locations) for the data. Shortcomings are that if an multiple attributes to of an object are to be operated on they might now be distant in memory and so result in a cache miss. The aligning and any needed padding lead to
486:
based, which means they conceptually perform only one operation at a time. As the computing needs of the world evolved, the amount of data to be managed increased very quickly. It was obvious that the sequential programming model could not cope with the increased need for processing power. Various
1590:
improved this with full-duplex communications, getting a GPU (and possibly a generic stream processor) to work will possibly take long amounts of time. This means it's usually counter-productive to use them for small datasets. Because changing the kernel is a rather expensive operation the stream
1455:
Historically, CPUs began implementing various tiers of memory access optimizations because of the ever-increasing performance when compared to relatively slow growing external memory bandwidth. As this gap widened, big amounts of die area were dedicated to hiding memory latencies. Since fetching
1619:
is an early (circa 1985) graphics processor capable of combining three source streams of 16 component bit vectors in 256 ways to produce an output stream consisting of 16 component bit vectors. Total input stream bandwidth is up to 42 million bits per second. Output stream bandwidth is up to 28
238:
query (a query that executes forever processing arriving data based on timestamps and window duration). This code fragment illustrates a JOIN of two data streams, one for stock orders, and one for the resulting stock trades. The query outputs a stream of all Orders matched by a Trade within one
1601:
is a very widespread and heavily used practice on stream processors, with GPUs featuring pipelines exceeding 200 stages. The cost for switching settings is dependent on the setting being modified but it is now considered to always be expensive. To avoid those problems at various levels of the
1070:
For stream processors, the usage of structures is encouraged. From an application point of view, all the attributes can be defined with some flexibility. Taking GPUs as reference, there is a set of attributes (at least 16) available. For each attribute, the application can state the number of
777:
While stream processing is a branch of SIMD/MIMD processing, they must not be confused. Although SIMD implementations can often work in a "streaming" manner, their performance is not comparable: the model envisions a very different usage pattern which allows far greater performance by itself.
858:
The most immediate challenge in the realm of parallel processing does not lie as much in the type of hardware architecture used, but in how easy it will be to program the system in question in a real-world environment with acceptable performance. Machines like
Imagine use a straightforward
781:
It has been noted that when applied on generic processors such as standard CPU, only a 1.5x speedup can be reached. By contrast, ad-hoc stream processors easily reach over 10x performance, mainly attributed to the more efficient memory access and higher levels of parallel processing.
1714:
R2xx/NV2x: kernel stream operations became explicitly under the programmer's control but only for vertex processing (fragments were still using old paradigms). No branching support severely hampered flexibility but some types of algorithms could be run (notably, low-precision fluid
502:
Although those two paradigms were efficient, real-world implementations were plagued with limitations from memory alignment problems to synchronization issues and limited parallelism. Only few SIMD processors survived as stand-alone components; most were embedded in standard CPUs.
1467:
around (entry-level). Most mid-range models use a fast 128-bit crossbar switch matrix (4 or 2 segments), while high-end models deploy huge amounts of memory (actually up to 512 MB) with a slightly slower crossbar that is 256 bits wide. By contrast, standard processors from
1511:
There is proof; there can be a lot of clusters because inter-cluster communication is assumed to be rare. Internally however, each cluster can efficiently exploit a much lower amount of ALUs because intra-cluster communication is common and thus needs to be highly efficient.
1841:
Cal2Many a code generation framework from
Halmstad University, Sweden. It takes CAL code as input and generates different target specific languages including sequential C, Chisel, parallel C targeting Epiphany architecture, ajava & astruct targeting Ambric architecture,
150:(DMA) when dependencies become known. The elimination of manual DMA management reduces software complexity, and an associated elimination for hardware cached I/O, reduces the data area expanse that has to be involved with service by specialized computational units such as
685:
What happened however is that the packed SIMD register holds a certain amount of data so it's not possible to get more parallelism. The speed up is somewhat limited by the assumption we made of performing four parallel operations (please note this is common for both
1650:
project, called
Merrimac, is aimed at developing a stream-based supercomputer. Merrimac intends to use a stream architecture and advanced interconnection networks to provide more performance per unit cost than cluster-based scientific computers built from the same
1462:
Beginning from a whole system point of view, stream processors usually exist in a controlled environment. GPUs do exist on an add-in board (this seems to also apply to
Imagine). CPUs continue do the job of managing system resources, running applications, and such.
142:, where one kernel function is applied to all elements in the stream, is typical. Since the kernel and stream abstractions expose data dependencies, compiler tools can fully automate and optimize on-chip management tasks. Stream processing hardware can use
1810:, FPGA). Applications can be developed in any combination of C, C++, and Java for the CPU. Verilog or VHDL for FPGAs. Cuda is currently used for Nvidia GPGPUs. Auto-Pipe also handles coordination of TCP connections between multiple machines.
1585:
Although an order of magnitude speedup can be reasonably expected (even from mainstream GPUs when computing in a streaming manner), not all applications benefit from this. Communication latencies are actually the biggest problem. Although
773:
An implementation of this paradigm can "unroll" a loop internally. This allows throughput to scale with chip complexity, easily utilizing hundreds of ALUs. The elimination of complex data patterns makes much of this extra power available.
177:) but less so for general purpose processing with more randomized data access (such as databases). By sacrificing some flexibility in the model, the implications allow easier, faster and more efficient execution. Depending on the context,
867:
between programmer, tools and hardware. Programmers beat tools in mapping algorithms to parallel hardware, and tools beat programmers in figuring out smartest memory allocation schemes, etc. Of particular concern are MIMD designs such as
1485:
Because of the SIMD nature of the stream processor's execution units (ALUs clusters), read/write operations are expected to happen in bulk, so memories are optimized for high bandwidth rather than low latency (this is a difference from
1602:
pipeline, many techniques have been deployed such as "ĂĽber shaders" and "texture atlases". Those techniques are game-oriented because of the nature of GPUs, but the concepts are interesting for generic stream processing as well.
225:
For each record we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable.
1781:
Most programming languages for stream processors start with Java, C or C++ and add extensions which provide specific instructions to allow application developers to tag kernels and/or streams. This also applies to most
1631:, is a flexible architecture intended to be both fast and energy efficient. The project, originally conceived in 1996, included architecture, software tools, a VLSI implementation and a development board, was funded by
1771:
or adhere to a stream programming model. With a suitable algorithm the performance of the Cell can rival that of pure stream processors, however this nearly always requires a complete redesign of algorithms and
841:
stream processing projects included the
Stanford Real-Time Programmable Shading Project started in 1999. A prototype called Imagine was developed in 2002. A project called Merrimac ran until about 2004.
487:
efforts have been spent on finding alternative ways to perform massive amounts of computations but the only solution was to exploit some level of parallel execution. The result of those efforts was
1478:
Memory access patterns are much more predictable. While arrays do exist, their dimension is fixed at kernel invocation. The thing which most closely matches a multiple pointer indirection is an
197:
exists in a kernel if the same function is applied to all records of an input stream and a number of records can be processed simultaneously without waiting for results from previous records.
1758:, is a hardware architecture that can function like a stream processor with appropriate software support. It consists of a controlling processor, the PPE (Power Processing Element, an IBM
1497:
Most (90%) of a stream processor's work is done on-chip, requiring only 1% of the global data to be stored to memory. This is where knowing the kernel temporaries and dependencies pays.
876:
191:, the number of arithmetic operations per I/O or global memory reference. In many signal processing applications today it is well over 50:1 and increasing with algorithmic complexity.
2572:
1504:(SRF). This is conceptually a large cache in which stream data is stored to be transferred to external memory in bulks. As a cache-like software-controlled structure to the various
1456:
information and opcodes to those few ALUs is expensive, very little die area is dedicated to actual mathematical machinery (as a rough estimation, consider it to be less than 10%).
234:
By way of illustration, the following code fragments demonstrate detection of patterns within event streams. The first is an example of processing a data stream using a continuous
1071:
components and the format of the components (but only primitive data types are supported for now). The various attributes are then attached to a memory block, possibly defining a
1718:
R3xx/NV4x: flexible branching support although some limitations still exist on the number of operations to be executed and strict recursion depth, as well as array manipulation.
1518:
This three-tiered data access pattern, makes it easy to keep temporary data away from slow memories, thus making the silicon implementation highly efficient and power-saving.
1865:
966:. By moving the memory location of the data out of the structure data can be better organised for efficient access in a stream and for SIMD instructions to operate one. A
1838:: a high-level programming language for writing (dataflow) actors, which are stateful operators that transform input streams of data objects (tokens) into output streams.
173:
Stream processing is essentially a compromise, driven by a data-centric model that works very well for traditional DSP or GPU-type applications (such as image, video and
682:. The number of jump instructions is also decreased, as the loop is run fewer times. These gains result from the parallel execution of the four mathematical operations.
582:
This is the sequential paradigm that is most familiar. Variations do exist (such as inner loops, structures and such), but they ultimately boil down to that construct.
1459:
A similar architecture exists on stream processors but thanks to the new programming model, the amount of transistors dedicated to management is actually very little.
879:. Programmers often create representations of enitities in memory, for example, the location of an particle in 3D space, the colour of the ball and its size as below:
850:
rapidly evolved in both speed and functionality. Since these early days, dozens of stream processing languages have been developed, as well as specialized hardware.
124:
The stream processing paradigm simplifies parallel software and hardware by restricting the parallel computation that can be performed. Given a sequence of data (a
2662:
2514:
1762:) and a set of SIMD coprocessors, called SPEs (Synergistic Processing Elements), each with independent program counters and instruction memory, in effect a
1954:
IBM Spade - Stream
Processing Application Declarative Engine (B. Gedik, et al. SPADE: the system S declarative stream processing engine. ACM SIGMOD 2008.)
1079:
all the various attributes in a single set of parameters (usually this looks like a structure or a "magic global variable"), performs the operations and
872:, for which the programmer needs to deal with application partitioning across multiple cores and deal with process synchronization and load balancing.
491:, a programming paradigm which allowed applying one instruction to multiple instances of (different) data. Most of the time, SIMD was being used in a
671:, much information is actually not taken into account here such as the number of vector components and their data format. This is done for clarity.
1075:
between 'consecutive' elements of the same attributes, effectively allowing interleaved data. When the GPU begins the stream processing, it will
138:, and optimal local on-chip memory reuse is attempted, in order to minimize the loss in bandwidth, associated with external memory interaction.
2643:
1911:
Commercial implementations are either general purpose or tied to specific hardware by a vendor. Examples of general purpose languages include:
2415:
2233:
2910:
2181:
1806:, an application development environment for streaming applications that allows authoring of applications for heterogeneous systems (CPU,
818:
2933:
1814:
1763:
496:
1515:
To keep those ALUs fetched with data, each ALU is equipped with local register files (LRFs), which are basically its usable registers.
2822:
1859:
1803:
1670:
2007. The family contains four members ranging from 30 GOPS to 220 16-bit GOPS (billions of operations per second), all fabricated at
2928:
2905:
1572:
488:
2507:
2388:
Memeti, Suejb; Pllana, Sabri (October 2018). "HSTREAM: A Directive-Based
Language Extension for Heterogeneous Stream Computing".
2166:
1443:
Apart from specifying streaming applications in high-level languages, models of computation (MoCs) also have been widely used as
239:
second of the Order being placed. The output stream is sorted by timestamp, in this case, the timestamp from the Orders stream.
2900:
2715:
1904:
483:
3281:
3276:
3271:
3007:
2870:
1846:
1067:
increased memory usage. Overall, memory management may be more complicated if structures are added and removed for example.
3231:
3065:
2683:
2603:
2136:
118:
98:
2176:
1500:
Internally, a stream processor features some clever communication and management circuits but what's interesting is the
3250:
3196:
2656:
2500:
1948:
1883:
1823:
BeepBeep, a simple and lightweight Java-based event stream processing library from the Formal
Computer Science Lab at
1747:
1864:
SPar - C++ domain-specific language for expressing stream parallelism from the
Application Modelling Group (GMAP) at
1550:
1858:
RaftLib - open source C++ stream processing template library originally from the Stream Based
Supercomputing Lab at
3175:
2970:
2855:
2817:
2667:
2557:
1924:
Java extension that enables a simple expression of stream programming, the Actor model, and the MapReduce algorithm
3191:
3170:
3115:
3002:
2992:
2965:
2827:
2156:
2006:
1796:
Free Edition, enables a simple expression of stream programming, the actor model, and the MapReduce algorithm on
691:
174:
3145:
2771:
2710:
2623:
2055:
Batch file-based processing (emulates some of actual stream processing, but much lower performance in general)
2027:
1675:
963:
847:
114:
3060:
3206:
3201:
2651:
479:
178:
2336:
2210:
2945:
2877:
2781:
2673:
2628:
2146:
2012:
1874:
1659:
1554:
184:
Stream processing is especially suitable for applications that exhibit three application characteristics:
102:
2735:
1542:
3037:
2997:
2950:
2940:
2678:
2598:
2537:
2246:
2151:
1683:
1505:
869:
151:
106:
71:
1482:, which is however guaranteed to finally read or write from a specific memory area (inside a stream).
2977:
2865:
2860:
2850:
2837:
2633:
1850:
1721:
R8xx: Supports append/consume buffers and atomic operations. This generation is the state of the art.
1598:
953:
860:
158:
147:
135:
67:
63:
51:
2272:
3140:
3095:
2921:
2916:
2895:
2761:
2186:
2171:
2141:
1751:
1628:
968:
958:
838:
221:
In wireless signal processing, each record could be a sequence of samples received from an antenna.
110:
82:
3165:
3014:
2987:
2812:
2776:
2766:
2567:
2547:
2542:
2523:
2393:
2161:
2131:
1835:
1687:
1679:
863:
scheduling. This in itself is a result of the research at MIT and Stanford in finding an optimal
668:
507:
78:
55:
2725:
1707:
Pre-R2xx/NV2x: no explicit support for stream processing. Kernel operations were hidden in the
3211:
2887:
2740:
2411:
2221:
2031:
1640:
90:
2259:
3221:
3020:
2955:
2802:
2618:
2613:
2608:
2577:
2442:
2403:
2191:
2023:
1988:
1783:
1768:
31:
2458:
2430:
212:
In graphics, each record might be the vertex, normal, and color information for a triangle;
3085:
3025:
2960:
2807:
2797:
2730:
2562:
2552:
74:
2720:
952:
When multiple of these structures exist in memory they are placed end to end creating an
3286:
3216:
3032:
2689:
2582:
1739:
1726:
1092:
130:
94:
86:
3265:
3105:
2982:
1591:
architecture also incurs penalties for small streams, a behaviour referred to as the
1468:
143:
2705:
2084:
2074:
2064:
2059:
1732:
2234:"A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing"
770:
data but this does not need to be. Applied kernels can also be much more complex.
218:
In a video encoder, each record may be 256 pixels forming a macroblock of data; or
17:
2390:
2018 IEEE International Conference on Computational Science and Engineering (CSE)
2286:
674:
You can see however, this method reduces the number of decoded instructions from
3226:
2069:
2049:
2035:
1915:
1855:
HSTREAM: a directive-based language extension for heterogeneous stream computing
1587:
59:
1624:
1087:
simplest and most efficient stream processing modalities to date for C++, is
58:, or sequences of events in time, as the central input and output objects of
3100:
3075:
2365:
2311:
1957:
1786:, which can be considered stream programming languages to a certain degree.
1491:
1472:
2407:
181:
design may be tuned for maximum efficiency or a trade-off for flexibility.
1704:. Various generations to be noted from a stream processing point of view:
506:
Consider a simple program adding up two arrays containing 100 4-component
478:
Basic computers started from a sequential execution paradigm. Traditional
3150:
3130:
3055:
2002:
1980:
1944:
1937:
1931:
1921:
1830:
1793:
1647:
1444:
1095:
together as a data flow graph using C++ stream operators. As an example:
859:
single-threaded model with automated dependencies, memory allocation and
134:) is applied to each element in the stream. Kernel functions are usually
2275:, Universities of Stanford, Rice, California (Davis) and Reservoir Labs.
1494:, for example). This also allows for efficient memory bus negotiations.
3155:
3135:
3110:
2745:
2474:
1886:, which provides separation of coordination and algorithmic programming
1759:
1612:
1088:
843:
687:
495:
environment. By using more complicated structures, one could also have
215:
In image processing, each record might be a single pixel from an image;
1927:
Embiot, a lightweight embedded streaming analytics agent from Telchemy
1388:/** add kernels to map, both hello and p are executed concurrently **/
1083:
the results to some memory area for later processing (or retrieving).
3125:
3120:
2492:
2479:
1997:
1961:
1870:
1818:
1701:
1487:
1696:
are widespread, consumer-grade stream processors designed mainly by
2398:
1674:
in a 130 nanometer process. The devices target the high end of the
1965:
1807:
1667:
1636:
1632:
1616:
162:
1789:
Non-commercial examples of stream programming languages include:
3160:
3090:
3080:
2376:
2344:
2222:
FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs
2044:
1993:
1896:
1824:
1671:
492:
204:
locality directly using the stream processing programming model.
2496:
3070:
3047:
1984:
1890:
1797:
1755:
1708:
1697:
1693:
1525:
793:
235:
2431:
PeakStream unveils multicore and CPU/GPU programming solution
663:
This is actually oversimplified. It assumes the instruction
27:
Programming paradigm for parallel processing of data streams
704:// This is a fictional language for demonstration purposes.
2271:
Kapasi, Dally, Rixner, Khailany, Owens, Ahn and Mattson,
165:(Streams and Iteration in a Single Assignment Language).
2092:
Eclipse Streamsheets - spreadsheet for stream processing
1918:' Jacket, a commercialization of a GPU engine for MATLAB
1666:
project, was announced during a feature presentation at
1907:
could be considered stream processing in a broad sense.
1901:
WaveScript functional stream processing, also from MIT.
1802:
Auto-Pipe, from the Stream Based Supercomputing Lab at
877:
array-of-structures (AoS) and structure-of-arrays (SoA)
814:
157:
During the 1980s stream processing was explored within
2337:"Merrimac - Stanford Streaming Supercomputer Project"
2015:, a commercialization of the Imagine work at Stanford
1711:
and provided too little flexibility for general use.
3184:
3046:
2886:
2836:
2790:
2754:
2698:
2642:
2591:
2530:
2245:Khailany, Dally, Rixner, Kapasi, Owens and Towles:
1934:
game engine for PlayStation 3, Xbox360, Wii, and PC
1866:
Pontifical Catholic University of Rio Grande do Sul
786:the number of instructions which can be executed.
2287:"Stanford Real-Time Programmable Shading Project"
2260:"Stream processing in General-Purpose Processors"
2247:"Exploring VLSI Scalability of Stream Processors"
2236:, Stanford University and Stream Processors, Inc.
1979:Brook+ (AMD hardware optimized implementation of
1971:TStreams, Hewlett-Packard Cambridge Research Lab
1930:Floodgate, a stream processor provided with the
932:// 8 bit per channel, say we care about RGB only
875:A drawback of SIMD programming was the issue of
2109:Datastreams - Data streaming analytics platform
1940:, a "directive" vision of Many-Core programming
586:Parallel SIMD paradigm, packed registers (SWAR)
944:// ... and many other attributes may follow...
846:also researched stream-enhanced processors as
819:sources that evaluate within a broader context
89:for these systems includes components such as
2508:
8:
1553:. There might be a discussion about this on
208:Examples of records within streams include:
1996:(Compute Unified Device Architecture) from
1439:Models of computation for stream processing
884:// A particle in a three-dimensional space.
2515:
2501:
2493:
2312:"The Imagine - Image and Signal Processor"
1777:Stream programming libraries and languages
667:works. Although this is what happens with
77:. Stream processing systems aim to expose
2460:TStreams: How to Write a Parallel Program
2444:TStreams: A Model of Parallel Computation
2397:
1735:brand name for product line targeting HPC
1729:brand name for product line targeting HPC
1573:Learn how and when to remove this message
1475:have only a single 64-bit wide data bus.
1813:ACOTES programming model: language from
2203:
972:(SoA), as shown below, can allow this.
373:simple events: a wedding is happening.
2080:Continuous operator stream processing
1662:, a commercial spin-off of Stanford's
1358:/** instantiate hello world kernel **/
474:Comparison to prior parallel paradigms
2232:IEEE Journal of Solid-State Circuits:
1882:S-Net coordination language from the
809:focuses too much on specific examples
7:
2475:"GitHub - walmartlabs/Mupd8: Muppet"
1091:, which enables linking independent
698:Parallel stream paradigm (SIMD/MIMD)
1975:Vendor-specific languages include:
1815:Polytechnic University of Catalonia
2211:A SHORT INTRO TO STREAM PROCESSING
2106:Microsoft Azure - Stream analytics
1860:Washington University in St. Louis
1804:Washington University in St. Louis
1767:effectively either require a tiny
676:numElements * componentsPerElement
85:for efficient implementation. The
25:
1825:Université du Québec à Chicoutimi
1447:models and process-based models.
514:Conventional, sequential paradigm
3245:
3244:
2273:"Programmable Stream Processors"
2167:Partitioned global address space
1879:Shallows, an open source project
1530:
1325:/** instantiate print kernel **/
798:
62:. Stream processing encompasses
2716:Analysis of parallel algorithms
2249:, Stanford and Rice University.
1905:Functional reactive programming
1847:Technical University of Munich
1451:Generic processor architecture
119:field-programmable gate arrays
105:; and hardware components for
97:, for expressing computation;
1:
2663:Simultaneous and heterogenous
2137:Data Stream Management System
2100:Amazon Web Services - Kinesis
1943:PeakStream, a spinout of the
1623:Imagine, headed by Professor
510:(i.e. 400 numbers in total).
161:. An example is the language
146:, for example, to initiate a
81:for data streams and rely on
48:distributed stream processing
3251:Category: Parallel computing
2177:Real Time Streaming Protocol
2096:Stream processing services:
1884:University of Hertfordshire
1748:Sony Computer Entertainment
1522:Hardware-in-the-loop issues
128:), a series of operations (
3303:
2558:High-performance computing
2392:. IEEE. pp. 138–145.
2120:Eventador SQLStreamBuilder
3240:
3192:Automatic parallelization
2828:Application checkpointing
2258:Gummaraju and Rosenblum,
2157:Molecular modeling on GPU
2043:WSO2 stream processor by
1960:, a commercialization of
1370:/** make a map object **/
848:graphics processing units
175:digital signal processing
115:graphics processing units
99:stream management systems
1620:million bits per second.
1097:
974:
881:
701:
589:
517:
375:
241:
3207:Embarrassingly parallel
3202:Deterministic algorithm
2291:Research group web site
2115:IBM streaming analytics
2103:Google Cloud - Dataflow
2019:Event-Based Processing
1409:/** execute the map **/
854:Programming model notes
101:, for distribution and
40:event stream processing
2922:Associative processing
2878:Non-blocking algorithm
2684:Clustered multi-thread
2408:10.1109/CSE.2018.00026
2262:, Stanford University.
2147:Flow-based programming
2013:Stream Processors, Inc
1875:University of Waterloo
1684:multifunction printers
1660:Stream Processors, Inc
669:instruction intrinsics
152:arithmetic logic units
44:data stream processing
3282:Models of computation
3277:Programming paradigms
3272:Computer architecture
3038:Hardware acceleration
2951:Superscalar processor
2941:Dataflow architecture
2538:Distributed computing
2152:Hardware acceleration
1947:project (acquired by
917:// not even an array!
2917:Pipelined processing
2866:Explicit parallelism
2861:Implicit parallelism
2851:Dataflow programming
2347:on December 18, 2013
2007:Throughput Computing
1851:University of Denver
1829:Brook language from
1543:confusing or unclear
1502:Stream Register File
815:improve this section
159:dataflow programming
148:direct memory access
111:floating-point units
83:streaming algorithms
68:reactive programming
64:dataflow programming
52:programming paradigm
3141:Parallel Extensions
2946:Pipelined processor
2463:(Technical report).
2447:(Technical report).
2187:Streaming algorithm
2172:Real-time computing
2142:Dimension reduction
1752:Toshiba Corporation
1629:Stanford University
1593:short stream effect
1551:clarify the section
969:structure of arrays
959:array of structures
839:Stanford University
79:parallel processing
3015:Massively parallel
2993:distributed shared
2813:Cache invalidation
2777:Instruction window
2568:Manycore processor
2548:Massively parallel
2543:Parallel computing
2524:Parallel computing
2162:Parallel computing
2132:Data stream mining
1845:DUP language from
1836:CAL Actor Language
1688:video surveillance
1680:video conferencing
634:// for each vector
411:"tuxedo"
91:programming models
18:Stream programming
3259:
3258:
3212:Parallel slowdown
2846:Stream processing
2736:Karp–Flatt metric
2417:978-1-5386-7649-3
2089:Walmartlabs Mupd8
2032:stream processing
1873:library from the
1784:shading languages
1746:, an alliance of
1678:market including
1641:Texas Instruments
1615:in the Commodore
1583:
1582:
1575:
1480:indirection chain
1259:"Hello World
865:layering of tasks
836:
835:
737:"@arg0"
189:Compute intensity
140:Uniform streaming
36:stream processing
16:(Redirected from
3294:
3248:
3247:
3222:Software lockout
3021:Computer cluster
2956:Vector processor
2911:Array processing
2896:Flynn's taxonomy
2803:Memory coherence
2578:Computer network
2517:
2510:
2503:
2494:
2485:
2484:
2471:
2465:
2464:
2455:
2449:
2448:
2439:
2433:
2428:
2422:
2421:
2401:
2385:
2379:
2374:
2368:
2363:
2357:
2356:
2354:
2352:
2343:. Archived from
2333:
2327:
2326:
2324:
2322:
2308:
2302:
2301:
2299:
2297:
2282:
2276:
2269:
2263:
2256:
2250:
2243:
2237:
2230:
2224:
2219:
2213:
2208:
2192:Vector processor
1769:memory footprint
1578:
1571:
1567:
1564:
1558:
1534:
1533:
1526:
1434:
1431:
1428:
1425:
1422:
1419:
1416:
1413:
1410:
1407:
1404:
1401:
1398:
1395:
1392:
1389:
1386:
1383:
1380:
1377:
1374:
1371:
1368:
1365:
1362:
1359:
1356:
1353:
1350:
1347:
1344:
1341:
1338:
1335:
1332:
1329:
1326:
1323:
1320:
1317:
1314:
1311:
1308:
1305:
1302:
1299:
1296:
1293:
1290:
1287:
1284:
1281:
1278:
1275:
1272:
1269:
1266:
1263:
1260:
1257:
1254:
1251:
1248:
1245:
1242:
1239:
1236:
1233:
1230:
1227:
1224:
1221:
1218:
1215:
1212:
1209:
1206:
1203:
1200:
1197:
1194:
1191:
1188:
1185:
1182:
1179:
1176:
1173:
1170:
1167:
1164:
1161:
1158:
1155:
1152:
1149:
1146:
1143:
1140:
1137:
1134:
1131:
1128:
1125:
1122:
1119:
1116:
1113:
1110:
1107:
1104:
1101:
1062:
1059:
1056:
1053:
1050:
1047:
1044:
1041:
1038:
1035:
1032:
1029:
1026:
1023:
1020:
1017:
1014:
1011:
1008:
1005:
1002:
999:
996:
993:
990:
987:
984:
981:
978:
948:
945:
942:
939:
936:
933:
930:
927:
924:
921:
918:
915:
912:
909:
906:
903:
900:
897:
894:
891:
888:
885:
831:
828:
822:
802:
801:
794:
765:
762:
759:
756:
753:
750:
747:
744:
741:
738:
735:
732:
729:
726:
723:
720:
717:
714:
711:
708:
705:
666:
659:
656:
653:
650:
647:
644:
641:
638:
635:
632:
629:
626:
623:
620:
617:
614:
611:
608:
605:
602:
599:
596:
593:
578:
575:
572:
569:
566:
563:
560:
557:
554:
551:
548:
545:
542:
539:
536:
533:
530:
527:
524:
521:
469:
466:
463:
460:
457:
454:
451:
448:
445:
442:
439:
436:
435:"gown"
433:
430:
427:
424:
421:
418:
415:
412:
409:
406:
403:
400:
397:
394:
391:
388:
385:
382:
379:
368:
365:
362:
359:
356:
353:
350:
347:
344:
341:
338:
335:
332:
329:
326:
323:
320:
317:
314:
311:
308:
305:
302:
299:
296:
293:
290:
287:
284:
281:
278:
275:
272:
269:
266:
263:
260:
257:
254:
251:
248:
245:
195:Data parallelism
131:kernel functions
32:computer science
21:
3302:
3301:
3297:
3296:
3295:
3293:
3292:
3291:
3262:
3261:
3260:
3255:
3236:
3180:
3086:Coarray Fortran
3042:
3026:Beowulf cluster
2882:
2832:
2823:Synchronization
2808:Cache coherence
2798:Multiprocessing
2786:
2750:
2731:Cost efficiency
2726:Gustafson's law
2694:
2638:
2587:
2563:Multiprocessing
2553:Cloud computing
2526:
2521:
2490:
2488:
2473:
2472:
2468:
2457:
2456:
2452:
2441:
2440:
2436:
2429:
2425:
2418:
2387:
2386:
2382:
2375:
2371:
2364:
2360:
2350:
2348:
2335:
2334:
2330:
2320:
2318:
2310:
2309:
2305:
2295:
2293:
2284:
2283:
2279:
2270:
2266:
2257:
2253:
2244:
2240:
2231:
2227:
2220:
2216:
2209:
2205:
2201:
2196:
2127:
1968:in August 2009)
1779:
1608:
1579:
1568:
1562:
1559:
1548:
1535:
1531:
1524:
1453:
1441:
1436:
1435:
1432:
1429:
1426:
1423:
1420:
1417:
1414:
1411:
1408:
1405:
1402:
1399:
1396:
1393:
1390:
1387:
1384:
1381:
1378:
1375:
1372:
1369:
1366:
1363:
1360:
1357:
1354:
1351:
1348:
1345:
1342:
1339:
1336:
1333:
1330:
1327:
1324:
1321:
1318:
1315:
1312:
1309:
1306:
1303:
1300:
1297:
1294:
1291:
1288:
1285:
1282:
1279:
1276:
1273:
1270:
1267:
1264:
1261:
1258:
1255:
1252:
1249:
1246:
1243:
1240:
1237:
1234:
1231:
1228:
1225:
1222:
1219:
1216:
1213:
1210:
1207:
1204:
1201:
1198:
1195:
1192:
1189:
1186:
1183:
1180:
1177:
1174:
1171:
1168:
1165:
1162:
1159:
1156:
1153:
1150:
1147:
1144:
1141:
1138:
1135:
1132:
1129:
1126:
1123:
1120:
1117:
1115:<cstdlib>
1114:
1111:
1108:
1105:
1102:
1099:
1093:compute kernels
1064:
1063:
1060:
1057:
1054:
1051:
1048:
1045:
1042:
1039:
1036:
1033:
1030:
1027:
1024:
1021:
1018:
1015:
1012:
1009:
1006:
1003:
1000:
997:
994:
991:
988:
985:
982:
979:
976:
950:
949:
946:
943:
940:
937:
934:
931:
928:
925:
922:
919:
916:
913:
910:
907:
904:
901:
898:
895:
892:
889:
886:
883:
856:
832:
826:
823:
812:
803:
799:
792:
767:
766:
763:
760:
757:
754:
751:
748:
745:
742:
739:
736:
733:
730:
727:
724:
721:
718:
715:
712:
709:
706:
703:
700:
664:
661:
660:
657:
654:
651:
648:
645:
642:
639:
636:
633:
630:
627:
624:
621:
618:
615:
612:
609:
606:
603:
600:
597:
594:
591:
588:
580:
579:
576:
573:
570:
567:
564:
561:
558:
555:
552:
549:
546:
543:
540:
537:
534:
531:
528:
525:
522:
519:
516:
476:
471:
470:
467:
464:
461:
458:
455:
452:
449:
446:
443:
440:
437:
434:
431:
428:
425:
422:
419:
416:
413:
410:
407:
404:
401:
398:
395:
393:"man"
392:
389:
386:
383:
380:
377:
370:
369:
366:
363:
360:
357:
354:
351:
348:
345:
342:
339:
336:
333:
330:
327:
324:
321:
318:
315:
312:
309:
306:
303:
300:
297:
294:
291:
288:
285:
282:
279:
276:
273:
270:
267:
264:
261:
258:
255:
252:
249:
246:
243:
232:
171:
95:query languages
75:data processing
38:(also known as
28:
23:
22:
15:
12:
11:
5:
3300:
3298:
3290:
3289:
3284:
3279:
3274:
3264:
3263:
3257:
3256:
3254:
3253:
3241:
3238:
3237:
3235:
3234:
3229:
3224:
3219:
3217:Race condition
3214:
3209:
3204:
3199:
3194:
3188:
3186:
3182:
3181:
3179:
3178:
3173:
3168:
3163:
3158:
3153:
3148:
3143:
3138:
3133:
3128:
3123:
3118:
3113:
3108:
3103:
3098:
3093:
3088:
3083:
3078:
3073:
3068:
3063:
3058:
3052:
3050:
3044:
3043:
3041:
3040:
3035:
3030:
3029:
3028:
3018:
3012:
3011:
3010:
3005:
3000:
2995:
2990:
2985:
2975:
2974:
2973:
2968:
2961:Multiprocessor
2958:
2953:
2948:
2943:
2938:
2937:
2936:
2931:
2926:
2925:
2924:
2919:
2914:
2903:
2892:
2890:
2884:
2883:
2881:
2880:
2875:
2874:
2873:
2868:
2863:
2853:
2848:
2842:
2840:
2834:
2833:
2831:
2830:
2825:
2820:
2815:
2810:
2805:
2800:
2794:
2792:
2788:
2787:
2785:
2784:
2779:
2774:
2769:
2764:
2758:
2756:
2752:
2751:
2749:
2748:
2743:
2738:
2733:
2728:
2723:
2718:
2713:
2708:
2702:
2700:
2696:
2695:
2693:
2692:
2690:Hardware scout
2687:
2681:
2676:
2671:
2665:
2660:
2654:
2648:
2646:
2644:Multithreading
2640:
2639:
2637:
2636:
2631:
2626:
2621:
2616:
2611:
2606:
2601:
2595:
2593:
2589:
2588:
2586:
2585:
2583:Systolic array
2580:
2575:
2570:
2565:
2560:
2555:
2550:
2545:
2540:
2534:
2532:
2528:
2527:
2522:
2520:
2519:
2512:
2505:
2497:
2487:
2486:
2466:
2450:
2434:
2423:
2416:
2380:
2369:
2358:
2341:Group web site
2328:
2316:Group web site
2303:
2277:
2264:
2251:
2238:
2225:
2214:
2202:
2200:
2197:
2195:
2194:
2189:
2184:
2179:
2174:
2169:
2164:
2159:
2154:
2149:
2144:
2139:
2134:
2128:
2126:
2123:
2122:
2121:
2118:
2117:
2116:
2110:
2107:
2104:
2101:
2094:
2093:
2090:
2087:
2078:
2077:
2072:
2067:
2062:
2053:
2052:
2047:
2041:
2038:
2017:
2016:
2009:
2000:
1991:
1973:
1972:
1969:
1955:
1952:
1941:
1935:
1928:
1925:
1919:
1909:
1908:
1902:
1899:
1893:
1889:StreamIt from
1887:
1880:
1877:
1868:
1862:
1856:
1853:
1843:
1839:
1833:
1827:
1821:
1811:
1800:
1778:
1775:
1774:
1773:
1740:Cell processor
1736:
1730:
1727:AMD FireStream
1724:
1723:
1722:
1719:
1716:
1712:
1691:
1652:
1644:
1621:
1607:
1604:
1581:
1580:
1538:
1536:
1529:
1523:
1520:
1452:
1449:
1440:
1437:
1121:<string>
1109:<raftio>
1098:
975:
882:
855:
852:
834:
833:
806:
804:
797:
791:
788:
702:
699:
696:
590:
587:
584:
518:
515:
512:
475:
472:
376:
242:
231:
228:
223:
222:
219:
216:
213:
206:
205:
198:
192:
170:
167:
87:software stack
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
3299:
3288:
3285:
3283:
3280:
3278:
3275:
3273:
3270:
3269:
3267:
3252:
3243:
3242:
3239:
3233:
3230:
3228:
3225:
3223:
3220:
3218:
3215:
3213:
3210:
3208:
3205:
3203:
3200:
3198:
3195:
3193:
3190:
3189:
3187:
3183:
3177:
3174:
3172:
3169:
3167:
3164:
3162:
3159:
3157:
3154:
3152:
3149:
3147:
3144:
3142:
3139:
3137:
3134:
3132:
3129:
3127:
3124:
3122:
3119:
3117:
3114:
3112:
3109:
3107:
3106:Global Arrays
3104:
3102:
3099:
3097:
3094:
3092:
3089:
3087:
3084:
3082:
3079:
3077:
3074:
3072:
3069:
3067:
3064:
3062:
3059:
3057:
3054:
3053:
3051:
3049:
3045:
3039:
3036:
3034:
3033:Grid computer
3031:
3027:
3024:
3023:
3022:
3019:
3016:
3013:
3009:
3006:
3004:
3001:
2999:
2996:
2994:
2991:
2989:
2986:
2984:
2981:
2980:
2979:
2976:
2972:
2969:
2967:
2964:
2963:
2962:
2959:
2957:
2954:
2952:
2949:
2947:
2944:
2942:
2939:
2935:
2932:
2930:
2927:
2923:
2920:
2918:
2915:
2912:
2909:
2908:
2907:
2904:
2902:
2899:
2898:
2897:
2894:
2893:
2891:
2889:
2885:
2879:
2876:
2872:
2869:
2867:
2864:
2862:
2859:
2858:
2857:
2854:
2852:
2849:
2847:
2844:
2843:
2841:
2839:
2835:
2829:
2826:
2824:
2821:
2819:
2816:
2814:
2811:
2809:
2806:
2804:
2801:
2799:
2796:
2795:
2793:
2789:
2783:
2780:
2778:
2775:
2773:
2770:
2768:
2765:
2763:
2760:
2759:
2757:
2753:
2747:
2744:
2742:
2739:
2737:
2734:
2732:
2729:
2727:
2724:
2722:
2719:
2717:
2714:
2712:
2709:
2707:
2704:
2703:
2701:
2697:
2691:
2688:
2685:
2682:
2680:
2677:
2675:
2672:
2669:
2666:
2664:
2661:
2658:
2655:
2653:
2650:
2649:
2647:
2645:
2641:
2635:
2632:
2630:
2627:
2625:
2622:
2620:
2617:
2615:
2612:
2610:
2607:
2605:
2602:
2600:
2597:
2596:
2594:
2590:
2584:
2581:
2579:
2576:
2574:
2571:
2569:
2566:
2564:
2561:
2559:
2556:
2554:
2551:
2549:
2546:
2544:
2541:
2539:
2536:
2535:
2533:
2529:
2525:
2518:
2513:
2511:
2506:
2504:
2499:
2498:
2495:
2491:
2482:
2481:
2476:
2470:
2467:
2462:
2461:
2454:
2451:
2446:
2445:
2438:
2435:
2432:
2427:
2424:
2419:
2413:
2409:
2405:
2400:
2395:
2391:
2384:
2381:
2378:
2373:
2370:
2367:
2362:
2359:
2346:
2342:
2338:
2332:
2329:
2317:
2313:
2307:
2304:
2292:
2288:
2281:
2278:
2274:
2268:
2265:
2261:
2255:
2252:
2248:
2242:
2239:
2235:
2229:
2226:
2223:
2218:
2215:
2212:
2207:
2204:
2198:
2193:
2190:
2188:
2185:
2183:
2180:
2178:
2175:
2173:
2170:
2168:
2165:
2163:
2160:
2158:
2155:
2153:
2150:
2148:
2145:
2143:
2140:
2138:
2135:
2133:
2130:
2129:
2124:
2119:
2114:
2113:
2111:
2108:
2105:
2102:
2099:
2098:
2097:
2091:
2088:
2086:
2083:
2082:
2081:
2076:
2073:
2071:
2068:
2066:
2063:
2061:
2058:
2057:
2056:
2051:
2048:
2046:
2042:
2039:
2037:
2033:
2029:
2028:complex event
2026:- a combined
2025:
2022:
2021:
2020:
2014:
2011:StreamC from
2010:
2008:
2004:
2001:
1999:
1995:
1992:
1990:
1986:
1982:
1978:
1977:
1976:
1970:
1967:
1964:(acquired by
1963:
1959:
1956:
1953:
1951:in June 2007)
1950:
1946:
1942:
1939:
1936:
1933:
1929:
1926:
1923:
1920:
1917:
1914:
1913:
1912:
1906:
1903:
1900:
1898:
1894:
1892:
1888:
1885:
1881:
1878:
1876:
1872:
1869:
1867:
1863:
1861:
1857:
1854:
1852:
1848:
1844:
1840:
1837:
1834:
1832:
1828:
1826:
1822:
1820:
1816:
1812:
1809:
1805:
1801:
1799:
1795:
1792:
1791:
1790:
1787:
1785:
1776:
1770:
1765:
1761:
1757:
1753:
1749:
1745:
1741:
1737:
1734:
1731:
1728:
1725:
1720:
1717:
1713:
1710:
1706:
1705:
1703:
1699:
1695:
1692:
1689:
1685:
1681:
1677:
1673:
1669:
1665:
1661:
1657:
1653:
1649:
1645:
1642:
1638:
1634:
1630:
1626:
1625:William Dally
1622:
1618:
1614:
1610:
1609:
1605:
1603:
1600:
1596:
1594:
1589:
1577:
1574:
1566:
1556:
1555:the talk page
1552:
1546:
1544:
1539:This section
1537:
1528:
1527:
1521:
1519:
1516:
1513:
1509:
1507:
1503:
1498:
1495:
1493:
1489:
1483:
1481:
1476:
1474:
1470:
1469:Intel Pentium
1464:
1460:
1457:
1450:
1448:
1446:
1438:
1205:"0"
1096:
1094:
1090:
1084:
1082:
1078:
1074:
1068:
973:
971:
970:
965:
961:
960:
955:
880:
878:
873:
871:
866:
862:
853:
851:
849:
845:
840:
830:
827:February 2023
820:
816:
810:
807:This section
805:
796:
795:
789:
787:
783:
779:
775:
771:
716:streamElement
697:
695:
693:
689:
683:
681:
677:
672:
670:
585:
583:
513:
511:
509:
504:
500:
499:parallelism.
498:
494:
490:
485:
481:
473:
374:
240:
237:
230:Code examples
229:
227:
220:
217:
214:
211:
210:
209:
202:
201:Data locality
199:
196:
193:
190:
187:
186:
185:
182:
180:
176:
168:
166:
164:
160:
155:
153:
149:
145:
144:scoreboarding
141:
137:
133:
132:
127:
122:
120:
116:
112:
108:
104:
100:
96:
92:
88:
84:
80:
76:
73:
69:
65:
61:
57:
53:
49:
45:
41:
37:
33:
19:
2845:
2791:Coordination
2721:Amdahl's law
2657:Simultaneous
2489:
2478:
2469:
2459:
2453:
2443:
2437:
2426:
2389:
2383:
2372:
2361:
2349:. Retrieved
2345:the original
2340:
2331:
2319:. Retrieved
2315:
2306:
2294:. Retrieved
2290:
2280:
2267:
2254:
2241:
2228:
2217:
2206:
2112:IBM streams
2095:
2085:Apache Flink
2079:
2075:Apache Spark
2065:Apache Storm
2060:Apache Kafka
2054:
2018:
1974:
1910:
1895:Siddhi from
1788:
1780:
1743:
1733:Nvidia Tesla
1715:simulation).
1686:and digital
1663:
1658:family from
1655:
1597:
1592:
1584:
1569:
1563:January 2008
1560:
1549:Please help
1540:
1517:
1514:
1510:
1501:
1499:
1496:
1484:
1479:
1477:
1465:
1461:
1458:
1454:
1442:
1427:EXIT_SUCCESS
1103:<raft>
1085:
1080:
1076:
1072:
1069:
1065:
967:
957:
951:
874:
864:
857:
837:
824:
813:Please help
808:
784:
780:
776:
772:
768:
731:streamKernel
684:
679:
675:
673:
662:
581:
505:
501:
477:
371:
233:
224:
207:
200:
194:
188:
183:
172:
169:Applications
156:
139:
129:
125:
123:
107:acceleration
54:which views
47:
43:
39:
35:
29:
3227:Scalability
2988:distributed
2871:Concurrency
2838:Programming
2679:Cooperative
2668:Speculative
2604:Instruction
2285:Eric Chan.
2070:Apache Apex
2050:Apache NiFi
2036:Software AG
1916:AccelerEyes
1651:technology.
1588:PCI Express
680:numElements
450:Rice_Flying
444:Church_Bell
331:'1'
72:distributed
60:computation
3266:Categories
3232:Starvation
2971:asymmetric
2706:PRAM model
2674:Preemptive
2399:1809.09387
2199:References
2034:engine by
1690:equipment.
1599:Pipelining
1545:to readers
1043:colorGreen
980:particle_t
890:particle_t
817:by adding
665:vector_sum
637:vector_sum
247:DataStream
109:including
103:scheduling
2966:symmetric
2711:PEM model
1958:RapidMind
1817:based on
1772:software.
1492:DDR SDRAM
1473:Athlon 64
1034:colorBlue
337:FOLLOWING
256:TimeStamp
179:processor
136:pipelined
3197:Deadlock
3185:Problems
3151:pthreads
3131:OpenHMPP
3056:Ateji PX
3017:computer
2888:Hardware
2755:Elements
2741:Slowdown
2652:Temporal
2634:Pipeline
2377:Merrimac
2351:March 9,
2321:March 9,
2296:March 9,
2125:See also
2040:Wallaroo
2005:- C for
2003:Intel Ct
1938:OpenHMPP
1932:Gamebryo
1922:Ateji PX
1831:Stanford
1794:Ateji PX
1648:Stanford
1646:Another
1606:Examples
1471:to some
1445:dataflow
1400:>>
1118:#include
1112:#include
1106:#include
1100:#include
1081:scatters
1025:colorRed
1016:unsigned
920:unsigned
844:AT&T
790:Research
761:elements
728:instance
707:elements
414:FOLLOWED
328:INTERVAL
3156:RaftLib
3136:OpenACC
3111:GPUOpen
3101:C++ AMP
3076:Charm++
2818:Barrier
2762:Process
2746:Speedup
2531:General
2366:Imagine
1983:) from
1760:PowerPC
1664:Imagine
1656:Storm-1
1613:Blitter
1541:may be
1223:kstatus
1214:virtual
1184:addPort
1089:RaftLib
964:aligned
688:AltiVec
655:source1
649:source0
574:source1
568:source0
508:vectors
468:Wedding
429:Clothes
405:Clothes
364:orderId
352:orderId
268:orderId
56:streams
50:) is a
3249:
3126:OpenCL
3121:OpenMP
3066:Chapel
2983:shared
2978:Memory
2913:(SIMT)
2856:Models
2767:Thread
2699:Theory
2670:(SpMT)
2624:Memory
2609:Thread
2592:Levels
2480:GitHub
2414:
1998:Nvidia
1949:Google
1819:OpenMP
1754:, and
1702:Nvidia
1488:Rambus
1424:return
1346:string
1271:return
1265:"
1253:string
1235:output
1196:string
1178:output
1169:kernel
1148:public
1142:kernel
1133:public
1077:gather
1073:stride
977:struct
956:in an
954:arrays
887:struct
755:invoke
749:kernel
743:result
722:kernel
643:result
562:result
465:ACTION
456:WITHIN
432:EQUALS
423:Person
408:EQUALS
399:Person
390:EQUALS
387:Gender
381:Person
358:Trades
346:Orders
334:SECOND
316:Trades
310:Orders
304:amount
292:amount
286:Orders
280:ticker
274:Orders
262:Orders
250:Orders
244:SELECT
126:stream
117:, and
70:, and
3287:GPGPU
3096:Dryad
3061:Boost
2782:Array
2772:Fiber
2686:(CMT)
2659:(SMT)
2573:GPGPU
2394:arXiv
2024:Apama
1981:Brook
1966:Intel
1945:Brook
1842:etc..
1808:GPGPU
1742:from
1668:ISSCC
1637:Intel
1633:DARPA
1617:Amiga
1397:hello
1364:hello
1334:print
1124:class
1049:float
986:float
935:float
926:color
896:float
713:array
462:hours
325:RANGE
298:Trade
163:SISAL
46:, or
3161:ROCm
3091:CUDA
3081:Cilk
3048:APIs
3008:COMA
3003:NUMA
2934:MIMD
2929:MISD
2906:SIMD
2901:SISD
2629:Loop
2619:Data
2614:Task
2412:ISBN
2353:2017
2323:2017
2298:2017
2182:SIMT
2045:WSO2
2030:and
1994:CUDA
1897:WSO2
1849:and
1764:MIMD
1738:The
1700:and
1694:GPUs
1672:TSMC
1654:The
1639:and
1611:The
1506:ALUs
1490:and
1373:raft
1349:>
1337:<
1328:raft
1316:argv
1310:char
1304:argc
1295:main
1280:stop
1274:raft
1241:push
1217:raft
1199:>
1187:<
1163:raft
1136:raft
1055:size
1019:byte
938:size
923:byte
870:Cell
690:and
616:<
544:<
497:MIMD
493:SWAR
489:SIMD
484:SISD
482:are
480:CPUs
378:WHEN
319:OVER
313:JOIN
307:FROM
93:and
3176:ZPL
3171:TBB
3166:UPC
3146:PVM
3116:MPI
3071:HPX
2998:UMA
2599:Bit
2404:doi
1989:ATI
1985:AMD
1891:MIT
1798:JVM
1756:IBM
1744:STI
1709:API
1698:AMD
1676:DSP
1627:of
1421:();
1418:exe
1379:map
1340:std
1301:int
1292:int
1268:));
1247:std
1226:run
1190:std
861:DMA
694:).
692:SSE
678:to
619:100
598:int
592:for
547:400
526:int
520:for
438:AND
396:AND
236:SQL
30:In
3268::
2477:.
2410:.
2402:.
2339:.
2314:.
2289:.
1962:Sh
1871:Sh
1750:,
1682:,
1635:,
1595:.
1394:+=
1376:::
1361:hi
1343:::
1331:::
1313:**
1289:};
1277:::
1262:\n
1250:::
1229:()
1220:::
1208:);
1193:::
1172:()
1166:::
1157:()
1154:hi
1139:::
1127:hi
1061:};
947:};
719:()
658:);
628:++
625:el
613:el
601:el
556:++
447:OR
420:BY
343:ON
154:.
121:.
113:,
66:,
42:,
34:,
2516:e
2509:t
2502:v
2483:.
2420:.
2406::
2396::
2355:.
2325:.
2300:.
1987:/
1643:.
1576:)
1570:(
1565:)
1561:(
1557:.
1547:.
1433:}
1430:;
1415:.
1412:m
1406:;
1403:p
1391:m
1385:;
1382:m
1367:;
1355:;
1352:p
1322:{
1319:)
1307:,
1298:(
1286:}
1283:;
1256:(
1244:(
1238:.
1232:{
1211:}
1202:(
1181:.
1175:{
1160::
1151::
1145:{
1130::
1058:;
1052:*
1046:;
1040:*
1037:,
1031:*
1028:,
1022:*
1013:;
1010:z
1007:*
1004:,
1001:y
998:*
995:,
992:x
989:*
983:{
941:;
929:;
914:;
911:z
908:,
905:y
902:,
899:x
893:{
829:)
825:(
821:.
811:.
764:)
758:(
752:.
746:=
740:)
734:(
725:=
710:=
652:,
646:,
640:(
631:)
622:;
610:;
607:0
604:=
595:(
577:;
571:+
565:=
559:)
553:i
550:;
541:i
538:;
535:0
532:=
529:i
523:(
459:2
453:)
441:(
426:.
417:-
402:.
384:.
367:;
361:.
355:=
349:.
340:)
322:(
301:.
295:,
289:.
283:,
277:.
271:,
265:.
259:,
253:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.