730:, designed in the late 1950s, pre-executes all unconditional branches and any conditional branches that depended on the index registers. For other conditional branches, the first two production models implemented predict untaken; subsequent models were changed to implement predictions based on the current values of the indicator bits (corresponding to today's condition codes). The Stretch designers had considered static hint bits in the branch instructions early in the project but decided against them. Misprediction recovery was provided by the lookahead unit on Stretch, and part of Stretch's reputation for less-than-stellar performance was blamed on the time required for misprediction recovery. Subsequent IBM large computer designs did not use branch prediction with speculative execution until the
748:, a microprogrammed COBOL machine released around 1982, was pipelined and used branch prediction. The B4900 branch prediction history state is stored back into the in-memory instructions during program execution. The B4900 implements 4-state branch prediction by using 4 semantically equivalent branch opcodes to represent each branch operator type. The opcode used indicated the history of that particular branch instruction. If the hardware determines that the branch prediction state of a particular branch needs to be updated, it rewrites the opcode with the semantically equivalent opcode that hinted the proper history. This scheme obtains a 93% hit rate.
277:
338:
843:, the vulnerability involves priming the branch predictors so another process (or the kernel) will mispredict a branch and use secret data as an array index, evicting one of the attacker's cache lines. The attacker can time access to their own array to find out which one, turning this CPU internal (microarchitectural) state into a value the attacker can save which has information about values they could not read directly.
76:
119:. Branch prediction attempts to guess whether a conditional jump will be taken or not. Branch target prediction attempts to guess the target of a taken conditional or unconditional jump before it is computed by decoding and executing the instruction itself. Branch prediction and branch target prediction are often combined into the same circuitry.
537:
will suffer for those branches. Once you have multiple predictors, it is beneficial to arrange that each predictor will have different aliasing patterns, so that it is more likely that at least one predictor will have no aliasing. Combined predictors with different indexing functions for the different predictors are called
685:
The main disadvantage of the perceptron predictor is its high latency. Even after taking advantage of high-speed arithmetic tricks, the computation latency is relatively high compared to the clock period of many modern microarchitectures. In order to reduce the prediction latency, Jimenez proposed in
536:
Predictors like gshare use multiple table entries to track the behavior of any particular branch. This multiplication of entries makes it much more likely that two branches will map to the same table entry (a situation called aliasing), which in turn makes it much more likely that prediction accuracy
523:
A hybrid predictor, also called combined predictor, implements more than one prediction mechanism. The final prediction is based either on a meta-predictor that remembers which of the predictors has made the best predictions in the past, or a majority vote function based on an odd number of different
349:
statement is executed three times, the decision made on the third execution might depend upon whether the previous two were taken or not. In such scenarios, a two-level adaptive predictor works more efficiently than a saturation counter. Conditional jumps that are taken every second time or have some
148:
architectures) used single-direction static branch prediction: they always predict that a conditional jump will not be taken, so they always fetch the next sequential instruction. Only when the branch or jump is evaluated and found to be taken, does the instruction pointer get set to a non-sequential
561:
is best predicted with a special loop predictor. A conditional jump in the bottom of a loop that repeats N times will be taken N-1 times and then not taken once. If the conditional jump is placed at the top of the loop, it will be not taken N-1 times and then taken once. A conditional jump that goes
816:
has a combined bimodal and global predictor, where the combining choice is another bimodal predictor. This processor caches the base and choice bimodal predictor counters in bits of the L2 cache otherwise used for ECC. As a result, it has effectively very large base and choice predictor tables, and
627:
between fast branch prediction and good branch prediction is sometimes dealt with by having two branch predictors. The first branch predictor is fast and simple. The second branch predictor, which is slower, more complicated, and with bigger tables, will override a possibly wrong prediction made by
111:
The first time a conditional jump instruction is encountered, there is not much information to base a prediction on. But the branch predictor keeps records of whether branches are taken or not taken. When it encounters a conditional jump that has been seen several times before, then it can base the
91:
Without branch prediction, the processor would have to wait until the conditional jump instruction has passed the execute stage before the next instruction can enter the fetch stage in the pipeline. The branch predictor attempts to avoid this waste of time by trying to guess whether the conditional
574:
instruction can choose among more than two branches. Some processors have specialized indirect branch predictors. Newer processors from Intel and AMD can predict indirect branches by using a two-level adaptive predictor. This kind of instruction contributes more than one bit to the history buffer.
234:
When a next-line predictor points to aligned groups of 2, 4, or 8 instructions, the branch target will usually not be the first instruction fetched, and so the initial instructions fetched are wasted. Assuming for simplicity, a uniform distribution of branch targets, 0.5, 1.5, and 3.5 instructions
740:
Microprogrammed processors, popular from the 1960s to the 1980s and beyond, took multiple cycles per instruction, and generally did not require branch prediction. However, in addition to the IBM 3090, there are several other examples of microprogrammed designs that incorporated branch prediction.
361:
values, 00, 01, 10, and 11, where zero means "not taken" and one means "taken". A pattern history table contains four entries per branch, one for each of the 2 = 4 possible branch histories, and each entry in the table contains a two-bit saturating counter of the same type as in figure 2 for
87:
instruction. A conditional jump can either be "taken" and jump to a different place in program memory, or it can be "not taken" and continue execution immediately after the conditional jump. It is not known for certain whether a conditional jump will be taken or not taken until the condition has
631:
The Alpha 21264 and Alpha EV8 microprocessors used a fast single-cycle next-line predictor to handle the branch target recurrence and provide a simple and fast branch prediction. Because the next-line predictor is so inaccurate, and the branch resolution recurrence takes so long, both cores have
690:, where the perceptron predictor chooses its weights according to the current branch's path, rather than according to the branch's PC. Many other researchers developed this concept (A. Seznec, M. Monchiero, D. Tarjan & K. Skadron, V. Desmet, Akkary et al., K. Aasaraai, Michael Black, etc.).
442:
between different conditional jumps is part of making the predictions. The disadvantage is that the history is diluted by irrelevant information if the different conditional jumps are uncorrelated, and that the history buffer may not include any bits from the same branch if there are many other
298:
When a branch is evaluated, the corresponding state machine is updated. Branches evaluated as not taken change the state toward strongly not taken, and branches evaluated as taken change the state toward strongly taken. The advantage of the two-bit counter scheme over a one-bit scheme is that a
404:
A local branch predictor has a separate history buffer for each conditional jump instruction. It may use a two-level adaptive predictor. The history buffer is separate for each conditional jump instruction, while the pattern history table may be separate as well or it may be shared between all
365:
Assume, for example, that a conditional jump is taken every third time. The branch sequence is 001001001... In this case, entry number 00 in the pattern history table will go to state "strongly taken", indicating that after two zeroes comes a one. Entry number 01 will go to state "strongly not
350:
other regularly recurring pattern are not predicted well by the saturating counter. A two-level adaptive predictor remembers the history of the last n occurrences of the branch and uses one saturating counter for each of the possible 2 history patterns. This method is illustrated in figure 3.
514:
An agree predictor is a two-level adaptive predictor with globally shared history buffer and pattern history table, and an additional local saturating counter. The outputs of the local and the global predictors are XORed with each other to give the final prediction. The purpose is to reduce
1522:
681:
The main advantage of the neural predictor is its ability to exploit long histories while requiring only linear resource growth. Classical predictors require exponential resource growth. Jimenez reports a global improvement of 5.7% over a McFarling-style hybrid predictor. He also used a
160:
A more advanced form of static prediction presumes that backward branches will be taken and that forward branches will not. A backward branch is one that has a target address that is lower than its own address. This technique can help with prediction accuracy of loops, which are usually
562:
many times one way and then the other way once is detected as having loop behavior. Such a conditional jump can be predicted easily with a simple counter. A loop predictor is part of a hybrid predictor where a meta-predictor detects whether the conditional jump has loop behavior.
824:(EV8, cancelled late in design) had a minimum branch misprediction penalty of 14 cycles. It was to use a complex but fast next-line predictor overridden by a combined bimodal and majority-voting predictor. The majority vote was between the bimodal and two gskew predictors.
152:
Both CPUs evaluate branches in the decode stage and have a single cycle instruction fetch. As a result, the branch target recurrence is two cycles long, and the machine always fetches the instruction immediately after any taken branch. Both architectures define
446:
This scheme is better than the saturating counter scheme only for large table sizes, and it is rarely as good as local prediction. The history buffer must be longer in order to make a good prediction. The size of the pattern history table grows
198:
Using a random or pseudorandom bit (a pure guess) would guarantee every branch a 50% correct prediction rate, which cannot be improved (or worsened) by reordering instructions. (With the simplest static prediction of "assume take",
132:
Static prediction is the simplest branch prediction technique because it does not rely on information about the dynamic history of code executing. Instead, it predicts the outcome of a branch based solely on the branch instruction.
362:
each branch. The branch history register is used for choosing which of the four saturating counters to use. If the history is 00, then the first counter is used; if the history is 11, then the last of the four counters is used.
587:
instruction that can preload the branch predictor entry for a given instruction with a branch target address constructed by adding the contents of a general-purpose register to an immediate displacement value.
328:
The Two-Level Branch
Predictor, also referred to as Correlation-Based Branch Predictor, uses a two-dimensional table of counters, also called "Pattern History Table". The table entries are two-bit counters.
96:. If it is later detected that the guess was wrong, then the speculatively executed or partially executed instructions are discarded and the pipeline starts over with the correct branch, incurring a delay.
103:
is equal to the number of stages in the pipeline from the fetch stage to the execute stage. Modern microprocessors tend to have quite long pipelines so that the misprediction delay is between 10 and 20
1008:
438:
A global branch predictor does not keep a separate history record for each conditional jump. Instead it keeps a shared history of all conditional jumps. The advantage of a shared history is that any
779:
processors, do only trivial "not-taken" branch prediction. Because they use branch delay slots, fetched just one instruction per cycle, and execute in-order, there is no performance loss. The later
1967:" – describes the Alpha EV8 branch predictor. This paper does an excellent job discussing how they arrived at their design from various hardware constraints and simulation studies.
299:
conditional jump has to deviate twice from what it has done most in the past before the prediction changes. For example, a loop-closing conditional jump is mispredicted once rather than twice.
693:
Most of the state-of-the-art branch predictors are using a perceptron predictor (see Intel's "Championship Branch
Prediction Competition"). Intel already implements this idea in one of the
171:
Static prediction is used as a fall-back technique in some processors with dynamic branch prediction when dynamic predictors do not have sufficient information to use. Both the
Motorola
737:
Two-bit predictors were introduced by Tom McWilliams and Curt
Widdoes in 1977 for the Lawrence Livermore National Lab S-1 supercomputer and independently by Jim Smith in 1979 at CDC.
817:
parity rather than ECC on instructions in the L2 cache. The parity design is sufficient, since any instruction suffering a parity error can be invalidated and refetched from memory.
3082:
1915:
2001:
1520:, Yeh, Tse-Yu & Sharangpani, H. P., "A method and apparatus for branch prediction using a second level branch prediction table", published 2000-03-16
366:
taken", indicating that after 01 comes a zero. The same is the case with entry number 10, while entry number 11 is never used because there are never two consecutive ones.
112:
prediction on the history. The branch predictor may, for example, recognize that the conditional jump is taken more often than not, or that it is taken every second time.
976:
2054:
670:). One year later he developed the perceptron branch predictor. The neural branch predictor research was developed much further by Daniel Jimenez. In 2001, the first
376:
The advantage of the two-level adaptive predictor is that it can quickly learn to predict an arbitrary repetitive pattern. This method was invented by T.-Y. Yeh and
3193:
2376:
2895:
303:
1250:
1015:
783:
uses the same trivial "not-taken" branch prediction, and loses two cycles to each taken branch because the branch resolution recurrence is four cycles long.
341:
Figure 3: Two-level adaptive branch predictor. Every entry in the pattern history table represents a 2-bit saturating counter of the type shown in figure 2.
3052:
2618:
2435:
809:(EV6) uses a next-line predictor overridden by a combined local predictor and global predictor, where the combining choice is made by a bimodal predictor.
427:
310:
384:. Since the initial publication in 1991, this method has become very popular. Variants of this prediction method are used in most modern microprocessors.
242:) will be discarded. Once again, assuming a uniform distribution of branch instruction placements, 0.5, 1.5, and 3.5 instructions fetched are discarded.
164:
Some processors allow branch prediction hints to be inserted into the code to tell whether the static prediction should be taken or not taken. The Intel
2025:
2398:
674:
predictor was presented that was feasible to implement in hardware. The first commercial implementation of a perceptron branch predictor was in AMD's
369:
The general rule for a two-level adaptive predictor with an n-bit history is that it can predict any repetitive sequence with any period if all n-bit
245:
The discarded instructions at the branch and destination lines add up to nearly a complete fetch cycle, even for a single-cycle next-line predictor.
3047:
3119:
1888:
902:
515:
contentions in the pattern history table where two branches with opposite prediction happen to share the same entry in the pattern history table.
262:) records the last outcome of the branch. This is the most simple version of dynamic branch predictor possible, although it is not very accurate.
2872:
896:
1664:
632:
two-cycle secondary branch predictors that can override the prediction of the next-line predictor at the cost of a single lost fetch cycle.
3816:
2940:
2203:
2047:
929:
3826:
2967:
31:
2094:
890:
862:
190:
Dynamic branch prediction uses information about taken or not taken branches gathered at run-time to predict the outcome of a branch.
1974:
423:
have local branch predictors with a local 4-bit history and a local pattern history table with 16 entries for each conditional jump.
3134:
2962:
2935:
2314:
1946:
1860:
1537:
1123:
1078:
238:
Since the branch itself will generally not be the last instruction in an aligned group, instructions after the taken branch (or its
2285:
2009:
454:
A two-level adaptive predictor with globally shared history buffer and pattern history table is called a "gshare" predictor if it
3949:
3512:
2405:
2371:
2366:
2250:
832:
667:
3985:
3924:
3821:
3222:
3129:
2930:
2173:
2151:
2040:
983:
3990:
2669:
2104:
1960:
675:
611:. Many microprocessors have a separate prediction mechanism for return instructions. This mechanism is based on a so-called
1419:
3124:
2972:
2806:
2420:
2381:
2238:
1845:
Digest of Papers
Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage
3561:
3406:
3401:
3323:
2799:
2760:
2415:
2410:
2344:
2156:
1369:
1111:
655:
2280:
3188:
2885:
2583:
1672:. The 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36). San Diego, USA. pp. 243–252.
451:
with the size of the history buffer. Hence, the big pattern history table must be shared among all conditional jumps.
276:
1787:
1719:
Brekelbaum, Edward; Rupley, Jeff; Wilkerson, Chris; Black, Bryan (December 2002). "Hierarchical scheduling windows".
1398:
591:
Processors without this mechanism will simply predict an indirect jump to go to the same target as it did last time.
313:'89 benchmarks, very large bimodal predictors saturate at 93.5% correct, once every branch maps to a unique counter.
1233:
3838:
3485:
2902:
2393:
2361:
2131:
2119:
2099:
576:
353:
Consider the example of n = 2. This means that the last two occurrences of the branch are stored in a two-bit
203:
can reorder instructions to get better than 50% correct prediction.) Also, it would make timing nondeterministic.
92:
jump is most likely to be taken or not taken. The branch that is guessed to be the most likely is then fetched and
1336:"A Taxonomy of Branch Mispredictions, and Alloyed Prediction as a Robust Solution to Wrong-History Mispredictions"
88:
been calculated and the conditional jump has passed the execution stage in the instruction pipeline (see fig. 1).
3929:
3892:
3882:
2270:
828:
799:
786:
Branch prediction became more important with the introduction of pipelined superscalar processors like the Intel
663:
84:
54:
50:
3944:
3351:
3287:
3264:
3114:
3076:
2912:
2862:
2857:
2334:
2228:
2136:
958:
57:) will go before this is known definitively. The purpose of the branch predictor is to improve the flow in the
2141:
1953:" – demonstrates prediction accuracy is not impaired by indexing with previous branch address.
1640:
3897:
3680:
3574:
3538:
3455:
3439:
3281:
3070:
3029:
3017:
2880:
2794:
2715:
2480:
2084:
884:
869:
852:
840:
708:
640:
615:, which is a local mirror of the call stack. The size of the return stack buffer is typically 4–16 entries.
320:
bits, so that the processor can fetch a prediction for every instruction before the instruction is decoded.
259:
228:
116:
1065:. 2006 IEEE International Symposium on Performance Analysis of Systems and Software. IEEE. pp. 48–58.
3703:
3675:
3585:
3550:
3299:
3293:
3275:
3009:
3003:
2907:
2811:
2702:
2641:
2503:
2146:
1517:
768:
659:
463:
381:
79:
Figure 1: Example of 4-stage pipeline. The colored boxes represent instructions independent of each other.
66:
1743:
227:(EV8)) fetch each line of instructions with a pointer to the next line. This next-line predictor handles
3877:
3786:
3532:
3244:
3062:
2821:
2789:
2747:
2659:
2460:
2275:
2265:
2255:
2245:
2215:
2198:
2063:
1241:
358:
266:
212:
93:
38:
1763:
3907:
3843:
3429:
3151:
3041:
2988:
2520:
2233:
2089:
2071:
1343:
Proceedings of the 2000 International
Conference on Parallel Architectures and Compilation Techniques
527:
448:
62:
58:
3954:
3556:
3939:
3759:
3610:
3592:
3544:
3198:
3145:
2950:
2945:
2922:
2838:
2720:
2575:
2470:
2329:
182:
In static prediction, all decisions are made at compile time, before the execution of the program.
46:
1216:
3811:
3803:
3655:
3630:
3434:
3309:
2833:
2774:
2654:
2386:
2114:
1866:
1681:
1618:
Proceedings of the 7th
International Symposium on High Performance Computer Architecture (HPCA-7)
1084:
1043:
866:
857:
108:. As a result, making a pipeline longer increases the need for a more advanced branch predictor.
1335:
1115:
3764:
3731:
3647:
3579:
3480:
3470:
3460:
3391:
3386:
3381:
3304:
3233:
3139:
3099:
2732:
2682:
2632:
2608:
2490:
2430:
2425:
2307:
2223:
1981:" – describes the EV6 and K8 branch predictors, and pipelining considerations.
1856:
1588:
1573:
1119:
1074:
933:
879:
765:
337:
154:
141:
1175:
3934:
3867:
3708:
3615:
3376:
3371:
3366:
3361:
3356:
3346:
3216:
3183:
3094:
3089:
2998:
2850:
2845:
2828:
2816:
2319:
2297:
2183:
2161:
2079:
1896:
1848:
1724:
1673:
1621:
1380:
1346:
1316:
1283:
1103:
1066:
874:
761:, announced in 1989, is both microprogrammed and pipelined, and performs branch prediction.
727:
651:
604:
554:
1196:
168:
accepts branch prediction hints, but this feature was abandoned in later Intel processors.
3848:
3833:
3781:
3685:
3660:
3497:
3490:
3341:
3336:
3331:
3270:
3178:
3168:
2890:
2725:
2677:
2440:
2324:
2292:
2193:
2188:
2109:
1978:
1971:
1964:
1950:
1811:
1496:
787:
745:
571:
499:
1943:
1548:
3959:
3793:
3776:
3769:
3665:
3522:
3259:
3173:
3104:
2687:
2649:
2598:
2593:
2588:
2302:
2126:
580:
393:
354:
317:
69:
1473:
1444:
1320:
1151:
3979:
3754:
3670:
2710:
2692:
2485:
2178:
1825:
1459:
1304:
1303:
Egan, Colin; Steven, Gordon; Quick, P.; Anguera, R.; Vintan, Lucian (December 2003).
1104:
636:
533:
On the SPEC'89 benchmarks, such a predictor is about as good as the local predictor.
495:
459:
270:
1870:
1370:"The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference"
3964:
3902:
3718:
3695:
3507:
3228:
2166:
1610:
1403:
1180:
1047:
836:
558:
494:
An alloyed branch predictor combines the local and global prediction principles by
475:
1957:
1399:"Cortex-A15 MPCore Technical Reference Manual, section 6.5.3 "Indirect predictor""
1088:
17:
3749:
3713:
3424:
3396:
3254:
3109:
1426:
1070:
821:
806:
791:
439:
420:
412:
370:
224:
220:
105:
75:
1728:
1677:
3635:
3625:
3620:
3602:
3502:
3475:
2737:
2570:
2540:
2260:
1900:
1852:
750:
671:
608:
600:
483:
479:
471:
416:
306:
processor uses a saturating counter, though with an imperfect implementation.
239:
172:
1840:
1625:
1592:
1350:
3726:
3723:
3465:
2535:
2513:
1916:"Meltdown and Spectre: 'worst ever' CPU bugs affect virtually all computers"
1645:
1365:
1275:
624:
542:
467:
377:
176:
165:
1280:
802:
series. These processors all rely on one-bit or simple bimodal predictors.
1788:"AMD's Zen CPU is now called Ryzen, and it might actually challenge Intel"
1384:
1288:
3741:
2613:
2560:
2032:
1986:
1748:
1033:
758:
731:
503:
455:
200:
1893:
Proceedings 29th Annual
International Symposium on Computer Architecture
1377:
Proceedings of the 24th
International Symposium on Computer Architecture
2550:
2508:
1545:
Proceedings International Journal Conference on Neural Networks (IJCNN)
712:
430:'89 benchmarks, very large local predictors saturate at 97.1% correct.
392:
A two-level branch predictor where the second level is replaced with a
3853:
2565:
2530:
2495:
1839:
Murray, J.E.; Salett, R.M.; Hetherington, R.C.; McKeen, F.X. (1990).
813:
715:
161:
backward-pointing branches, and are taken more often than not taken.
1721:
Proceedings of the 35th International Symposium on Microarchitecture
498:
local and global branch histories, possibly with some bits from the
928:
Malishevsky, Alexey; Beck, Douglas; Schmid, Andreas; Landry, Eric.
3023:
2555:
2525:
795:
780:
776:
772:
704:
694:
409:
336:
275:
216:
137:
74:
1889:"Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor"
1744:"AMD Ryzen reviews, news, performance, pricing, and availability"
1032:
Michaud, Pierre; Seznec, André; Uhlig, Richard (September 1996).
3887:
3035:
2955:
2545:
1705:
443:
branches in between. It may use a two-level adaptive predictor.
145:
2036:
1958:
Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor
1106:
Modern processor design: fundamentals of superscalar processors
2475:
2465:
1767:
1503:
1482:
1197:"CMSC 611: Advanced Computer Architecture, Chapter 4 (Part V)"
718:
processor include a perceptron-based neural branch predictor.
701:
1764:"AMD Takes Computing to a New Horizon with Ryzen™ Processors"
1282:. Albuquerque, New Mexico, Puerto Rico: ACM. pp. 51–61.
1222:. Digital Western Research Lab (WRL) Technical Report, TN-36.
1009:"18-447 Computer Architecture Lecture 11: Branch Prediction"
1812:"IBM Stretch (7030) -- Aggressive Uniprocessor Parallelism"
959:"The Schemes and Performances of Dynamic Branch predictors"
607:
is an indirect jump that reads its target address from the
61:. Branch predictors play a critical role in achieving high
952:
950:
1334:
Skadron, K.; Martonosi, M.; Clark, D. W. (October 2000).
1278:(1991). "Two-Level Adaptive Training Branch Prediction".
1176:"The Pentium 4 and the G4e: an Architectural Comparison"
2002:"Branch and Loop Reorganization to Prevent Mispredicts"
1145:
1143:
1141:
1139:
1137:
1135:
530:
proposed combined branch prediction in his 1993 paper.
357:. This branch history register can have four different
1604:
1602:
1581:
Romanian Journal of Information Science and Technology
839:
and other researchers. Affecting virtually all modern
458:
the global history and branch PC, and "gselect" if it
2026:"What is Branch Prediction? – Stack Overflow Example"
603:
will normally return to where it is called from. The
1538:"Towards a High Performance Neural Branch Predictor"
316:
The predictor table is indexed with the instruction
3917:
3866:
3802:
3740:
3694:
3646:
3601:
3521:
3448:
3417:
3322:
3243:
3207:
3161:
3061:
2987:
2921:
2871:
2782:
2773:
2746:
2701:
2668:
2640:
2631:
2451:
2354:
2343:
2214:
2070:
1658:
1656:
1305:"Two-Level Branch Prediction using Neural Networks"
1234:"New Algorithm Improves Branch Prediction: 3/27/95"
1152:"The microarchitecture of Intel, AMD, and VIA CPUs"
666:
branch prediction", was proposed by Lucian Vintan (
280:
Figure 2: State diagram of 2-bit saturating counter
1987:"The microarchitecture of Intel, AMD and VIA CPUs"
1887:Seznec, A.; Felix, S.; Krishnan, V.; Sazeides, Y.
923:
921:
919:
1641:"The AMD Trinity Review (A10-4600M): A New Hope"
1460:"Performance Analysis for Core 2 and K8: Part 1"
977:"Branch Prediction Techniques and Optimizations"
682:gshare/perceptron overriding hybrid predictors.
157:in order to utilize these fetched instructions.
83:Two-way branching is usually implemented with a
1210:
1208:
1206:
1063:Characterizing the branch misprediction penalty
1061:Eyerman, S.; Smith, J.E.; Eeckhout, L. (2006).
2048:
1574:"Towards a Powerful Dynamic Branch Predictor"
1485:. May 2022. pp. 7-42–7-45. SA22-7832-14.
8:
1611:"Dynamic Branch Prediction with Perceptrons"
643:and possibly two or more branch predictors.
3053:Computer performance by orders of magnitude
1620:. Monterrey, NL, Mexico. pp. 197–296.
1587:(3). Bucharest: Romanian Academy: 287–301.
3518:
3158:
2779:
2637:
2351:
2055:
2041:
2033:
1445:"A Look at Centrino's Core: The Pentium M"
1418:Driesen, Karel; Hölzle, Urs (1997-06-25).
764:The first commercial RISC processors, the
462:them. Global branch prediction is used in
258:A 1-bit saturating counter (essentially a
1882:
1880:
1364:Sprangle, E.; Chappell, R.S.; Alsup, M.;
1287:
1666:Fast Path-Based Neural Branch Prediction
754:and others were granted on this scheme.
231:as well as branch direction prediction.
1972:Reconsidering Complex Branch Predictors
915:
903:Single thread indirect branch predictor
545:used for data and instruction caching.
506:processor may be using this technique.
1944:Multiple-Block Ahead Branch Predictors
1497:"IBM zEnterprise BC12 Technical Guide"
1475:z/Architecture Principles of Operation
1420:"Limits of Indirect Branch Prediction"
1102:Shen, John P.; Lipasti, Mikko (2005).
897:Indirect branch restricted speculation
235:fetched are discarded, respectively.
115:Branch prediction is not the same as
99:The time that is wasted in case of a
7:
3024:Floating-point operations per second
1841:"Micro-architecture of the VAX 9000"
1663:Jimenez, Daniel A. (December 2003).
179:use this technique as a fall-back.
32:Predication (computer architecture)
1345:. Philadelphia. pp. 199–206.
891:Indirect branch prediction barrier
863:Branch prediction analysis attacks
25:
541:predictors, and are analogous to
502:as well. Tests indicate that the
3950:Semiconductor device fabrication
1706:"Championship Branch Prediction"
1609:Jimenez, D. A.; Lin, C. (2001).
668:Lucian Blaga University of Sibiu
49:that tries to guess which way a
3925:History of general-purpose CPUs
2152:Nondeterministic Turing machine
1309:Journal of Systems Architecture
1256:from the original on 2015-03-10
2105:Deterministic finite automaton
1215:McFarling, Scott (June 1993).
595:Prediction of function returns
583:processors from IBM support a
1:
2896:Simultaneous and heterogenous
1639:Walton, Jarred (2012-05-15).
1321:10.1016/S1383-7621(03)00095-X
1217:"Combining Branch Predictors"
144:(two of the first commercial
136:The early implementations of
3580:Integrated memory controller
3562:Translation lookaside buffer
2761:Memory dependence prediction
2204:Random-access stored program
2157:Probabilistic Turing machine
2000:Andrews, Jeff (2007-10-30).
1914:Gibbs, Samuel (2018-01-04).
1506:. February 2014. p. 78.
1458:Kanter, Aaron (2008-10-28).
1112:McGraw-Hill Higher Education
835:was made public by Google's
676:Piledriver microarchitecture
654:for branch prediction using
619:Overriding branch prediction
333:Two-level adaptive predictor
3036:Synaptic updates per second
1071:10.1109/ispass.2006.1620789
930:"Dynamic Branch Prediction"
249:One-level branch prediction
4007:
3440:Heterogeneous architecture
2362:Orthogonal instruction set
2132:Alternating Turing machine
2120:Quantum cellular automaton
1742:James, Dave (2017-12-06).
1729:10.1109/MICRO.2002.1176236
1678:10.1109/MICRO.2003.1253199
1572:Vintan, Lucian N. (2000).
1536:Vintan, Lucian N. (1999).
1443:Stokes, Jon (2004-02-25).
1007:Mutlu, Onur (2013-02-11).
688:fast-path neural predictor
388:Two-level neural predictor
29:
3930:Microprocessor chronology
3893:Dynamic frequency scaling
3048:Cache performance metrics
1901:10.1109/ISCA.2002.1003587
1853:10.1109/CMPCON.1990.63652
1150:Fog, Agner (2016-12-01).
585:BRANCH PREDICTION PRELOAD
566:Indirect branch predictor
543:skewed associative caches
490:Alloyed branch prediction
466:processors, and in Intel
186:Dynamic branch prediction
3945:Hardware security module
3288:Digital signal processor
3265:Graphics processing unit
3077:Graphics processing unit
1626:10.1109/HPCA.2001.903263
1351:10.1109/PACT.2000.888344
1035:Skewed branch predictors
647:Neural branch prediction
434:Global branch prediction
229:branch target prediction
194:Random branch prediction
128:Static branch prediction
117:branch target prediction
30:Not to be confused with
3898:Dynamic voltage scaling
3681:Memory address register
3575:Branch target predictor
3539:Address generation unit
3282:Physics processing unit
3071:Central processing unit
3030:Transactions per second
3018:Instructions per second
2941:Array processing (SIMT)
2085:Stored-program computer
1956:Seznec et al. (2002). "
1942:Seznec et al. (1996). "
1481:(Fourteenth ed.).
885:Indirect branch control
870:public-key cryptography
853:Branch target predictor
827:In 2018 a catastrophic
707:multi-core processor's
660:multi-layer perceptrons
400:Local branch prediction
3986:Instruction processing
3704:Hardwired control unit
3586:Memory management unit
3551:Memory management unit
3300:Secure cryptoprocessor
3294:Tensor Processing Unit
3276:Vision processing unit
3010:Cycles per instruction
3004:Instructions per cycle
2951:Associative processing
2642:Instruction pipelining
2064:Processor technologies
2006:Intel Software Network
829:security vulnerability
697:'s simulators (2003).
382:University of Michigan
342:
302:The original, non-MMX
281:
213:superscalar processors
94:speculatively executed
80:
55:if–then–else structure
3991:Speculative execution
3787:Sum-addressed decoder
3533:Arithmetic logic unit
2660:Classic RISC pipeline
2614:Epiphany architecture
2461:Motorola 68000 series
1385:10.1145/264107.264210
1289:10.1145/123465.123475
1249:(4). March 27, 1995.
1242:Microprocessor Report
641:branch target buffers
628:the first predictor.
340:
279:
78:
39:computer architecture
3908:Performance per watt
3486:replacement policies
3152:Package on a package
3042:Performance per watt
2946:Pipelined processing
2716:Tomasulo's algorithm
2521:Clipper architecture
2377:Application-specific
2090:Finite-state machine
1723:. Istanbul, Turkey.
207:Next line prediction
101:branch misprediction
59:instruction pipeline
3940:Digital electronics
3593:Instruction decoder
3545:Floating-point unit
3199:Soft microprocessor
3146:System in a package
2721:Reservation station
2251:Transport-triggered
1985:Fog, Agner (2009).
1826:"S-1 Supercomputer"
957:Cheng, Chih-Cheng.
751:US patent 4,435,756
613:return stack buffer
405:conditional jumps.
396:has been proposed.
324:Two-level predictor
3812:Integrated circuit
3656:Processor register
3310:Baseband processor
2655:Operand forwarding
2115:Cellular automaton
1977:2007-12-27 at the
1963:2008-07-20 at the
1949:2008-07-20 at the
1847:. pp. 44–53.
1315:(12–15): 557–570.
1195:Plusquellic, Jim.
858:Branch predication
605:return instruction
343:
285:Strongly not taken
282:
273:with four states:
267:saturating counter
254:Saturating counter
155:branch delay slots
81:
3973:
3972:
3862:
3861:
3481:Instruction cache
3471:Scratchpad memory
3318:
3317:
3305:Network processor
3234:Network on a chip
3189:Ultra-low-voltage
3140:Multi-chip module
2983:
2982:
2769:
2768:
2756:Branch prediction
2733:Register renaming
2627:
2626:
2609:VISC architecture
2431:Quantum computing
2426:VISC architecture
2308:Secondary storage
2224:Microarchitecture
2184:Register machines
1970:Jimenez (2003). "
1766:(Press release).
880:Cache prefetching
18:Branch predictors
16:(Redirected from
3998:
3935:Processor design
3827:Power management
3709:Instruction unit
3570:Branch predictor
3519:
3217:System on a chip
3159:
2999:Transistor count
2923:Flynn's taxonomy
2780:
2638:
2441:Addressing modes
2352:
2298:Memory hierarchy
2162:Hypercomputation
2080:Abstract machine
2057:
2050:
2043:
2034:
2029:
2024:Yee, Alexander.
2020:
2018:
2017:
2008:. Archived from
1996:
1994:
1993:
1930:
1929:
1927:
1926:
1911:
1905:
1904:
1884:
1875:
1874:
1836:
1830:
1829:
1822:
1816:
1815:
1808:
1802:
1801:
1799:
1798:
1784:
1778:
1777:
1775:
1774:
1760:
1754:
1753:
1739:
1733:
1732:
1716:
1710:
1709:
1702:
1696:
1695:
1693:
1692:
1686:
1680:. Archived from
1671:
1660:
1651:
1650:
1636:
1630:
1629:
1615:
1606:
1597:
1596:
1578:
1569:
1563:
1562:
1560:
1559:
1553:
1547:. Archived from
1542:
1533:
1527:
1526:
1525:
1521:
1514:
1508:
1507:
1501:
1493:
1487:
1486:
1480:
1470:
1464:
1463:
1455:
1449:
1448:
1440:
1434:
1433:
1431:
1425:. Archived from
1424:
1415:
1409:
1408:
1395:
1389:
1388:
1374:
1361:
1355:
1354:
1340:
1331:
1325:
1324:
1300:
1294:
1293:
1291:
1271:
1265:
1264:
1262:
1261:
1255:
1238:
1230:
1224:
1223:
1221:
1212:
1201:
1200:
1192:
1186:
1185:
1172:
1166:
1165:
1163:
1162:
1156:
1147:
1130:
1129:
1109:
1099:
1093:
1092:
1058:
1052:
1051:
1029:
1023:
1022:
1020:
1014:. Archived from
1013:
1004:
998:
997:
995:
994:
988:
982:. Archived from
981:
972:
966:
965:
963:
954:
945:
944:
942:
941:
932:. Archived from
925:
875:Instruction unit
775:and the earlier
753:
728:IBM 7030 Stretch
652:Machine learning
586:
557:that controls a
555:conditional jump
519:Hybrid predictor
348:
288:Weakly not taken
85:conditional jump
43:branch predictor
21:
4006:
4005:
4001:
4000:
3999:
3997:
3996:
3995:
3976:
3975:
3974:
3969:
3955:Tick–tock model
3913:
3869:
3858:
3798:
3782:Address decoder
3736:
3690:
3686:Program counter
3661:Status register
3642:
3597:
3557:Load–store unit
3524:
3517:
3444:
3413:
3314:
3271:Image processor
3246:
3239:
3209:
3203:
3179:Microcontroller
3169:Embedded system
3157:
3057:
2990:
2979:
2917:
2867:
2765:
2742:
2726:Re-order buffer
2697:
2678:Data dependency
2664:
2623:
2453:
2447:
2346:
2345:Instruction set
2339:
2325:Multiprocessing
2293:Cache hierarchy
2286:Register/memory
2210:
2110:Queue automaton
2066:
2061:
2023:
2015:
2013:
1999:
1991:
1989:
1984:
1979:Wayback Machine
1965:Wayback Machine
1951:Wayback Machine
1939:
1934:
1933:
1924:
1922:
1913:
1912:
1908:
1886:
1885:
1878:
1863:
1838:
1837:
1833:
1824:
1823:
1819:
1810:
1809:
1805:
1796:
1794:
1792:Ars Technica UK
1786:
1785:
1781:
1772:
1770:
1762:
1761:
1757:
1741:
1740:
1736:
1718:
1717:
1713:
1704:
1703:
1699:
1690:
1688:
1684:
1669:
1662:
1661:
1654:
1638:
1637:
1633:
1613:
1608:
1607:
1600:
1576:
1571:
1570:
1566:
1557:
1555:
1551:
1540:
1535:
1534:
1530:
1523:
1516:
1515:
1511:
1499:
1495:
1494:
1490:
1478:
1472:
1471:
1467:
1457:
1456:
1452:
1447:. pp. 2–3.
1442:
1441:
1437:
1429:
1422:
1417:
1416:
1412:
1397:
1396:
1392:
1372:
1363:
1362:
1358:
1338:
1333:
1332:
1328:
1302:
1301:
1297:
1273:
1272:
1268:
1259:
1257:
1253:
1236:
1232:
1231:
1227:
1219:
1214:
1213:
1204:
1194:
1193:
1189:
1174:
1173:
1169:
1160:
1158:
1154:
1149:
1148:
1133:
1126:
1101:
1100:
1096:
1081:
1060:
1059:
1055:
1031:
1030:
1026:
1018:
1011:
1006:
1005:
1001:
992:
990:
986:
979:
974:
973:
969:
961:
956:
955:
948:
939:
937:
927:
926:
917:
912:
849:
749:
746:Burroughs B4900
724:
709:Infinity Fabric
649:
621:
597:
584:
568:
551:
528:Scott McFarling
521:
512:
510:Agree predictor
500:program counter
492:
436:
402:
390:
373:are different.
346:
335:
326:
256:
251:
209:
196:
188:
130:
125:
72:architectures.
65:in many modern
47:digital circuit
35:
28:
27:Digital circuit
23:
22:
15:
12:
11:
5:
4004:
4002:
3994:
3993:
3988:
3978:
3977:
3971:
3970:
3968:
3967:
3962:
3960:Pin grid array
3957:
3952:
3947:
3942:
3937:
3932:
3927:
3921:
3919:
3915:
3914:
3912:
3911:
3905:
3900:
3895:
3890:
3885:
3880:
3874:
3872:
3864:
3863:
3860:
3859:
3857:
3856:
3851:
3846:
3841:
3836:
3831:
3830:
3829:
3824:
3819:
3808:
3806:
3800:
3799:
3797:
3796:
3794:Barrel shifter
3791:
3790:
3789:
3784:
3777:Binary decoder
3774:
3773:
3772:
3762:
3757:
3752:
3746:
3744:
3738:
3737:
3735:
3734:
3729:
3721:
3716:
3711:
3706:
3700:
3698:
3692:
3691:
3689:
3688:
3683:
3678:
3673:
3668:
3666:Stack register
3663:
3658:
3652:
3650:
3644:
3643:
3641:
3640:
3639:
3638:
3633:
3623:
3618:
3613:
3607:
3605:
3599:
3598:
3596:
3595:
3590:
3589:
3588:
3577:
3572:
3567:
3566:
3565:
3559:
3548:
3542:
3536:
3529:
3527:
3516:
3515:
3510:
3505:
3500:
3495:
3494:
3493:
3488:
3483:
3478:
3473:
3468:
3458:
3452:
3450:
3446:
3445:
3443:
3442:
3437:
3432:
3427:
3421:
3419:
3415:
3414:
3412:
3411:
3410:
3409:
3399:
3394:
3389:
3384:
3379:
3374:
3369:
3364:
3359:
3354:
3349:
3344:
3339:
3334:
3328:
3326:
3320:
3319:
3316:
3315:
3313:
3312:
3307:
3302:
3297:
3291:
3285:
3279:
3273:
3268:
3262:
3260:AI accelerator
3257:
3251:
3249:
3241:
3240:
3238:
3237:
3231:
3226:
3223:Multiprocessor
3220:
3213:
3211:
3205:
3204:
3202:
3201:
3196:
3191:
3186:
3181:
3176:
3174:Microprocessor
3171:
3165:
3163:
3162:By application
3156:
3155:
3149:
3143:
3137:
3132:
3127:
3122:
3117:
3112:
3107:
3105:Tile processor
3102:
3097:
3092:
3087:
3086:
3085:
3074:
3067:
3065:
3059:
3058:
3056:
3055:
3050:
3045:
3039:
3033:
3027:
3021:
3015:
3014:
3013:
3001:
2995:
2993:
2985:
2984:
2981:
2980:
2978:
2977:
2976:
2975:
2965:
2960:
2959:
2958:
2953:
2948:
2943:
2933:
2927:
2925:
2919:
2918:
2916:
2915:
2910:
2905:
2900:
2899:
2898:
2893:
2891:Hyperthreading
2883:
2877:
2875:
2873:Multithreading
2869:
2868:
2866:
2865:
2860:
2855:
2854:
2853:
2843:
2842:
2841:
2836:
2826:
2825:
2824:
2819:
2809:
2804:
2803:
2802:
2797:
2786:
2784:
2777:
2771:
2770:
2767:
2766:
2764:
2763:
2758:
2752:
2750:
2744:
2743:
2741:
2740:
2735:
2730:
2729:
2728:
2723:
2713:
2707:
2705:
2699:
2698:
2696:
2695:
2690:
2685:
2680:
2674:
2672:
2666:
2665:
2663:
2662:
2657:
2652:
2650:Pipeline stall
2646:
2644:
2635:
2629:
2628:
2625:
2624:
2622:
2621:
2616:
2611:
2606:
2603:
2602:
2601:
2599:z/Architecture
2596:
2591:
2586:
2578:
2573:
2568:
2563:
2558:
2553:
2548:
2543:
2538:
2533:
2528:
2523:
2518:
2517:
2516:
2511:
2506:
2498:
2493:
2488:
2483:
2478:
2473:
2468:
2463:
2457:
2455:
2449:
2448:
2446:
2445:
2444:
2443:
2433:
2428:
2423:
2418:
2413:
2408:
2403:
2402:
2401:
2391:
2390:
2389:
2379:
2374:
2369:
2364:
2358:
2356:
2349:
2341:
2340:
2338:
2337:
2332:
2327:
2322:
2317:
2312:
2311:
2310:
2305:
2303:Virtual memory
2295:
2290:
2289:
2288:
2283:
2278:
2273:
2263:
2258:
2253:
2248:
2243:
2242:
2241:
2231:
2226:
2220:
2218:
2212:
2211:
2209:
2208:
2207:
2206:
2201:
2196:
2191:
2181:
2176:
2171:
2170:
2169:
2164:
2159:
2154:
2149:
2144:
2139:
2134:
2127:Turing machine
2124:
2123:
2122:
2117:
2112:
2107:
2102:
2097:
2087:
2082:
2076:
2074:
2068:
2067:
2062:
2060:
2059:
2052:
2045:
2037:
2031:
2030:
2021:
1997:
1982:
1968:
1954:
1938:
1937:External links
1935:
1932:
1931:
1906:
1876:
1861:
1831:
1817:
1803:
1779:
1755:
1734:
1711:
1697:
1652:
1631:
1598:
1564:
1528:
1518:WO 2000/014628
1509:
1488:
1465:
1450:
1435:
1432:on 2016-05-06.
1410:
1390:
1356:
1326:
1295:
1266:
1225:
1202:
1187:
1184:. 12 May 2001.
1167:
1131:
1124:
1094:
1079:
1053:
1024:
1021:on 2015-03-25.
999:
975:Parihar, Raj.
967:
946:
914:
913:
911:
908:
907:
906:
900:
894:
888:
882:
877:
872:
860:
855:
848:
845:
723:
720:
648:
645:
620:
617:
596:
593:
581:z/Architecture
567:
564:
550:
549:Loop predictor
547:
520:
517:
511:
508:
491:
488:
435:
432:
401:
398:
394:neural network
389:
386:
355:shift register
334:
331:
325:
322:
296:
295:
294:Strongly taken
292:
289:
286:
255:
252:
250:
247:
208:
205:
195:
192:
187:
184:
175:and the Intel
129:
126:
124:
123:Implementation
121:
70:microprocessor
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
4003:
3992:
3989:
3987:
3984:
3983:
3981:
3966:
3963:
3961:
3958:
3956:
3953:
3951:
3948:
3946:
3943:
3941:
3938:
3936:
3933:
3931:
3928:
3926:
3923:
3922:
3920:
3916:
3909:
3906:
3904:
3901:
3899:
3896:
3894:
3891:
3889:
3886:
3884:
3881:
3879:
3876:
3875:
3873:
3871:
3865:
3855:
3852:
3850:
3847:
3845:
3842:
3840:
3837:
3835:
3832:
3828:
3825:
3823:
3820:
3818:
3815:
3814:
3813:
3810:
3809:
3807:
3805:
3801:
3795:
3792:
3788:
3785:
3783:
3780:
3779:
3778:
3775:
3771:
3768:
3767:
3766:
3763:
3761:
3758:
3756:
3755:Demultiplexer
3753:
3751:
3748:
3747:
3745:
3743:
3739:
3733:
3730:
3728:
3725:
3722:
3720:
3717:
3715:
3712:
3710:
3707:
3705:
3702:
3701:
3699:
3697:
3693:
3687:
3684:
3682:
3679:
3677:
3676:Memory buffer
3674:
3672:
3671:Register file
3669:
3667:
3664:
3662:
3659:
3657:
3654:
3653:
3651:
3649:
3645:
3637:
3634:
3632:
3629:
3628:
3627:
3624:
3622:
3619:
3617:
3614:
3612:
3611:Combinational
3609:
3608:
3606:
3604:
3600:
3594:
3591:
3587:
3584:
3583:
3581:
3578:
3576:
3573:
3571:
3568:
3563:
3560:
3558:
3555:
3554:
3552:
3549:
3546:
3543:
3540:
3537:
3534:
3531:
3530:
3528:
3526:
3520:
3514:
3511:
3509:
3506:
3504:
3501:
3499:
3496:
3492:
3489:
3487:
3484:
3482:
3479:
3477:
3474:
3472:
3469:
3467:
3464:
3463:
3462:
3459:
3457:
3454:
3453:
3451:
3447:
3441:
3438:
3436:
3433:
3431:
3428:
3426:
3423:
3422:
3420:
3416:
3408:
3405:
3404:
3403:
3400:
3398:
3395:
3393:
3390:
3388:
3385:
3383:
3380:
3378:
3375:
3373:
3370:
3368:
3365:
3363:
3360:
3358:
3355:
3353:
3350:
3348:
3345:
3343:
3340:
3338:
3335:
3333:
3330:
3329:
3327:
3325:
3321:
3311:
3308:
3306:
3303:
3301:
3298:
3295:
3292:
3289:
3286:
3283:
3280:
3277:
3274:
3272:
3269:
3266:
3263:
3261:
3258:
3256:
3253:
3252:
3250:
3248:
3242:
3235:
3232:
3230:
3227:
3224:
3221:
3218:
3215:
3214:
3212:
3206:
3200:
3197:
3195:
3192:
3190:
3187:
3185:
3182:
3180:
3177:
3175:
3172:
3170:
3167:
3166:
3164:
3160:
3153:
3150:
3147:
3144:
3141:
3138:
3136:
3133:
3131:
3128:
3126:
3123:
3121:
3118:
3116:
3113:
3111:
3108:
3106:
3103:
3101:
3098:
3096:
3093:
3091:
3088:
3084:
3081:
3080:
3078:
3075:
3072:
3069:
3068:
3066:
3064:
3060:
3054:
3051:
3049:
3046:
3043:
3040:
3037:
3034:
3031:
3028:
3025:
3022:
3019:
3016:
3011:
3008:
3007:
3005:
3002:
3000:
2997:
2996:
2994:
2992:
2986:
2974:
2971:
2970:
2969:
2966:
2964:
2961:
2957:
2954:
2952:
2949:
2947:
2944:
2942:
2939:
2938:
2937:
2934:
2932:
2929:
2928:
2926:
2924:
2920:
2914:
2911:
2909:
2906:
2904:
2901:
2897:
2894:
2892:
2889:
2888:
2887:
2884:
2882:
2879:
2878:
2876:
2874:
2870:
2864:
2861:
2859:
2856:
2852:
2849:
2848:
2847:
2844:
2840:
2837:
2835:
2832:
2831:
2830:
2827:
2823:
2820:
2818:
2815:
2814:
2813:
2810:
2808:
2805:
2801:
2798:
2796:
2793:
2792:
2791:
2788:
2787:
2785:
2781:
2778:
2776:
2772:
2762:
2759:
2757:
2754:
2753:
2751:
2749:
2745:
2739:
2736:
2734:
2731:
2727:
2724:
2722:
2719:
2718:
2717:
2714:
2712:
2711:Scoreboarding
2709:
2708:
2706:
2704:
2700:
2694:
2693:False sharing
2691:
2689:
2686:
2684:
2681:
2679:
2676:
2675:
2673:
2671:
2667:
2661:
2658:
2656:
2653:
2651:
2648:
2647:
2645:
2643:
2639:
2636:
2634:
2630:
2620:
2617:
2615:
2612:
2610:
2607:
2604:
2600:
2597:
2595:
2592:
2590:
2587:
2585:
2582:
2581:
2579:
2577:
2574:
2572:
2569:
2567:
2564:
2562:
2559:
2557:
2554:
2552:
2549:
2547:
2544:
2542:
2539:
2537:
2534:
2532:
2529:
2527:
2524:
2522:
2519:
2515:
2512:
2510:
2507:
2505:
2502:
2501:
2499:
2497:
2494:
2492:
2489:
2487:
2486:Stanford MIPS
2484:
2482:
2479:
2477:
2474:
2472:
2469:
2467:
2464:
2462:
2459:
2458:
2456:
2450:
2442:
2439:
2438:
2437:
2434:
2432:
2429:
2427:
2424:
2422:
2419:
2417:
2414:
2412:
2409:
2407:
2404:
2400:
2397:
2396:
2395:
2392:
2388:
2385:
2384:
2383:
2380:
2378:
2375:
2373:
2370:
2368:
2365:
2363:
2360:
2359:
2357:
2353:
2350:
2348:
2347:architectures
2342:
2336:
2333:
2331:
2328:
2326:
2323:
2321:
2318:
2316:
2315:Heterogeneous
2313:
2309:
2306:
2304:
2301:
2300:
2299:
2296:
2294:
2291:
2287:
2284:
2282:
2279:
2277:
2274:
2272:
2269:
2268:
2267:
2266:Memory access
2264:
2262:
2259:
2257:
2254:
2252:
2249:
2247:
2244:
2240:
2237:
2236:
2235:
2232:
2230:
2227:
2225:
2222:
2221:
2219:
2217:
2213:
2205:
2202:
2200:
2199:Random-access
2197:
2195:
2192:
2190:
2187:
2186:
2185:
2182:
2180:
2179:Stack machine
2177:
2175:
2172:
2168:
2165:
2163:
2160:
2158:
2155:
2153:
2150:
2148:
2145:
2143:
2140:
2138:
2135:
2133:
2130:
2129:
2128:
2125:
2121:
2118:
2116:
2113:
2111:
2108:
2106:
2103:
2101:
2098:
2096:
2095:with datapath
2093:
2092:
2091:
2088:
2086:
2083:
2081:
2078:
2077:
2075:
2073:
2069:
2065:
2058:
2053:
2051:
2046:
2044:
2039:
2038:
2035:
2027:
2022:
2012:on 2018-11-11
2011:
2007:
2003:
1998:
1988:
1983:
1980:
1976:
1973:
1969:
1966:
1962:
1959:
1955:
1952:
1948:
1945:
1941:
1940:
1936:
1921:
1917:
1910:
1907:
1902:
1898:
1894:
1890:
1883:
1881:
1877:
1872:
1868:
1864:
1862:0-8186-2028-5
1858:
1854:
1850:
1846:
1842:
1835:
1832:
1827:
1821:
1818:
1813:
1807:
1804:
1793:
1789:
1783:
1780:
1769:
1765:
1759:
1756:
1751:
1750:
1745:
1738:
1735:
1730:
1726:
1722:
1715:
1712:
1707:
1701:
1698:
1687:on 2016-03-31
1683:
1679:
1675:
1668:
1667:
1659:
1657:
1653:
1648:
1647:
1642:
1635:
1632:
1627:
1623:
1619:
1612:
1605:
1603:
1599:
1594:
1590:
1586:
1582:
1575:
1568:
1565:
1554:on 2019-07-13
1550:
1546:
1539:
1532:
1529:
1519:
1513:
1510:
1505:
1498:
1492:
1489:
1484:
1477:
1476:
1469:
1466:
1461:
1454:
1451:
1446:
1439:
1436:
1428:
1421:
1414:
1411:
1406:
1405:
1400:
1394:
1391:
1386:
1382:
1378:
1371:
1368:(June 1997).
1367:
1360:
1357:
1352:
1348:
1344:
1337:
1330:
1327:
1322:
1318:
1314:
1310:
1306:
1299:
1296:
1290:
1285:
1281:
1277:
1270:
1267:
1252:
1248:
1244:
1243:
1235:
1229:
1226:
1218:
1211:
1209:
1207:
1203:
1198:
1191:
1188:
1183:
1182:
1177:
1171:
1168:
1153:
1146:
1144:
1142:
1140:
1138:
1136:
1132:
1127:
1125:0-07-057064-7
1121:
1117:
1113:
1108:
1107:
1098:
1095:
1090:
1086:
1082:
1080:1-4244-0186-0
1076:
1072:
1068:
1064:
1057:
1054:
1049:
1045:
1041:
1037:
1036:
1028:
1025:
1017:
1010:
1003:
1000:
989:on 2017-05-16
985:
978:
971:
968:
960:
953:
951:
947:
936:on 2019-07-17
935:
931:
924:
922:
920:
916:
909:
904:
901:
898:
895:
892:
889:
886:
883:
881:
878:
876:
873:
871:
868:
864:
861:
859:
856:
854:
851:
850:
846:
844:
842:
838:
834:
830:
825:
823:
818:
815:
810:
808:
803:
801:
797:
793:
789:
784:
782:
778:
774:
770:
767:
762:
760:
755:
752:
747:
742:
738:
735:
733:
729:
721:
719:
717:
714:
710:
706:
703:
698:
696:
691:
689:
683:
679:
677:
673:
669:
665:
661:
657:
653:
646:
644:
642:
638:
637:Intel Core i7
633:
629:
626:
618:
616:
614:
610:
606:
602:
594:
592:
589:
582:
578:
573:
572:indirect jump
565:
563:
560:
556:
548:
546:
544:
540:
534:
531:
529:
525:
518:
516:
509:
507:
505:
501:
497:
496:concatenating
489:
487:
485:
481:
477:
473:
469:
465:
461:
457:
452:
450:
449:exponentially
444:
441:
433:
431:
429:
424:
422:
418:
414:
411:
406:
399:
397:
395:
387:
385:
383:
379:
374:
372:
371:sub-sequences
367:
363:
360:
356:
351:
339:
332:
330:
323:
321:
319:
314:
312:
307:
305:
304:Intel Pentium
300:
293:
290:
287:
284:
283:
278:
274:
272:
271:state machine
268:
263:
261:
253:
248:
246:
243:
241:
236:
232:
230:
226:
222:
218:
214:
206:
204:
202:
193:
191:
185:
183:
180:
178:
174:
173:MPC7450 (G4e)
169:
167:
162:
158:
156:
150:
147:
143:
139:
134:
127:
122:
120:
118:
113:
109:
107:
102:
97:
95:
89:
86:
77:
73:
71:
68:
64:
60:
56:
52:
48:
44:
40:
33:
19:
3965:Chip carrier
3903:Clock gating
3822:Mixed-signal
3719:Write buffer
3696:Control unit
3569:
3508:Clock signal
3247:accelerators
3229:Cypress PSoC
2886:Simultaneous
2755:
2703:Out-of-order
2335:Neuromorphic
2216:Architecture
2174:Belt machine
2167:Zeno machine
2100:Hierarchical
2014:. Retrieved
2010:the original
2005:
1990:. Retrieved
1923:. Retrieved
1920:the Guardian
1919:
1909:
1892:
1844:
1834:
1820:
1806:
1795:. Retrieved
1791:
1782:
1771:. Retrieved
1758:
1747:
1737:
1720:
1714:
1700:
1689:. Retrieved
1682:the original
1665:
1644:
1634:
1617:
1584:
1580:
1567:
1556:. Retrieved
1549:the original
1544:
1531:
1512:
1491:
1474:
1468:
1462:. p. 5.
1453:
1438:
1427:the original
1413:
1404:ARM Holdings
1402:
1393:
1376:
1359:
1342:
1329:
1312:
1308:
1298:
1279:
1274:Yeh, T.-Y.;
1269:
1258:. Retrieved
1246:
1240:
1228:
1190:
1181:Ars Technica
1179:
1170:
1159:. Retrieved
1157:. p. 36
1105:
1097:
1062:
1056:
1039:
1034:
1027:
1016:the original
1002:
991:. Retrieved
984:the original
970:
938:. Retrieved
934:the original
837:Project Zero
826:
819:
811:
804:
785:
763:
756:
743:
739:
736:
725:
699:
692:
687:
684:
680:
650:
634:
630:
622:
612:
598:
590:
569:
552:
538:
535:
532:
526:
524:predictors.
522:
513:
493:
486:processors.
460:concatenates
453:
445:
437:
425:
407:
403:
391:
375:
368:
364:
352:
344:
327:
315:
308:
301:
297:
291:Weakly taken
264:
257:
244:
237:
233:
210:
197:
189:
181:
170:
163:
159:
151:
135:
131:
114:
110:
106:clock cycles
100:
98:
90:
82:
42:
36:
3750:Multiplexer
3714:Data buffer
3425:Single-core
3397:bit slicing
3255:Coprocessor
3110:Coprocessor
2991:performance
2913:Cooperative
2903:Speculative
2863:Distributed
2822:Superscalar
2807:Instruction
2775:Parallelism
2748:Speculative
2580:System/3x0
2452:Instruction
2229:Von Neumann
2142:Post–Turing
1276:Patt, Y. N.
1114:. pp.
822:Alpha 21464
807:Alpha 21264
794:, the MIPS
792:Alpha 21064
440:correlation
421:Pentium III
413:Pentium MMX
225:Alpha 21464
221:Alpha 21264
63:performance
3980:Categories
3870:management
3765:Multiplier
3626:Logic gate
3616:Sequential
3523:Functional
3503:Clock rate
3476:Data cache
3449:Components
3430:Multi-core
3418:Core count
2908:Preemptive
2812:Pipelining
2795:Bit-serial
2738:Wide-issue
2683:Structural
2605:Tilera ISA
2571:MicroBlaze
2541:ETRAX CRIS
2436:Comparison
2281:Load–store
2261:Endianness
2016:2018-11-10
1992:2009-10-01
1925:2018-05-18
1797:2016-12-14
1773:2016-12-14
1691:2018-04-08
1558:2010-12-02
1379:. Denver.
1366:Patt, Y.N.
1260:2016-02-02
1161:2017-03-22
1110:. Boston:
1042:(report).
993:2017-04-02
940:2017-03-22
910:References
798:, and the
672:perceptron
662:, called "
609:call stack
579:and later
480:Silvermont
417:Pentium II
240:delay slot
53:(e.g., an
3804:Circuitry
3724:Microcode
3648:Registers
3491:coherence
3466:CPU cache
3324:Word size
2989:Processor
2633:Execution
2536:DEC Alpha
2514:Power ISA
2330:Cognitive
2137:Universal
1646:AnandTech
1593:1453-8245
800:IBM POWER
734:in 1985.
686:2003 the
625:trade-off
468:Pentium M
378:Yale Patt
260:flip-flop
201:compilers
177:Pentium 4
166:Pentium 4
149:address.
67:pipelined
3742:Datapath
3435:Manycore
3407:variable
3245:Hardware
2881:Temporal
2561:OpenRISC
2256:Cellular
2246:Dataflow
2239:modified
1975:Archived
1961:Archived
1947:Archived
1871:24999559
1749:PCGamesN
1251:Archived
847:See also
805:The DEC
759:VAX 9000
757:The DEC
732:IBM 3090
711:and the
639:has two
601:function
504:VIA Nano
265:A 2-bit
3918:Related
3849:Quantum
3839:Digital
3834:Boolean
3732:Counter
3631:Quantum
3392:512-bit
3387:256-bit
3382:128-bit
3225:(MPSoC)
3210:on chip
3208:Systems
3026:(FLOPS)
2839:Process
2688:Control
2670:Hazards
2556:Itanium
2551:Unicore
2509:PowerPC
2234:Harvard
2194:Pointer
2189:Counter
2147:Quantum
1048:3712157
905:(STIBP)
833:Spectre
831:called
788:Pentium
722:History
713:Samsung
482:-based
426:On the
380:at the
318:address
309:On the
3854:Switch
3844:Analog
3582:(IMC)
3553:(MMU)
3402:others
3377:64-bit
3372:48-bit
3367:32-bit
3362:24-bit
3357:16-bit
3352:15-bit
3347:12-bit
3184:Mobile
3100:Stream
3095:Barrel
3090:Vector
3079:(GPU)
3038:(SUPS)
3006:(IPC)
2858:Memory
2851:Vector
2834:Thread
2817:Scalar
2619:Others
2566:RISC-V
2531:SuperH
2500:Power
2496:MIPS-X
2471:PDP-11
2320:Fabric
2072:Models
1869:
1859:
1591:
1524:
1122:
1087:
1077:
1046:
899:(IBRS)
893:(IBPB)
814:AMD K8
790:, DEC
716:Exynos
664:neural
478:, and
476:Core 2
419:, and
359:binary
345:If an
223:, and
215:(MIPS
51:branch
3910:(PPW)
3868:Power
3760:Adder
3636:Array
3603:Logic
3564:(TLB)
3547:(FPU)
3541:(AGU)
3535:(ALU)
3525:units
3461:Cache
3342:8-bit
3337:4-bit
3332:1-bit
3296:(TPU)
3290:(DSP)
3284:(PPU)
3278:(VPU)
3267:(GPU)
3236:(NoC)
3219:(SoC)
3154:(PoP)
3148:(SiP)
3142:(MCM)
3083:GPGPU
3073:(CPU)
3063:Types
3044:(PPW)
3032:(TPS)
3020:(IPS)
3012:(CPI)
2783:Level
2594:S/390
2589:S/370
2584:S/360
2526:SPARC
2504:POWER
2387:TRIPS
2355:Types
1867:S2CID
1685:(PDF)
1670:(PDF)
1614:(PDF)
1577:(PDF)
1552:(PDF)
1541:(PDF)
1500:(PDF)
1479:(PDF)
1430:(PDF)
1423:(PDF)
1373:(PDF)
1339:(PDF)
1254:(PDF)
1237:(PDF)
1220:(PDF)
1155:(PDF)
1089:72217
1085:S2CID
1044:S2CID
1019:(PDF)
1012:(PDF)
987:(PDF)
980:(PDF)
962:(PDF)
887:(IBC)
865:– on
796:R8000
781:R4000
777:SPARC
773:R3000
769:R2000
705:Ryzen
695:IA-64
577:zEC12
539:gskew
410:Intel
269:is a
217:R8000
211:Some
138:SPARC
45:is a
3888:ACPI
3621:Glue
3513:FIFO
3456:Core
3194:ASIP
3135:CPLD
3130:FPOA
3125:FPGA
3120:ASIC
2973:SPMD
2968:MIMD
2963:MISD
2956:SWAR
2936:SIMD
2931:SISD
2846:Data
2829:Task
2800:Word
2546:M32R
2491:MIPS
2454:sets
2421:ZISC
2416:NISC
2411:OISC
2406:MISC
2399:EPIC
2394:VLIW
2382:EDGE
2372:RISC
2367:CISC
2276:HUMA
2271:NUMA
1857:ISBN
1589:ISSN
1120:ISBN
1075:ISBN
841:CPUs
820:The
812:The
771:and
766:MIPS
744:The
726:The
700:The
658:and
635:The
623:The
575:The
559:loop
484:Atom
472:Core
456:xors
428:SPEC
408:The
311:SPEC
146:RISC
142:MIPS
140:and
41:, a
3883:APM
3878:PMU
3770:CPU
3727:ROM
3498:Bus
3115:PAL
2790:Bit
2576:LMC
2481:ARM
2476:x86
2466:VAX
1897:doi
1849:doi
1768:AMD
1725:doi
1674:doi
1622:doi
1504:IBM
1483:IBM
1381:doi
1347:doi
1317:doi
1284:doi
1116:455
1067:doi
1040:HAL
867:RSA
702:AMD
656:LVQ
570:An
464:AMD
37:In
3982::
3817:3D
2004:.
1918:.
1895:.
1891:.
1879:^
1865:.
1855:.
1843:.
1790:.
1746:.
1655:^
1643:.
1616:.
1601:^
1583:.
1579:.
1543:.
1502:.
1401:.
1375:.
1341:.
1313:49
1311:.
1307:.
1245:.
1239:.
1205:^
1178:.
1134:^
1118:.
1083:.
1073:.
1038:.
949:^
918:^
678:.
599:A
553:A
474:,
470:,
415:,
347:if
219:,
2056:e
2049:t
2042:v
2028:.
2019:.
1995:.
1928:.
1903:.
1899::
1873:.
1851::
1828:.
1814:.
1800:.
1776:.
1752:.
1731:.
1727::
1708:.
1694:.
1676::
1649:.
1628:.
1624::
1595:.
1585:3
1561:.
1407:.
1387:.
1383::
1353:.
1349::
1323:.
1319::
1292:.
1286::
1263:.
1247:9
1199:.
1164:.
1128:.
1091:.
1069::
1050:.
996:.
964:.
943:.
34:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.