Knowledge (XXG)

Branch predictor

Source đź“ť

730:, designed in the late 1950s, pre-executes all unconditional branches and any conditional branches that depended on the index registers. For other conditional branches, the first two production models implemented predict untaken; subsequent models were changed to implement predictions based on the current values of the indicator bits (corresponding to today's condition codes). The Stretch designers had considered static hint bits in the branch instructions early in the project but decided against them. Misprediction recovery was provided by the lookahead unit on Stretch, and part of Stretch's reputation for less-than-stellar performance was blamed on the time required for misprediction recovery. Subsequent IBM large computer designs did not use branch prediction with speculative execution until the 748:, a microprogrammed COBOL machine released around 1982, was pipelined and used branch prediction. The B4900 branch prediction history state is stored back into the in-memory instructions during program execution. The B4900 implements 4-state branch prediction by using 4 semantically equivalent branch opcodes to represent each branch operator type. The opcode used indicated the history of that particular branch instruction. If the hardware determines that the branch prediction state of a particular branch needs to be updated, it rewrites the opcode with the semantically equivalent opcode that hinted the proper history. This scheme obtains a 93% hit rate. 277: 338: 843:, the vulnerability involves priming the branch predictors so another process (or the kernel) will mispredict a branch and use secret data as an array index, evicting one of the attacker's cache lines. The attacker can time access to their own array to find out which one, turning this CPU internal (microarchitectural) state into a value the attacker can save which has information about values they could not read directly. 76: 119:. Branch prediction attempts to guess whether a conditional jump will be taken or not. Branch target prediction attempts to guess the target of a taken conditional or unconditional jump before it is computed by decoding and executing the instruction itself. Branch prediction and branch target prediction are often combined into the same circuitry. 537:
will suffer for those branches. Once you have multiple predictors, it is beneficial to arrange that each predictor will have different aliasing patterns, so that it is more likely that at least one predictor will have no aliasing. Combined predictors with different indexing functions for the different predictors are called
685:
The main disadvantage of the perceptron predictor is its high latency. Even after taking advantage of high-speed arithmetic tricks, the computation latency is relatively high compared to the clock period of many modern microarchitectures. In order to reduce the prediction latency, Jimenez proposed in
536:
Predictors like gshare use multiple table entries to track the behavior of any particular branch. This multiplication of entries makes it much more likely that two branches will map to the same table entry (a situation called aliasing), which in turn makes it much more likely that prediction accuracy
523:
A hybrid predictor, also called combined predictor, implements more than one prediction mechanism. The final prediction is based either on a meta-predictor that remembers which of the predictors has made the best predictions in the past, or a majority vote function based on an odd number of different
349:
statement is executed three times, the decision made on the third execution might depend upon whether the previous two were taken or not. In such scenarios, a two-level adaptive predictor works more efficiently than a saturation counter. Conditional jumps that are taken every second time or have some
148:
architectures) used single-direction static branch prediction: they always predict that a conditional jump will not be taken, so they always fetch the next sequential instruction. Only when the branch or jump is evaluated and found to be taken, does the instruction pointer get set to a non-sequential
561:
is best predicted with a special loop predictor. A conditional jump in the bottom of a loop that repeats N times will be taken N-1 times and then not taken once. If the conditional jump is placed at the top of the loop, it will be not taken N-1 times and then taken once. A conditional jump that goes
816:
has a combined bimodal and global predictor, where the combining choice is another bimodal predictor. This processor caches the base and choice bimodal predictor counters in bits of the L2 cache otherwise used for ECC. As a result, it has effectively very large base and choice predictor tables, and
627:
between fast branch prediction and good branch prediction is sometimes dealt with by having two branch predictors. The first branch predictor is fast and simple. The second branch predictor, which is slower, more complicated, and with bigger tables, will override a possibly wrong prediction made by
111:
The first time a conditional jump instruction is encountered, there is not much information to base a prediction on. But the branch predictor keeps records of whether branches are taken or not taken. When it encounters a conditional jump that has been seen several times before, then it can base the
91:
Without branch prediction, the processor would have to wait until the conditional jump instruction has passed the execute stage before the next instruction can enter the fetch stage in the pipeline. The branch predictor attempts to avoid this waste of time by trying to guess whether the conditional
574:
instruction can choose among more than two branches. Some processors have specialized indirect branch predictors. Newer processors from Intel and AMD can predict indirect branches by using a two-level adaptive predictor. This kind of instruction contributes more than one bit to the history buffer.
234:
When a next-line predictor points to aligned groups of 2, 4, or 8 instructions, the branch target will usually not be the first instruction fetched, and so the initial instructions fetched are wasted. Assuming for simplicity, a uniform distribution of branch targets, 0.5, 1.5, and 3.5 instructions
740:
Microprogrammed processors, popular from the 1960s to the 1980s and beyond, took multiple cycles per instruction, and generally did not require branch prediction. However, in addition to the IBM 3090, there are several other examples of microprogrammed designs that incorporated branch prediction.
361:
values, 00, 01, 10, and 11, where zero means "not taken" and one means "taken". A pattern history table contains four entries per branch, one for each of the 2 = 4 possible branch histories, and each entry in the table contains a two-bit saturating counter of the same type as in figure 2 for
87:
instruction. A conditional jump can either be "taken" and jump to a different place in program memory, or it can be "not taken" and continue execution immediately after the conditional jump. It is not known for certain whether a conditional jump will be taken or not taken until the condition has
631:
The Alpha 21264 and Alpha EV8 microprocessors used a fast single-cycle next-line predictor to handle the branch target recurrence and provide a simple and fast branch prediction. Because the next-line predictor is so inaccurate, and the branch resolution recurrence takes so long, both cores have
690:, where the perceptron predictor chooses its weights according to the current branch's path, rather than according to the branch's PC. Many other researchers developed this concept (A. Seznec, M. Monchiero, D. Tarjan & K. Skadron, V. Desmet, Akkary et al., K. Aasaraai, Michael Black, etc.). 442:
between different conditional jumps is part of making the predictions. The disadvantage is that the history is diluted by irrelevant information if the different conditional jumps are uncorrelated, and that the history buffer may not include any bits from the same branch if there are many other
298:
When a branch is evaluated, the corresponding state machine is updated. Branches evaluated as not taken change the state toward strongly not taken, and branches evaluated as taken change the state toward strongly taken. The advantage of the two-bit counter scheme over a one-bit scheme is that a
404:
A local branch predictor has a separate history buffer for each conditional jump instruction. It may use a two-level adaptive predictor. The history buffer is separate for each conditional jump instruction, while the pattern history table may be separate as well or it may be shared between all
365:
Assume, for example, that a conditional jump is taken every third time. The branch sequence is 001001001... In this case, entry number 00 in the pattern history table will go to state "strongly taken", indicating that after two zeroes comes a one. Entry number 01 will go to state "strongly not
350:
other regularly recurring pattern are not predicted well by the saturating counter. A two-level adaptive predictor remembers the history of the last n occurrences of the branch and uses one saturating counter for each of the possible 2 history patterns. This method is illustrated in figure 3.
514:
An agree predictor is a two-level adaptive predictor with globally shared history buffer and pattern history table, and an additional local saturating counter. The outputs of the local and the global predictors are XORed with each other to give the final prediction. The purpose is to reduce
1522: 681:
The main advantage of the neural predictor is its ability to exploit long histories while requiring only linear resource growth. Classical predictors require exponential resource growth. Jimenez reports a global improvement of 5.7% over a McFarling-style hybrid predictor. He also used a
160:
A more advanced form of static prediction presumes that backward branches will be taken and that forward branches will not. A backward branch is one that has a target address that is lower than its own address. This technique can help with prediction accuracy of loops, which are usually
562:
many times one way and then the other way once is detected as having loop behavior. Such a conditional jump can be predicted easily with a simple counter. A loop predictor is part of a hybrid predictor where a meta-predictor detects whether the conditional jump has loop behavior.
824:(EV8, cancelled late in design) had a minimum branch misprediction penalty of 14 cycles. It was to use a complex but fast next-line predictor overridden by a combined bimodal and majority-voting predictor. The majority vote was between the bimodal and two gskew predictors. 152:
Both CPUs evaluate branches in the decode stage and have a single cycle instruction fetch. As a result, the branch target recurrence is two cycles long, and the machine always fetches the instruction immediately after any taken branch. Both architectures define
446:
This scheme is better than the saturating counter scheme only for large table sizes, and it is rarely as good as local prediction. The history buffer must be longer in order to make a good prediction. The size of the pattern history table grows
198:
Using a random or pseudorandom bit (a pure guess) would guarantee every branch a 50% correct prediction rate, which cannot be improved (or worsened) by reordering instructions. (With the simplest static prediction of "assume take",
132:
Static prediction is the simplest branch prediction technique because it does not rely on information about the dynamic history of code executing. Instead, it predicts the outcome of a branch based solely on the branch instruction.
362:
each branch. The branch history register is used for choosing which of the four saturating counters to use. If the history is 00, then the first counter is used; if the history is 11, then the last of the four counters is used.
587:
instruction that can preload the branch predictor entry for a given instruction with a branch target address constructed by adding the contents of a general-purpose register to an immediate displacement value.
328:
The Two-Level Branch Predictor, also referred to as Correlation-Based Branch Predictor, uses a two-dimensional table of counters, also called "Pattern History Table". The table entries are two-bit counters.
96:. If it is later detected that the guess was wrong, then the speculatively executed or partially executed instructions are discarded and the pipeline starts over with the correct branch, incurring a delay. 103:
is equal to the number of stages in the pipeline from the fetch stage to the execute stage. Modern microprocessors tend to have quite long pipelines so that the misprediction delay is between 10 and 20
1008: 438:
A global branch predictor does not keep a separate history record for each conditional jump. Instead it keeps a shared history of all conditional jumps. The advantage of a shared history is that any
779:
processors, do only trivial "not-taken" branch prediction. Because they use branch delay slots, fetched just one instruction per cycle, and execute in-order, there is no performance loss. The later
1967:" – describes the Alpha EV8 branch predictor. This paper does an excellent job discussing how they arrived at their design from various hardware constraints and simulation studies. 299:
conditional jump has to deviate twice from what it has done most in the past before the prediction changes. For example, a loop-closing conditional jump is mispredicted once rather than twice.
693:
Most of the state-of-the-art branch predictors are using a perceptron predictor (see Intel's "Championship Branch Prediction Competition"). Intel already implements this idea in one of the
171:
Static prediction is used as a fall-back technique in some processors with dynamic branch prediction when dynamic predictors do not have sufficient information to use. Both the Motorola
737:
Two-bit predictors were introduced by Tom McWilliams and Curt Widdoes in 1977 for the Lawrence Livermore National Lab S-1 supercomputer and independently by Jim Smith in 1979 at CDC.
817:
parity rather than ECC on instructions in the L2 cache. The parity design is sufficient, since any instruction suffering a parity error can be invalidated and refetched from memory.
3082: 1915: 2001: 1520:, Yeh, Tse-Yu & Sharangpani, H. P., "A method and apparatus for branch prediction using a second level branch prediction table", published 2000-03-16 366:
taken", indicating that after 01 comes a zero. The same is the case with entry number 10, while entry number 11 is never used because there are never two consecutive ones.
112:
prediction on the history. The branch predictor may, for example, recognize that the conditional jump is taken more often than not, or that it is taken every second time.
976: 2054: 670:). One year later he developed the perceptron branch predictor. The neural branch predictor research was developed much further by Daniel Jimenez. In 2001, the first 376:
The advantage of the two-level adaptive predictor is that it can quickly learn to predict an arbitrary repetitive pattern. This method was invented by T.-Y. Yeh and
3193: 2376: 2895: 303: 1250: 1015: 783:
uses the same trivial "not-taken" branch prediction, and loses two cycles to each taken branch because the branch resolution recurrence is four cycles long.
341:
Figure 3: Two-level adaptive branch predictor. Every entry in the pattern history table represents a 2-bit saturating counter of the type shown in figure 2.
3052: 2618: 2435: 809:(EV6) uses a next-line predictor overridden by a combined local predictor and global predictor, where the combining choice is made by a bimodal predictor. 427: 310: 384:. Since the initial publication in 1991, this method has become very popular. Variants of this prediction method are used in most modern microprocessors. 242:) will be discarded. Once again, assuming a uniform distribution of branch instruction placements, 0.5, 1.5, and 3.5 instructions fetched are discarded. 164:
Some processors allow branch prediction hints to be inserted into the code to tell whether the static prediction should be taken or not taken. The Intel
2025: 2398: 674:
predictor was presented that was feasible to implement in hardware. The first commercial implementation of a perceptron branch predictor was in AMD's
369:
The general rule for a two-level adaptive predictor with an n-bit history is that it can predict any repetitive sequence with any period if all n-bit
245:
The discarded instructions at the branch and destination lines add up to nearly a complete fetch cycle, even for a single-cycle next-line predictor.
3047: 3119: 1888: 902: 515:
contentions in the pattern history table where two branches with opposite prediction happen to share the same entry in the pattern history table.
262:) records the last outcome of the branch. This is the most simple version of dynamic branch predictor possible, although it is not very accurate. 2872: 896: 1664: 632:
two-cycle secondary branch predictors that can override the prediction of the next-line predictor at the cost of a single lost fetch cycle.
3816: 2940: 2203: 2047: 929: 3826: 2967: 31: 2094: 890: 862: 190:
Dynamic branch prediction uses information about taken or not taken branches gathered at run-time to predict the outcome of a branch.
1974: 423:
have local branch predictors with a local 4-bit history and a local pattern history table with 16 entries for each conditional jump.
3134: 2962: 2935: 2314: 1946: 1860: 1537: 1123: 1078: 238:
Since the branch itself will generally not be the last instruction in an aligned group, instructions after the taken branch (or its
2285: 2009: 454:
A two-level adaptive predictor with globally shared history buffer and pattern history table is called a "gshare" predictor if it
3949: 3512: 2405: 2371: 2366: 2250: 832: 667: 3985: 3924: 3821: 3222: 3129: 2930: 2173: 2151: 2040: 983: 3990: 2669: 2104: 1960: 675: 611:. Many microprocessors have a separate prediction mechanism for return instructions. This mechanism is based on a so-called 1419: 3124: 2972: 2806: 2420: 2381: 2238: 1845:
Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage
3561: 3406: 3401: 3323: 2799: 2760: 2415: 2410: 2344: 2156: 1369: 1111: 655: 2280: 3188: 2885: 2583: 1672:. The 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36). San Diego, USA. pp. 243–252. 451:
with the size of the history buffer. Hence, the big pattern history table must be shared among all conditional jumps.
276: 1787: 1719:
Brekelbaum, Edward; Rupley, Jeff; Wilkerson, Chris; Black, Bryan (December 2002). "Hierarchical scheduling windows".
1398: 591:
Processors without this mechanism will simply predict an indirect jump to go to the same target as it did last time.
313:'89 benchmarks, very large bimodal predictors saturate at 93.5% correct, once every branch maps to a unique counter. 1233: 3838: 3485: 2902: 2393: 2361: 2131: 2119: 2099: 576: 353:
Consider the example of n = 2. This means that the last two occurrences of the branch are stored in a two-bit
203:
can reorder instructions to get better than 50% correct prediction.) Also, it would make timing nondeterministic.
92:
jump is most likely to be taken or not taken. The branch that is guessed to be the most likely is then fetched and
1336:"A Taxonomy of Branch Mispredictions, and Alloyed Prediction as a Robust Solution to Wrong-History Mispredictions" 88:
been calculated and the conditional jump has passed the execution stage in the instruction pipeline (see fig. 1).
3929: 3892: 3882: 2270: 828: 799: 786:
Branch prediction became more important with the introduction of pipelined superscalar processors like the Intel
663: 84: 54: 50: 3944: 3351: 3287: 3264: 3114: 3076: 2912: 2862: 2857: 2334: 2228: 2136: 958: 57:) will go before this is known definitively. The purpose of the branch predictor is to improve the flow in the 2141: 1953:" – demonstrates prediction accuracy is not impaired by indexing with previous branch address. 1640: 3897: 3680: 3574: 3538: 3455: 3439: 3281: 3070: 3029: 3017: 2880: 2794: 2715: 2480: 2084: 884: 869: 852: 840: 708: 640: 615:, which is a local mirror of the call stack. The size of the return stack buffer is typically 4–16 entries. 320:
bits, so that the processor can fetch a prediction for every instruction before the instruction is decoded.
259: 228: 116: 1065:. 2006 IEEE International Symposium on Performance Analysis of Systems and Software. IEEE. pp. 48–58. 3703: 3675: 3585: 3550: 3299: 3293: 3275: 3009: 3003: 2907: 2811: 2702: 2641: 2503: 2146: 1517: 768: 659: 463: 381: 79:
Figure 1: Example of 4-stage pipeline. The colored boxes represent instructions independent of each other.
66: 1743: 227:(EV8)) fetch each line of instructions with a pointer to the next line. This next-line predictor handles 3877: 3786: 3532: 3244: 3062: 2821: 2789: 2747: 2659: 2460: 2275: 2265: 2255: 2245: 2215: 2198: 2063: 1241: 358: 266: 212: 93: 38: 1763: 3907: 3843: 3429: 3151: 3041: 2988: 2520: 2233: 2089: 2071: 1343:
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
527: 448: 62: 58: 3954: 3556: 3939: 3759: 3610: 3592: 3544: 3198: 3145: 2950: 2945: 2922: 2838: 2720: 2575: 2470: 2329: 182:
In static prediction, all decisions are made at compile time, before the execution of the program.
46: 1216: 3811: 3803: 3655: 3630: 3434: 3309: 2833: 2774: 2654: 2386: 2114: 1866: 1681: 1618:
Proceedings of the 7th International Symposium on High Performance Computer Architecture (HPCA-7)
1084: 1043: 866: 857: 108:. As a result, making a pipeline longer increases the need for a more advanced branch predictor. 1335: 1115: 3764: 3731: 3647: 3579: 3480: 3470: 3460: 3391: 3386: 3381: 3304: 3233: 3139: 3099: 2732: 2682: 2632: 2608: 2490: 2430: 2425: 2307: 2223: 1981:" – describes the EV6 and K8 branch predictors, and pipelining considerations. 1856: 1588: 1573: 1119: 1074: 933: 879: 765: 337: 154: 141: 1175: 3934: 3867: 3708: 3615: 3376: 3371: 3366: 3361: 3356: 3346: 3216: 3183: 3094: 3089: 2998: 2850: 2845: 2828: 2816: 2319: 2297: 2183: 2161: 2079: 1896: 1848: 1724: 1673: 1621: 1380: 1346: 1316: 1283: 1103: 1066: 874: 761:, announced in 1989, is both microprogrammed and pipelined, and performs branch prediction. 727: 651: 604: 554: 1196: 168:
accepts branch prediction hints, but this feature was abandoned in later Intel processors.
3848: 3833: 3781: 3685: 3660: 3497: 3490: 3341: 3336: 3331: 3270: 3178: 3168: 2890: 2725: 2677: 2440: 2324: 2292: 2193: 2188: 2109: 1978: 1971: 1964: 1950: 1811: 1496: 787: 745: 571: 499: 1943: 1548: 3959: 3793: 3776: 3769: 3665: 3522: 3259: 3173: 3104: 2687: 2649: 2598: 2593: 2588: 2302: 2126: 580: 393: 354: 317: 69: 1473: 1444: 1320: 1151: 3979: 3754: 3670: 2710: 2692: 2485: 2178: 1825: 1459: 1304: 1303:
Egan, Colin; Steven, Gordon; Quick, P.; Anguera, R.; Vintan, Lucian (December 2003).
1104: 636: 533:
On the SPEC'89 benchmarks, such a predictor is about as good as the local predictor.
495: 459: 270: 1870: 1370:"The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference" 3964: 3902: 3718: 3695: 3507: 3228: 2166: 1610: 1403: 1180: 1047: 836: 558: 494:
An alloyed branch predictor combines the local and global prediction principles by
475: 1957: 1399:"Cortex-A15 MPCore Technical Reference Manual, section 6.5.3 "Indirect predictor"" 1088: 17: 3749: 3713: 3424: 3396: 3254: 3109: 1426: 1070: 821: 806: 791: 439: 420: 412: 370: 224: 220: 105: 75: 1728: 1677: 3635: 3625: 3620: 3602: 3502: 3475: 2737: 2570: 2540: 2260: 1900: 1852: 750: 671: 608: 600: 483: 479: 471: 416: 306:
processor uses a saturating counter, though with an imperfect implementation.
239: 172: 1840: 1625: 1592: 1350: 3726: 3723: 3465: 2535: 2513: 1916:"Meltdown and Spectre: 'worst ever' CPU bugs affect virtually all computers" 1645: 1365: 1275: 624: 542: 467: 377: 176: 165: 1280:
Proceedings of the 24th annual international symposium on Microarchitecture
802:
series. These processors all rely on one-bit or simple bimodal predictors.
1788:"AMD's Zen CPU is now called Ryzen, and it might actually challenge Intel" 1384: 1288: 3741: 2613: 2560: 2032: 1986: 1748: 1033: 758: 731: 503: 455: 200: 1893:
Proceedings 29th Annual International Symposium on Computer Architecture
1377:
Proceedings of the 24th International Symposium on Computer Architecture
2550: 2508: 1545:
Proceedings International Journal Conference on Neural Networks (IJCNN)
712: 430:'89 benchmarks, very large local predictors saturate at 97.1% correct. 392:
A two-level branch predictor where the second level is replaced with a
3853: 2565: 2530: 2495: 1839:
Murray, J.E.; Salett, R.M.; Hetherington, R.C.; McKeen, F.X. (1990).
813: 715: 161:
backward-pointing branches, and are taken more often than not taken.
1721:
Proceedings of the 35th International Symposium on Microarchitecture
498:
local and global branch histories, possibly with some bits from the
928:
Malishevsky, Alexey; Beck, Douglas; Schmid, Andreas; Landry, Eric.
3023: 2555: 2525: 795: 780: 776: 772: 704: 694: 409: 336: 275: 216: 137: 74: 1889:"Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor" 1744:"AMD Ryzen reviews, news, performance, pricing, and availability" 1032:
Michaud, Pierre; Seznec, André; Uhlig, Richard (September 1996).
3887: 3035: 2955: 2545: 1705: 443:
branches in between. It may use a two-level adaptive predictor.
145: 2036: 1958:
Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor
1106:
Modern processor design: fundamentals of superscalar processors
2475: 2465: 1767: 1503: 1482: 1197:"CMSC 611: Advanced Computer Architecture, Chapter 4 (Part V)" 718:
processor include a perceptron-based neural branch predictor.
701: 1764:"AMD Takes Computing to a New Horizon with Ryzen™ Processors" 1282:. Albuquerque, New Mexico, Puerto Rico: ACM. pp. 51–61. 1222:. Digital Western Research Lab (WRL) Technical Report, TN-36. 1009:"18-447 Computer Architecture Lecture 11: Branch Prediction" 1812:"IBM Stretch (7030) -- Aggressive Uniprocessor Parallelism" 959:"The Schemes and Performances of Dynamic Branch predictors" 607:
is an indirect jump that reads its target address from the
61:. Branch predictors play a critical role in achieving high 952: 950: 1334:
Skadron, K.; Martonosi, M.; Clark, D. W. (October 2000).
1278:(1991). "Two-Level Adaptive Training Branch Prediction". 1176:"The Pentium 4 and the G4e: an Architectural Comparison" 2002:"Branch and Loop Reorganization to Prevent Mispredicts" 1145: 1143: 1141: 1139: 1137: 1135: 530:
proposed combined branch prediction in his 1993 paper.
357:. This branch history register can have four different 1604: 1602: 1581:
Romanian Journal of Information Science and Technology
839:
and other researchers. Affecting virtually all modern
458:
the global history and branch PC, and "gselect" if it
2026:"What is Branch Prediction? – Stack Overflow Example" 603:
will normally return to where it is called from. The
1538:"Towards a High Performance Neural Branch Predictor" 316:
The predictor table is indexed with the instruction
3917: 3866: 3802: 3740: 3694: 3646: 3601: 3521: 3448: 3417: 3322: 3243: 3207: 3161: 3061: 2987: 2921: 2871: 2782: 2773: 2746: 2701: 2668: 2640: 2631: 2451: 2354: 2343: 2214: 2070: 1658: 1656: 1305:"Two-Level Branch Prediction using Neural Networks" 1234:"New Algorithm Improves Branch Prediction: 3/27/95" 1152:"The microarchitecture of Intel, AMD, and VIA CPUs" 666:
branch prediction", was proposed by Lucian Vintan (
280:
Figure 2: State diagram of 2-bit saturating counter
1987:"The microarchitecture of Intel, AMD and VIA CPUs" 1887:Seznec, A.; Felix, S.; Krishnan, V.; Sazeides, Y. 923: 921: 919: 1641:"The AMD Trinity Review (A10-4600M): A New Hope" 1460:"Performance Analysis for Core 2 and K8: Part 1" 977:"Branch Prediction Techniques and Optimizations" 682:gshare/perceptron overriding hybrid predictors. 157:in order to utilize these fetched instructions. 83:Two-way branching is usually implemented with a 1210: 1208: 1206: 1063:Characterizing the branch misprediction penalty 1061:Eyerman, S.; Smith, J.E.; Eeckhout, L. (2006). 2048: 1574:"Towards a Powerful Dynamic Branch Predictor" 1485:. May 2022. pp. 7-42–7-45. SA22-7832-14. 8: 1611:"Dynamic Branch Prediction with Perceptrons" 643:and possibly two or more branch predictors. 3053:Computer performance by orders of magnitude 1620:. Monterrey, NL, Mexico. pp. 197–296. 1587:(3). Bucharest: Romanian Academy: 287–301. 3518: 3158: 2779: 2637: 2351: 2055: 2041: 2033: 1445:"A Look at Centrino's Core: The Pentium M" 1418:Driesen, Karel; Hölzle, Urs (1997-06-25). 764:The first commercial RISC processors, the 462:them. Global branch prediction is used in 258:A 1-bit saturating counter (essentially a 1882: 1880: 1364:Sprangle, E.; Chappell, R.S.; Alsup, M.; 1287: 1666:Fast Path-Based Neural Branch Prediction 754:and others were granted on this scheme. 231:as well as branch direction prediction. 1972:Reconsidering Complex Branch Predictors 915: 903:Single thread indirect branch predictor 545:used for data and instruction caching. 506:processor may be using this technique. 1944:Multiple-Block Ahead Branch Predictors 1497:"IBM zEnterprise BC12 Technical Guide" 1475:z/Architecture Principles of Operation 1420:"Limits of Indirect Branch Prediction" 1102:Shen, John P.; Lipasti, Mikko (2005). 897:Indirect branch restricted speculation 235:fetched are discarded, respectively. 115:Branch prediction is not the same as 99:The time that is wasted in case of a 7: 3024:Floating-point operations per second 1841:"Micro-architecture of the VAX 9000" 1663:Jimenez, Daniel A. (December 2003). 179:use this technique as a fall-back. 32:Predication (computer architecture) 1345:. Philadelphia. pp. 199–206. 891:Indirect branch prediction barrier 863:Branch prediction analysis attacks 25: 541:predictors, and are analogous to 502:as well. Tests indicate that the 3950:Semiconductor device fabrication 1706:"Championship Branch Prediction" 1609:Jimenez, D. A.; Lin, C. (2001). 668:Lucian Blaga University of Sibiu 49:that tries to guess which way a 3925:History of general-purpose CPUs 2152:Nondeterministic Turing machine 1309:Journal of Systems Architecture 1256:from the original on 2015-03-10 2105:Deterministic finite automaton 1215:McFarling, Scott (June 1993). 595:Prediction of function returns 583:processors from IBM support a 1: 2896:Simultaneous and heterogenous 1639:Walton, Jarred (2012-05-15). 1321:10.1016/S1383-7621(03)00095-X 1217:"Combining Branch Predictors" 144:(two of the first commercial 136:The early implementations of 3580:Integrated memory controller 3562:Translation lookaside buffer 2761:Memory dependence prediction 2204:Random-access stored program 2157:Probabilistic Turing machine 2000:Andrews, Jeff (2007-10-30). 1914:Gibbs, Samuel (2018-01-04). 1506:. February 2014. p. 78. 1458:Kanter, Aaron (2008-10-28). 1112:McGraw-Hill Higher Education 835:was made public by Google's 676:Piledriver microarchitecture 654:for branch prediction using 619:Overriding branch prediction 333:Two-level adaptive predictor 3036:Synaptic updates per second 1071:10.1109/ispass.2006.1620789 930:"Dynamic Branch Prediction" 249:One-level branch prediction 4007: 3440:Heterogeneous architecture 2362:Orthogonal instruction set 2132:Alternating Turing machine 2120:Quantum cellular automaton 1742:James, Dave (2017-12-06). 1729:10.1109/MICRO.2002.1176236 1678:10.1109/MICRO.2003.1253199 1572:Vintan, Lucian N. (2000). 1536:Vintan, Lucian N. (1999). 1443:Stokes, Jon (2004-02-25). 1007:Mutlu, Onur (2013-02-11). 688:fast-path neural predictor 388:Two-level neural predictor 29: 3930:Microprocessor chronology 3893:Dynamic frequency scaling 3048:Cache performance metrics 1901:10.1109/ISCA.2002.1003587 1853:10.1109/CMPCON.1990.63652 1150:Fog, Agner (2016-12-01). 585:BRANCH PREDICTION PRELOAD 566:Indirect branch predictor 543:skewed associative caches 490:Alloyed branch prediction 466:processors, and in Intel 186:Dynamic branch prediction 3945:Hardware security module 3288:Digital signal processor 3265:Graphics processing unit 3077:Graphics processing unit 1626:10.1109/HPCA.2001.903263 1351:10.1109/PACT.2000.888344 1035:Skewed branch predictors 647:Neural branch prediction 434:Global branch prediction 229:branch target prediction 194:Random branch prediction 128:Static branch prediction 117:branch target prediction 30:Not to be confused with 3898:Dynamic voltage scaling 3681:Memory address register 3575:Branch target predictor 3539:Address generation unit 3282:Physics processing unit 3071:Central processing unit 3030:Transactions per second 3018:Instructions per second 2941:Array processing (SIMT) 2085:Stored-program computer 1956:Seznec et al. (2002). " 1942:Seznec et al. (1996). " 1481:(Fourteenth ed.). 885:Indirect branch control 870:public-key cryptography 853:Branch target predictor 827:In 2018 a catastrophic 707:multi-core processor's 660:multi-layer perceptrons 400:Local branch prediction 3986:Instruction processing 3704:Hardwired control unit 3586:Memory management unit 3551:Memory management unit 3300:Secure cryptoprocessor 3294:Tensor Processing Unit 3276:Vision processing unit 3010:Cycles per instruction 3004:Instructions per cycle 2951:Associative processing 2642:Instruction pipelining 2064:Processor technologies 2006:Intel Software Network 829:security vulnerability 697:'s simulators (2003). 382:University of Michigan 342: 302:The original, non-MMX 281: 213:superscalar processors 94:speculatively executed 80: 55:if–then–else structure 3991:Speculative execution 3787:Sum-addressed decoder 3533:Arithmetic logic unit 2660:Classic RISC pipeline 2614:Epiphany architecture 2461:Motorola 68000 series 1385:10.1145/264107.264210 1289:10.1145/123465.123475 1249:(4). March 27, 1995. 1242:Microprocessor Report 641:branch target buffers 628:the first predictor. 340: 279: 78: 39:computer architecture 3908:Performance per watt 3486:replacement policies 3152:Package on a package 3042:Performance per watt 2946:Pipelined processing 2716:Tomasulo's algorithm 2521:Clipper architecture 2377:Application-specific 2090:Finite-state machine 1723:. Istanbul, Turkey. 207:Next line prediction 101:branch misprediction 59:instruction pipeline 3940:Digital electronics 3593:Instruction decoder 3545:Floating-point unit 3199:Soft microprocessor 3146:System in a package 2721:Reservation station 2251:Transport-triggered 1985:Fog, Agner (2009). 1826:"S-1 Supercomputer" 957:Cheng, Chih-Cheng. 751:US patent 4,435,756 613:return stack buffer 405:conditional jumps. 396:has been proposed. 324:Two-level predictor 3812:Integrated circuit 3656:Processor register 3310:Baseband processor 2655:Operand forwarding 2115:Cellular automaton 1977:2007-12-27 at the 1963:2008-07-20 at the 1949:2008-07-20 at the 1847:. pp. 44–53. 1315:(12–15): 557–570. 1195:Plusquellic, Jim. 858:Branch predication 605:return instruction 343: 285:Strongly not taken 282: 273:with four states: 267:saturating counter 254:Saturating counter 155:branch delay slots 81: 3973: 3972: 3862: 3861: 3481:Instruction cache 3471:Scratchpad memory 3318: 3317: 3305:Network processor 3234:Network on a chip 3189:Ultra-low-voltage 3140:Multi-chip module 2983: 2982: 2769: 2768: 2756:Branch prediction 2733:Register renaming 2627: 2626: 2609:VISC architecture 2431:Quantum computing 2426:VISC architecture 2308:Secondary storage 2224:Microarchitecture 2184:Register machines 1970:Jimenez (2003). " 1766:(Press release). 880:Cache prefetching 18:Branch predictors 16:(Redirected from 3998: 3935:Processor design 3827:Power management 3709:Instruction unit 3570:Branch predictor 3519: 3217:System on a chip 3159: 2999:Transistor count 2923:Flynn's taxonomy 2780: 2638: 2441:Addressing modes 2352: 2298:Memory hierarchy 2162:Hypercomputation 2080:Abstract machine 2057: 2050: 2043: 2034: 2029: 2024:Yee, Alexander. 2020: 2018: 2017: 2008:. Archived from 1996: 1994: 1993: 1930: 1929: 1927: 1926: 1911: 1905: 1904: 1884: 1875: 1874: 1836: 1830: 1829: 1822: 1816: 1815: 1808: 1802: 1801: 1799: 1798: 1784: 1778: 1777: 1775: 1774: 1760: 1754: 1753: 1739: 1733: 1732: 1716: 1710: 1709: 1702: 1696: 1695: 1693: 1692: 1686: 1680:. Archived from 1671: 1660: 1651: 1650: 1636: 1630: 1629: 1615: 1606: 1597: 1596: 1578: 1569: 1563: 1562: 1560: 1559: 1553: 1547:. Archived from 1542: 1533: 1527: 1526: 1525: 1521: 1514: 1508: 1507: 1501: 1493: 1487: 1486: 1480: 1470: 1464: 1463: 1455: 1449: 1448: 1440: 1434: 1433: 1431: 1425:. Archived from 1424: 1415: 1409: 1408: 1395: 1389: 1388: 1374: 1361: 1355: 1354: 1340: 1331: 1325: 1324: 1300: 1294: 1293: 1291: 1271: 1265: 1264: 1262: 1261: 1255: 1238: 1230: 1224: 1223: 1221: 1212: 1201: 1200: 1192: 1186: 1185: 1172: 1166: 1165: 1163: 1162: 1156: 1147: 1130: 1129: 1109: 1099: 1093: 1092: 1058: 1052: 1051: 1029: 1023: 1022: 1020: 1014:. Archived from 1013: 1004: 998: 997: 995: 994: 988: 982:. Archived from 981: 972: 966: 965: 963: 954: 945: 944: 942: 941: 932:. Archived from 925: 875:Instruction unit 775:and the earlier 753: 728:IBM 7030 Stretch 652:Machine learning 586: 557:that controls a 555:conditional jump 519:Hybrid predictor 348: 288:Weakly not taken 85:conditional jump 43:branch predictor 21: 4006: 4005: 4001: 4000: 3999: 3997: 3996: 3995: 3976: 3975: 3974: 3969: 3955:Tick–tock model 3913: 3869: 3858: 3798: 3782:Address decoder 3736: 3690: 3686:Program counter 3661:Status register 3642: 3597: 3557:Load–store unit 3524: 3517: 3444: 3413: 3314: 3271:Image processor 3246: 3239: 3209: 3203: 3179:Microcontroller 3169:Embedded system 3157: 3057: 2990: 2979: 2917: 2867: 2765: 2742: 2726:Re-order buffer 2697: 2678:Data dependency 2664: 2623: 2453: 2447: 2346: 2345:Instruction set 2339: 2325:Multiprocessing 2293:Cache hierarchy 2286:Register/memory 2210: 2110:Queue automaton 2066: 2061: 2023: 2015: 2013: 1999: 1991: 1989: 1984: 1979:Wayback Machine 1965:Wayback Machine 1951:Wayback Machine 1939: 1934: 1933: 1924: 1922: 1913: 1912: 1908: 1886: 1885: 1878: 1863: 1838: 1837: 1833: 1824: 1823: 1819: 1810: 1809: 1805: 1796: 1794: 1792:Ars Technica UK 1786: 1785: 1781: 1772: 1770: 1762: 1761: 1757: 1741: 1740: 1736: 1718: 1717: 1713: 1704: 1703: 1699: 1690: 1688: 1684: 1669: 1662: 1661: 1654: 1638: 1637: 1633: 1613: 1608: 1607: 1600: 1576: 1571: 1570: 1566: 1557: 1555: 1551: 1540: 1535: 1534: 1530: 1523: 1516: 1515: 1511: 1499: 1495: 1494: 1490: 1478: 1472: 1471: 1467: 1457: 1456: 1452: 1447:. pp. 2–3. 1442: 1441: 1437: 1429: 1422: 1417: 1416: 1412: 1397: 1396: 1392: 1372: 1363: 1362: 1358: 1338: 1333: 1332: 1328: 1302: 1301: 1297: 1273: 1272: 1268: 1259: 1257: 1253: 1236: 1232: 1231: 1227: 1219: 1214: 1213: 1204: 1194: 1193: 1189: 1174: 1173: 1169: 1160: 1158: 1154: 1149: 1148: 1133: 1126: 1101: 1100: 1096: 1081: 1060: 1059: 1055: 1031: 1030: 1026: 1018: 1011: 1006: 1005: 1001: 992: 990: 986: 979: 974: 973: 969: 961: 956: 955: 948: 939: 937: 927: 926: 917: 912: 849: 749: 746:Burroughs B4900 724: 709:Infinity Fabric 649: 621: 597: 584: 568: 551: 528:Scott McFarling 521: 512: 510:Agree predictor 500:program counter 492: 436: 402: 390: 373:are different. 346: 335: 326: 256: 251: 209: 196: 188: 130: 125: 72:architectures. 65:in many modern 47:digital circuit 35: 28: 27:Digital circuit 23: 22: 15: 12: 11: 5: 4004: 4002: 3994: 3993: 3988: 3978: 3977: 3971: 3970: 3968: 3967: 3962: 3960:Pin grid array 3957: 3952: 3947: 3942: 3937: 3932: 3927: 3921: 3919: 3915: 3914: 3912: 3911: 3905: 3900: 3895: 3890: 3885: 3880: 3874: 3872: 3864: 3863: 3860: 3859: 3857: 3856: 3851: 3846: 3841: 3836: 3831: 3830: 3829: 3824: 3819: 3808: 3806: 3800: 3799: 3797: 3796: 3794:Barrel shifter 3791: 3790: 3789: 3784: 3777:Binary decoder 3774: 3773: 3772: 3762: 3757: 3752: 3746: 3744: 3738: 3737: 3735: 3734: 3729: 3721: 3716: 3711: 3706: 3700: 3698: 3692: 3691: 3689: 3688: 3683: 3678: 3673: 3668: 3666:Stack register 3663: 3658: 3652: 3650: 3644: 3643: 3641: 3640: 3639: 3638: 3633: 3623: 3618: 3613: 3607: 3605: 3599: 3598: 3596: 3595: 3590: 3589: 3588: 3577: 3572: 3567: 3566: 3565: 3559: 3548: 3542: 3536: 3529: 3527: 3516: 3515: 3510: 3505: 3500: 3495: 3494: 3493: 3488: 3483: 3478: 3473: 3468: 3458: 3452: 3450: 3446: 3445: 3443: 3442: 3437: 3432: 3427: 3421: 3419: 3415: 3414: 3412: 3411: 3410: 3409: 3399: 3394: 3389: 3384: 3379: 3374: 3369: 3364: 3359: 3354: 3349: 3344: 3339: 3334: 3328: 3326: 3320: 3319: 3316: 3315: 3313: 3312: 3307: 3302: 3297: 3291: 3285: 3279: 3273: 3268: 3262: 3260:AI accelerator 3257: 3251: 3249: 3241: 3240: 3238: 3237: 3231: 3226: 3223:Multiprocessor 3220: 3213: 3211: 3205: 3204: 3202: 3201: 3196: 3191: 3186: 3181: 3176: 3174:Microprocessor 3171: 3165: 3163: 3162:By application 3156: 3155: 3149: 3143: 3137: 3132: 3127: 3122: 3117: 3112: 3107: 3105:Tile processor 3102: 3097: 3092: 3087: 3086: 3085: 3074: 3067: 3065: 3059: 3058: 3056: 3055: 3050: 3045: 3039: 3033: 3027: 3021: 3015: 3014: 3013: 3001: 2995: 2993: 2985: 2984: 2981: 2980: 2978: 2977: 2976: 2975: 2965: 2960: 2959: 2958: 2953: 2948: 2943: 2933: 2927: 2925: 2919: 2918: 2916: 2915: 2910: 2905: 2900: 2899: 2898: 2893: 2891:Hyperthreading 2883: 2877: 2875: 2873:Multithreading 2869: 2868: 2866: 2865: 2860: 2855: 2854: 2853: 2843: 2842: 2841: 2836: 2826: 2825: 2824: 2819: 2809: 2804: 2803: 2802: 2797: 2786: 2784: 2777: 2771: 2770: 2767: 2766: 2764: 2763: 2758: 2752: 2750: 2744: 2743: 2741: 2740: 2735: 2730: 2729: 2728: 2723: 2713: 2707: 2705: 2699: 2698: 2696: 2695: 2690: 2685: 2680: 2674: 2672: 2666: 2665: 2663: 2662: 2657: 2652: 2650:Pipeline stall 2646: 2644: 2635: 2629: 2628: 2625: 2624: 2622: 2621: 2616: 2611: 2606: 2603: 2602: 2601: 2599:z/Architecture 2596: 2591: 2586: 2578: 2573: 2568: 2563: 2558: 2553: 2548: 2543: 2538: 2533: 2528: 2523: 2518: 2517: 2516: 2511: 2506: 2498: 2493: 2488: 2483: 2478: 2473: 2468: 2463: 2457: 2455: 2449: 2448: 2446: 2445: 2444: 2443: 2433: 2428: 2423: 2418: 2413: 2408: 2403: 2402: 2401: 2391: 2390: 2389: 2379: 2374: 2369: 2364: 2358: 2356: 2349: 2341: 2340: 2338: 2337: 2332: 2327: 2322: 2317: 2312: 2311: 2310: 2305: 2303:Virtual memory 2295: 2290: 2289: 2288: 2283: 2278: 2273: 2263: 2258: 2253: 2248: 2243: 2242: 2241: 2231: 2226: 2220: 2218: 2212: 2211: 2209: 2208: 2207: 2206: 2201: 2196: 2191: 2181: 2176: 2171: 2170: 2169: 2164: 2159: 2154: 2149: 2144: 2139: 2134: 2127:Turing machine 2124: 2123: 2122: 2117: 2112: 2107: 2102: 2097: 2087: 2082: 2076: 2074: 2068: 2067: 2062: 2060: 2059: 2052: 2045: 2037: 2031: 2030: 2021: 1997: 1982: 1968: 1954: 1938: 1937:External links 1935: 1932: 1931: 1906: 1876: 1861: 1831: 1817: 1803: 1779: 1755: 1734: 1711: 1697: 1652: 1631: 1598: 1564: 1528: 1518:WO 2000/014628 1509: 1488: 1465: 1450: 1435: 1432:on 2016-05-06. 1410: 1390: 1356: 1326: 1295: 1266: 1225: 1202: 1187: 1184:. 12 May 2001. 1167: 1131: 1124: 1094: 1079: 1053: 1024: 1021:on 2015-03-25. 999: 975:Parihar, Raj. 967: 946: 914: 913: 911: 908: 907: 906: 900: 894: 888: 882: 877: 872: 860: 855: 848: 845: 723: 720: 648: 645: 620: 617: 596: 593: 581:z/Architecture 567: 564: 550: 549:Loop predictor 547: 520: 517: 511: 508: 491: 488: 435: 432: 401: 398: 394:neural network 389: 386: 355:shift register 334: 331: 325: 322: 296: 295: 294:Strongly taken 292: 289: 286: 255: 252: 250: 247: 208: 205: 195: 192: 187: 184: 175:and the Intel 129: 126: 124: 123:Implementation 121: 70:microprocessor 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 4003: 3992: 3989: 3987: 3984: 3983: 3981: 3966: 3963: 3961: 3958: 3956: 3953: 3951: 3948: 3946: 3943: 3941: 3938: 3936: 3933: 3931: 3928: 3926: 3923: 3922: 3920: 3916: 3909: 3906: 3904: 3901: 3899: 3896: 3894: 3891: 3889: 3886: 3884: 3881: 3879: 3876: 3875: 3873: 3871: 3865: 3855: 3852: 3850: 3847: 3845: 3842: 3840: 3837: 3835: 3832: 3828: 3825: 3823: 3820: 3818: 3815: 3814: 3813: 3810: 3809: 3807: 3805: 3801: 3795: 3792: 3788: 3785: 3783: 3780: 3779: 3778: 3775: 3771: 3768: 3767: 3766: 3763: 3761: 3758: 3756: 3755:Demultiplexer 3753: 3751: 3748: 3747: 3745: 3743: 3739: 3733: 3730: 3728: 3725: 3722: 3720: 3717: 3715: 3712: 3710: 3707: 3705: 3702: 3701: 3699: 3697: 3693: 3687: 3684: 3682: 3679: 3677: 3676:Memory buffer 3674: 3672: 3671:Register file 3669: 3667: 3664: 3662: 3659: 3657: 3654: 3653: 3651: 3649: 3645: 3637: 3634: 3632: 3629: 3628: 3627: 3624: 3622: 3619: 3617: 3614: 3612: 3611:Combinational 3609: 3608: 3606: 3604: 3600: 3594: 3591: 3587: 3584: 3583: 3581: 3578: 3576: 3573: 3571: 3568: 3563: 3560: 3558: 3555: 3554: 3552: 3549: 3546: 3543: 3540: 3537: 3534: 3531: 3530: 3528: 3526: 3520: 3514: 3511: 3509: 3506: 3504: 3501: 3499: 3496: 3492: 3489: 3487: 3484: 3482: 3479: 3477: 3474: 3472: 3469: 3467: 3464: 3463: 3462: 3459: 3457: 3454: 3453: 3451: 3447: 3441: 3438: 3436: 3433: 3431: 3428: 3426: 3423: 3422: 3420: 3416: 3408: 3405: 3404: 3403: 3400: 3398: 3395: 3393: 3390: 3388: 3385: 3383: 3380: 3378: 3375: 3373: 3370: 3368: 3365: 3363: 3360: 3358: 3355: 3353: 3350: 3348: 3345: 3343: 3340: 3338: 3335: 3333: 3330: 3329: 3327: 3325: 3321: 3311: 3308: 3306: 3303: 3301: 3298: 3295: 3292: 3289: 3286: 3283: 3280: 3277: 3274: 3272: 3269: 3266: 3263: 3261: 3258: 3256: 3253: 3252: 3250: 3248: 3242: 3235: 3232: 3230: 3227: 3224: 3221: 3218: 3215: 3214: 3212: 3206: 3200: 3197: 3195: 3192: 3190: 3187: 3185: 3182: 3180: 3177: 3175: 3172: 3170: 3167: 3166: 3164: 3160: 3153: 3150: 3147: 3144: 3141: 3138: 3136: 3133: 3131: 3128: 3126: 3123: 3121: 3118: 3116: 3113: 3111: 3108: 3106: 3103: 3101: 3098: 3096: 3093: 3091: 3088: 3084: 3081: 3080: 3078: 3075: 3072: 3069: 3068: 3066: 3064: 3060: 3054: 3051: 3049: 3046: 3043: 3040: 3037: 3034: 3031: 3028: 3025: 3022: 3019: 3016: 3011: 3008: 3007: 3005: 3002: 3000: 2997: 2996: 2994: 2992: 2986: 2974: 2971: 2970: 2969: 2966: 2964: 2961: 2957: 2954: 2952: 2949: 2947: 2944: 2942: 2939: 2938: 2937: 2934: 2932: 2929: 2928: 2926: 2924: 2920: 2914: 2911: 2909: 2906: 2904: 2901: 2897: 2894: 2892: 2889: 2888: 2887: 2884: 2882: 2879: 2878: 2876: 2874: 2870: 2864: 2861: 2859: 2856: 2852: 2849: 2848: 2847: 2844: 2840: 2837: 2835: 2832: 2831: 2830: 2827: 2823: 2820: 2818: 2815: 2814: 2813: 2810: 2808: 2805: 2801: 2798: 2796: 2793: 2792: 2791: 2788: 2787: 2785: 2781: 2778: 2776: 2772: 2762: 2759: 2757: 2754: 2753: 2751: 2749: 2745: 2739: 2736: 2734: 2731: 2727: 2724: 2722: 2719: 2718: 2717: 2714: 2712: 2711:Scoreboarding 2709: 2708: 2706: 2704: 2700: 2694: 2693:False sharing 2691: 2689: 2686: 2684: 2681: 2679: 2676: 2675: 2673: 2671: 2667: 2661: 2658: 2656: 2653: 2651: 2648: 2647: 2645: 2643: 2639: 2636: 2634: 2630: 2620: 2617: 2615: 2612: 2610: 2607: 2604: 2600: 2597: 2595: 2592: 2590: 2587: 2585: 2582: 2581: 2579: 2577: 2574: 2572: 2569: 2567: 2564: 2562: 2559: 2557: 2554: 2552: 2549: 2547: 2544: 2542: 2539: 2537: 2534: 2532: 2529: 2527: 2524: 2522: 2519: 2515: 2512: 2510: 2507: 2505: 2502: 2501: 2499: 2497: 2494: 2492: 2489: 2487: 2486:Stanford MIPS 2484: 2482: 2479: 2477: 2474: 2472: 2469: 2467: 2464: 2462: 2459: 2458: 2456: 2450: 2442: 2439: 2438: 2437: 2434: 2432: 2429: 2427: 2424: 2422: 2419: 2417: 2414: 2412: 2409: 2407: 2404: 2400: 2397: 2396: 2395: 2392: 2388: 2385: 2384: 2383: 2380: 2378: 2375: 2373: 2370: 2368: 2365: 2363: 2360: 2359: 2357: 2353: 2350: 2348: 2347:architectures 2342: 2336: 2333: 2331: 2328: 2326: 2323: 2321: 2318: 2316: 2315:Heterogeneous 2313: 2309: 2306: 2304: 2301: 2300: 2299: 2296: 2294: 2291: 2287: 2284: 2282: 2279: 2277: 2274: 2272: 2269: 2268: 2267: 2266:Memory access 2264: 2262: 2259: 2257: 2254: 2252: 2249: 2247: 2244: 2240: 2237: 2236: 2235: 2232: 2230: 2227: 2225: 2222: 2221: 2219: 2217: 2213: 2205: 2202: 2200: 2199:Random-access 2197: 2195: 2192: 2190: 2187: 2186: 2185: 2182: 2180: 2179:Stack machine 2177: 2175: 2172: 2168: 2165: 2163: 2160: 2158: 2155: 2153: 2150: 2148: 2145: 2143: 2140: 2138: 2135: 2133: 2130: 2129: 2128: 2125: 2121: 2118: 2116: 2113: 2111: 2108: 2106: 2103: 2101: 2098: 2096: 2095:with datapath 2093: 2092: 2091: 2088: 2086: 2083: 2081: 2078: 2077: 2075: 2073: 2069: 2065: 2058: 2053: 2051: 2046: 2044: 2039: 2038: 2035: 2027: 2022: 2012:on 2018-11-11 2011: 2007: 2003: 1998: 1988: 1983: 1980: 1976: 1973: 1969: 1966: 1962: 1959: 1955: 1952: 1948: 1945: 1941: 1940: 1936: 1921: 1917: 1910: 1907: 1902: 1898: 1894: 1890: 1883: 1881: 1877: 1872: 1868: 1864: 1862:0-8186-2028-5 1858: 1854: 1850: 1846: 1842: 1835: 1832: 1827: 1821: 1818: 1813: 1807: 1804: 1793: 1789: 1783: 1780: 1769: 1765: 1759: 1756: 1751: 1750: 1745: 1738: 1735: 1730: 1726: 1722: 1715: 1712: 1707: 1701: 1698: 1687:on 2016-03-31 1683: 1679: 1675: 1668: 1667: 1659: 1657: 1653: 1648: 1647: 1642: 1635: 1632: 1627: 1623: 1619: 1612: 1605: 1603: 1599: 1594: 1590: 1586: 1582: 1575: 1568: 1565: 1554:on 2019-07-13 1550: 1546: 1539: 1532: 1529: 1519: 1513: 1510: 1505: 1498: 1492: 1489: 1484: 1477: 1476: 1469: 1466: 1461: 1454: 1451: 1446: 1439: 1436: 1428: 1421: 1414: 1411: 1406: 1405: 1400: 1394: 1391: 1386: 1382: 1378: 1371: 1368:(June 1997). 1367: 1360: 1357: 1352: 1348: 1344: 1337: 1330: 1327: 1322: 1318: 1314: 1310: 1306: 1299: 1296: 1290: 1285: 1281: 1277: 1270: 1267: 1252: 1248: 1244: 1243: 1235: 1229: 1226: 1218: 1211: 1209: 1207: 1203: 1198: 1191: 1188: 1183: 1182: 1177: 1171: 1168: 1153: 1146: 1144: 1142: 1140: 1138: 1136: 1132: 1127: 1125:0-07-057064-7 1121: 1117: 1113: 1108: 1107: 1098: 1095: 1090: 1086: 1082: 1080:1-4244-0186-0 1076: 1072: 1068: 1064: 1057: 1054: 1049: 1045: 1041: 1037: 1036: 1028: 1025: 1017: 1010: 1003: 1000: 989:on 2017-05-16 985: 978: 971: 968: 960: 953: 951: 947: 936:on 2019-07-17 935: 931: 924: 922: 920: 916: 909: 904: 901: 898: 895: 892: 889: 886: 883: 881: 878: 876: 873: 871: 868: 864: 861: 859: 856: 854: 851: 850: 846: 844: 842: 838: 834: 830: 825: 823: 818: 815: 810: 808: 803: 801: 797: 793: 789: 784: 782: 778: 774: 770: 767: 762: 760: 755: 752: 747: 742: 738: 735: 733: 729: 721: 719: 717: 714: 710: 706: 703: 698: 696: 691: 689: 683: 679: 677: 673: 669: 665: 661: 657: 653: 646: 644: 642: 638: 637:Intel Core i7 633: 629: 626: 618: 616: 614: 610: 606: 602: 594: 592: 589: 582: 578: 573: 572:indirect jump 565: 563: 560: 556: 548: 546: 544: 540: 534: 531: 529: 525: 518: 516: 509: 507: 505: 501: 497: 496:concatenating 489: 487: 485: 481: 477: 473: 469: 465: 461: 457: 452: 450: 449:exponentially 444: 441: 433: 431: 429: 424: 422: 418: 414: 411: 406: 399: 397: 395: 387: 385: 383: 379: 374: 372: 371:sub-sequences 367: 363: 360: 356: 351: 339: 332: 330: 323: 321: 319: 314: 312: 307: 305: 304:Intel Pentium 300: 293: 290: 287: 284: 283: 278: 274: 272: 271:state machine 268: 263: 261: 253: 248: 246: 243: 241: 236: 232: 230: 226: 222: 218: 214: 206: 204: 202: 193: 191: 185: 183: 180: 178: 174: 173:MPC7450 (G4e) 169: 167: 162: 158: 156: 150: 147: 143: 139: 134: 127: 122: 120: 118: 113: 109: 107: 102: 97: 95: 89: 86: 77: 73: 71: 68: 64: 60: 56: 52: 48: 44: 40: 33: 19: 3965:Chip carrier 3903:Clock gating 3822:Mixed-signal 3719:Write buffer 3696:Control unit 3569: 3508:Clock signal 3247:accelerators 3229:Cypress PSoC 2886:Simultaneous 2755: 2703:Out-of-order 2335:Neuromorphic 2216:Architecture 2174:Belt machine 2167:Zeno machine 2100:Hierarchical 2014:. Retrieved 2010:the original 2005: 1990:. Retrieved 1923:. Retrieved 1920:the Guardian 1919: 1909: 1892: 1844: 1834: 1820: 1806: 1795:. Retrieved 1791: 1782: 1771:. Retrieved 1758: 1747: 1737: 1720: 1714: 1700: 1689:. Retrieved 1682:the original 1665: 1644: 1634: 1617: 1584: 1580: 1567: 1556:. Retrieved 1549:the original 1544: 1531: 1512: 1491: 1474: 1468: 1462:. p. 5. 1453: 1438: 1427:the original 1413: 1404:ARM Holdings 1402: 1393: 1376: 1359: 1342: 1329: 1312: 1308: 1298: 1279: 1274:Yeh, T.-Y.; 1269: 1258:. Retrieved 1246: 1240: 1228: 1190: 1181:Ars Technica 1179: 1170: 1159:. Retrieved 1157:. p. 36 1105: 1097: 1062: 1056: 1039: 1034: 1027: 1016:the original 1002: 991:. Retrieved 984:the original 970: 938:. Retrieved 934:the original 837:Project Zero 826: 819: 811: 804: 785: 763: 756: 743: 739: 736: 725: 699: 692: 687: 684: 680: 650: 634: 630: 622: 612: 598: 590: 569: 552: 538: 535: 532: 526: 524:predictors. 522: 513: 493: 486:processors. 460:concatenates 453: 445: 437: 425: 407: 403: 391: 375: 368: 364: 352: 344: 327: 315: 308: 301: 297: 291:Weakly taken 264: 257: 244: 237: 233: 210: 197: 189: 181: 170: 163: 159: 151: 135: 131: 114: 110: 106:clock cycles 100: 98: 90: 82: 42: 36: 3750:Multiplexer 3714:Data buffer 3425:Single-core 3397:bit slicing 3255:Coprocessor 3110:Coprocessor 2991:performance 2913:Cooperative 2903:Speculative 2863:Distributed 2822:Superscalar 2807:Instruction 2775:Parallelism 2748:Speculative 2580:System/3x0 2452:Instruction 2229:Von Neumann 2142:Post–Turing 1276:Patt, Y. N. 1114:. pp.  822:Alpha 21464 807:Alpha 21264 794:, the MIPS 792:Alpha 21064 440:correlation 421:Pentium III 413:Pentium MMX 225:Alpha 21464 221:Alpha 21264 63:performance 3980:Categories 3870:management 3765:Multiplier 3626:Logic gate 3616:Sequential 3523:Functional 3503:Clock rate 3476:Data cache 3449:Components 3430:Multi-core 3418:Core count 2908:Preemptive 2812:Pipelining 2795:Bit-serial 2738:Wide-issue 2683:Structural 2605:Tilera ISA 2571:MicroBlaze 2541:ETRAX CRIS 2436:Comparison 2281:Load–store 2261:Endianness 2016:2018-11-10 1992:2009-10-01 1925:2018-05-18 1797:2016-12-14 1773:2016-12-14 1691:2018-04-08 1558:2010-12-02 1379:. Denver. 1366:Patt, Y.N. 1260:2016-02-02 1161:2017-03-22 1110:. Boston: 1042:(report). 993:2017-04-02 940:2017-03-22 910:References 798:, and the 672:perceptron 662:, called " 609:call stack 579:and later 480:Silvermont 417:Pentium II 240:delay slot 53:(e.g., an 3804:Circuitry 3724:Microcode 3648:Registers 3491:coherence 3466:CPU cache 3324:Word size 2989:Processor 2633:Execution 2536:DEC Alpha 2514:Power ISA 2330:Cognitive 2137:Universal 1646:AnandTech 1593:1453-8245 800:IBM POWER 734:in 1985. 686:2003 the 625:trade-off 468:Pentium M 378:Yale Patt 260:flip-flop 201:compilers 177:Pentium 4 166:Pentium 4 149:address. 67:pipelined 3742:Datapath 3435:Manycore 3407:variable 3245:Hardware 2881:Temporal 2561:OpenRISC 2256:Cellular 2246:Dataflow 2239:modified 1975:Archived 1961:Archived 1947:Archived 1871:24999559 1749:PCGamesN 1251:Archived 847:See also 805:The DEC 759:VAX 9000 757:The DEC 732:IBM 3090 711:and the 639:has two 601:function 504:VIA Nano 265:A 2-bit 3918:Related 3849:Quantum 3839:Digital 3834:Boolean 3732:Counter 3631:Quantum 3392:512-bit 3387:256-bit 3382:128-bit 3225:(MPSoC) 3210:on chip 3208:Systems 3026:(FLOPS) 2839:Process 2688:Control 2670:Hazards 2556:Itanium 2551:Unicore 2509:PowerPC 2234:Harvard 2194:Pointer 2189:Counter 2147:Quantum 1048:3712157 905:(STIBP) 833:Spectre 831:called 788:Pentium 722:History 713:Samsung 482:-based 426:On the 380:at the 318:address 309:On the 3854:Switch 3844:Analog 3582:(IMC) 3553:(MMU) 3402:others 3377:64-bit 3372:48-bit 3367:32-bit 3362:24-bit 3357:16-bit 3352:15-bit 3347:12-bit 3184:Mobile 3100:Stream 3095:Barrel 3090:Vector 3079:(GPU) 3038:(SUPS) 3006:(IPC) 2858:Memory 2851:Vector 2834:Thread 2817:Scalar 2619:Others 2566:RISC-V 2531:SuperH 2500:Power 2496:MIPS-X 2471:PDP-11 2320:Fabric 2072:Models 1869:  1859:  1591:  1524:  1122:  1087:  1077:  1046:  899:(IBRS) 893:(IBPB) 814:AMD K8 790:, DEC 716:Exynos 664:neural 478:, and 476:Core 2 419:, and 359:binary 345:If an 223:, and 215:(MIPS 51:branch 3910:(PPW) 3868:Power 3760:Adder 3636:Array 3603:Logic 3564:(TLB) 3547:(FPU) 3541:(AGU) 3535:(ALU) 3525:units 3461:Cache 3342:8-bit 3337:4-bit 3332:1-bit 3296:(TPU) 3290:(DSP) 3284:(PPU) 3278:(VPU) 3267:(GPU) 3236:(NoC) 3219:(SoC) 3154:(PoP) 3148:(SiP) 3142:(MCM) 3083:GPGPU 3073:(CPU) 3063:Types 3044:(PPW) 3032:(TPS) 3020:(IPS) 3012:(CPI) 2783:Level 2594:S/390 2589:S/370 2584:S/360 2526:SPARC 2504:POWER 2387:TRIPS 2355:Types 1867:S2CID 1685:(PDF) 1670:(PDF) 1614:(PDF) 1577:(PDF) 1552:(PDF) 1541:(PDF) 1500:(PDF) 1479:(PDF) 1430:(PDF) 1423:(PDF) 1373:(PDF) 1339:(PDF) 1254:(PDF) 1237:(PDF) 1220:(PDF) 1155:(PDF) 1089:72217 1085:S2CID 1044:S2CID 1019:(PDF) 1012:(PDF) 987:(PDF) 980:(PDF) 962:(PDF) 887:(IBC) 865:– on 796:R8000 781:R4000 777:SPARC 773:R3000 769:R2000 705:Ryzen 695:IA-64 577:zEC12 539:gskew 410:Intel 269:is a 217:R8000 211:Some 138:SPARC 45:is a 3888:ACPI 3621:Glue 3513:FIFO 3456:Core 3194:ASIP 3135:CPLD 3130:FPOA 3125:FPGA 3120:ASIC 2973:SPMD 2968:MIMD 2963:MISD 2956:SWAR 2936:SIMD 2931:SISD 2846:Data 2829:Task 2800:Word 2546:M32R 2491:MIPS 2454:sets 2421:ZISC 2416:NISC 2411:OISC 2406:MISC 2399:EPIC 2394:VLIW 2382:EDGE 2372:RISC 2367:CISC 2276:HUMA 2271:NUMA 1857:ISBN 1589:ISSN 1120:ISBN 1075:ISBN 841:CPUs 820:The 812:The 771:and 766:MIPS 744:The 726:The 700:The 658:and 635:The 623:The 575:The 559:loop 484:Atom 472:Core 456:xors 428:SPEC 408:The 311:SPEC 146:RISC 142:MIPS 140:and 41:, a 3883:APM 3878:PMU 3770:CPU 3727:ROM 3498:Bus 3115:PAL 2790:Bit 2576:LMC 2481:ARM 2476:x86 2466:VAX 1897:doi 1849:doi 1768:AMD 1725:doi 1674:doi 1622:doi 1504:IBM 1483:IBM 1381:doi 1347:doi 1317:doi 1284:doi 1116:455 1067:doi 1040:HAL 867:RSA 702:AMD 656:LVQ 570:An 464:AMD 37:In 3982:: 3817:3D 2004:. 1918:. 1895:. 1891:. 1879:^ 1865:. 1855:. 1843:. 1790:. 1746:. 1655:^ 1643:. 1616:. 1601:^ 1583:. 1579:. 1543:. 1502:. 1401:. 1375:. 1341:. 1313:49 1311:. 1307:. 1245:. 1239:. 1205:^ 1178:. 1134:^ 1118:. 1083:. 1073:. 1038:. 949:^ 918:^ 678:. 599:A 553:A 474:, 470:, 415:, 347:if 219:, 2056:e 2049:t 2042:v 2028:. 2019:. 1995:. 1928:. 1903:. 1899:: 1873:. 1851:: 1828:. 1814:. 1800:. 1776:. 1752:. 1731:. 1727:: 1708:. 1694:. 1676:: 1649:. 1628:. 1624:: 1595:. 1585:3 1561:. 1407:. 1387:. 1383:: 1353:. 1349:: 1323:. 1319:: 1292:. 1286:: 1263:. 1247:9 1199:. 1164:. 1128:. 1091:. 1069:: 1050:. 996:. 964:. 943:. 34:. 20:)

Index

Branch predictors
Predication (computer architecture)
computer architecture
digital circuit
branch
if–then–else structure
instruction pipeline
performance
pipelined
microprocessor

conditional jump
speculatively executed
clock cycles
branch target prediction
SPARC
MIPS
RISC
branch delay slots
Pentium 4
MPC7450 (G4e)
Pentium 4
compilers
superscalar processors
R8000
Alpha 21264
Alpha 21464
branch target prediction
delay slot
flip-flop

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑