197:. The kernel used for GEMM is a routine called GEBP, for "General block-times-panel multiply", which was experimentally found to be "inherently superior" over several other kernels that were considered in the design.
270:
488:
140:
189:. It follows a similar decomposition into smaller "kernel" routines that other BLAS implementations use, but where earlier implementations streamed data from the
609:
224:
634:
404:
337:
302:
166:
to 2 TFLOPS. As of 2005, the library was available at no cost for noncommercial use. A later open source version was released under the terms of the
665:
629:
481:
655:
588:
474:
113:
619:
72:
546:
435:
396:
298:
204:
174:
125:
578:
660:
132:
40:
532:
583:
212:
139:
is an actively maintained fork of GotoBLAS, developed at the Lab of
Parallel Software and Computational Science,
497:
131:
GotoBLAS remains available, but development ceased with a final version touting optimal performance on Intel's
117:
346:
542:
443:
93:
537:
256:
105:
351:
516:
392:
332:
230:
151:
66:
374:
260:
200:
Several other BLAS routines are, as is customary in BLAS libraries, implemented in terms of GEMM.
364:
186:
282:
552:
413:
356:
79:
593:
456:
208:
511:
388:
328:
265:
121:
29:
261:"Writing the Fastest Code, by Hand, for Fun: A Human Computer Keeps Speeding Up Chips"
649:
557:
431:
294:
159:
167:
84:
147:
128:. As of 2003, it was used in seven of the world's ten fastest supercomputers.
24:
368:
573:
417:
360:
207:
website states that Goto BLAS in no more maintained and suggests the use of
190:
155:
466:
194:
136:
624:
614:
182:
163:
158:
processor and managed to immediately boost the performance of a
109:
470:
178:
335:(2008). "Anatomy of High-Performance Matrix Multiplication".
397:"High-performance implementation of the level-3 BLAS"
177:, called GEMM in BLAS terms, is highly tuned for the
602:
566:
525:
504:
78:
65:
39:
23:
116:with many hand-crafted optimizations for specific
323:
321:
319:
185:processor architectures by means of handcrafted
251:
249:
247:
482:
8:
383:
381:
154:in 2002. It was initially optimized for the
18:
225:Automatically Tuned Linear Algebra Software
489:
475:
467:
71:Linear algebra library; implementation of
17:
405:ACM Transactions on Mathematical Software
350:
338:ACM Transactions on Mathematical Software
146:GotoBLAS was written by Goto during his
620:Basic Linear Algebra Subprograms (BLAS)
243:
452:
441:
135:architecture (contemporary in 2008).
7:
175:matrix-matrix multiplication routine
112:(Basic Linear Algebra Subprograms)
14:
120:types. GotoBLAS was developed by
46:2-1.13 / 5 February 2010
436:Texas Advanced Computing Center
305:from the original on 2020-03-23
299:Texas Advanced Computing Center
273:from the original on 2020-03-23
205:Texas Advanced Computing Center
126:Texas Advanced Computing Center
666:Software using the BSD license
1:
269:. Seattle, Washington, USA.
162:based on that CPU from 1.5
682:
533:System of linear equations
584:Cache-oblivious algorithm
231:Intel Math Kernel Library
61:
35:
656:Numerical linear algebra
635:General purpose software
498:Numerical linear algebra
203:As of January 2022, the
418:10.1145/1377603.1377607
393:van de Geijn, Robert A.
361:10.1145/1356052.1356053
333:van de Geijn, Robert A.
108:implementations of the
451:Cite journal requires
48:; 14 years ago
630:Specialized libraries
543:Matrix multiplication
538:Matrix decompositions
432:"BLAS-LAPACK at TACC"
257:Markoff, John Gregory
193:, GotoBLAS uses the
94:scientific computing
517:Numerical stability
152:Japan Patent Office
20:
661:Numerical software
191:L1 processor cache
25:Original author(s)
643:
642:
345:(3): 12:1–12:25.
90:
89:
673:
553:Matrix splitting
491:
484:
477:
468:
461:
460:
454:
449:
447:
439:
428:
422:
421:
401:
385:
376:
372:
354:
325:
314:
313:
311:
310:
290:
284:
281:
279:
278:
253:
56:
54:
49:
21:
681:
680:
676:
675:
674:
672:
671:
670:
646:
645:
644:
639:
598:
594:Multiprocessing
562:
558:Sparse problems
521:
500:
495:
465:
464:
450:
440:
430:
429:
425:
399:
389:Goto, Kazushige
387:
386:
379:
352:10.1.1.111.3873
329:Goto, Kazushige
327:
326:
317:
308:
306:
293:Milfeld, Kent.
292:
291:
287:
276:
274:
255:
254:
245:
240:
221:
150:leave from the
57:
52:
50:
47:
12:
11:
5:
679:
677:
669:
668:
663:
658:
648:
647:
641:
640:
638:
637:
632:
627:
622:
617:
612:
606:
604:
600:
599:
597:
596:
591:
586:
581:
576:
570:
568:
564:
563:
561:
560:
555:
550:
540:
535:
529:
527:
523:
522:
520:
519:
514:
512:Floating point
508:
506:
502:
501:
496:
494:
493:
486:
479:
471:
463:
462:
453:|journal=
423:
377:
315:
285:
266:New York Times
259:(2005-11-28).
242:
241:
239:
236:
235:
234:
228:
220:
217:
122:Kazushige Goto
88:
87:
82:
76:
75:
69:
63:
62:
59:
58:
45:
43:
37:
36:
33:
32:
30:Kazushige Goto
27:
13:
10:
9:
6:
4:
3:
2:
678:
667:
664:
662:
659:
657:
654:
653:
651:
636:
633:
631:
628:
626:
623:
621:
618:
616:
613:
611:
608:
607:
605:
601:
595:
592:
590:
587:
585:
582:
580:
577:
575:
572:
571:
569:
565:
559:
556:
554:
551:
548:
544:
541:
539:
536:
534:
531:
530:
528:
524:
518:
515:
513:
510:
509:
507:
503:
499:
492:
487:
485:
480:
478:
473:
472:
469:
458:
445:
437:
433:
427:
424:
419:
415:
411:
407:
406:
398:
394:
390:
384:
382:
378:
375:
370:
366:
362:
358:
353:
348:
344:
340:
339:
334:
330:
324:
322:
320:
316:
304:
300:
296:
289:
286:
283:
272:
268:
267:
262:
258:
252:
250:
248:
244:
237:
232:
229:
226:
223:
222:
218:
216:
214:
210:
206:
201:
198:
196:
192:
188:
187:assembly code
184:
180:
176:
171:
169:
165:
161:
160:supercomputer
157:
153:
149:
144:
142:
138:
134:
129:
127:
123:
119:
115:
111:
107:
103:
99:
95:
86:
83:
81:
77:
74:
70:
68:
64:
60:
44:
42:
41:Final release
38:
34:
31:
28:
26:
22:
16:
505:Key concepts
444:cite journal
426:
409:
403:
342:
336:
307:. Retrieved
288:
275:. Retrieved
264:
202:
199:
172:
145:
130:
101:
97:
91:
15:
412:(1): 1–14.
373:(25 pages)
295:"GotoBLAS2"
173:GotoBLAS's
168:BSD license
106:open source
85:BSD License
650:Categories
547:algorithms
309:2013-08-28
277:2010-03-04
238:References
148:sabbatical
53:2010-02-05
574:CPU cache
369:0098-3500
347:CiteSeerX
156:Pentium 4
118:processor
102:GotoBLAS2
603:Software
567:Hardware
526:Problems
395:(2008).
303:Archived
271:Archived
219:See also
195:L2 cache
137:OpenBLAS
98:GotoBLAS
19:GotoBLAS
227:(ATLAS)
133:Nehalem
124:at the
80:License
51: (
625:LAPACK
615:MATLAB
367:
349:
164:TFLOPS
610:ATLAS
400:(PDF)
233:(MKL)
183:AMD64
141:ISCAS
589:SIMD
457:help
365:ISSN
209:BLIS
181:and
110:BLAS
104:are
100:and
73:BLAS
67:Type
579:TLB
414:doi
357:doi
213:MKL
211:or
179:x86
114:API
92:In
652::
448::
446:}}
442:{{
434:.
410:35
408:.
402:.
391:;
380:^
363:.
355:.
343:34
341:.
331:;
318:^
301:.
297:.
263:.
246:^
215:.
170:.
143:.
96:,
549:)
545:(
490:e
483:t
476:v
459:)
455:(
438:.
420:.
416::
371:.
359::
312:.
280:.
55:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.