1181:
training efficiency—requiring 2.2 times fewer training steps than its predecessor, Mamba, while maintaining competitive performance. MoE Mamba showcases improved efficiency and effectiveness by combining selective state space modeling with expert-based processing, offering a promising avenue for future research in scaling SSMs to handle tens of billions of parameters. The model's design involves alternating Mamba and MoE layers, allowing it to efficiently integrate the entire sequence context and apply the most relevant expert for each token.
2524:
2504:
1200:
classification, COCO object detection, and ADE20k semantic segmentation, Vim showcases enhanced performance and efficiency and is capable of handling high-resolution images with lower computational resources. This positions Vim as a scalable model for future advancements in visual representation
1180:
MoE Mamba represents a pioneering integration of the
Mixture of Experts (MoE) technique with the Mamba architecture, enhancing the efficiency and scalability of State Space Models (SSMs) in language modeling. This model leverages the strengths of both MoE and SSMs, achieving significant gains in
1139:
This research investigates a novel approach to language modeling, MambaByte, which departs from the standard token-based methods. Unlike traditional models that rely on breaking text into discrete units, MambaByte directly processes raw byte sequences. This eliminates the need for tokenization,
990:
Mamba introduces significant enhancements to S4, particularly in its treatment of time-variant operations. It adopts a unique selection mechanism that adapts structured state space model (SSM) parameters based on the input. This enables Mamba to selectively focus on relevant information within
1607:
1154:
where common subwords are overrepresented and rare or new words are underrepresented or split into less meaningful units. This can affect the model's understanding and generation capabilities, particularly for languages with rich morphology or tokens not well-represented in the training
1002:, by using kernel fusion, parallel scan, and recomputation. The implementation avoids materializing expanded states in memory-intensive layers, thereby improving performance and memory usage. The result is significantly more efficient in processing long sequences compared to
1530:
1013:
blocks, resulting in a homogeneous and streamlined structure, furthering the model's capability for general sequence modeling across data types that include language, audio, and genomics, while maintaining efficiency in both training and inference.
1195:
Vision Mamba (Vim) integrates SSMs with visual data processing, employing bidirectional Mamba blocks for visual sequence encoding. This method reduces the computational demands typically associated with self-attention in visual tasks. Tested on
1434:
1147:
Tokenization often relies on language-specific rules and vocabulary, limiting applicability across diverse languages. MambaByte's byte-level representation allows it to handle different languages without language-specific
1406:
Gu, Albert; Johnson, Isys; Goel, Karan; Saab, Khaled Kamal; Dao, Tri; Rudra, A.; R'e, Christopher (26 October 2021). "Combining
Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers".
1165:
Subword tokenisation introduces a number of quirks in LLMs, such as failure modes where LLMs can't spell words, reverse certain words, handle rare tokens, which are not present in byte-level tokenisation.
979:
To enable handling long data sequences, Mamba incorporates the
Structured State Space sequence model (S4). S4 can effectively and efficiently model long dependencies by combining continuous-time,
2398:
1026:
The core of Mamba, SSMs are recurrent models that selectively process information based on the current input. This allows them to focus on relevant information and discard irrelevant data.
866:
1531:"This AI Paper Proposes MoE-Mamba: Revolutionizing Machine Learning with Advanced State Space Models and Mixture of Experts MoEs Outperforming both Mamba and Transformer-MoE Individually"
904:
861:
1032:
Mamba replaces the complex attention and MLP blocks of
Transformers with a single, unified SSM block. This aims to reduce computational complexity and improve inference speed.
851:
1161:: It simplifies the preprocessing pipeline by eliminating the need for complex tokenization and vocabulary management, reducing the preprocessing steps and potential errors.
2240:
692:
1646:
899:
1435:"Researchers from CMU and Princeton Unveil Mamba: A Breakthrough SSM Architecture Exceeding Transformer Efficiency for Multimodal Deep Learning Applications"
856:
707:
438:
939:
742:
1038:
Mamba utilizes a recurrent mode with a parallel algorithm specifically designed for hardware efficiency, potentially further enhancing its performance.
987:
models. These enable it to handle irregularly sampled data, unbounded context, and remain computationally efficient during training and inferencing.
1003:
1756:
1319:
1132:
scaling laws, as a result, Transformers opt to use subword tokenization to reduce the number of tokens in text, however, this leads to very large
818:
367:
1639:
2558:
2429:
1248:
876:
639:
174:
2530:
2081:
1818:
894:
727:
702:
2342:
1969:
1776:
1632:
775:
770:
423:
2297:
433:
71:
1123:
828:
2484:
2424:
2022:
932:
592:
413:
2017:
1706:
803:
505:
281:
2459:
1813:
1766:
1761:
1219:
with 52 billion parameters, making it the largest Mamba-variant created so far. It has a context window of 256k tokens.
984:
760:
697:
607:
585:
428:
418:
1234:
Applications include language translation, content generation, long-form text analysis, audio, and speech processing.
1128:
Operating on byte-sized tokens, transformers scale poorly as every token must "attend" to every other token leading to
2510:
1806:
1732:
911:
823:
808:
269:
91:
798:
2563:
2134:
2069:
1670:
960:
871:
548:
443:
231:
164:
124:
2535:
2393:
2032:
1863:
1686:
925:
531:
299:
169:
2434:
1691:
1345:
1258:
999:
980:
553:
473:
396:
314:
144:
106:
101:
61:
56:
2479:
2464:
2117:
2112:
2012:
1880:
1661:
500:
349:
249:
76:
2439:
2199:
1918:
1913:
1210:
971:, especially in processing long sequences. It is based on the Structured State Space sequence (S4) model.
680:
656:
558:
319:
294:
254:
66:
2469:
2454:
2419:
2107:
2007:
1875:
1010:
634:
456:
408:
264:
179:
51:
2337:
2489:
2444:
1890:
1835:
1681:
1676:
1505:
PiĂłro, Maciej; Ciebiera, Kamil; KrĂłl, Krystian; Ludziejewski, Jan; Jaszczur, Sebastian (2024-01-08),
1228:
964:
563:
513:
1554:
Zhu, Lianghui; Liao, Bencheng; Zhang, Qian; Wang, Xinlong; Liu, Wenyu; Wang, Xinggang (2024-02-10),
1215:
Jamba is a novel architecture built on a hybrid transformer and mamba SSM architecture developed by
2064:
2042:
1791:
1786:
1744:
1696:
666:
602:
573:
478:
304:
237:
223:
209:
184:
134:
86:
46:
2503:
2449:
2027:
1856:
1559:
1510:
1464:
1412:
1379:
1298:
1297:
Gu, Albert; Dao, Tri (2023). "Mamba: Linear-Time
Sequence Modeling with Selective State Spaces".
1175:
644:
568:
354:
149:
2515:
2307:
1959:
1830:
1823:
1253:
1243:
968:
737:
580:
493:
289:
259:
204:
199:
154:
96:
2260:
2250:
2057:
1851:
1801:
1796:
1739:
1727:
765:
518:
468:
378:
362:
332:
194:
189:
139:
129:
27:
1556:
Vision Mamba: Efficient Visual
Representation Learning with Bidirectional State Space Model
2373:
2317:
2139:
1781:
1701:
1190:
793:
597:
463:
403:
1459:
Wang, Junxiong; Gangavarapu, Tushaar; Yan, Jing Nathan; Rush, Alexander M. (2024-01-24),
2347:
2312:
2302:
2127:
1885:
1711:
1133:
992:
991:
sequences, effectively filtering out less pertinent data. The model transitions from a
813:
344:
81:
2552:
2292:
2272:
2189:
1868:
1416:
956:
732:
661:
543:
274:
159:
2378:
2209:
1009:
Additionally, Mamba simplifies its architecture by integrating the SSM design with
2474:
2245:
2154:
2149:
1771:
1749:
959:
architecture focused on sequence modeling. It was developed by researchers from
538:
32:
2368:
2327:
2322:
2235:
2144:
2052:
1964:
1944:
1483:
1371:
687:
383:
309:
1578:
2363:
2332:
2230:
2074:
2037:
1974:
1928:
1923:
1908:
1216:
995:
to a time-varying framework, which impacts both computation and efficiency.
846:
627:
1624:
1320:"The tech powering ChatGPT won't make AI as smart as humans. Others might"
2265:
2097:
1507:
MoE-Mamba: Efficient
Selective State Space Models with Mixture of Experts
1197:
2388:
2225:
2179:
2102:
2002:
1997:
1949:
622:
2403:
2383:
2255:
2047:
373:
1231:
architecture, offering faster, more efficient, and scalable models.
1564:
1515:
1469:
1384:
1303:
2204:
2184:
2174:
2169:
2164:
2159:
2122:
1954:
1372:"Efficiently Modeling Long Sequences with Structured State Spaces"
617:
612:
339:
2194:
1579:"Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model"
1628:
1370:
Gu, Albert; Goel, Karan; Re, Christopher (6 October 2021).
905:
List of datasets in computer vision and image processing
998:
Mamba employs a hardware-aware algorithm that exploits
1227:
Mamba LLM represents a significant potential shift in
2412:
2356:
2285:
2218:
2090:
1990:
1983:
1937:
1901:
1844:
1720:
1660:
1608:"Edge 425: Inside Mamba, the Most Famous SSM Model"
1461:MambaByte: Token-free Selective State Space Model
1346:"Mamba is Here to Mark the End of Transformers"
900:List of datasets for machine-learning research
1640:
933:
8:
1987:
1647:
1633:
1625:
940:
926:
18:
1563:
1514:
1468:
1383:
1302:
1152:Removes the bias of subword tokenisation:
1140:potentially offering several advantages:
1041:
1292:
1290:
1288:
1286:
1284:
1282:
1280:
1278:
1276:
1274:
1270:
26:
1428:
1426:
1134:vocabulary tables and word embeddings
1118:Token-free language models: MambaByte
7:
2485:Generative adversarial network (GAN)
1249:Transformer (machine learning model)
1433:Tickoo, Aneesh (10 December 2023).
895:Glossary of artificial intelligence
14:
1344:Pandey, Mohit (6 December 2023).
2523:
2522:
2502:
1606:Rodriguez, Jesus (2024-08-27).
1124:Tokenization (lexical analysis)
967:to address some limitations of
2435:Recurrent neural network (RNN)
2425:Differentiable neural computer
1170:Mamba Mixture of Experts (MOE)
315:Relevance vector machine (RVM)
1:
2480:Variational autoencoder (VAE)
2440:Long short-term memory (LSTM)
1707:Computational learning theory
1485:Let's build the GPT Tokenizer
1024:Selective-State-Spaces (SSM):
804:Computational learning theory
368:Expectation–maximization (EM)
2559:Neural network architectures
2460:Convolutional neural network
1223:Impact and Future Directions
761:Coefficient of determination
608:Convolutional neural network
320:Support vector machine (SVM)
2455:Multilayer perceptron (MLP)
1159:Simplicity in Preprocessing
1043:Comparison to Transformers
1036:Hardware-Aware Parallelism:
912:Outline of machine learning
809:Empirical risk minimization
2580:
2531:Artificial neural networks
2445:Gated recurrent unit (GRU)
1671:Differentiable programming
1208:
1188:
1173:
1121:
961:Carnegie Mellon University
549:Feedforward neural network
300:Artificial neural networks
16:Deep learning architecture
2498:
1864:Artificial neural network
1687:Automatic differentiation
532:Artificial neural network
1692:Neuromorphic engineering
1655:Differentiable computing
1350:Analytics India Magazine
1259:Recurrent neural network
1030:Simplified Architecture:
841:Journals and conferences
788:Mathematical foundations
698:Temporal difference (TD)
554:Recurrent neural network
474:Conditional random field
397:Dimensionality reduction
145:Dimensionality reduction
107:Quantum machine learning
102:Neuromorphic engineering
62:Self-supervised learning
57:Semi-supervised learning
2465:Residual neural network
1881:Artificial Intelligence
250:Apprenticeship learning
1211:Jamba (language model)
1145:Language Independence:
799:Bias–variance tradeoff
681:Reinforcement learning
657:Spiking neural network
67:Reinforcement learning
2420:Neural Turing machine
2008:Human image synthesis
1529:Nikhil (2024-01-13).
1209:Further information:
1189:Further information:
1174:Further information:
1122:Further information:
635:Neural radiance field
457:Structured prediction
180:Structured prediction
52:Unsupervised learning
2511:Computer programming
2490:Graph neural network
2065:Text-to-video models
2043:Text-to-image models
1891:Large language model
1876:Scientific computing
1682:Statistical manifold
1677:Information geometry
1229:large language model
965:Princeton University
824:Statistical learning
722:Learning with humans
514:Local outlier factor
1857:In-context learning
1697:Pattern recognition
1044:
667:Electrochemical RAM
574:reservoir computing
305:Logistic regression
224:Supervised learning
210:Multimodal learning
185:Feature engineering
130:Generative modeling
92:Rule-based learning
87:Curriculum learning
47:Supervised learning
22:Part of a series on
2450:Echo state network
2338:JĂĽrgen Schmidhuber
2033:Facial recognition
2028:Speech recognition
1938:Software libraries
1488:, 20 February 2024
1318:Chowdhury, Hasan.
1176:Mixture of experts
1042:
969:transformer models
235: •
150:Density estimation
2564:Language modeling
2546:
2545:
2308:Stephen Grossberg
2281:
2280:
1254:State-space model
1244:Language modeling
1110:
1109:
950:
949:
755:Model diagnostics
738:Human-in-the-loop
581:Boltzmann machine
494:Anomaly detection
290:Linear regression
205:Ontology learning
200:Grammar induction
175:Semantic analysis
170:Association rules
155:Anomaly detection
97:Neuro-symbolic AI
2571:
2536:Machine learning
2526:
2525:
2506:
2261:Action selection
2251:Self-driving car
2058:Stable Diffusion
2023:Speech synthesis
1988:
1852:Machine learning
1728:Gradient descent
1649:
1642:
1635:
1626:
1621:
1619:
1618:
1593:
1592:
1590:
1589:
1575:
1569:
1568:
1567:
1551:
1545:
1544:
1542:
1541:
1526:
1520:
1519:
1518:
1502:
1496:
1495:
1494:
1493:
1480:
1474:
1473:
1472:
1456:
1450:
1449:
1447:
1445:
1430:
1421:
1420:
1403:
1397:
1396:
1394:
1392:
1387:
1367:
1361:
1360:
1358:
1356:
1341:
1335:
1334:
1332:
1330:
1324:Business Insider
1315:
1309:
1308:
1306:
1294:
1131:
1106:
1101:
1091:
1086:
1081:Inference speed
1062:Attention-based
1045:
942:
935:
928:
889:Related articles
766:Confusion matrix
519:Isolation forest
464:Graphical models
243:
242:
195:Learning to rank
190:Feature learning
28:Machine learning
19:
2579:
2578:
2574:
2573:
2572:
2570:
2569:
2568:
2549:
2548:
2547:
2542:
2494:
2408:
2374:Google DeepMind
2352:
2318:Geoffrey Hinton
2277:
2214:
2140:Project Debater
2086:
1984:Implementations
1979:
1933:
1897:
1840:
1782:Backpropagation
1716:
1702:Tensor calculus
1656:
1653:
1616:
1614:
1605:
1602:
1597:
1596:
1587:
1585:
1577:
1576:
1572:
1553:
1552:
1548:
1539:
1537:
1528:
1527:
1523:
1504:
1503:
1499:
1491:
1489:
1482:
1481:
1477:
1458:
1457:
1453:
1443:
1441:
1432:
1431:
1424:
1405:
1404:
1400:
1390:
1388:
1369:
1368:
1364:
1354:
1352:
1343:
1342:
1338:
1328:
1326:
1317:
1316:
1312:
1296:
1295:
1272:
1267:
1240:
1225:
1213:
1207:
1193:
1191:Computer vision
1187:
1178:
1172:
1129:
1126:
1120:
1115:
1104:
1099:
1096:Training speed
1089:
1084:
1020:
977:
946:
917:
916:
890:
882:
881:
842:
834:
833:
794:Kernel machines
789:
781:
780:
756:
748:
747:
728:Active learning
723:
715:
714:
683:
673:
672:
598:Diffusion model
534:
524:
523:
496:
486:
485:
459:
449:
448:
404:Factor analysis
399:
389:
388:
372:
335:
325:
324:
245:
244:
228:
227:
226:
215:
214:
120:
112:
111:
77:Online learning
42:
30:
17:
12:
11:
5:
2577:
2575:
2567:
2566:
2561:
2551:
2550:
2544:
2543:
2541:
2540:
2539:
2538:
2533:
2520:
2519:
2518:
2513:
2499:
2496:
2495:
2493:
2492:
2487:
2482:
2477:
2472:
2467:
2462:
2457:
2452:
2447:
2442:
2437:
2432:
2427:
2422:
2416:
2414:
2410:
2409:
2407:
2406:
2401:
2396:
2391:
2386:
2381:
2376:
2371:
2366:
2360:
2358:
2354:
2353:
2351:
2350:
2348:Ilya Sutskever
2345:
2340:
2335:
2330:
2325:
2320:
2315:
2313:Demis Hassabis
2310:
2305:
2303:Ian Goodfellow
2300:
2295:
2289:
2287:
2283:
2282:
2279:
2278:
2276:
2275:
2270:
2269:
2268:
2258:
2253:
2248:
2243:
2238:
2233:
2228:
2222:
2220:
2216:
2215:
2213:
2212:
2207:
2202:
2197:
2192:
2187:
2182:
2177:
2172:
2167:
2162:
2157:
2152:
2147:
2142:
2137:
2132:
2131:
2130:
2120:
2115:
2110:
2105:
2100:
2094:
2092:
2088:
2087:
2085:
2084:
2079:
2078:
2077:
2072:
2062:
2061:
2060:
2055:
2050:
2040:
2035:
2030:
2025:
2020:
2015:
2010:
2005:
2000:
1994:
1992:
1985:
1981:
1980:
1978:
1977:
1972:
1967:
1962:
1957:
1952:
1947:
1941:
1939:
1935:
1934:
1932:
1931:
1926:
1921:
1916:
1911:
1905:
1903:
1899:
1898:
1896:
1895:
1894:
1893:
1886:Language model
1883:
1878:
1873:
1872:
1871:
1861:
1860:
1859:
1848:
1846:
1842:
1841:
1839:
1838:
1836:Autoregression
1833:
1828:
1827:
1826:
1816:
1814:Regularization
1811:
1810:
1809:
1804:
1799:
1789:
1784:
1779:
1777:Loss functions
1774:
1769:
1764:
1759:
1754:
1753:
1752:
1742:
1737:
1736:
1735:
1724:
1722:
1718:
1717:
1715:
1714:
1712:Inductive bias
1709:
1704:
1699:
1694:
1689:
1684:
1679:
1674:
1666:
1664:
1658:
1657:
1654:
1652:
1651:
1644:
1637:
1629:
1623:
1622:
1601:
1600:External links
1598:
1595:
1594:
1570:
1546:
1521:
1497:
1475:
1451:
1422:
1398:
1362:
1336:
1310:
1269:
1268:
1266:
1263:
1262:
1261:
1256:
1251:
1246:
1239:
1236:
1224:
1221:
1206:
1203:
1186:
1183:
1171:
1168:
1163:
1162:
1156:
1149:
1119:
1116:
1114:
1111:
1108:
1107:
1102:
1097:
1093:
1092:
1087:
1082:
1078:
1077:
1074:
1071:
1067:
1066:
1063:
1060:
1056:
1055:
1052:
1049:
1040:
1039:
1033:
1027:
1019:
1018:Key components
1016:
993:time-invariant
976:
973:
948:
947:
945:
944:
937:
930:
922:
919:
918:
915:
914:
909:
908:
907:
897:
891:
888:
887:
884:
883:
880:
879:
874:
869:
864:
859:
854:
849:
843:
840:
839:
836:
835:
832:
831:
826:
821:
816:
814:Occam learning
811:
806:
801:
796:
790:
787:
786:
783:
782:
779:
778:
773:
771:Learning curve
768:
763:
757:
754:
753:
750:
749:
746:
745:
740:
735:
730:
724:
721:
720:
717:
716:
713:
712:
711:
710:
700:
695:
690:
684:
679:
678:
675:
674:
671:
670:
664:
659:
654:
649:
648:
647:
637:
632:
631:
630:
625:
620:
615:
605:
600:
595:
590:
589:
588:
578:
577:
576:
571:
566:
561:
551:
546:
541:
535:
530:
529:
526:
525:
522:
521:
516:
511:
503:
497:
492:
491:
488:
487:
484:
483:
482:
481:
476:
471:
460:
455:
454:
451:
450:
447:
446:
441:
436:
431:
426:
421:
416:
411:
406:
400:
395:
394:
391:
390:
387:
386:
381:
376:
370:
365:
360:
352:
347:
342:
336:
331:
330:
327:
326:
323:
322:
317:
312:
307:
302:
297:
292:
287:
279:
278:
277:
272:
267:
257:
255:Decision trees
252:
246:
232:classification
222:
221:
220:
217:
216:
213:
212:
207:
202:
197:
192:
187:
182:
177:
172:
167:
162:
157:
152:
147:
142:
137:
132:
127:
125:Classification
121:
118:
117:
114:
113:
110:
109:
104:
99:
94:
89:
84:
82:Batch learning
79:
74:
69:
64:
59:
54:
49:
43:
40:
39:
36:
35:
24:
23:
15:
13:
10:
9:
6:
4:
3:
2:
2576:
2565:
2562:
2560:
2557:
2556:
2554:
2537:
2534:
2532:
2529:
2528:
2521:
2517:
2514:
2512:
2509:
2508:
2505:
2501:
2500:
2497:
2491:
2488:
2486:
2483:
2481:
2478:
2476:
2473:
2471:
2468:
2466:
2463:
2461:
2458:
2456:
2453:
2451:
2448:
2446:
2443:
2441:
2438:
2436:
2433:
2431:
2428:
2426:
2423:
2421:
2418:
2417:
2415:
2413:Architectures
2411:
2405:
2402:
2400:
2397:
2395:
2392:
2390:
2387:
2385:
2382:
2380:
2377:
2375:
2372:
2370:
2367:
2365:
2362:
2361:
2359:
2357:Organizations
2355:
2349:
2346:
2344:
2341:
2339:
2336:
2334:
2331:
2329:
2326:
2324:
2321:
2319:
2316:
2314:
2311:
2309:
2306:
2304:
2301:
2299:
2296:
2294:
2293:Yoshua Bengio
2291:
2290:
2288:
2284:
2274:
2273:Robot control
2271:
2267:
2264:
2263:
2262:
2259:
2257:
2254:
2252:
2249:
2247:
2244:
2242:
2239:
2237:
2234:
2232:
2229:
2227:
2224:
2223:
2221:
2217:
2211:
2208:
2206:
2203:
2201:
2198:
2196:
2193:
2191:
2190:Chinchilla AI
2188:
2186:
2183:
2181:
2178:
2176:
2173:
2171:
2168:
2166:
2163:
2161:
2158:
2156:
2153:
2151:
2148:
2146:
2143:
2141:
2138:
2136:
2133:
2129:
2126:
2125:
2124:
2121:
2119:
2116:
2114:
2111:
2109:
2106:
2104:
2101:
2099:
2096:
2095:
2093:
2089:
2083:
2080:
2076:
2073:
2071:
2068:
2067:
2066:
2063:
2059:
2056:
2054:
2051:
2049:
2046:
2045:
2044:
2041:
2039:
2036:
2034:
2031:
2029:
2026:
2024:
2021:
2019:
2016:
2014:
2011:
2009:
2006:
2004:
2001:
1999:
1996:
1995:
1993:
1989:
1986:
1982:
1976:
1973:
1971:
1968:
1966:
1963:
1961:
1958:
1956:
1953:
1951:
1948:
1946:
1943:
1942:
1940:
1936:
1930:
1927:
1925:
1922:
1920:
1917:
1915:
1912:
1910:
1907:
1906:
1904:
1900:
1892:
1889:
1888:
1887:
1884:
1882:
1879:
1877:
1874:
1870:
1869:Deep learning
1867:
1866:
1865:
1862:
1858:
1855:
1854:
1853:
1850:
1849:
1847:
1843:
1837:
1834:
1832:
1829:
1825:
1822:
1821:
1820:
1817:
1815:
1812:
1808:
1805:
1803:
1800:
1798:
1795:
1794:
1793:
1790:
1788:
1785:
1783:
1780:
1778:
1775:
1773:
1770:
1768:
1765:
1763:
1760:
1758:
1757:Hallucination
1755:
1751:
1748:
1747:
1746:
1743:
1741:
1738:
1734:
1731:
1730:
1729:
1726:
1725:
1723:
1719:
1713:
1710:
1708:
1705:
1703:
1700:
1698:
1695:
1693:
1690:
1688:
1685:
1683:
1680:
1678:
1675:
1673:
1672:
1668:
1667:
1665:
1663:
1659:
1650:
1645:
1643:
1638:
1636:
1631:
1630:
1627:
1613:
1609:
1604:
1603:
1599:
1584:
1580:
1574:
1571:
1566:
1561:
1557:
1550:
1547:
1536:
1532:
1525:
1522:
1517:
1512:
1508:
1501:
1498:
1487:
1486:
1479:
1476:
1471:
1466:
1462:
1455:
1452:
1440:
1436:
1429:
1427:
1423:
1418:
1414:
1410:
1402:
1399:
1386:
1381:
1377:
1373:
1366:
1363:
1351:
1347:
1340:
1337:
1325:
1321:
1314:
1311:
1305:
1300:
1293:
1291:
1289:
1287:
1285:
1283:
1281:
1279:
1277:
1275:
1271:
1264:
1260:
1257:
1255:
1252:
1250:
1247:
1245:
1242:
1241:
1237:
1235:
1232:
1230:
1222:
1220:
1218:
1212:
1204:
1202:
1199:
1192:
1184:
1182:
1177:
1169:
1167:
1160:
1157:
1153:
1150:
1146:
1143:
1142:
1141:
1137:
1135:
1125:
1117:
1112:
1103:
1098:
1095:
1094:
1088:
1083:
1080:
1079:
1075:
1072:
1069:
1068:
1064:
1061:
1059:Architecture
1058:
1057:
1053:
1050:
1047:
1046:
1037:
1034:
1031:
1028:
1025:
1022:
1021:
1017:
1015:
1012:
1007:
1005:
1001:
996:
994:
988:
986:
985:convolutional
982:
974:
972:
970:
966:
962:
958:
957:deep learning
954:
943:
938:
936:
931:
929:
924:
923:
921:
920:
913:
910:
906:
903:
902:
901:
898:
896:
893:
892:
886:
885:
878:
875:
873:
870:
868:
865:
863:
860:
858:
855:
853:
850:
848:
845:
844:
838:
837:
830:
827:
825:
822:
820:
817:
815:
812:
810:
807:
805:
802:
800:
797:
795:
792:
791:
785:
784:
777:
774:
772:
769:
767:
764:
762:
759:
758:
752:
751:
744:
741:
739:
736:
734:
733:Crowdsourcing
731:
729:
726:
725:
719:
718:
709:
706:
705:
704:
701:
699:
696:
694:
691:
689:
686:
685:
682:
677:
676:
668:
665:
663:
662:Memtransistor
660:
658:
655:
653:
650:
646:
643:
642:
641:
638:
636:
633:
629:
626:
624:
621:
619:
616:
614:
611:
610:
609:
606:
604:
601:
599:
596:
594:
591:
587:
584:
583:
582:
579:
575:
572:
570:
567:
565:
562:
560:
557:
556:
555:
552:
550:
547:
545:
544:Deep learning
542:
540:
537:
536:
533:
528:
527:
520:
517:
515:
512:
510:
508:
504:
502:
499:
498:
495:
490:
489:
480:
479:Hidden Markov
477:
475:
472:
470:
467:
466:
465:
462:
461:
458:
453:
452:
445:
442:
440:
437:
435:
432:
430:
427:
425:
422:
420:
417:
415:
412:
410:
407:
405:
402:
401:
398:
393:
392:
385:
382:
380:
377:
375:
371:
369:
366:
364:
361:
359:
357:
353:
351:
348:
346:
343:
341:
338:
337:
334:
329:
328:
321:
318:
316:
313:
311:
308:
306:
303:
301:
298:
296:
293:
291:
288:
286:
284:
280:
276:
275:Random forest
273:
271:
268:
266:
263:
262:
261:
258:
256:
253:
251:
248:
247:
240:
239:
234:
233:
225:
219:
218:
211:
208:
206:
203:
201:
198:
196:
193:
191:
188:
186:
183:
181:
178:
176:
173:
171:
168:
166:
163:
161:
160:Data cleaning
158:
156:
153:
151:
148:
146:
143:
141:
138:
136:
133:
131:
128:
126:
123:
122:
116:
115:
108:
105:
103:
100:
98:
95:
93:
90:
88:
85:
83:
80:
78:
75:
73:
72:Meta-learning
70:
68:
65:
63:
60:
58:
55:
53:
50:
48:
45:
44:
38:
37:
34:
29:
25:
21:
20:
2379:Hugging Face
2343:David Silver
1991:Audio–visual
1845:Applications
1824:Augmentation
1669:
1615:. Retrieved
1611:
1586:. Retrieved
1583:www.ai21.com
1582:
1573:
1555:
1549:
1538:. Retrieved
1535:MarkTechPost
1534:
1524:
1506:
1500:
1490:, retrieved
1484:
1478:
1460:
1454:
1442:. Retrieved
1439:MarkTechPost
1438:
1408:
1401:
1389:. Retrieved
1375:
1365:
1353:. Retrieved
1349:
1339:
1327:. Retrieved
1323:
1313:
1233:
1226:
1214:
1194:
1185:Vision Mamba
1179:
1164:
1158:
1151:
1148:adaptations.
1144:
1138:
1127:
1051:Transformer
1035:
1029:
1023:
1008:
1004:transformers
997:
989:
978:
975:Architecture
952:
951:
819:PAC learning
651:
506:
355:
350:Hierarchical
282:
236:
230:
2527:Categories
2475:Autoencoder
2430:Transformer
2298:Alex Graves
2246:OpenAI Five
2150:IBM Watsonx
1772:Convolution
1750:Overfitting
1612:TheSequence
1070:Complexity
703:Multi-agent
640:Transformer
539:Autoencoder
295:Naive Bayes
33:data mining
2553:Categories
2516:Technology
2369:EleutherAI
2328:Fei-Fei Li
2323:Yann LeCun
2236:Q-learning
2219:Decisional
2145:IBM Watson
2053:Midjourney
1945:TensorFlow
1792:Activation
1745:Regression
1740:Clustering
1617:2024-08-28
1588:2024-03-29
1565:2401.09417
1540:2024-02-23
1516:2401.04081
1492:2024-02-23
1470:2401.13660
1444:13 January
1391:13 January
1385:2111.00396
1355:13 January
1329:13 January
1304:2312.00752
1265:References
1201:learning.
1065:SSM-based
688:Q-learning
586:Restricted
384:Mean shift
333:Clustering
310:Perceptron
238:regression
140:Clustering
135:Regression
2399:MIT CSAIL
2364:Anthropic
2333:Andrew Ng
2231:AlphaZero
2075:VideoPoet
2038:AlphaFold
1975:MindSpore
1929:SpiNNaker
1924:Memristor
1831:Diffusion
1807:Rectifier
1787:Batchnorm
1767:Attention
1762:Adversary
1417:239998472
1217:AI21 Labs
981:recurrent
847:ECML PKDD
829:VC theory
776:ROC curve
708:Self-play
628:DeepDream
469:Bayes net
260:Ensembles
41:Paradigms
2507:Portals
2266:Auto-GPT
2098:Word2vec
1902:Hardware
1819:Datasets
1721:Concepts
1238:See also
1198:ImageNet
1113:Variants
1048:Feature
270:Boosting
119:Problems
2389:Meta AI
2226:AlphaGo
2210:PanGu-ÎŁ
2180:ChatGPT
2155:Granite
2103:Seq2seq
2082:Whisper
2003:WaveNet
1998:AlexNet
1970:Flux.jl
1950:PyTorch
1802:Sigmoid
1797:Softmax
1662:General
1409:NeurIPS
852:NeurIPS
669:(ECRAM)
623:AlexNet
265:Bagging
2404:Huawei
2384:OpenAI
2286:People
2256:MuZero
2118:Gemini
2113:Claude
2048:DALL-E
1960:Theano
1415:
1076:Lower
1054:Mamba
983:, and
645:Vision
501:RANSAC
379:OPTICS
374:DBSCAN
358:-means
165:AutoML
2470:Mamba
2241:SARSA
2205:LLaMA
2200:BLOOM
2185:GPT-J
2175:GPT-4
2170:GPT-3
2165:GPT-2
2160:GPT-1
2123:LaMDA
1955:Keras
1560:arXiv
1511:arXiv
1465:arXiv
1413:S2CID
1380:arXiv
1299:arXiv
1205:Jamba
1155:data.
1073:High
955:is a
953:Mamba
867:IJCAI
693:SARSA
652:Mamba
618:LeNet
613:U-Net
439:t-SNE
363:Fuzzy
340:BIRCH
2394:Mila
2195:PaLM
2128:Bard
2108:BERT
2091:Text
2070:Sora
1446:2024
1393:2024
1376:ICLR
1357:2024
1331:2024
1130:O(n)
1105:O(n)
1100:O(n)
1090:O(1)
1085:O(n)
1000:GPUs
963:and
877:JMLR
862:ICLR
857:ICML
743:RLHF
559:LSTM
345:CURE
31:and
2135:NMT
2018:OCR
2013:HWR
1965:JAX
1919:VPU
1914:TPU
1909:IPU
1733:SGD
1011:MLP
603:SOM
593:GAN
569:ESN
564:GRU
509:-NN
444:SDL
434:PGD
429:PCA
424:NMF
419:LDA
414:ICA
409:CCA
285:-NN
2555::
1610:.
1581:.
1558:,
1533:.
1509:,
1463:,
1437:.
1425:^
1411:.
1378:.
1374:.
1348:.
1322:.
1273:^
1136:.
1006:.
872:ML
1648:e
1641:t
1634:v
1620:.
1591:.
1562::
1543:.
1513::
1467::
1448:.
1419:.
1395:.
1382::
1359:.
1333:.
1307:.
1301::
941:e
934:t
927:v
507:k
356:k
283:k
241:)
229:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.