535:
checkpointing that allows users to select which data needs to be protected, in order to improve efficiency and avoid space, time and energy waste. It offers a direct data interface so that users do not need to deal with files and/or directory names. All metadata is managed by FTI in a transparent fashion for the user. If desired, users can dedicate one process per node to overlap fault tolerance workload and scientific computation, so that post-checkpoint tasks are executed asynchronously.
456:
imperative. Thus the "checkpoint/restart" capability was born, in which after a number of transactions had been processed, a "snapshot" or "checkpoint" of the state of the application could be taken. If the application failed before the next checkpoint, it could be restarted by giving it the checkpoint information and the last place in the transaction file where a transaction had successfully completed. The application could then restart at that point.
278:) and then continue with the execution. In case of failure, when the application restarts, it does not need to start from scratch. Rather, it will read the latest state ("the checkpoint") from the stable storage and execute from that. While there is ongoing debate on whether checkpointing is the dominating I/O workload on distributed computing systems, there is general consensus that checkpointing is one of the major I/O workloads.
77:
544:
made to application code. BLCR focuses on checkpointing parallel applications that communicate through MPI, and on compatibility with the software suite produced by the SciDAC Scalable
Systems Software ISIC. Its work is broken down into 4 main areas: Checkpoint/Restart for Linux (CR), Checkpointable MPI Libraries, Resource Management Interface to Checkpoint/Restart and Development of Process Management Interfaces.
1770:
474:
395:
312:
124:
36:
373:
or exit the application and at a later time, restart the application and restore the saved state. This was implemented through a "save" command or menu option in the application. In many cases it became standard practice to ask the user if they had unsaved work when exiting the application if they wanted to save their work before doing so.
543:
The Future
Technologies Group at the Lawrence National Laboratories are developing a hybrid kernel/user implementation of checkpoint/restart called BLCR. Their goal is to provide a robust, production quality implementation that checkpoints a wide range of applications, without requiring changes to be
455:
As batch applications began to handle tens to hundreds of thousands of transactions, where each transaction might process one record from one file against several different files, the need for the application to be restartable at some point without the need to rerun the entire job from scratch became
534:
FTI is a library that aims to provide computational scientists with an easy way to perform checkpoint/restart in a scalable fashion. FTI leverages local storage plus multiple replications and erasures techniques to provide several levels of reliability and performance. FTI provides application-level
459:
Checkpointing tends to be expensive, so it was generally not done with every record, but at some reasonable compromise between the cost of a checkpoint vs. the value of the computer time needed to reprocess a batch of records. Thus the number of records processed for each checkpoint might range from
376:
This sort of functionality became extremely important for usability in applications where the particular work could not be completed in one sitting (such as playing a video game expected to take dozens of hours, or writing a book or long document amounting to hundreds or thousands of pages) or where
372:
One of the original and now most common means of application checkpointing was a "save state" feature in interactive applications, in which the user of the application could save the state of all variables and other data to a storage medium at the time they were using it and either continue working,
286:
algorithm. In the uncoordinated checkpointing, each process checkpoints its own state independently. It must be stressed that simply forcing processes to checkpoint their state at fixed time intervals is not sufficient to ensure global consistency. The need for establishing a consistent state (i.e.,
287:
no missing messages or duplicated messages) may force other processes to roll back to their checkpoints, which in turn may cause other processes to roll back to even earlier checkpoints, which in the most extreme case may mean that the only consistent state found is the initial state (the so-called
273:
environment, checkpointing is a technique that helps tolerate failures that otherwise would force long-running application to restart from the beginning. The most basic way to implement checkpointing, is to stop the application, copy all the required data from the memory to reliable storage (e.g.,
821:
Bautista-Gomez, L., Tsuboi, S., Komatitsch, D., Cappello, F., Maruyama, N., & Matsuoka, S. (2011, November). FTI: high performance fault tolerance interface for hybrid systems. In
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (p.
586:
Some recent protocols perform collaborative checkpointing by storing fragments of the checkpoint in nearby nodes. This is helpful because it avoids the cost of storing to a parallel file system (which often becomes a bottleneck for large-scale systems) and it uses storage that is closer. This has
552:
DMTCP (Distributed MultiThreaded
Checkpointing) is a tool for transparently checkpointing the state of an arbitrary group of programs spread across many machines and connected by sockets. It does not modify the user's program or the operating system. Among the applications supported by DMTCP are
281:
There are two main approaches for checkpointing in the distributed computing systems: coordinated checkpointing and uncoordinated checkpointing. In the coordinated checkpointing approach, processes must ensure that their checkpoints are consistent. This is usually achieved by some kind of
950:
Mirhoseini, A.; Songhori, E.M.; Koushanfar, F., "Idetic: A high-level synthesis approach for enabling long computations on transiently-powered ASICs," Pervasive
Computing and Communications (PerCom), 2013 IEEE International Conference on , vol., no., pp.216,224, 18–22 March 2013 URL:
970:
R.E. Ahmed, R.C. Frazier, and P.N. Marinos, " Cache-Aided
Rollback Error Recovery (CARER) Algorithms for Shared-Memory Multiprocessor Systems", IEEE 20th International Symposium on Fault-Tolerant Computing (FTCS-20), Newcastle upon Tyne, UK, June 26–28, 1990,
631:
from ambient background sources. Mementos frequently senses the available energy in the system and decides whether to checkpoint the program due to impending power loss versus continuing computation. If checkpointing, data will be stored in a
802:
Bouteiller, B., Lemarinier, P., Krawezik, K., & Capello, F. (2003, December). Coordinated checkpoint versus message log for fault tolerant MPI. In
Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on (pp. 242-250).
380:
The problem with save state is it requires the operator of a program to request the save. For non-interactive programs, including automated or batch processed workloads, the ability to checkpoint such applications also had to be automated.
840:
Ansel, J., Arya, K., & Cooperman, G. (2009, May). DMTCP: Transparent checkpointing for cluster computations and the desktop. In
Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on (pp. 1-12).
569:
and shell scripting languages. With the use of TightVNC, it can also checkpoint and restart X Window applications, as long as they do not use extensions (e.g. no OpenGL or video). Among the Linux features supported by DMTCP are open
587:
found use particularly in large-scale supercomputing clusters. The challenge is to ensure that when the checkpoint is needed when recovering from a failure, the nearby nodes with fragments of the checkpoints are available.
622:
Mementos is a software system that transforms general-purpose tasks into interruptible programs for platforms with frequent interruptions such as power outages. It was designed for batteryless embedded devices such as
937:
Benjamin
Ransford, Jacob Sorber, and Kevin Fu. 2011. Mementos: system support for long-running computation on RFID-scale devices. ACM SIGPLAN Notices 47, 4 (March 2011), 159-170. DOI=10.1145/2248487.1950386
574:, pipes, sockets, signal handlers, process id and thread id virtualization (ensure old pids and tids continue to work upon restart), ptys, fifos, process group ids, session ids, terminal attributes, and
831:
Hargrove, P. H., & Duell, J. C. (2006, September). Berkeley lab checkpoint/restart (blcr) for linux clusters. In
Journal of Physics: Conference Series (Vol. 46, No. 1, p. 494). IOP Publishing.
1096:
745:
Wang, Teng; Snyder, Shane; Lockwood, Glenn; Carns, Philip; Wright, Nicholas; Byna, Suren (Sep 2018). "IOMiner: Large-Scale Analytics Framework for Gaining Knowledge from I/O Logs".
812:
Elnozahy, E. N., Alvisi, L., Wang, Y. M., & Johnson, D. B. (2002). A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys, 34(3), 375-408.
1186:
694:
to a non-volatile memory, the optimum points are required to have minimum number of registers to store. Idetic is deployed and evaluated on energy harvesting
1038:
495:
416:
333:
141:
49:
1795:
460:
25 to 200, depending on cost factors, the relative complexity of the application and the resources needed to successfully restart the application.
667:
1167:
762:
967:
Yibei Ling, Jie Mi, Xiaola Lin: A Variational Calculus Approach to Optimal Checkpoint Placement. IEEE Trans. Computers 50(7): 699-708 (2001)
1434:
1457:
188:
640:, the data is retrieved from non-volatile memory and the program continues from the stored state. Mementos has been implemented on the
1346:
735:
Plank, J. S., Beck, M., Kingsley, G., & Li, K. (1994). Libckpt: Transparent checkpointing under unix. Computer Science Department.
160:
1452:
1429:
521:
442:
359:
225:
207:
63:
1031:
167:
1424:
1239:
991:
1531:
1394:
499:
420:
337:
145:
98:
55:
952:
174:
1755:
1589:
1207:
1127:
558:
981:
89:
1774:
1720:
1180:
1024:
716:
250:
156:
578:/mprotect (including mmap-based shared memory). DMTCP supports the OFED API for InfiniBand on an experimental basis.
484:
405:
377:
the work was being done over a long period of time such as data entry into a document such as rows in a spreadsheet.
322:
261:. This is particularly important for long running applications that are executed in failure-prone computing systems.
1699:
1494:
1379:
1341:
1191:
1081:
503:
488:
424:
409:
341:
326:
134:
1715:
1694:
1639:
1526:
1516:
1489:
283:
1669:
1295:
1234:
1147:
1584:
1730:
1725:
1175:
675:
1469:
1401:
1305:
1197:
1152:
1001:
880:
1259:
1561:
1521:
1474:
1464:
1202:
1122:
1061:
270:
851:
181:
871:
Walters, J. P.; Chaudhary, V. (2009-07-01). "Replication-Based Fault Tolerance for MPI Applications".
1501:
1389:
1384:
1374:
1361:
1157:
671:
566:
275:
254:
885:
1664:
1619:
1445:
1440:
1419:
1285:
683:
633:
1689:
1538:
1511:
1336:
1300:
1290:
1091:
1071:
1066:
1047:
906:
768:
691:
637:
1249:
690:
of the design. Since the checkpointing in hardware level involves sending the data of dependent
1735:
1411:
1369:
1264:
898:
758:
649:
628:
595:
1745:
1544:
1479:
1326:
1142:
1137:
1132:
1101:
890:
750:
645:
1609:
1549:
1484:
1331:
1321:
1254:
1086:
1076:
571:
242:
1244:
953:
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6526735&isnumber=6526701
787:"Comparative I/O workload characterization of two leadership class storage clusters Logs"
76:
1740:
1556:
1213:
1106:
653:
1789:
1629:
1506:
707:
687:
289:
772:
1229:
910:
670:(ASIC) developers to automatically embed checkpoints in their designs. It targets
1750:
473:
394:
311:
123:
754:
786:
712:
1011:
939:
902:
17:
1624:
1599:
246:
894:
598:
and the underlying technology contain a checkpoint and restore mechanism.
1674:
1654:
1579:
695:
624:
554:
1679:
1659:
1634:
1269:
679:
258:
852:"GitHub - DMTCP/DMTCP: DMTCP: Distributed MultiThreaded CheckPointing"
257:'s state, so that applications can restart from that point in case of
1649:
1644:
1016:
856:
641:
986:
747:
2018 IEEE International Conference on Cluster Computing (CLUSTER)
1684:
1614:
1604:
996:
606:
575:
562:
27:
A technique for inserting fault tolerance into computing systems
1020:
1594:
1571:
467:
388:
305:
117:
70:
29:
1006:
924:
94:
873:
IEEE Transactions on Parallel and Distributed Systems
1708:
1570:
1410:
1360:
1314:
1278:
1222:
1166:
1115:
1054:
148:. Unsourced material may be challenged and removed.
997:Distributed MultiThreaded CheckPointing (DMTCP)
666:Idetic is a set of automatic tools which helps
686:approach to locate low overhead points in the
1032:
8:
613:Implementation for embedded and ASIC devices
502:. Unsourced material may be challenged and
423:. Unsourced material may be challenged and
340:. Unsourced material may be challenged and
249:systems. It basically consists of saving a
64:Learn how and when to remove these messages
1039:
1025:
1017:
940:http://doi.acm.org/10.1145/2248487.1950386
884:
636:. When the energy becomes sufficient for
522:Learn how and when to remove this message
443:Learn how and when to remove this message
360:Learn how and when to remove this message
226:Learn how and when to remove this message
208:Learn how and when to remove this message
728:
668:application-specific integrated circuit
992:Berkeley Lab Checkpoint/Restart (BLCR)
674:tools and adds the checkpoints at the
539:Berkeley Lab Checkpoint/Restart (BLCR)
7:
609:is a user space checkpoint library.
500:adding citations to reliable sources
421:adding citations to reliable sources
338:adding citations to reliable sources
265:Checkpointing in distributed systems
146:adding citations to reliable sources
25:
88:to comply with Knowledge (XXG)'s
45:This article has multiple issues.
1769:
1768:
715:, a similar concept provided by
472:
393:
310:
297:Implementations for applications
122:
75:
34:
1796:Fault-tolerant computer systems
1240:Analysis of parallel algorithms
464:Fault Tolerance Interface (FTI)
133:needs additional citations for
53:or discuss these issues on the
627:and smart cards which rely on
1:
1187:Simultaneous and heterogenous
241:is a technique that provides
1775:Category: Parallel computing
717:video game console emulators
582:Collaborative checkpointing
157:"Application checkpointing"
1812:
1082:High-performance computing
755:10.1109/CLUSTER.2018.00062
749:. IEEE. pp. 466–476.
648:. Mementos is named after
1764:
1716:Automatic parallelization
1352:Application checkpointing
284:two-phase commit protocol
101:may contain suggestions.
86:may need to be rewritten
1731:Embarrassingly parallel
1726:Deterministic algorithm
676:register-transfer level
1446:Associative processing
1402:Non-blocking algorithm
1208:Clustered multi-thread
1562:Hardware acceleration
1475:Superscalar processor
1465:Dataflow architecture
1062:Distributed computing
895:10.1109/TPDS.2008.172
567:programming languages
271:distributed computing
1441:Pipelined processing
1390:Explicit parallelism
1385:Implicit parallelism
1375:Dataflow programming
672:high-level synthesis
496:improve this section
417:improve this section
334:improve this section
276:parallel file system
142:improve this article
1665:Parallel Extensions
1470:Pipelined processor
684:dynamic programming
634:non-volatile memory
1539:Massively parallel
1517:distributed shared
1337:Cache invalidation
1301:Instruction window
1092:Manycore processor
1072:Massively parallel
1067:Parallel computing
1048:Parallel computing
385:Checkpoint/Restart
1783:
1782:
1736:Parallel slowdown
1370:Stream processing
1260:Karp–Flatt metric
764:978-1-5386-8319-4
682:code). It uses a
650:Christopher Nolan
629:harvesting energy
532:
531:
524:
453:
452:
445:
370:
369:
362:
236:
235:
228:
218:
217:
210:
192:
116:
115:
90:quality standards
68:
16:(Redirected from
1803:
1772:
1771:
1746:Software lockout
1545:Computer cluster
1480:Vector processor
1435:Array processing
1420:Flynn's taxonomy
1327:Memory coherence
1102:Computer network
1041:
1034:
1027:
1018:
955:
948:
942:
935:
929:
928:
921:
915:
914:
888:
868:
862:
861:
848:
842:
838:
832:
829:
823:
819:
813:
810:
804:
800:
794:
793:
792:. ACM. Nov 2015.
791:
783:
777:
776:
742:
736:
733:
646:microcontrollers
572:file descriptors
527:
520:
516:
513:
507:
476:
468:
448:
441:
437:
434:
428:
397:
389:
365:
358:
354:
351:
345:
314:
306:
231:
224:
213:
206:
202:
199:
193:
191:
150:
126:
118:
111:
108:
102:
79:
71:
60:
38:
37:
30:
21:
1811:
1810:
1806:
1805:
1804:
1802:
1801:
1800:
1786:
1785:
1784:
1779:
1760:
1704:
1610:Coarray Fortran
1566:
1550:Beowulf cluster
1406:
1356:
1347:Synchronization
1332:Cache coherence
1322:Multiprocessing
1310:
1274:
1255:Cost efficiency
1250:Gustafson's law
1218:
1162:
1111:
1087:Multiprocessing
1077:Cloud computing
1050:
1045:
978:
971:pp. 82–88.
964:
962:Further reading
959:
958:
949:
945:
936:
932:
925:"Docker - CRIU"
923:
922:
918:
886:10.1.1.921.6773
879:(7): 997–1010.
870:
869:
865:
850:
849:
845:
839:
835:
830:
826:
820:
816:
811:
807:
801:
797:
789:
785:
784:
780:
765:
744:
743:
739:
734:
730:
725:
704:
664:
620:
615:
604:
593:
584:
550:
541:
528:
517:
511:
508:
493:
477:
466:
449:
438:
432:
429:
414:
398:
387:
366:
355:
349:
346:
331:
315:
304:
299:
267:
243:fault tolerance
232:
221:
220:
219:
214:
203:
197:
194:
151:
149:
139:
127:
112:
106:
103:
93:
80:
39:
35:
28:
23:
22:
15:
12:
11:
5:
1809:
1807:
1799:
1798:
1788:
1787:
1781:
1780:
1778:
1777:
1765:
1762:
1761:
1759:
1758:
1753:
1748:
1743:
1741:Race condition
1738:
1733:
1728:
1723:
1718:
1712:
1710:
1706:
1705:
1703:
1702:
1697:
1692:
1687:
1682:
1677:
1672:
1667:
1662:
1657:
1652:
1647:
1642:
1637:
1632:
1627:
1622:
1617:
1612:
1607:
1602:
1597:
1592:
1587:
1582:
1576:
1574:
1568:
1567:
1565:
1564:
1559:
1554:
1553:
1552:
1542:
1536:
1535:
1534:
1529:
1524:
1519:
1514:
1509:
1499:
1498:
1497:
1492:
1485:Multiprocessor
1482:
1477:
1472:
1467:
1462:
1461:
1460:
1455:
1450:
1449:
1448:
1443:
1438:
1427:
1416:
1414:
1408:
1407:
1405:
1404:
1399:
1398:
1397:
1392:
1387:
1377:
1372:
1366:
1364:
1358:
1357:
1355:
1354:
1349:
1344:
1339:
1334:
1329:
1324:
1318:
1316:
1312:
1311:
1309:
1308:
1303:
1298:
1293:
1288:
1282:
1280:
1276:
1275:
1273:
1272:
1267:
1262:
1257:
1252:
1247:
1242:
1237:
1232:
1226:
1224:
1220:
1219:
1217:
1216:
1214:Hardware scout
1211:
1205:
1200:
1195:
1189:
1184:
1178:
1172:
1170:
1168:Multithreading
1164:
1163:
1161:
1160:
1155:
1150:
1145:
1140:
1135:
1130:
1125:
1119:
1117:
1113:
1112:
1110:
1109:
1107:Systolic array
1104:
1099:
1094:
1089:
1084:
1079:
1074:
1069:
1064:
1058:
1056:
1052:
1051:
1046:
1044:
1043:
1036:
1029:
1021:
1015:
1014:
1009:
1004:
999:
994:
989:
984:
977:
976:External links
974:
973:
972:
968:
963:
960:
957:
956:
943:
930:
916:
863:
843:
833:
824:
814:
805:
795:
778:
763:
737:
727:
726:
724:
721:
720:
719:
710:
703:
700:
663:
660:
619:
616:
614:
611:
603:
600:
592:
589:
583:
580:
549:
546:
540:
537:
530:
529:
480:
478:
471:
465:
462:
451:
450:
401:
399:
392:
386:
383:
368:
367:
318:
316:
309:
303:
300:
298:
295:
266:
263:
234:
233:
216:
215:
130:
128:
121:
114:
113:
83:
81:
74:
69:
43:
42:
40:
33:
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
1808:
1797:
1794:
1793:
1791:
1776:
1767:
1766:
1763:
1757:
1754:
1752:
1749:
1747:
1744:
1742:
1739:
1737:
1734:
1732:
1729:
1727:
1724:
1722:
1719:
1717:
1714:
1713:
1711:
1707:
1701:
1698:
1696:
1693:
1691:
1688:
1686:
1683:
1681:
1678:
1676:
1673:
1671:
1668:
1666:
1663:
1661:
1658:
1656:
1653:
1651:
1648:
1646:
1643:
1641:
1638:
1636:
1633:
1631:
1630:Global Arrays
1628:
1626:
1623:
1621:
1618:
1616:
1613:
1611:
1608:
1606:
1603:
1601:
1598:
1596:
1593:
1591:
1588:
1586:
1583:
1581:
1578:
1577:
1575:
1573:
1569:
1563:
1560:
1558:
1557:Grid computer
1555:
1551:
1548:
1547:
1546:
1543:
1540:
1537:
1533:
1530:
1528:
1525:
1523:
1520:
1518:
1515:
1513:
1510:
1508:
1505:
1504:
1503:
1500:
1496:
1493:
1491:
1488:
1487:
1486:
1483:
1481:
1478:
1476:
1473:
1471:
1468:
1466:
1463:
1459:
1456:
1454:
1451:
1447:
1444:
1442:
1439:
1436:
1433:
1432:
1431:
1428:
1426:
1423:
1422:
1421:
1418:
1417:
1415:
1413:
1409:
1403:
1400:
1396:
1393:
1391:
1388:
1386:
1383:
1382:
1381:
1378:
1376:
1373:
1371:
1368:
1367:
1365:
1363:
1359:
1353:
1350:
1348:
1345:
1343:
1340:
1338:
1335:
1333:
1330:
1328:
1325:
1323:
1320:
1319:
1317:
1313:
1307:
1304:
1302:
1299:
1297:
1294:
1292:
1289:
1287:
1284:
1283:
1281:
1277:
1271:
1268:
1266:
1263:
1261:
1258:
1256:
1253:
1251:
1248:
1246:
1243:
1241:
1238:
1236:
1233:
1231:
1228:
1227:
1225:
1221:
1215:
1212:
1209:
1206:
1204:
1201:
1199:
1196:
1193:
1190:
1188:
1185:
1182:
1179:
1177:
1174:
1173:
1171:
1169:
1165:
1159:
1156:
1154:
1151:
1149:
1146:
1144:
1141:
1139:
1136:
1134:
1131:
1129:
1126:
1124:
1121:
1120:
1118:
1114:
1108:
1105:
1103:
1100:
1098:
1095:
1093:
1090:
1088:
1085:
1083:
1080:
1078:
1075:
1073:
1070:
1068:
1065:
1063:
1060:
1059:
1057:
1053:
1049:
1042:
1037:
1035:
1030:
1028:
1023:
1022:
1019:
1013:
1010:
1008:
1005:
1003:
1000:
998:
995:
993:
990:
988:
985:
983:
980:
979:
975:
969:
966:
965:
961:
954:
947:
944:
941:
934:
931:
926:
920:
917:
912:
908:
904:
900:
896:
892:
887:
882:
878:
874:
867:
864:
860:. 2019-07-11.
859:
858:
853:
847:
844:
837:
834:
828:
825:
818:
815:
809:
806:
799:
796:
788:
782:
779:
774:
770:
766:
760:
756:
752:
748:
741:
738:
732:
729:
722:
718:
714:
711:
709:
708:Process image
706:
705:
701:
699:
697:
693:
689:
688:state machine
685:
681:
677:
673:
669:
661:
659:
657:
656:
651:
647:
643:
639:
635:
630:
626:
617:
612:
610:
608:
601:
599:
597:
590:
588:
581:
579:
577:
573:
568:
564:
560:
556:
547:
545:
538:
536:
526:
523:
515:
505:
501:
497:
491:
490:
486:
481:This section
479:
475:
470:
469:
463:
461:
457:
447:
444:
436:
426:
422:
418:
412:
411:
407:
402:This section
400:
396:
391:
390:
384:
382:
378:
374:
364:
361:
353:
343:
339:
335:
329:
328:
324:
319:This section
317:
313:
308:
307:
301:
296:
294:
292:
291:
290:domino effect
285:
279:
277:
272:
264:
262:
260:
256:
252:
248:
244:
240:
239:Checkpointing
230:
227:
212:
209:
201:
190:
187:
183:
180:
176:
173:
169:
166:
162:
159: –
158:
154:
153:Find sources:
147:
143:
137:
136:
131:This article
129:
125:
120:
119:
110:
107:February 2012
100:
96:
91:
87:
84:This article
82:
78:
73:
72:
67:
65:
58:
57:
52:
51:
46:
41:
32:
31:
19:
18:Checkpointing
1351:
1315:Coordination
1245:Amdahl's law
1181:Simultaneous
946:
933:
919:
876:
872:
866:
855:
846:
836:
827:
817:
808:
798:
781:
746:
740:
731:
665:
654:
621:
605:
594:
585:
551:
542:
533:
518:
512:January 2024
509:
494:Please help
482:
458:
454:
439:
433:January 2024
430:
415:Please help
403:
379:
375:
371:
356:
350:January 2024
347:
332:Please help
320:
288:
280:
268:
238:
237:
222:
204:
195:
185:
178:
171:
164:
152:
140:Please help
135:verification
132:
104:
95:You can help
85:
61:
54:
48:
47:Please help
44:
1751:Scalability
1512:distributed
1395:Concurrency
1362:Programming
1203:Cooperative
1192:Speculative
1128:Instruction
713:Save states
565:, and many
255:application
1756:Starvation
1495:asymmetric
1230:PRAM model
1198:Preemptive
723:References
644:family of
302:Save State
168:newspapers
50:improve it
1490:symmetric
1235:PEM model
903:1045-9219
881:CiteSeerX
822:32). ACM.
692:registers
625:RFID tags
483:does not
404:does not
321:does not
247:computing
198:July 2022
99:talk page
56:talk page
1790:Category
1721:Deadlock
1709:Problems
1675:pthreads
1655:OpenHMPP
1580:Ateji PX
1541:computer
1412:Hardware
1279:Elements
1265:Slowdown
1176:Temporal
1158:Pipeline
1012:Cryopid2
773:53235850
702:See also
698:device.
696:RFID tag
618:Mementos
555:Open MPI
251:snapshot
1680:RaftLib
1660:OpenACC
1635:GPUOpen
1625:C++ AMP
1600:Charm++
1342:Barrier
1286:Process
1270:Speedup
1055:General
982:LibCkpt
911:2086958
680:Verilog
655:Memento
504:removed
489:sources
425:removed
410:sources
342:removed
327:sources
269:In the
259:failure
253:of the
182:scholar
1773:
1650:OpenCL
1645:OpenMP
1590:Chapel
1507:shared
1502:Memory
1437:(SIMT)
1380:Models
1291:Thread
1223:Theory
1194:(SpMT)
1148:Memory
1133:Thread
1116:Levels
1002:OpenVZ
909:
901:
883:
857:GitHub
771:
761:
662:Idetic
642:MSP430
638:reboot
596:Docker
591:Docker
559:Python
184:
177:
170:
163:
155:
97:. The
1620:Dryad
1585:Boost
1306:Array
1296:Fiber
1210:(CMT)
1183:(SMT)
1097:GPGPU
907:S2CID
841:IEEE.
803:IEEE.
790:(PDF)
769:S2CID
548:DMTCP
189:JSTOR
175:books
1685:ROCm
1615:CUDA
1605:Cilk
1572:APIs
1532:COMA
1527:NUMA
1458:MIMD
1453:MISD
1430:SIMD
1425:SISD
1153:Loop
1143:Data
1138:Task
1007:CRIU
899:ISSN
759:ISBN
607:CRIU
602:CRIU
576:mmap
563:Perl
487:any
485:cite
408:any
406:cite
325:any
323:cite
245:for
161:news
1700:ZPL
1695:TBB
1690:UPC
1670:PVM
1640:MPI
1595:HPX
1522:UMA
1123:Bit
987:FTI
891:doi
751:doi
652:'s
498:by
419:by
336:by
293:).
144:by
1792::
905:.
897:.
889:.
877:20
875:.
854:.
767:.
757:.
658:.
561:,
557:,
59:.
1040:e
1033:t
1026:v
927:.
913:.
893::
775:.
753::
678:(
525:)
519:(
514:)
510:(
506:.
492:.
446:)
440:(
435:)
431:(
427:.
413:.
363:)
357:(
352:)
348:(
344:.
330:.
229:)
223:(
211:)
205:(
200:)
196:(
186:·
179:·
172:·
165:·
138:.
109:)
105:(
92:.
66:)
62:(
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.