36:
2032:
efficient
Content-Defined Chunking approach. It uses a fast rolling Gear hash algorithm, skipping the minimum length, normalizing the chunk-size distribution, and last but not the least, rolling two bytes each time to speed up the CDC algorithm, which can achieve about 10X higher throughput than Rabin-based CDC approach.
2031:
The
Content-Defined Chunking algorithm needs to compute the hash value of a data stream byte by byte and split the data stream into chunks when the hash value meets a predefined value. However, comparing a string byte-by-byte will introduce the heavy computation overhead. FastCDC proposes a new and
1518:
One of the interesting use cases of the rolling hash function is that it can create dynamic, content-based chunks of a stream or file. This is especially useful when it is required to send only the changed chunks of a large file over a network: a simple byte addition at the front of the file would
2230:
Where Gear array is a pre-calculated hashing array. Here FastCDC uses Gear hashing algorithm which can calculate the rolling hashing results quickly and keep the uniform distribution of the hashing results as Rabin. Compared with the traditional Rabin hashing algorithm, it achieves a much faster
662:
Because it shares the same author as the Rabin–Karp string search algorithm, which is often explained with another, simpler rolling hash, and because this simpler rolling hash is also a polynomial, both rolling hashes are often mistaken for each other. The backup software
2027:
Chunking is a technique to divide a data stream into a set of blocks, also called chunks. Content-defined chunking (CDC) is a chunking technique in which the division of the data stream is not based on fixed chunk size, as in fixed-size chunking, but on its content.
1020:
86:
A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a
2239:
All rolling hash functions can be computed in time linear in the number of characters and updated in constant time when characters are shifted by one position. In particular, computing the Rabin-Karp rolling hash of a string of length
659:). The hash is the result of the division of that polynomial by an irreducible polynomial over GF(2). It is possible to update a Rabin fingerprint using only the entering and the leaving byte, making it effectively a rolling hash.
1346:
288:
1686:
2231:
speed. Experiments suggest that it can generate nearly the same chunk size distribution in the much shorter time (about 1/10 of rabin-based chunking ) when segmenting the data stream.
1508:
483:
Removing and adding characters simply involves adding or subtracting the first or last term. Shifting all characters by one position to the left requires multiplying the entire sum
849:
1745:
1140:
1239:
364:
1558:
817:
1062:
754:
1971:
1382:
837:
614:
1829:
1417:
1167:
2626:
2316:
2287:
1909:
1880:
1780:
2017:
1463:
1915:
Shifting the window by one byte simply involves adding the new character to the sum and subtracting the oldest character (no longer in the window) from the sum.
1585:
2258:
1991:
1936:
1849:
1800:
1437:
1187:
1105:
1085:
778:
715:
634:
581:
561:
541:
521:
501:
474:
454:
434:
414:
391:
312:
1522:
A simple approach to making dynamic chunks is to calculate a rolling hash, and if the hash value matches an arbitrary pattern (e.g. all zeroes) in the lower
142:
103:
1560:, given the hash has a uniform probability distribution) then it’s chosen to be a chunk boundary. Each chunk will thus have an average size of
2541:
Xia, Wen; Jiang, Hong; Feng, Dan; Tian, Lei; Fu, Min; Zhou, Yukun (2014). "Ddelta: A deduplication-inspired fast delta compression approach".
1247:
655:. Instead of seeing the input as a polynomial of bytes, it is seen as a polynomial of bits, and all arithmetic is done in GF(2) (similarly to
151:
689:
Hashing by cyclic polynomial—sometimes called
Buzhash—is also simple and it has the benefit of avoiding multiplications, using
2019:. This approach will ensure that any change in the file will only affect its current and possibly the next chunk, but no other chunk.
2499:
1590:
Once the boundaries are known, the chunks need to be compared by cryptographic hash value to detect changes. The backup software
1587:
bytes. This approach ensures that unmodified data (more than a window size away from the changes) will have the same boundaries.
2579:
Xia, Wen; Zou, Xiangyu; Jiang, Hong; Zhou, Yukun; Liu, Chuanyi; Feng, Dan; Hua, Yu; Hu, Yuchong; Zhang, Yucheng (2020-06-16).
2452:
584:
118:
as its rolling hash. FastCDC (Fast
Content-Defined Chunking) uses a compute-efficient Gear fingerprint as its rolling hash.
477:
2382:
1619:
1519:
normally cause all fixed size windows to become updated, while in reality, only the first "chunk" has been modified.
2435:
1468:
51:
2489:
1015:{\displaystyle H=s^{k-1}(h(c_{1}))\oplus s^{k-2}(h(c_{2}))\oplus \ldots \oplus s(h(c_{k-1}))\oplus h(c_{k}),}
2641:
2368:-gram hashing is pairwise independent, at best, Computer Speech & Language 24 (4), pages 698–710, 2010.
1591:
1025:
where the multiplications by powers of two can be implemented by binary shifts. The result is a number in
1694:
45:
130:
2488:
Xia, Wen; Zhou, Yukun; Jiang, Hong; Feng, Dan; Hua, Yu; Hu, Yuchong; Liu, Qing; Zhang, Yucheng (2005).
1110:
1192:
317:
122:
1387:
Hashing by cyclic polynomials is strongly universal or pairwise independent: simply keep the first
2600:
2515:
1613:
option) and rsyncrypto, do content-based slicing based on this specific (unweighted) moving sum:
1598:
1529:
694:
394:
91:
function can be computed much more quickly than other low-pass filters; and similar to the way a
787:
1028:
720:
145:
is often explained using a rolling hash function that only uses multiplications and additions:
2558:
2495:
645:
367:
126:
115:
56:
2421:
1941:
1354:
822:
784:): it rotates the bits by 1 to the left, pushing the latest bit in the first position. E.g.,
2592:
2550:
1594:
uses the
Buzhash algorithm with a customizable chunk size range for splitting file streams.
664:
636:
can be multiplied to get the result of the division without actually performing a division.
589:
1807:
1390:
1145:
2581:"The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems"
2527:
2475:
2292:
2263:
1885:
1856:
1756:
96:
92:
1996:
1442:
523:. Shifting all characters by one position to the right requires dividing the entire sum
17:
2323:
2243:
1976:
1921:
1834:
1785:
1563:
1422:
1172:
1090:
1070:
781:
763:
700:
690:
619:
566:
546:
526:
506:
486:
459:
439:
419:
399:
376:
297:
88:
2491:
FastCDC: A Fast and
Efficient Content-Defined Chunking Approach for Data Deduplication
2453:"Data structures and file formats — Borg - Deduplicating Archiver 1.1.5 documentation"
2635:
2604:
80:
2319:
840:
649:
106:, which uses the rolling hash described below. Another popular application is the
2341:
2580:
2554:
648:
is another hash, which also interprets the input as a polynomial, but over the
2596:
757:
2562:
667:
uses a Rabin fingerprint for splitting files, with blob size varying between
2369:
1465:
consecutive bits. In practice, this can be achieved by an integer division
2289:
modular arithmetic operations, and hashing by cyclic polynomials requires
111:
2436:"Rabin Karp rolling hash - dynamic sized chunks based on hashed content"
1341:{\displaystyle H\leftarrow s(H)\oplus s^{k}(h(c_{1}))\oplus h(c_{k+1}),}
2627:
MIT 6.006: Introduction to
Algorithms 2011- Recitation 9 - Rolling Hash
2335:
1067:
Computing the hash values in a rolling fashion is done as follows. Let
283:{\displaystyle H=c_{1}a^{k-1}+c_{2}a^{k-2}+c_{3}a^{k-3}+...+c_{k}a^{0}}
2574:
2572:
114:
as its rolling hash. Low
Bandwidth Network Filesystem (LBFS) uses a
83:
where the input is hashed in a window that moves through the input.
656:
652:
107:
29:
2023:
Gear fingerprint and content-based chunking algorithm FastCDC
456:
is critical to get good hashing; in particular, the modulus
79:(also known as recursive hashing or rolling checksum) is a
2422:"Foundation - Introducing Content Defined Chunking (CDC)"
2068:← 64KB // split maximum chunk size is 64 KB
2064:← 2KB // split minimum chunk size is 2 KB
760:
mapping characters to random integers. Let the function
1882:
is a "hash value" consisting of the bottom 12 bits of
1782:
is the sum of 8196 consecutive bytes ending with byte
1566:
1532:
2585:
IEEE Transactions on
Parallel and Distributed Systems
2295:
2266:
2246:
2035:
The basic version pseudocode is provided as follows:
1999:
1979:
1944:
1924:
1888:
1859:
1837:
1810:
1788:
1759:
1697:
1622:
1471:
1445:
1425:
1393:
1357:
1250:
1195:
1175:
1148:
1113:
1093:
1073:
1031:
852:
825:
790:
766:
723:
703:
622:
592:
569:
549:
529:
509:
489:
462:
442:
422:
402:
379:
366:
are the input characters (but this function is not a
320:
300:
154:
110:
program, which uses a checksum based on Mark Adler's
2405:Jonathan D. Cohen, Recursive Hashing Functions for
2090:// buffer size is less than minimum chunk size
756:. This hash function might be simply an array or a
43:It has been suggested that this article should be
2310:
2281:
2252:
2011:
1985:
1965:
1930:
1903:
1874:
1843:
1823:
1794:
1774:
1739:
1680:
1579:
1552:
1502:
1457:
1431:
1411:
1376:
1340:
1233:
1181:
1161:
1134:
1099:
1079:
1056:
1014:
831:
811:
772:
748:
709:
628:
608:
575:
555:
535:
515:
495:
468:
448:
428:
408:
385:
358:
306:
282:
1597:Such content-defined chunking is often used for
1681:{\displaystyle S(n)=\sum _{i=n-8195}^{n}c_{i},}
697:: it presumes that there is some hash function
2360:
2358:
2356:
8:
2447:
2445:
2417:
2415:
2134:bytes, and kickstart the hash
717:from characters to integers in the interval
2409:-Grams, ACM Trans. Inf. Syst. 15 (3), 1997.
1609:Several programs, including gzip (with the
1514:Content-based slicing using a rolling hash
1503:{\displaystyle H\rightarrow H\div 2^{k-1}}
1169:is the character to be removed, rotate it
2383:"References — restic 0.9.0 documentation"
2294:
2265:
2245:
1998:
1978:
1943:
1923:
1887:
1858:
1836:
1815:
1809:
1787:
1758:
1730:
1729:
1696:
1669:
1659:
1642:
1621:
1571:
1565:
1542:
1533:
1531:
1488:
1470:
1444:
1424:
1392:
1362:
1356:
1320:
1295:
1276:
1249:
1219:
1200:
1194:
1174:
1153:
1147:
1112:
1092:
1072:
1045:
1030:
1000:
969:
932:
907:
888:
863:
851:
824:
789:
765:
737:
722:
702:
621:
597:
591:
568:
548:
528:
508:
488:
461:
441:
421:
401:
378:
350:
325:
319:
299:
274:
264:
233:
223:
204:
194:
175:
165:
153:
2352:
2523:
2513:
1973:, these programs cut the file between
1605:Content-based slicing using moving sum
2364:Daniel Lemire, Owen Kaser: Recursive
7:
373:In order to avoid manipulating huge
102:One of the main applications is the
1740:{\displaystyle H(n)=S(n)\mod 4096,}
1087:be the previous hash value. Rotate
2434:Horvath, Adam (October 24, 2012).
563:. Note that in modulo arithmetic,
476:is typically a prime number. See
143:Rabin–Karp string search algorithm
104:Rabin–Karp string search algorithm
25:
843:. The hash values are defined as
121:At best, rolling hash values are
1135:{\displaystyle H\leftarrow s(H)}
780:be a cyclic binary rotation (or
34:
1725:
1419:bits. That is, take the result
1234:{\displaystyle s^{k}(h(c_{1}))}
359:{\displaystyle c_{1},...,c_{k}}
2305:
2299:
2276:
2270:
1954:
1948:
1898:
1892:
1869:
1863:
1802:(requires 21 bits of storage),
1769:
1763:
1722:
1716:
1707:
1701:
1632:
1626:
1475:
1332:
1313:
1304:
1301:
1288:
1282:
1266:
1260:
1254:
1228:
1225:
1212:
1206:
1129:
1123:
1117:
1051:
1032:
1006:
993:
984:
981:
962:
956:
941:
938:
925:
919:
897:
894:
881:
875:
800:
794:
743:
724:
1:
2338: – Data mining technique
478:linear congruential generator
1553:{\textstyle {1 \over 2^{n}}}
1526:bits (with a probability of
2658:
2555:10.1016/j.peva.2014.07.016
2049:, data length
812:{\displaystyle s(101)=011}
49:into a new article titled
2597:10.1109/TPDS.2020.2984632
2457:borgbackup.readthedocs.io
1057:{\displaystyle [0,2^{L})}
749:{\displaystyle [0,2^{L})}
693:instead. It is a form of
393:values, all math is done
99:from the old hash value.
2235:Computational complexity
583:can be chosen to have a
52:Content-Defined Chunking
18:Content-Defined Chunking
1966:{\displaystyle H(n)==0}
1377:{\displaystyle c_{k+1}}
832:{\displaystyle \oplus }
137:Polynomial rolling hash
2543:Performance Evaluation
2476:"Rsyncrypto Algorithm"
2312:
2283:
2254:
2221:+ 1
2013:
1987:
1967:
1932:
1905:
1876:
1845:
1825:
1796:
1776:
1741:
1682:
1664:
1581:
1554:
1504:
1459:
1433:
1413:
1384:is the new character.
1378:
1342:
1235:
1183:
1163:
1136:
1101:
1081:
1058:
1016:
833:
813:
774:
750:
711:
630:
610:
609:{\displaystyle a^{-1}}
585:multiplicative inverse
577:
557:
537:
517:
497:
470:
450:
430:
410:
387:
360:
308:
284:
2387:restic.readthedocs.io
2313:
2284:
2255:
2014:
1988:
1968:
1933:
1906:
1877:
1846:
1826:
1824:{\displaystyle c_{i}}
1797:
1777:
1742:
1683:
1638:
1582:
1555:
1505:
1460:
1434:
1414:
1412:{\displaystyle L-k+1}
1379:
1343:
1236:
1184:
1164:
1162:{\displaystyle c_{1}}
1137:
1102:
1082:
1059:
1017:
834:
814:
775:
751:
712:
631:
611:
578:
558:
538:
518:
498:
480:for more discussion.
471:
451:
431:
411:
388:
361:
309:
285:
27:Type of hash function
2311:{\displaystyle O(k)}
2293:
2282:{\displaystyle O(k)}
2264:
2244:
2074:0x0000d93003530000LL
1997:
1977:
1942:
1922:
1904:{\displaystyle S(n)}
1886:
1875:{\displaystyle H(n)}
1857:
1835:
1808:
1786:
1775:{\displaystyle S(n)}
1757:
1695:
1620:
1564:
1530:
1469:
1443:
1423:
1391:
1355:
1248:
1193:
1173:
1146:
1111:
1091:
1071:
1029:
850:
823:
788:
764:
721:
701:
620:
590:
567:
547:
527:
507:
487:
460:
440:
420:
400:
377:
318:
298:
152:
123:pairwise independent
2012:{\displaystyle n+1}
1458:{\displaystyle k-1}
314:is a constant, and
2308:
2279:
2250:
2167:+ 1
2130:// Skip the first
2009:
1983:
1963:
1928:
1901:
1872:
1841:
1821:
1792:
1772:
1737:
1678:
1599:data deduplication
1580:{\textstyle 2^{n}}
1577:
1550:
1500:
1455:
1429:
1409:
1374:
1338:
1241:. Then simply set
1231:
1179:
1159:
1132:
1097:
1077:
1054:
1012:
829:
809:
770:
746:
707:
695:tabulation hashing
626:
606:
573:
553:
533:
513:
493:
466:
446:
426:
406:
383:
356:
304:
280:
131:3-wise independent
2253:{\displaystyle k}
1986:{\displaystyle n}
1931:{\displaystyle n}
1844:{\displaystyle i}
1795:{\displaystyle n}
1548:
1432:{\displaystyle H}
1182:{\displaystyle k}
1100:{\displaystyle H}
1080:{\displaystyle H}
773:{\displaystyle s}
710:{\displaystyle h}
685:Cyclic polynomial
646:Rabin fingerprint
640:Rabin fingerprint
629:{\displaystyle H}
576:{\displaystyle a}
556:{\displaystyle a}
536:{\displaystyle H}
516:{\displaystyle a}
496:{\displaystyle H}
469:{\displaystyle n}
449:{\displaystyle n}
429:{\displaystyle a}
409:{\displaystyle n}
386:{\displaystyle H}
368:Rabin fingerprint
307:{\displaystyle a}
129:. They cannot be
116:Rabin fingerprint
73:
72:
16:(Redirected from
2649:
2615:
2614:
2612:
2611:
2591:(9): 2017–2031.
2576:
2567:
2566:
2538:
2532:
2531:
2525:
2521:
2519:
2511:
2509:
2508:
2485:
2479:
2473:
2467:
2466:
2464:
2463:
2449:
2440:
2439:
2431:
2425:
2419:
2410:
2403:
2397:
2396:
2394:
2393:
2379:
2373:
2362:
2317:
2315:
2314:
2309:
2288:
2286:
2285:
2280:
2259:
2257:
2256:
2251:
2018:
2016:
2015:
2010:
1992:
1990:
1989:
1984:
1972:
1970:
1969:
1964:
1937:
1935:
1934:
1929:
1910:
1908:
1907:
1902:
1881:
1879:
1878:
1873:
1850:
1848:
1847:
1842:
1830:
1828:
1827:
1822:
1820:
1819:
1801:
1799:
1798:
1793:
1781:
1779:
1778:
1773:
1746:
1744:
1743:
1738:
1687:
1685:
1684:
1679:
1674:
1673:
1663:
1658:
1612:
1586:
1584:
1583:
1578:
1576:
1575:
1559:
1557:
1556:
1551:
1549:
1547:
1546:
1534:
1509:
1507:
1506:
1501:
1499:
1498:
1464:
1462:
1461:
1456:
1439:and dismiss any
1438:
1436:
1435:
1430:
1418:
1416:
1415:
1410:
1383:
1381:
1380:
1375:
1373:
1372:
1347:
1345:
1344:
1339:
1331:
1330:
1300:
1299:
1281:
1280:
1240:
1238:
1237:
1232:
1224:
1223:
1205:
1204:
1188:
1186:
1185:
1180:
1168:
1166:
1165:
1160:
1158:
1157:
1141:
1139:
1138:
1133:
1106:
1104:
1103:
1098:
1086:
1084:
1083:
1078:
1063:
1061:
1060:
1055:
1050:
1049:
1021:
1019:
1018:
1013:
1005:
1004:
980:
979:
937:
936:
918:
917:
893:
892:
874:
873:
838:
836:
835:
830:
818:
816:
815:
810:
779:
777:
776:
771:
755:
753:
752:
747:
742:
741:
716:
714:
713:
708:
680:
679:
673:
672:
635:
633:
632:
627:
615:
613:
612:
607:
605:
604:
582:
580:
579:
574:
562:
560:
559:
554:
542:
540:
539:
534:
522:
520:
519:
514:
502:
500:
499:
494:
475:
473:
472:
467:
455:
453:
452:
447:
435:
433:
432:
427:
416:. The choice of
415:
413:
412:
407:
392:
390:
389:
384:
365:
363:
362:
357:
355:
354:
330:
329:
313:
311:
310:
305:
289:
287:
286:
281:
279:
278:
269:
268:
244:
243:
228:
227:
215:
214:
199:
198:
186:
185:
170:
169:
68:
65:
38:
37:
30:
21:
2657:
2656:
2652:
2651:
2650:
2648:
2647:
2646:
2632:
2631:
2623:
2618:
2609:
2607:
2578:
2577:
2570:
2540:
2539:
2535:
2522:
2512:
2506:
2504:
2502:
2487:
2486:
2482:
2474:
2470:
2461:
2459:
2451:
2450:
2443:
2433:
2432:
2428:
2420:
2413:
2404:
2400:
2391:
2389:
2381:
2380:
2376:
2370:arXiv:0705.4676
2363:
2354:
2350:
2332:
2324:circular shifts
2291:
2290:
2262:
2261:
2242:
2241:
2237:
2228:
2188:<< 1 ) +
2155:<< 1 ) +
2025:
1995:
1994:
1975:
1974:
1940:
1939:
1920:
1919:
1884:
1883:
1855:
1854:
1833:
1832:
1811:
1806:
1805:
1784:
1783:
1755:
1754:
1693:
1692:
1665:
1618:
1617:
1610:
1607:
1567:
1562:
1561:
1538:
1528:
1527:
1516:
1484:
1467:
1466:
1441:
1440:
1421:
1420:
1389:
1388:
1358:
1353:
1352:
1316:
1291:
1272:
1246:
1245:
1215:
1196:
1191:
1190:
1171:
1170:
1149:
1144:
1143:
1109:
1108:
1089:
1088:
1069:
1068:
1041:
1027:
1026:
996:
965:
928:
903:
884:
859:
848:
847:
839:be the bitwise
821:
820:
786:
785:
762:
761:
733:
719:
718:
699:
698:
687:
677:
675:
670:
668:
642:
618:
617:
593:
588:
587:
565:
564:
545:
544:
525:
524:
505:
504:
485:
484:
458:
457:
438:
437:
418:
417:
398:
397:
375:
374:
346:
321:
316:
315:
296:
295:
270:
260:
229:
219:
200:
190:
171:
161:
150:
149:
139:
133:, for example.
97:rapidly updated
69:
63:
60:
39:
35:
28:
23:
22:
15:
12:
11:
5:
2655:
2653:
2645:
2644:
2642:Hash functions
2634:
2633:
2630:
2629:
2622:
2621:External links
2619:
2617:
2616:
2568:
2533:
2524:|website=
2500:
2480:
2468:
2441:
2426:
2411:
2398:
2374:
2351:
2349:
2346:
2345:
2344:
2339:
2331:
2328:
2307:
2304:
2301:
2298:
2278:
2275:
2272:
2269:
2249:
2236:
2233:
2037:
2024:
2021:
2008:
2005:
2002:
1982:
1962:
1959:
1956:
1953:
1950:
1947:
1927:
1913:
1912:
1900:
1897:
1894:
1891:
1871:
1868:
1865:
1862:
1852:
1840:
1818:
1814:
1803:
1791:
1771:
1768:
1765:
1762:
1748:
1747:
1736:
1733:
1728:
1724:
1721:
1718:
1715:
1712:
1709:
1706:
1703:
1700:
1689:
1688:
1677:
1672:
1668:
1662:
1657:
1654:
1651:
1648:
1645:
1641:
1637:
1634:
1631:
1628:
1625:
1606:
1603:
1574:
1570:
1545:
1541:
1537:
1515:
1512:
1497:
1494:
1491:
1487:
1483:
1480:
1477:
1474:
1454:
1451:
1448:
1428:
1408:
1405:
1402:
1399:
1396:
1371:
1368:
1365:
1361:
1349:
1348:
1337:
1334:
1329:
1326:
1323:
1319:
1315:
1312:
1309:
1306:
1303:
1298:
1294:
1290:
1287:
1284:
1279:
1275:
1271:
1268:
1265:
1262:
1259:
1256:
1253:
1230:
1227:
1222:
1218:
1214:
1211:
1208:
1203:
1199:
1178:
1156:
1152:
1131:
1128:
1125:
1122:
1119:
1116:
1096:
1076:
1053:
1048:
1044:
1040:
1037:
1034:
1023:
1022:
1011:
1008:
1003:
999:
995:
992:
989:
986:
983:
978:
975:
972:
968:
964:
961:
958:
955:
952:
949:
946:
943:
940:
935:
931:
927:
924:
921:
916:
913:
910:
906:
902:
899:
896:
891:
887:
883:
880:
877:
872:
869:
866:
862:
858:
855:
828:
808:
805:
802:
799:
796:
793:
782:circular shift
769:
745:
740:
736:
732:
729:
726:
706:
686:
683:
641:
638:
625:
603:
600:
596:
572:
552:
532:
512:
492:
465:
445:
425:
405:
382:
370:, see below).
353:
349:
345:
342:
339:
336:
333:
328:
324:
303:
292:
291:
277:
273:
267:
263:
259:
256:
253:
250:
247:
242:
239:
236:
232:
226:
222:
218:
213:
210:
207:
203:
197:
193:
189:
184:
181:
178:
174:
168:
164:
160:
157:
138:
135:
89:moving average
71:
70:
42:
40:
33:
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
2654:
2643:
2640:
2639:
2637:
2628:
2625:
2624:
2620:
2606:
2602:
2598:
2594:
2590:
2586:
2582:
2575:
2573:
2569:
2564:
2560:
2556:
2552:
2548:
2544:
2537:
2534:
2529:
2517:
2503:
2501:9781931971300
2497:
2493:
2492:
2484:
2481:
2477:
2472:
2469:
2458:
2454:
2448:
2446:
2442:
2437:
2430:
2427:
2423:
2418:
2416:
2412:
2408:
2402:
2399:
2388:
2384:
2378:
2375:
2371:
2367:
2361:
2359:
2357:
2353:
2347:
2343:
2340:
2337:
2334:
2333:
2329:
2327:
2325:
2321:
2320:exclusive ors
2302:
2296:
2273:
2267:
2247:
2234:
2232:
2227:
2224:
2220:
2216:
2213:
2210:
2207:
2203:
2199:
2195:
2191:
2187:
2183:
2180:
2177:
2173:
2170:
2166:
2162:
2158:
2154:
2150:
2147:
2144:
2140:
2137:
2133:
2129:
2125:
2122:
2119:
2115:
2112:
2109:
2106:
2103:
2100:
2096:
2093:
2089:
2085:
2082:
2078:
2075:
2071:
2067:
2063:
2060:
2056:
2052:
2048:
2044:
2040:
2036:
2033:
2029:
2022:
2020:
2006:
2003:
2000:
1980:
1960:
1957:
1951:
1945:
1925:
1916:
1895:
1889:
1866:
1860:
1853:
1838:
1816:
1812:
1804:
1789:
1766:
1760:
1753:
1752:
1751:
1734:
1731:
1726:
1719:
1713:
1710:
1704:
1698:
1691:
1690:
1675:
1670:
1666:
1660:
1655:
1652:
1649:
1646:
1643:
1639:
1635:
1629:
1623:
1616:
1615:
1614:
1604:
1602:
1600:
1595:
1593:
1588:
1572:
1568:
1543:
1539:
1535:
1525:
1520:
1513:
1511:
1495:
1492:
1489:
1485:
1481:
1478:
1472:
1452:
1449:
1446:
1426:
1406:
1403:
1400:
1397:
1394:
1385:
1369:
1366:
1363:
1359:
1335:
1327:
1324:
1321:
1317:
1310:
1307:
1296:
1292:
1285:
1277:
1273:
1269:
1263:
1257:
1251:
1244:
1243:
1242:
1220:
1216:
1209:
1201:
1197:
1176:
1154:
1150:
1126:
1120:
1114:
1094:
1074:
1065:
1046:
1042:
1038:
1035:
1009:
1001:
997:
990:
987:
976:
973:
970:
966:
959:
953:
950:
947:
944:
933:
929:
922:
914:
911:
908:
904:
900:
889:
885:
878:
870:
867:
864:
860:
856:
853:
846:
845:
844:
842:
826:
806:
803:
797:
791:
783:
767:
759:
738:
734:
730:
727:
704:
696:
692:
691:barrel shifts
684:
682:
666:
660:
658:
654:
651:
647:
639:
637:
623:
601:
598:
594:
586:
570:
550:
530:
510:
490:
481:
479:
463:
443:
423:
403:
396:
380:
371:
369:
351:
347:
343:
340:
337:
334:
331:
326:
322:
301:
275:
271:
265:
261:
257:
254:
251:
248:
245:
240:
237:
234:
230:
224:
220:
216:
211:
208:
205:
201:
195:
191:
187:
182:
179:
176:
172:
166:
162:
158:
155:
148:
147:
146:
144:
136:
134:
132:
128:
124:
119:
117:
113:
109:
105:
100:
98:
94:
90:
84:
82:
81:hash function
78:
67:
58:
54:
53:
48:
47:
41:
32:
31:
19:
2608:. Retrieved
2588:
2584:
2546:
2542:
2536:
2505:. Retrieved
2490:
2483:
2471:
2460:. Retrieved
2456:
2429:
2406:
2401:
2390:. Retrieved
2386:
2377:
2365:
2238:
2229:
2225:
2222:
2218:
2214:
2211:
2208:
2205:
2201:
2197:
2193:
2189:
2185:
2181:
2178:
2175:
2171:
2168:
2164:
2160:
2156:
2152:
2148:
2145:
2142:
2138:
2135:
2131:
2127:
2123:
2120:
2117:
2113:
2110:
2107:
2104:
2101:
2098:
2094:
2091:
2087:
2083:
2080:
2076:
2073:
2069:
2065:
2061:
2058:
2054:
2050:
2046:
2045:data buffer
2042:
2041:FastCDC
2038:
2034:
2030:
2026:
1917:
1914:
1851:of the file,
1749:
1608:
1596:
1589:
1523:
1521:
1517:
1386:
1350:
1066:
1024:
841:exclusive or
688:
661:
650:Galois field
643:
482:
372:
293:
140:
125:or strongly
120:
101:
93:Zobrist hash
85:
77:rolling hash
76:
74:
61:
50:
44:
2549:: 258–272.
2342:w-shingling
1611:--rsyncable
64:August 2020
2610:2020-07-22
2507:2020-07-24
2462:2018-05-24
2392:2018-05-24
2192:]
2159:]
2057:cut point
1918:For every
758:hash table
2605:215817722
2563:0166-5316
2526:ignored (
2516:cite book
2348:Footnotes
2260:requires
2184:← (
2151:← (
2039:algorithm
1653:−
1640:∑
1493:−
1482:÷
1476:→
1450:−
1398:−
1308:⊕
1270:⊕
1255:←
1118:←
988:⊕
974:−
951:⊕
948:…
945:⊕
912:−
901:⊕
868:−
827:⊕
616:by which
599:−
238:−
209:−
180:−
127:universal
2636:Category
2330:See also
2318:bitwise
2217:←
2196: !(
2163:←
2126:←
2086:←
2079:←
2072:←
1831:is byte
1189:times:
112:adler-32
2424:. 2015.
2336:MinHash
2143:MinSize
2132:MinSize
2128:MaxSize
2118:MaxSize
2099:MinSize
2066:MaxSize
2062:MinSize
2055:output:
2053:,
95:can be
57:discuss
2603:
2561:
2498:
2223:return
2209:return
2200:&
2105:return
2043:input:
1938:where
1750:where
1351:where
1142:. If
1107:once:
819:. Let
665:restic
395:modulo
294:where
2601:S2CID
2174:<
2169:while
2141:<
2136:while
657:CRC32
653:GF(2)
108:rsync
46:split
2559:ISSN
2528:help
2496:ISBN
2322:and
2206:then
2202:Mask
2190:Gear
2157:Gear
2121:then
2102:then
2070:Mask
1993:and
1732:4096
1656:8195
1592:Borg
674:and
644:The
436:and
141:The
2593:doi
2551:doi
2047:src
1727:mod
807:011
798:101
678:MiB
671:KiB
669:512
543:by
503:by
59:)
55:. (
2638::
2599:.
2589:31
2587:.
2583:.
2571:^
2557:.
2547:79
2545:.
2520::
2518:}}
2514:{{
2494:.
2455:.
2444:^
2414:^
2385:.
2355:^
2326:.
2204:)
2198:fp
2194:if
2186:fp
2182:fp
2179:do
2153:fp
2149:fp
2146:do
2116:≥
2111:if
2097:≤
2092:if
2077:fp
1958:==
1601:.
1510:.
1064:.
681:.
75:A
2613:.
2595::
2565:.
2553::
2530:)
2510:.
2478:.
2465:.
2438:.
2407:n
2395:.
2372:.
2366:n
2306:)
2303:k
2300:(
2297:O
2277:)
2274:k
2271:(
2268:O
2248:k
2226:i
2219:i
2215:i
2212:i
2176:n
2172:i
2165:i
2161:i
2139:i
2124:n
2114:n
2108:n
2095:n
2088:0
2084:i
2081:0
2059:i
2051:n
2007:1
2004:+
2001:n
1981:n
1961:0
1955:)
1952:n
1949:(
1946:H
1926:n
1911:.
1899:)
1896:n
1893:(
1890:S
1870:)
1867:n
1864:(
1861:H
1839:i
1817:i
1813:c
1790:n
1770:)
1767:n
1764:(
1761:S
1735:,
1723:)
1720:n
1717:(
1714:S
1711:=
1708:)
1705:n
1702:(
1699:H
1676:,
1671:i
1667:c
1661:n
1650:n
1647:=
1644:i
1636:=
1633:)
1630:n
1627:(
1624:S
1573:n
1569:2
1544:n
1540:2
1536:1
1524:N
1496:1
1490:k
1486:2
1479:H
1473:H
1453:1
1447:k
1427:H
1407:1
1404:+
1401:k
1395:L
1370:1
1367:+
1364:k
1360:c
1336:,
1333:)
1328:1
1325:+
1322:k
1318:c
1314:(
1311:h
1305:)
1302:)
1297:1
1293:c
1289:(
1286:h
1283:(
1278:k
1274:s
1267:)
1264:H
1261:(
1258:s
1252:H
1229:)
1226:)
1221:1
1217:c
1213:(
1210:h
1207:(
1202:k
1198:s
1177:k
1155:1
1151:c
1130:)
1127:H
1124:(
1121:s
1115:H
1095:H
1075:H
1052:)
1047:L
1043:2
1039:,
1036:0
1033:[
1010:,
1007:)
1002:k
998:c
994:(
991:h
985:)
982:)
977:1
971:k
967:c
963:(
960:h
957:(
954:s
942:)
939:)
934:2
930:c
926:(
923:h
920:(
915:2
909:k
905:s
898:)
895:)
890:1
886:c
882:(
879:h
876:(
871:1
865:k
861:s
857:=
854:H
804:=
801:)
795:(
792:s
768:s
744:)
739:L
735:2
731:,
728:0
725:[
705:h
676:8
624:H
602:1
595:a
571:a
551:a
531:H
511:a
491:H
464:n
444:n
424:a
404:n
381:H
352:k
348:c
344:,
341:.
338:.
335:.
332:,
327:1
323:c
302:a
290:,
276:0
272:a
266:k
262:c
258:+
255:.
252:.
249:.
246:+
241:3
235:k
231:a
225:3
221:c
217:+
212:2
206:k
202:a
196:2
192:c
188:+
183:1
177:k
173:a
167:1
163:c
159:=
156:H
66:)
62:(
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.