Knowledge (XXG)

Rolling hash

Source đź“ť

25: 2021:
efficient Content-Defined Chunking approach. It uses a fast rolling Gear hash algorithm, skipping the minimum length, normalizing the chunk-size distribution, and last but not the least, rolling two bytes each time to speed up the CDC algorithm, which can achieve about 10X higher throughput than Rabin-based CDC approach.
2020:
The Content-Defined Chunking algorithm needs to compute the hash value of a data stream byte by byte and split the data stream into chunks when the hash value meets a predefined value. However, comparing a string byte-by-byte will introduce the heavy computation overhead. FastCDC proposes a new and
1507:
One of the interesting use cases of the rolling hash function is that it can create dynamic, content-based chunks of a stream or file. This is especially useful when it is required to send only the changed chunks of a large file over a network: a simple byte addition at the front of the file would
2219:
Where Gear array is a pre-calculated hashing array. Here FastCDC uses Gear hashing algorithm which can calculate the rolling hashing results quickly and keep the uniform distribution of the hashing results as Rabin. Compared with the traditional Rabin hashing algorithm, it achieves a much faster
651:
Because it shares the same author as the Rabin–Karp string search algorithm, which is often explained with another, simpler rolling hash, and because this simpler rolling hash is also a polynomial, both rolling hashes are often mistaken for each other. The backup software
2016:
Chunking is a technique to divide a data stream into a set of blocks, also called chunks. Content-defined chunking (CDC) is a chunking technique in which the division of the data stream is not based on fixed chunk size, as in fixed-size chunking, but on its content.
1009: 75:
A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a
2228:
All rolling hash functions can be computed in time linear in the number of characters and updated in constant time when characters are shifted by one position. In particular, computing the Rabin-Karp rolling hash of a string of length
648:). The hash is the result of the division of that polynomial by an irreducible polynomial over GF(2). It is possible to update a Rabin fingerprint using only the entering and the leaving byte, making it effectively a rolling hash. 1335: 277: 1675: 2220:
speed. Experiments suggest that it can generate nearly the same chunk size distribution in the much shorter time (about 1/10 of rabin-based chunking ) when segmenting the data stream.
1497: 472:
Removing and adding characters simply involves adding or subtracting the first or last term. Shifting all characters by one position to the left requires multiplying the entire sum
838: 1734: 1129: 1228: 353: 1547: 806: 1051: 743: 1960: 1371: 826: 603: 1818: 1406: 1156: 2615: 2305: 2276: 1898: 1869: 1769: 2006: 1452: 1904:
Shifting the window by one byte simply involves adding the new character to the sum and subtracting the oldest character (no longer in the window) from the sum.
1574: 2247: 1980: 1925: 1838: 1789: 1426: 1176: 1094: 1074: 767: 704: 623: 570: 550: 530: 510: 490: 463: 443: 423: 403: 380: 301: 1511:
A simple approach to making dynamic chunks is to calculate a rolling hash, and if the hash value matches an arbitrary pattern (e.g. all zeroes) in the lower
131: 92: 1549:, given the hash has a uniform probability distribution) then it’s chosen to be a chunk boundary. Each chunk will thus have an average size of 2530:
Xia, Wen; Jiang, Hong; Feng, Dan; Tian, Lei; Fu, Min; Zhou, Yukun (2014). "Ddelta: A deduplication-inspired fast delta compression approach".
1236: 644:. Instead of seeing the input as a polynomial of bytes, it is seen as a polynomial of bits, and all arithmetic is done in GF(2) (similarly to 140: 678:
Hashing by cyclic polynomial—sometimes called Buzhash—is also simple and it has the benefit of avoiding multiplications, using
2008:. This approach will ensure that any change in the file will only affect its current and possibly the next chunk, but no other chunk. 2488: 1579:
Once the boundaries are known, the chunks need to be compared by cryptographic hash value to detect changes. The backup software
1576:
bytes. This approach ensures that unmodified data (more than a window size away from the changes) will have the same boundaries.
2568:
Xia, Wen; Zou, Xiangyu; Jiang, Hong; Zhou, Yukun; Liu, Chuanyi; Feng, Dan; Hua, Yu; Hu, Yuchong; Zhang, Yucheng (2020-06-16).
2441: 573: 107:
as its rolling hash. FastCDC (Fast Content-Defined Chunking) uses a compute-efficient Gear fingerprint as its rolling hash.
466: 2371: 1608: 1508:
normally cause all fixed size windows to become updated, while in reality, only the first "chunk" has been modified.
2424: 1457: 40: 2478: 1004:{\displaystyle H=s^{k-1}(h(c_{1}))\oplus s^{k-2}(h(c_{2}))\oplus \ldots \oplus s(h(c_{k-1}))\oplus h(c_{k}),} 2630: 2357:-gram hashing is pairwise independent, at best, Computer Speech & Language 24 (4), pages 698–710, 2010. 1580: 1014:
where the multiplications by powers of two can be implemented by binary shifts. The result is a number in
1683: 34: 119: 2477:
Xia, Wen; Zhou, Yukun; Jiang, Hong; Feng, Dan; Hua, Yu; Hu, Yuchong; Liu, Qing; Zhang, Yucheng (2005).
1099: 1181: 306: 111: 1376:
Hashing by cyclic polynomials is strongly universal or pairwise independent: simply keep the first
2589: 2504: 1602:
option) and rsyncrypto, do content-based slicing based on this specific (unweighted) moving sum:
1587: 1518: 683: 383: 80:
function can be computed much more quickly than other low-pass filters; and similar to the way a
776: 1017: 709: 134:
is often explained using a rolling hash function that only uses multiplications and additions:
2547: 2484: 634: 356: 115: 104: 45: 2410: 1930: 1343: 811: 773:): it rotates the bits by 1 to the left, pushing the latest bit in the first position. E.g., 2581: 2539: 1583:
uses the Buzhash algorithm with a customizable chunk size range for splitting file streams.
653: 625:
can be multiplied to get the result of the division without actually performing a division.
578: 1796: 1379: 1134: 2570:"The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems" 2516: 2464: 2281: 2252: 1874: 1845: 1745: 85: 81: 1985: 1431: 512:. Shifting all characters by one position to the right requires dividing the entire sum 2312: 2232: 1965: 1910: 1823: 1774: 1552: 1411: 1161: 1079: 1059: 770: 752: 689: 679: 608: 555: 535: 515: 495: 475: 448: 428: 408: 388: 365: 286: 77: 2480:
FastCDC: A Fast and Efficient Content-Defined Chunking Approach for Data Deduplication
2442:"Data structures and file formats — Borg - Deduplicating Archiver 1.1.5 documentation" 2624: 2593: 69: 2308: 829: 638: 95:, which uses the rolling hash described below. Another popular application is the 2330: 2569: 2543: 637:
is another hash, which also interprets the input as a polynomial, but over the
2585: 746: 2551: 656:
uses a Rabin fingerprint for splitting files, with blob size varying between
2358: 1454:
consecutive bits. In practice, this can be achieved by an integer division
2278:
modular arithmetic operations, and hashing by cyclic polynomials requires
100: 2425:"Rabin Karp rolling hash - dynamic sized chunks based on hashed content" 1330:{\displaystyle H\leftarrow s(H)\oplus s^{k}(h(c_{1}))\oplus h(c_{k+1}),} 2616:
MIT 6.006: Introduction to Algorithms 2011- Recitation 9 - Rolling Hash
2324: 1056:
Computing the hash values in a rolling fashion is done as follows. Let
272:{\displaystyle H=c_{1}a^{k-1}+c_{2}a^{k-2}+c_{3}a^{k-3}+...+c_{k}a^{0}} 2563: 2561: 103:
as its rolling hash. Low Bandwidth Network Filesystem (LBFS) uses a
72:
where the input is hashed in a window that moves through the input.
645: 641: 96: 18: 2012:
Gear fingerprint and content-based chunking algorithm FastCDC
445:
is critical to get good hashing; in particular, the modulus
68:(also known as recursive hashing or rolling checksum) is a 2411:"Foundation - Introducing Content Defined Chunking (CDC)" 2057:← 64KB // split maximum chunk size is 64 KB 2053:← 2KB // split minimum chunk size is 2 KB 749:
mapping characters to random integers. Let the function
1871:
is a "hash value" consisting of the bottom 12 bits of
1771:
is the sum of 8196 consecutive bytes ending with byte
1555: 1521: 2574:
IEEE Transactions on Parallel and Distributed Systems
2284: 2255: 2235: 2024:
The basic version pseudocode is provided as follows:
1988: 1968: 1933: 1913: 1877: 1848: 1826: 1799: 1777: 1748: 1686: 1611: 1460: 1434: 1414: 1382: 1346: 1239: 1184: 1164: 1137: 1102: 1082: 1062: 1020: 841: 814: 779: 755: 712: 692: 611: 581: 558: 538: 518: 498: 478: 451: 431: 411: 391: 368: 355:
are the input characters (but this function is not a
309: 289: 143: 99:
program, which uses a checksum based on Mark Adler's
2394:Jonathan D. Cohen, Recursive Hashing Functions for 2079:// buffer size is less than minimum chunk size 745:. This hash function might be simply an array or a 32:It has been suggested that this article should be 2299: 2270: 2241: 2000: 1974: 1954: 1919: 1892: 1863: 1832: 1812: 1783: 1763: 1728: 1669: 1568: 1541: 1491: 1446: 1420: 1400: 1365: 1329: 1222: 1170: 1150: 1123: 1088: 1068: 1045: 1003: 820: 800: 761: 737: 698: 617: 597: 564: 544: 524: 504: 484: 457: 437: 417: 397: 374: 347: 295: 271: 1586:Such content-defined chunking is often used for 1670:{\displaystyle S(n)=\sum _{i=n-8195}^{n}c_{i},} 686:: it presumes that there is some hash function 2349: 2347: 2345: 8: 2436: 2434: 2406: 2404: 2123:bytes, and kickstart the hash     706:from characters to integers in the interval 2398:-Grams, ACM Trans. Inf. Syst. 15 (3), 1997. 1598:Several programs, including gzip (with the 1503:Content-based slicing using a rolling hash 1492:{\displaystyle H\rightarrow H\div 2^{k-1}} 1158:is the character to be removed, rotate it 2372:"References — restic 0.9.0 documentation" 2283: 2254: 2234: 1987: 1967: 1932: 1912: 1876: 1847: 1825: 1804: 1798: 1776: 1747: 1719: 1718: 1685: 1658: 1648: 1631: 1610: 1560: 1554: 1531: 1522: 1520: 1477: 1459: 1433: 1413: 1381: 1351: 1345: 1309: 1284: 1265: 1238: 1208: 1189: 1183: 1163: 1142: 1136: 1101: 1081: 1061: 1034: 1019: 989: 958: 921: 896: 877: 852: 840: 813: 778: 754: 726: 711: 691: 610: 586: 580: 557: 537: 517: 497: 477: 450: 430: 410: 390: 367: 339: 314: 308: 288: 263: 253: 222: 212: 193: 183: 164: 154: 142: 2341: 2512: 2502: 1962:, these programs cut the file between 1594:Content-based slicing using moving sum 2353:Daniel Lemire, Owen Kaser: Recursive 7: 362:In order to avoid manipulating huge 91:One of the main applications is the 1729:{\displaystyle H(n)=S(n)\mod 4096,} 1076:be the previous hash value. Rotate 2423:Horvath, Adam (October 24, 2012). 552:. Note that in modulo arithmetic, 465:is typically a prime number. See 132:Rabin–Karp string search algorithm 93:Rabin–Karp string search algorithm 14: 832:. The hash values are defined as 110:At best, rolling hash values are 1124:{\displaystyle H\leftarrow s(H)} 769:be a cyclic binary rotation (or 23: 1714: 1408:bits. That is, take the result 1223:{\displaystyle s^{k}(h(c_{1}))} 348:{\displaystyle c_{1},...,c_{k}} 2294: 2288: 2265: 2259: 1943: 1937: 1887: 1881: 1858: 1852: 1791:(requires 21 bits of storage), 1758: 1752: 1711: 1705: 1696: 1690: 1621: 1615: 1464: 1321: 1302: 1293: 1290: 1277: 1271: 1255: 1249: 1243: 1217: 1214: 1201: 1195: 1118: 1112: 1106: 1040: 1021: 995: 982: 973: 970: 951: 945: 930: 927: 914: 908: 886: 883: 870: 864: 789: 783: 732: 713: 1: 2327: â€“ Data mining technique 467:linear congruential generator 1542:{\textstyle {1 \over 2^{n}}} 1515:bits (with a probability of 2647: 2544:10.1016/j.peva.2014.07.016 2038:, data length 801:{\displaystyle s(101)=011} 38:into a new article titled 2586:10.1109/TPDS.2020.2984632 2446:borgbackup.readthedocs.io 1046:{\displaystyle [0,2^{L})} 738:{\displaystyle [0,2^{L})} 682:instead. It is a form of 382:values, all math is done 88:from the old hash value. 2224:Computational complexity 572:can be chosen to have a 41:Content-Defined Chunking 1955:{\displaystyle H(n)==0} 1366:{\displaystyle c_{k+1}} 821:{\displaystyle \oplus } 126:Polynomial rolling hash 2532:Performance Evaluation 2465:"Rsyncrypto Algorithm" 2301: 2272: 2243: 2210:+ 1     2002: 1976: 1956: 1921: 1894: 1865: 1834: 1814: 1785: 1765: 1730: 1671: 1653: 1570: 1543: 1493: 1448: 1422: 1402: 1373:is the new character. 1367: 1331: 1224: 1172: 1152: 1125: 1090: 1070: 1047: 1005: 822: 802: 763: 739: 700: 619: 599: 598:{\displaystyle a^{-1}} 574:multiplicative inverse 566: 546: 526: 506: 486: 459: 439: 419: 399: 376: 349: 297: 273: 2376:restic.readthedocs.io 2302: 2273: 2244: 2003: 1977: 1957: 1922: 1895: 1866: 1835: 1815: 1813:{\displaystyle c_{i}} 1786: 1766: 1731: 1672: 1627: 1571: 1544: 1494: 1449: 1423: 1403: 1401:{\displaystyle L-k+1} 1368: 1332: 1225: 1173: 1153: 1151:{\displaystyle c_{1}} 1126: 1091: 1071: 1048: 1006: 823: 803: 764: 740: 701: 620: 600: 567: 547: 527: 507: 487: 469:for more discussion. 460: 440: 420: 400: 377: 350: 298: 274: 16:Type of hash function 2300:{\displaystyle O(k)} 2282: 2271:{\displaystyle O(k)} 2253: 2233: 2063:0x0000d93003530000LL 1986: 1966: 1931: 1911: 1893:{\displaystyle S(n)} 1875: 1864:{\displaystyle H(n)} 1846: 1824: 1797: 1775: 1764:{\displaystyle S(n)} 1746: 1684: 1609: 1553: 1519: 1458: 1432: 1412: 1380: 1344: 1237: 1182: 1162: 1135: 1100: 1080: 1060: 1018: 839: 812: 777: 753: 710: 690: 609: 579: 556: 536: 516: 496: 476: 449: 429: 409: 389: 366: 307: 287: 141: 112:pairwise independent 2001:{\displaystyle n+1} 1447:{\displaystyle k-1} 303:is a constant, and 2297: 2268: 2239: 2156:+ 1     2119:// Skip the first 1998: 1972: 1952: 1917: 1890: 1861: 1830: 1810: 1781: 1761: 1726: 1667: 1588:data deduplication 1569:{\textstyle 2^{n}} 1566: 1539: 1489: 1444: 1418: 1398: 1363: 1327: 1230:. Then simply set 1220: 1168: 1148: 1121: 1086: 1066: 1043: 1001: 818: 798: 759: 735: 696: 684:tabulation hashing 615: 595: 562: 542: 522: 502: 482: 455: 435: 415: 395: 372: 345: 293: 269: 120:3-wise independent 2242:{\displaystyle k} 1975:{\displaystyle n} 1920:{\displaystyle n} 1833:{\displaystyle i} 1784:{\displaystyle n} 1537: 1421:{\displaystyle H} 1171:{\displaystyle k} 1089:{\displaystyle H} 1069:{\displaystyle H} 762:{\displaystyle s} 699:{\displaystyle h} 674:Cyclic polynomial 635:Rabin fingerprint 629:Rabin fingerprint 618:{\displaystyle H} 565:{\displaystyle a} 545:{\displaystyle a} 525:{\displaystyle H} 505:{\displaystyle a} 485:{\displaystyle H} 458:{\displaystyle n} 438:{\displaystyle n} 418:{\displaystyle a} 398:{\displaystyle n} 375:{\displaystyle H} 357:Rabin fingerprint 296:{\displaystyle a} 118:. They cannot be 105:Rabin fingerprint 62: 61: 2638: 2604: 2603: 2601: 2600: 2580:(9): 2017–2031. 2565: 2556: 2555: 2527: 2521: 2520: 2514: 2510: 2508: 2500: 2498: 2497: 2474: 2468: 2462: 2456: 2455: 2453: 2452: 2438: 2429: 2428: 2420: 2414: 2408: 2399: 2392: 2386: 2385: 2383: 2382: 2368: 2362: 2351: 2306: 2304: 2303: 2298: 2277: 2275: 2274: 2269: 2248: 2246: 2245: 2240: 2007: 2005: 2004: 1999: 1981: 1979: 1978: 1973: 1961: 1959: 1958: 1953: 1926: 1924: 1923: 1918: 1899: 1897: 1896: 1891: 1870: 1868: 1867: 1862: 1839: 1837: 1836: 1831: 1819: 1817: 1816: 1811: 1809: 1808: 1790: 1788: 1787: 1782: 1770: 1768: 1767: 1762: 1735: 1733: 1732: 1727: 1676: 1674: 1673: 1668: 1663: 1662: 1652: 1647: 1601: 1575: 1573: 1572: 1567: 1565: 1564: 1548: 1546: 1545: 1540: 1538: 1536: 1535: 1523: 1498: 1496: 1495: 1490: 1488: 1487: 1453: 1451: 1450: 1445: 1428:and dismiss any 1427: 1425: 1424: 1419: 1407: 1405: 1404: 1399: 1372: 1370: 1369: 1364: 1362: 1361: 1336: 1334: 1333: 1328: 1320: 1319: 1289: 1288: 1270: 1269: 1229: 1227: 1226: 1221: 1213: 1212: 1194: 1193: 1177: 1175: 1174: 1169: 1157: 1155: 1154: 1149: 1147: 1146: 1130: 1128: 1127: 1122: 1095: 1093: 1092: 1087: 1075: 1073: 1072: 1067: 1052: 1050: 1049: 1044: 1039: 1038: 1010: 1008: 1007: 1002: 994: 993: 969: 968: 926: 925: 907: 906: 882: 881: 863: 862: 827: 825: 824: 819: 807: 805: 804: 799: 768: 766: 765: 760: 744: 742: 741: 736: 731: 730: 705: 703: 702: 697: 669: 668: 662: 661: 624: 622: 621: 616: 604: 602: 601: 596: 594: 593: 571: 569: 568: 563: 551: 549: 548: 543: 531: 529: 528: 523: 511: 509: 508: 503: 491: 489: 488: 483: 464: 462: 461: 456: 444: 442: 441: 436: 424: 422: 421: 416: 405:. The choice of 404: 402: 401: 396: 381: 379: 378: 373: 354: 352: 351: 346: 344: 343: 319: 318: 302: 300: 299: 294: 278: 276: 275: 270: 268: 267: 258: 257: 233: 232: 217: 216: 204: 203: 188: 187: 175: 174: 159: 158: 57: 54: 27: 26: 19: 2646: 2645: 2641: 2640: 2639: 2637: 2636: 2635: 2621: 2620: 2612: 2607: 2598: 2596: 2567: 2566: 2559: 2529: 2528: 2524: 2511: 2501: 2495: 2493: 2491: 2476: 2475: 2471: 2463: 2459: 2450: 2448: 2440: 2439: 2432: 2422: 2421: 2417: 2409: 2402: 2393: 2389: 2380: 2378: 2370: 2369: 2365: 2359:arXiv:0705.4676 2352: 2343: 2339: 2321: 2313:circular shifts 2280: 2279: 2251: 2250: 2231: 2230: 2226: 2217: 2177:<< 1 ) + 2144:<< 1 ) + 2014: 1984: 1983: 1964: 1963: 1929: 1928: 1909: 1908: 1873: 1872: 1844: 1843: 1822: 1821: 1800: 1795: 1794: 1773: 1772: 1744: 1743: 1682: 1681: 1654: 1607: 1606: 1599: 1596: 1556: 1551: 1550: 1527: 1517: 1516: 1505: 1473: 1456: 1455: 1430: 1429: 1410: 1409: 1378: 1377: 1347: 1342: 1341: 1305: 1280: 1261: 1235: 1234: 1204: 1185: 1180: 1179: 1160: 1159: 1138: 1133: 1132: 1098: 1097: 1078: 1077: 1058: 1057: 1030: 1016: 1015: 985: 954: 917: 892: 873: 848: 837: 836: 828:be the bitwise 810: 809: 775: 774: 751: 750: 722: 708: 707: 688: 687: 676: 666: 664: 659: 657: 631: 607: 606: 582: 577: 576: 554: 553: 534: 533: 514: 513: 494: 493: 474: 473: 447: 446: 427: 426: 407: 406: 387: 386: 364: 363: 335: 310: 305: 304: 285: 284: 259: 249: 218: 208: 189: 179: 160: 150: 139: 138: 128: 122:, for example. 86:rapidly updated 58: 52: 49: 28: 24: 17: 12: 11: 5: 2644: 2642: 2634: 2633: 2631:Hash functions 2623: 2622: 2619: 2618: 2611: 2610:External links 2608: 2606: 2605: 2557: 2522: 2513:|website= 2489: 2469: 2457: 2430: 2415: 2400: 2387: 2363: 2340: 2338: 2335: 2334: 2333: 2328: 2320: 2317: 2296: 2293: 2290: 2287: 2267: 2264: 2261: 2258: 2238: 2225: 2222: 2026: 2013: 2010: 1997: 1994: 1991: 1971: 1951: 1948: 1945: 1942: 1939: 1936: 1916: 1902: 1901: 1889: 1886: 1883: 1880: 1860: 1857: 1854: 1851: 1841: 1829: 1807: 1803: 1792: 1780: 1760: 1757: 1754: 1751: 1737: 1736: 1725: 1722: 1717: 1713: 1710: 1707: 1704: 1701: 1698: 1695: 1692: 1689: 1678: 1677: 1666: 1661: 1657: 1651: 1646: 1643: 1640: 1637: 1634: 1630: 1626: 1623: 1620: 1617: 1614: 1595: 1592: 1563: 1559: 1534: 1530: 1526: 1504: 1501: 1486: 1483: 1480: 1476: 1472: 1469: 1466: 1463: 1443: 1440: 1437: 1417: 1397: 1394: 1391: 1388: 1385: 1360: 1357: 1354: 1350: 1338: 1337: 1326: 1323: 1318: 1315: 1312: 1308: 1304: 1301: 1298: 1295: 1292: 1287: 1283: 1279: 1276: 1273: 1268: 1264: 1260: 1257: 1254: 1251: 1248: 1245: 1242: 1219: 1216: 1211: 1207: 1203: 1200: 1197: 1192: 1188: 1167: 1145: 1141: 1120: 1117: 1114: 1111: 1108: 1105: 1085: 1065: 1042: 1037: 1033: 1029: 1026: 1023: 1012: 1011: 1000: 997: 992: 988: 984: 981: 978: 975: 972: 967: 964: 961: 957: 953: 950: 947: 944: 941: 938: 935: 932: 929: 924: 920: 916: 913: 910: 905: 902: 899: 895: 891: 888: 885: 880: 876: 872: 869: 866: 861: 858: 855: 851: 847: 844: 817: 797: 794: 791: 788: 785: 782: 771:circular shift 758: 734: 729: 725: 721: 718: 715: 695: 675: 672: 630: 627: 614: 592: 589: 585: 561: 541: 521: 501: 481: 454: 434: 414: 394: 371: 359:, see below). 342: 338: 334: 331: 328: 325: 322: 317: 313: 292: 281: 280: 266: 262: 256: 252: 248: 245: 242: 239: 236: 231: 228: 225: 221: 215: 211: 207: 202: 199: 196: 192: 186: 182: 178: 173: 170: 167: 163: 157: 153: 149: 146: 127: 124: 78:moving average 60: 59: 31: 29: 22: 15: 13: 10: 9: 6: 4: 3: 2: 2643: 2632: 2629: 2628: 2626: 2617: 2614: 2613: 2609: 2595: 2591: 2587: 2583: 2579: 2575: 2571: 2564: 2562: 2558: 2553: 2549: 2545: 2541: 2537: 2533: 2526: 2523: 2518: 2506: 2492: 2490:9781931971300 2486: 2482: 2481: 2473: 2470: 2466: 2461: 2458: 2447: 2443: 2437: 2435: 2431: 2426: 2419: 2416: 2412: 2407: 2405: 2401: 2397: 2391: 2388: 2377: 2373: 2367: 2364: 2360: 2356: 2350: 2348: 2346: 2342: 2336: 2332: 2329: 2326: 2323: 2322: 2318: 2316: 2314: 2310: 2309:exclusive ors 2291: 2285: 2262: 2256: 2236: 2223: 2221: 2216: 2213: 2209: 2205: 2202: 2199: 2196: 2192: 2188: 2184: 2180: 2176: 2172: 2169: 2166: 2162: 2159: 2155: 2151: 2147: 2143: 2139: 2136: 2133: 2129: 2126: 2122: 2118: 2114: 2111: 2108: 2104: 2101: 2098: 2095: 2092: 2089: 2085: 2082: 2078: 2074: 2071: 2067: 2064: 2060: 2056: 2052: 2049: 2045: 2041: 2037: 2033: 2029: 2025: 2022: 2018: 2011: 2009: 1995: 1992: 1989: 1969: 1949: 1946: 1940: 1934: 1914: 1905: 1884: 1878: 1855: 1849: 1842: 1827: 1805: 1801: 1793: 1778: 1755: 1749: 1742: 1741: 1740: 1723: 1720: 1715: 1708: 1702: 1699: 1693: 1687: 1680: 1679: 1664: 1659: 1655: 1649: 1644: 1641: 1638: 1635: 1632: 1628: 1624: 1618: 1612: 1605: 1604: 1603: 1593: 1591: 1589: 1584: 1582: 1577: 1561: 1557: 1532: 1528: 1524: 1514: 1509: 1502: 1500: 1484: 1481: 1478: 1474: 1470: 1467: 1461: 1441: 1438: 1435: 1415: 1395: 1392: 1389: 1386: 1383: 1374: 1358: 1355: 1352: 1348: 1324: 1316: 1313: 1310: 1306: 1299: 1296: 1285: 1281: 1274: 1266: 1262: 1258: 1252: 1246: 1240: 1233: 1232: 1231: 1209: 1205: 1198: 1190: 1186: 1165: 1143: 1139: 1115: 1109: 1103: 1083: 1063: 1054: 1035: 1031: 1027: 1024: 998: 990: 986: 979: 976: 965: 962: 959: 955: 948: 942: 939: 936: 933: 922: 918: 911: 903: 900: 897: 893: 889: 878: 874: 867: 859: 856: 853: 849: 845: 842: 835: 834: 833: 831: 815: 795: 792: 786: 780: 772: 756: 748: 727: 723: 719: 716: 693: 685: 681: 680:barrel shifts 673: 671: 655: 649: 647: 643: 640: 636: 628: 626: 612: 590: 587: 583: 575: 559: 539: 519: 499: 479: 470: 468: 452: 432: 412: 392: 385: 369: 360: 358: 340: 336: 332: 329: 326: 323: 320: 315: 311: 290: 264: 260: 254: 250: 246: 243: 240: 237: 234: 229: 226: 223: 219: 213: 209: 205: 200: 197: 194: 190: 184: 180: 176: 171: 168: 165: 161: 155: 151: 147: 144: 137: 136: 135: 133: 125: 123: 121: 117: 113: 108: 106: 102: 98: 94: 89: 87: 83: 79: 73: 71: 70:hash function 67: 56: 47: 43: 42: 37: 36: 30: 21: 20: 2597:. Retrieved 2577: 2573: 2535: 2531: 2525: 2494:. Retrieved 2479: 2472: 2460: 2449:. Retrieved 2445: 2418: 2395: 2390: 2379:. Retrieved 2375: 2366: 2354: 2227: 2218: 2214: 2211: 2207: 2203: 2200: 2197: 2194: 2190: 2186: 2182: 2178: 2174: 2170: 2167: 2164: 2160: 2157: 2153: 2149: 2145: 2141: 2137: 2134: 2131: 2127: 2124: 2120: 2116: 2112: 2109: 2106: 2102: 2099: 2096: 2093: 2090: 2087: 2083: 2080: 2076: 2072: 2069: 2065: 2062: 2058: 2054: 2050: 2047: 2043: 2039: 2035: 2034:data buffer 2031: 2030:FastCDC 2027: 2023: 2019: 2015: 1906: 1903: 1840:of the file, 1738: 1597: 1585: 1578: 1512: 1510: 1506: 1375: 1339: 1055: 1013: 830:exclusive or 677: 650: 639:Galois field 632: 471: 361: 282: 129: 114:or strongly 109: 90: 82:Zobrist hash 74: 66:rolling hash 65: 63: 50: 39: 33: 2538:: 258–272. 2331:w-shingling 1600:--rsyncable 53:August 2020 2599:2020-07-22 2496:2020-07-24 2451:2018-05-24 2381:2018-05-24 2181:] 2148:] 2046:cut point 1907:For every 747:hash table 2594:215817722 2552:0166-5316 2515:ignored ( 2505:cite book 2337:Footnotes 2249:requires 2173:← ( 2140:← ( 2028:algorithm 1642:− 1629:∑ 1482:− 1471:÷ 1465:→ 1439:− 1387:− 1297:⊕ 1259:⊕ 1244:← 1107:← 977:⊕ 963:− 940:⊕ 937:… 934:⊕ 901:− 890:⊕ 857:− 816:⊕ 605:by which 588:− 227:− 198:− 169:− 116:universal 2625:Category 2319:See also 2307:bitwise 2206:← 2185: !( 2152:← 2115:← 2075:← 2068:← 2061:← 1820:is byte 1178:times: 101:adler-32 2413:. 2015. 2325:MinHash 2132:MinSize 2121:MinSize 2117:MaxSize 2107:MaxSize 2088:MinSize 2055:MaxSize 2051:MinSize 2044:output: 2042:, 84:can be 46:discuss 2592:  2550:  2487:  2212:return 2198:return 2189:& 2094:return 2032:input: 1927:where 1739:where 1340:where 1131:. If 1096:once: 808:. Let 654:restic 384:modulo 283:where 2590:S2CID 2163:< 2158:while 2130:< 2125:while 646:CRC32 642:GF(2) 97:rsync 35:split 2548:ISSN 2517:help 2485:ISBN 2311:and 2195:then 2191:Mask 2179:Gear 2146:Gear 2110:then 2091:then 2059:Mask 1982:and 1721:4096 1645:8195 1581:Borg 663:and 633:The 425:and 130:The 2582:doi 2540:doi 2036:src 1716:mod 796:011 787:101 667:MiB 660:KiB 658:512 532:by 492:by 48:) 44:. ( 2627:: 2588:. 2578:31 2576:. 2572:. 2560:^ 2546:. 2536:79 2534:. 2509:: 2507:}} 2503:{{ 2483:. 2444:. 2433:^ 2403:^ 2374:. 2344:^ 2315:. 2193:) 2187:fp 2183:if 2175:fp 2171:fp 2168:do 2142:fp 2138:fp 2135:do 2105:≥ 2100:if 2086:≤ 2081:if 2066:fp 1947:== 1590:. 1499:. 1053:. 670:. 64:A 2602:. 2584:: 2554:. 2542:: 2519:) 2499:. 2467:. 2454:. 2427:. 2396:n 2384:. 2361:. 2355:n 2295:) 2292:k 2289:( 2286:O 2266:) 2263:k 2260:( 2257:O 2237:k 2215:i 2208:i 2204:i 2201:i 2165:n 2161:i 2154:i 2150:i 2128:i 2113:n 2103:n 2097:n 2084:n 2077:0 2073:i 2070:0 2048:i 2040:n 1996:1 1993:+ 1990:n 1970:n 1950:0 1944:) 1941:n 1938:( 1935:H 1915:n 1900:. 1888:) 1885:n 1882:( 1879:S 1859:) 1856:n 1853:( 1850:H 1828:i 1806:i 1802:c 1779:n 1759:) 1756:n 1753:( 1750:S 1724:, 1712:) 1709:n 1706:( 1703:S 1700:= 1697:) 1694:n 1691:( 1688:H 1665:, 1660:i 1656:c 1650:n 1639:n 1636:= 1633:i 1625:= 1622:) 1619:n 1616:( 1613:S 1562:n 1558:2 1533:n 1529:2 1525:1 1513:N 1485:1 1479:k 1475:2 1468:H 1462:H 1442:1 1436:k 1416:H 1396:1 1393:+ 1390:k 1384:L 1359:1 1356:+ 1353:k 1349:c 1325:, 1322:) 1317:1 1314:+ 1311:k 1307:c 1303:( 1300:h 1294:) 1291:) 1286:1 1282:c 1278:( 1275:h 1272:( 1267:k 1263:s 1256:) 1253:H 1250:( 1247:s 1241:H 1218:) 1215:) 1210:1 1206:c 1202:( 1199:h 1196:( 1191:k 1187:s 1166:k 1144:1 1140:c 1119:) 1116:H 1113:( 1110:s 1104:H 1084:H 1064:H 1041:) 1036:L 1032:2 1028:, 1025:0 1022:[ 999:, 996:) 991:k 987:c 983:( 980:h 974:) 971:) 966:1 960:k 956:c 952:( 949:h 946:( 943:s 931:) 928:) 923:2 919:c 915:( 912:h 909:( 904:2 898:k 894:s 887:) 884:) 879:1 875:c 871:( 868:h 865:( 860:1 854:k 850:s 846:= 843:H 793:= 790:) 784:( 781:s 757:s 733:) 728:L 724:2 720:, 717:0 714:[ 694:h 665:8 613:H 591:1 584:a 560:a 540:a 520:H 500:a 480:H 453:n 433:n 413:a 393:n 370:H 341:k 337:c 333:, 330:. 327:. 324:. 321:, 316:1 312:c 291:a 279:, 265:0 261:a 255:k 251:c 247:+ 244:. 241:. 238:. 235:+ 230:3 224:k 220:a 214:3 210:c 206:+ 201:2 195:k 191:a 185:2 181:c 177:+ 172:1 166:k 162:a 156:1 152:c 148:= 145:H 55:) 51:(

Index

split
Content-Defined Chunking
discuss
hash function
moving average
Zobrist hash
rapidly updated
Rabin–Karp string search algorithm
rsync
adler-32
Rabin fingerprint
pairwise independent
universal
3-wise independent
Rabin–Karp string search algorithm
Rabin fingerprint
modulo
linear congruential generator
multiplicative inverse
Rabin fingerprint
Galois field
GF(2)
CRC32
restic
barrel shifts
tabulation hashing
hash table
circular shift
exclusive or
Borg

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑