Knowledge (XXG)

Rolling hash

Source đź“ť

36: 2032:
efficient Content-Defined Chunking approach. It uses a fast rolling Gear hash algorithm, skipping the minimum length, normalizing the chunk-size distribution, and last but not the least, rolling two bytes each time to speed up the CDC algorithm, which can achieve about 10X higher throughput than Rabin-based CDC approach.
2031:
The Content-Defined Chunking algorithm needs to compute the hash value of a data stream byte by byte and split the data stream into chunks when the hash value meets a predefined value. However, comparing a string byte-by-byte will introduce the heavy computation overhead. FastCDC proposes a new and
1518:
One of the interesting use cases of the rolling hash function is that it can create dynamic, content-based chunks of a stream or file. This is especially useful when it is required to send only the changed chunks of a large file over a network: a simple byte addition at the front of the file would
2230:
Where Gear array is a pre-calculated hashing array. Here FastCDC uses Gear hashing algorithm which can calculate the rolling hashing results quickly and keep the uniform distribution of the hashing results as Rabin. Compared with the traditional Rabin hashing algorithm, it achieves a much faster
662:
Because it shares the same author as the Rabin–Karp string search algorithm, which is often explained with another, simpler rolling hash, and because this simpler rolling hash is also a polynomial, both rolling hashes are often mistaken for each other. The backup software
2027:
Chunking is a technique to divide a data stream into a set of blocks, also called chunks. Content-defined chunking (CDC) is a chunking technique in which the division of the data stream is not based on fixed chunk size, as in fixed-size chunking, but on its content.
1020: 86:
A few hash functions allow a rolling hash to be computed very quickly—the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window—similar to the way a
2239:
All rolling hash functions can be computed in time linear in the number of characters and updated in constant time when characters are shifted by one position. In particular, computing the Rabin-Karp rolling hash of a string of length
659:). The hash is the result of the division of that polynomial by an irreducible polynomial over GF(2). It is possible to update a Rabin fingerprint using only the entering and the leaving byte, making it effectively a rolling hash. 1346: 288: 1686: 2231:
speed. Experiments suggest that it can generate nearly the same chunk size distribution in the much shorter time (about 1/10 of rabin-based chunking ) when segmenting the data stream.
1508: 483:
Removing and adding characters simply involves adding or subtracting the first or last term. Shifting all characters by one position to the left requires multiplying the entire sum
849: 1745: 1140: 1239: 364: 1558: 817: 1062: 754: 1971: 1382: 837: 614: 1829: 1417: 1167: 2626: 2316: 2287: 1909: 1880: 1780: 2017: 1463: 1915:
Shifting the window by one byte simply involves adding the new character to the sum and subtracting the oldest character (no longer in the window) from the sum.
1585: 2258: 1991: 1936: 1849: 1800: 1437: 1187: 1105: 1085: 778: 715: 634: 581: 561: 541: 521: 501: 474: 454: 434: 414: 391: 312: 1522:
A simple approach to making dynamic chunks is to calculate a rolling hash, and if the hash value matches an arbitrary pattern (e.g. all zeroes) in the lower
142: 103: 1560:, given the hash has a uniform probability distribution) then it’s chosen to be a chunk boundary. Each chunk will thus have an average size of 2541:
Xia, Wen; Jiang, Hong; Feng, Dan; Tian, Lei; Fu, Min; Zhou, Yukun (2014). "Ddelta: A deduplication-inspired fast delta compression approach".
1247: 655:. Instead of seeing the input as a polynomial of bytes, it is seen as a polynomial of bits, and all arithmetic is done in GF(2) (similarly to 151: 689:
Hashing by cyclic polynomial—sometimes called Buzhash—is also simple and it has the benefit of avoiding multiplications, using
2019:. This approach will ensure that any change in the file will only affect its current and possibly the next chunk, but no other chunk. 2499: 1590:
Once the boundaries are known, the chunks need to be compared by cryptographic hash value to detect changes. The backup software
1587:
bytes. This approach ensures that unmodified data (more than a window size away from the changes) will have the same boundaries.
2579:
Xia, Wen; Zou, Xiangyu; Jiang, Hong; Zhou, Yukun; Liu, Chuanyi; Feng, Dan; Hua, Yu; Hu, Yuchong; Zhang, Yucheng (2020-06-16).
2452: 584: 118:
as its rolling hash. FastCDC (Fast Content-Defined Chunking) uses a compute-efficient Gear fingerprint as its rolling hash.
477: 2382: 1619: 1519:
normally cause all fixed size windows to become updated, while in reality, only the first "chunk" has been modified.
2435: 1468: 51: 2489: 1015:{\displaystyle H=s^{k-1}(h(c_{1}))\oplus s^{k-2}(h(c_{2}))\oplus \ldots \oplus s(h(c_{k-1}))\oplus h(c_{k}),} 2641: 2368:-gram hashing is pairwise independent, at best, Computer Speech & Language 24 (4), pages 698–710, 2010. 1591: 1025:
where the multiplications by powers of two can be implemented by binary shifts. The result is a number in
1694: 45: 130: 2488:
Xia, Wen; Zhou, Yukun; Jiang, Hong; Feng, Dan; Hua, Yu; Hu, Yuchong; Liu, Qing; Zhang, Yucheng (2005).
1110: 1192: 317: 122: 1387:
Hashing by cyclic polynomials is strongly universal or pairwise independent: simply keep the first
2600: 2515: 1613:
option) and rsyncrypto, do content-based slicing based on this specific (unweighted) moving sum:
1598: 1529: 694: 394: 91:
function can be computed much more quickly than other low-pass filters; and similar to the way a
787: 1028: 720: 145:
is often explained using a rolling hash function that only uses multiplications and additions:
2558: 2495: 645: 367: 126: 115: 56: 2421: 1941: 1354: 822: 784:): it rotates the bits by 1 to the left, pushing the latest bit in the first position. E.g., 2592: 2550: 1594:
uses the Buzhash algorithm with a customizable chunk size range for splitting file streams.
664: 636:
can be multiplied to get the result of the division without actually performing a division.
589: 1807: 1390: 1145: 2581:"The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems" 2527: 2475: 2292: 2263: 1885: 1856: 1756: 96: 92: 1996: 1442: 523:. Shifting all characters by one position to the right requires dividing the entire sum 2323: 2243: 1976: 1921: 1834: 1785: 1563: 1422: 1172: 1090: 1070: 781: 763: 700: 690: 619: 566: 546: 526: 506: 486: 459: 439: 419: 399: 376: 297: 88: 2491:
FastCDC: A Fast and Efficient Content-Defined Chunking Approach for Data Deduplication
2453:"Data structures and file formats — Borg - Deduplicating Archiver 1.1.5 documentation" 2635: 2604: 80: 2319: 840: 649: 106:, which uses the rolling hash described below. Another popular application is the 2341: 2580: 2554: 648:
is another hash, which also interprets the input as a polynomial, but over the
2596: 757: 2562: 667:
uses a Rabin fingerprint for splitting files, with blob size varying between
2369: 1465:
consecutive bits. In practice, this can be achieved by an integer division
2289:
modular arithmetic operations, and hashing by cyclic polynomials requires
111: 2436:"Rabin Karp rolling hash - dynamic sized chunks based on hashed content" 1341:{\displaystyle H\leftarrow s(H)\oplus s^{k}(h(c_{1}))\oplus h(c_{k+1}),} 2627:
MIT 6.006: Introduction to Algorithms 2011- Recitation 9 - Rolling Hash
2335: 1067:
Computing the hash values in a rolling fashion is done as follows. Let
283:{\displaystyle H=c_{1}a^{k-1}+c_{2}a^{k-2}+c_{3}a^{k-3}+...+c_{k}a^{0}} 17: 2574: 2572: 114:
as its rolling hash. Low Bandwidth Network Filesystem (LBFS) uses a
83:
where the input is hashed in a window that moves through the input.
656: 652: 107: 29: 2023:
Gear fingerprint and content-based chunking algorithm FastCDC
456:
is critical to get good hashing; in particular, the modulus
79:(also known as recursive hashing or rolling checksum) is a 2422:"Foundation - Introducing Content Defined Chunking (CDC)" 2068:← 64KB // split maximum chunk size is 64 KB 2064:← 2KB // split minimum chunk size is 2 KB 760:
mapping characters to random integers. Let the function
1882:
is a "hash value" consisting of the bottom 12 bits of
1782:
is the sum of 8196 consecutive bytes ending with byte
1566: 1532: 2585:
IEEE Transactions on Parallel and Distributed Systems
2295: 2266: 2246: 2035:
The basic version pseudocode is provided as follows:
1999: 1979: 1944: 1924: 1888: 1859: 1837: 1810: 1788: 1759: 1697: 1622: 1471: 1445: 1425: 1393: 1357: 1250: 1195: 1175: 1148: 1113: 1093: 1073: 1031: 852: 825: 790: 766: 723: 703: 622: 592: 569: 549: 529: 509: 489: 462: 442: 422: 402: 379: 366:
are the input characters (but this function is not a
320: 300: 154: 110:
program, which uses a checksum based on Mark Adler's
2405:Jonathan D. Cohen, Recursive Hashing Functions for 2090:// buffer size is less than minimum chunk size 756:. This hash function might be simply an array or a 43:It has been suggested that this article should be 2310: 2281: 2252: 2011: 1985: 1965: 1930: 1903: 1874: 1843: 1823: 1794: 1774: 1739: 1680: 1579: 1552: 1502: 1457: 1431: 1411: 1376: 1340: 1233: 1181: 1161: 1134: 1099: 1079: 1056: 1014: 831: 811: 772: 748: 709: 628: 608: 575: 555: 535: 515: 495: 468: 448: 428: 408: 385: 358: 306: 282: 1597:Such content-defined chunking is often used for 1681:{\displaystyle S(n)=\sum _{i=n-8195}^{n}c_{i},} 697:: it presumes that there is some hash function 2360: 2358: 2356: 8: 2447: 2445: 2417: 2415: 2134:bytes, and kickstart the hash     717:from characters to integers in the interval 2409:-Grams, ACM Trans. Inf. Syst. 15 (3), 1997. 1609:Several programs, including gzip (with the 1514:Content-based slicing using a rolling hash 1503:{\displaystyle H\rightarrow H\div 2^{k-1}} 1169:is the character to be removed, rotate it 2383:"References — restic 0.9.0 documentation" 2294: 2265: 2245: 1998: 1978: 1943: 1923: 1887: 1858: 1836: 1815: 1809: 1787: 1758: 1730: 1729: 1696: 1669: 1659: 1642: 1621: 1571: 1565: 1542: 1533: 1531: 1488: 1470: 1444: 1424: 1392: 1362: 1356: 1320: 1295: 1276: 1249: 1219: 1200: 1194: 1174: 1153: 1147: 1112: 1092: 1072: 1045: 1030: 1000: 969: 932: 907: 888: 863: 851: 824: 789: 765: 737: 722: 702: 621: 597: 591: 568: 548: 528: 508: 488: 461: 441: 421: 401: 378: 350: 325: 319: 299: 274: 264: 233: 223: 204: 194: 175: 165: 153: 2352: 2523: 2513: 1973:, these programs cut the file between 1605:Content-based slicing using moving sum 2364:Daniel Lemire, Owen Kaser: Recursive 7: 373:In order to avoid manipulating huge 102:One of the main applications is the 1740:{\displaystyle H(n)=S(n)\mod 4096,} 1087:be the previous hash value. Rotate 2434:Horvath, Adam (October 24, 2012). 563:. Note that in modulo arithmetic, 476:is typically a prime number. See 143:Rabin–Karp string search algorithm 104:Rabin–Karp string search algorithm 25: 843:. The hash values are defined as 121:At best, rolling hash values are 1135:{\displaystyle H\leftarrow s(H)} 780:be a cyclic binary rotation (or 34: 1725: 1419:bits. That is, take the result 1234:{\displaystyle s^{k}(h(c_{1}))} 359:{\displaystyle c_{1},...,c_{k}} 2305: 2299: 2276: 2270: 1954: 1948: 1898: 1892: 1869: 1863: 1802:(requires 21 bits of storage), 1769: 1763: 1722: 1716: 1707: 1701: 1632: 1626: 1475: 1332: 1313: 1304: 1301: 1288: 1282: 1266: 1260: 1254: 1228: 1225: 1212: 1206: 1129: 1123: 1117: 1051: 1032: 1006: 993: 984: 981: 962: 956: 941: 938: 925: 919: 897: 894: 881: 875: 800: 794: 743: 724: 1: 2338: â€“ Data mining technique 478:linear congruential generator 1553:{\textstyle {1 \over 2^{n}}} 1526:bits (with a probability of 2658: 2555:10.1016/j.peva.2014.07.016 2049:, data length 812:{\displaystyle s(101)=011} 49:into a new article titled 2597:10.1109/TPDS.2020.2984632 2457:borgbackup.readthedocs.io 1057:{\displaystyle [0,2^{L})} 749:{\displaystyle [0,2^{L})} 693:instead. It is a form of 393:values, all math is done 99:from the old hash value. 2235:Computational complexity 583:can be chosen to have a 52:Content-Defined Chunking 1966:{\displaystyle H(n)==0} 1377:{\displaystyle c_{k+1}} 832:{\displaystyle \oplus } 137:Polynomial rolling hash 2543:Performance Evaluation 2476:"Rsyncrypto Algorithm" 2312: 2283: 2254: 2221:+ 1     2013: 1987: 1967: 1932: 1905: 1876: 1845: 1825: 1796: 1776: 1741: 1682: 1664: 1581: 1554: 1504: 1459: 1433: 1413: 1384:is the new character. 1378: 1342: 1235: 1183: 1163: 1136: 1101: 1081: 1058: 1016: 833: 813: 774: 750: 711: 630: 610: 609:{\displaystyle a^{-1}} 585:multiplicative inverse 577: 557: 537: 517: 497: 470: 450: 430: 410: 387: 360: 308: 284: 2387:restic.readthedocs.io 2313: 2284: 2255: 2014: 1988: 1968: 1933: 1906: 1877: 1846: 1826: 1824:{\displaystyle c_{i}} 1797: 1777: 1742: 1683: 1638: 1582: 1555: 1505: 1460: 1434: 1414: 1412:{\displaystyle L-k+1} 1379: 1343: 1236: 1184: 1164: 1162:{\displaystyle c_{1}} 1137: 1102: 1082: 1059: 1017: 834: 814: 775: 751: 712: 631: 611: 578: 558: 538: 518: 498: 480:for more discussion. 471: 451: 431: 411: 388: 361: 309: 285: 27:Type of hash function 2311:{\displaystyle O(k)} 2293: 2282:{\displaystyle O(k)} 2264: 2244: 2074:0x0000d93003530000LL 1997: 1977: 1942: 1922: 1904:{\displaystyle S(n)} 1886: 1875:{\displaystyle H(n)} 1857: 1835: 1808: 1786: 1775:{\displaystyle S(n)} 1757: 1695: 1620: 1564: 1530: 1469: 1443: 1423: 1391: 1355: 1248: 1193: 1173: 1146: 1111: 1091: 1071: 1029: 850: 823: 788: 764: 721: 701: 620: 590: 567: 547: 527: 507: 487: 460: 440: 420: 400: 377: 318: 298: 152: 123:pairwise independent 2012:{\displaystyle n+1} 1458:{\displaystyle k-1} 314:is a constant, and 2308: 2279: 2250: 2167:+ 1     2130:// Skip the first 2009: 1983: 1963: 1928: 1901: 1872: 1841: 1821: 1792: 1772: 1737: 1678: 1599:data deduplication 1580:{\textstyle 2^{n}} 1577: 1550: 1500: 1455: 1429: 1409: 1374: 1338: 1241:. Then simply set 1231: 1179: 1159: 1132: 1097: 1077: 1054: 1012: 829: 809: 770: 746: 707: 695:tabulation hashing 626: 606: 573: 553: 533: 513: 493: 466: 446: 426: 406: 383: 356: 304: 280: 131:3-wise independent 2253:{\displaystyle k} 1986:{\displaystyle n} 1931:{\displaystyle n} 1844:{\displaystyle i} 1795:{\displaystyle n} 1548: 1432:{\displaystyle H} 1182:{\displaystyle k} 1100:{\displaystyle H} 1080:{\displaystyle H} 773:{\displaystyle s} 710:{\displaystyle h} 685:Cyclic polynomial 646:Rabin fingerprint 640:Rabin fingerprint 629:{\displaystyle H} 576:{\displaystyle a} 556:{\displaystyle a} 536:{\displaystyle H} 516:{\displaystyle a} 496:{\displaystyle H} 469:{\displaystyle n} 449:{\displaystyle n} 429:{\displaystyle a} 409:{\displaystyle n} 386:{\displaystyle H} 368:Rabin fingerprint 307:{\displaystyle a} 129:. They cannot be 116:Rabin fingerprint 73: 72: 16:(Redirected from 2649: 2615: 2614: 2612: 2611: 2591:(9): 2017–2031. 2576: 2567: 2566: 2538: 2532: 2531: 2525: 2521: 2519: 2511: 2509: 2508: 2485: 2479: 2473: 2467: 2466: 2464: 2463: 2449: 2440: 2439: 2431: 2425: 2419: 2410: 2403: 2397: 2396: 2394: 2393: 2379: 2373: 2362: 2317: 2315: 2314: 2309: 2288: 2286: 2285: 2280: 2259: 2257: 2256: 2251: 2018: 2016: 2015: 2010: 1992: 1990: 1989: 1984: 1972: 1970: 1969: 1964: 1937: 1935: 1934: 1929: 1910: 1908: 1907: 1902: 1881: 1879: 1878: 1873: 1850: 1848: 1847: 1842: 1830: 1828: 1827: 1822: 1820: 1819: 1801: 1799: 1798: 1793: 1781: 1779: 1778: 1773: 1746: 1744: 1743: 1738: 1687: 1685: 1684: 1679: 1674: 1673: 1663: 1658: 1612: 1586: 1584: 1583: 1578: 1576: 1575: 1559: 1557: 1556: 1551: 1549: 1547: 1546: 1534: 1509: 1507: 1506: 1501: 1499: 1498: 1464: 1462: 1461: 1456: 1439:and dismiss any 1438: 1436: 1435: 1430: 1418: 1416: 1415: 1410: 1383: 1381: 1380: 1375: 1373: 1372: 1347: 1345: 1344: 1339: 1331: 1330: 1300: 1299: 1281: 1280: 1240: 1238: 1237: 1232: 1224: 1223: 1205: 1204: 1188: 1186: 1185: 1180: 1168: 1166: 1165: 1160: 1158: 1157: 1141: 1139: 1138: 1133: 1106: 1104: 1103: 1098: 1086: 1084: 1083: 1078: 1063: 1061: 1060: 1055: 1050: 1049: 1021: 1019: 1018: 1013: 1005: 1004: 980: 979: 937: 936: 918: 917: 893: 892: 874: 873: 838: 836: 835: 830: 818: 816: 815: 810: 779: 777: 776: 771: 755: 753: 752: 747: 742: 741: 716: 714: 713: 708: 680: 679: 673: 672: 635: 633: 632: 627: 615: 613: 612: 607: 605: 604: 582: 580: 579: 574: 562: 560: 559: 554: 542: 540: 539: 534: 522: 520: 519: 514: 502: 500: 499: 494: 475: 473: 472: 467: 455: 453: 452: 447: 435: 433: 432: 427: 416:. The choice of 415: 413: 412: 407: 392: 390: 389: 384: 365: 363: 362: 357: 355: 354: 330: 329: 313: 311: 310: 305: 289: 287: 286: 281: 279: 278: 269: 268: 244: 243: 228: 227: 215: 214: 199: 198: 186: 185: 170: 169: 68: 65: 38: 37: 30: 21: 2657: 2656: 2652: 2651: 2650: 2648: 2647: 2646: 2632: 2631: 2623: 2618: 2609: 2607: 2578: 2577: 2570: 2540: 2539: 2535: 2522: 2512: 2506: 2504: 2502: 2487: 2486: 2482: 2474: 2470: 2461: 2459: 2451: 2450: 2443: 2433: 2432: 2428: 2420: 2413: 2404: 2400: 2391: 2389: 2381: 2380: 2376: 2370:arXiv:0705.4676 2363: 2354: 2350: 2332: 2324:circular shifts 2291: 2290: 2262: 2261: 2242: 2241: 2237: 2228: 2188:<< 1 ) + 2155:<< 1 ) + 2025: 1995: 1994: 1975: 1974: 1940: 1939: 1920: 1919: 1884: 1883: 1855: 1854: 1833: 1832: 1811: 1806: 1805: 1784: 1783: 1755: 1754: 1693: 1692: 1665: 1618: 1617: 1610: 1607: 1567: 1562: 1561: 1538: 1528: 1527: 1516: 1484: 1467: 1466: 1441: 1440: 1421: 1420: 1389: 1388: 1358: 1353: 1352: 1316: 1291: 1272: 1246: 1245: 1215: 1196: 1191: 1190: 1171: 1170: 1149: 1144: 1143: 1109: 1108: 1089: 1088: 1069: 1068: 1041: 1027: 1026: 996: 965: 928: 903: 884: 859: 848: 847: 839:be the bitwise 821: 820: 786: 785: 762: 761: 733: 719: 718: 699: 698: 687: 677: 675: 670: 668: 642: 618: 617: 593: 588: 587: 565: 564: 545: 544: 525: 524: 505: 504: 485: 484: 458: 457: 438: 437: 418: 417: 398: 397: 375: 374: 346: 321: 316: 315: 296: 295: 270: 260: 229: 219: 200: 190: 171: 161: 150: 149: 139: 133:, for example. 97:rapidly updated 69: 63: 60: 39: 35: 28: 23: 22: 15: 12: 11: 5: 2655: 2653: 2645: 2644: 2642:Hash functions 2634: 2633: 2630: 2629: 2622: 2621:External links 2619: 2617: 2616: 2568: 2533: 2524:|website= 2500: 2480: 2468: 2441: 2426: 2411: 2398: 2374: 2351: 2349: 2346: 2345: 2344: 2339: 2331: 2328: 2307: 2304: 2301: 2298: 2278: 2275: 2272: 2269: 2249: 2236: 2233: 2037: 2024: 2021: 2008: 2005: 2002: 1982: 1962: 1959: 1956: 1953: 1950: 1947: 1927: 1913: 1912: 1900: 1897: 1894: 1891: 1871: 1868: 1865: 1862: 1852: 1840: 1818: 1814: 1803: 1791: 1771: 1768: 1765: 1762: 1748: 1747: 1736: 1733: 1728: 1724: 1721: 1718: 1715: 1712: 1709: 1706: 1703: 1700: 1689: 1688: 1677: 1672: 1668: 1662: 1657: 1654: 1651: 1648: 1645: 1641: 1637: 1634: 1631: 1628: 1625: 1606: 1603: 1574: 1570: 1545: 1541: 1537: 1515: 1512: 1497: 1494: 1491: 1487: 1483: 1480: 1477: 1474: 1454: 1451: 1448: 1428: 1408: 1405: 1402: 1399: 1396: 1371: 1368: 1365: 1361: 1349: 1348: 1337: 1334: 1329: 1326: 1323: 1319: 1315: 1312: 1309: 1306: 1303: 1298: 1294: 1290: 1287: 1284: 1279: 1275: 1271: 1268: 1265: 1262: 1259: 1256: 1253: 1230: 1227: 1222: 1218: 1214: 1211: 1208: 1203: 1199: 1178: 1156: 1152: 1131: 1128: 1125: 1122: 1119: 1116: 1096: 1076: 1053: 1048: 1044: 1040: 1037: 1034: 1023: 1022: 1011: 1008: 1003: 999: 995: 992: 989: 986: 983: 978: 975: 972: 968: 964: 961: 958: 955: 952: 949: 946: 943: 940: 935: 931: 927: 924: 921: 916: 913: 910: 906: 902: 899: 896: 891: 887: 883: 880: 877: 872: 869: 866: 862: 858: 855: 828: 808: 805: 802: 799: 796: 793: 782:circular shift 769: 745: 740: 736: 732: 729: 726: 706: 686: 683: 641: 638: 625: 603: 600: 596: 572: 552: 532: 512: 492: 465: 445: 425: 405: 382: 370:, see below). 353: 349: 345: 342: 339: 336: 333: 328: 324: 303: 292: 291: 277: 273: 267: 263: 259: 256: 253: 250: 247: 242: 239: 236: 232: 226: 222: 218: 213: 210: 207: 203: 197: 193: 189: 184: 181: 178: 174: 168: 164: 160: 157: 138: 135: 89:moving average 71: 70: 42: 40: 33: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 2654: 2643: 2640: 2639: 2637: 2628: 2625: 2624: 2620: 2606: 2602: 2598: 2594: 2590: 2586: 2582: 2575: 2573: 2569: 2564: 2560: 2556: 2552: 2548: 2544: 2537: 2534: 2529: 2517: 2503: 2501:9781931971300 2497: 2493: 2492: 2484: 2481: 2477: 2472: 2469: 2458: 2454: 2448: 2446: 2442: 2437: 2430: 2427: 2423: 2418: 2416: 2412: 2408: 2402: 2399: 2388: 2384: 2378: 2375: 2371: 2367: 2361: 2359: 2357: 2353: 2347: 2343: 2340: 2337: 2334: 2333: 2329: 2327: 2325: 2321: 2320:exclusive ors 2302: 2296: 2273: 2267: 2247: 2234: 2232: 2227: 2224: 2220: 2216: 2213: 2210: 2207: 2203: 2199: 2195: 2191: 2187: 2183: 2180: 2177: 2173: 2170: 2166: 2162: 2158: 2154: 2150: 2147: 2144: 2140: 2137: 2133: 2129: 2125: 2122: 2119: 2115: 2112: 2109: 2106: 2103: 2100: 2096: 2093: 2089: 2085: 2082: 2078: 2075: 2071: 2067: 2063: 2060: 2056: 2052: 2048: 2044: 2040: 2036: 2033: 2029: 2022: 2020: 2006: 2003: 2000: 1980: 1960: 1957: 1951: 1945: 1925: 1916: 1895: 1889: 1866: 1860: 1853: 1838: 1816: 1812: 1804: 1789: 1766: 1760: 1753: 1752: 1751: 1734: 1731: 1726: 1719: 1713: 1710: 1704: 1698: 1691: 1690: 1675: 1670: 1666: 1660: 1655: 1652: 1649: 1646: 1643: 1639: 1635: 1629: 1623: 1616: 1615: 1614: 1604: 1602: 1600: 1595: 1593: 1588: 1572: 1568: 1543: 1539: 1535: 1525: 1520: 1513: 1511: 1495: 1492: 1489: 1485: 1481: 1478: 1472: 1452: 1449: 1446: 1426: 1406: 1403: 1400: 1397: 1394: 1385: 1369: 1366: 1363: 1359: 1335: 1327: 1324: 1321: 1317: 1310: 1307: 1296: 1292: 1285: 1277: 1273: 1269: 1263: 1257: 1251: 1244: 1243: 1242: 1220: 1216: 1209: 1201: 1197: 1176: 1154: 1150: 1126: 1120: 1114: 1094: 1074: 1065: 1046: 1042: 1038: 1035: 1009: 1001: 997: 990: 987: 976: 973: 970: 966: 959: 953: 950: 947: 944: 933: 929: 922: 914: 911: 908: 904: 900: 889: 885: 878: 870: 867: 864: 860: 856: 853: 846: 845: 844: 842: 826: 806: 803: 797: 791: 783: 767: 759: 738: 734: 730: 727: 704: 696: 692: 691:barrel shifts 684: 682: 666: 660: 658: 654: 651: 647: 639: 637: 623: 601: 598: 594: 586: 570: 550: 530: 510: 490: 481: 479: 463: 443: 423: 403: 396: 380: 371: 369: 351: 347: 343: 340: 337: 334: 331: 326: 322: 301: 275: 271: 265: 261: 257: 254: 251: 248: 245: 240: 237: 234: 230: 224: 220: 216: 211: 208: 205: 201: 195: 191: 187: 182: 179: 176: 172: 166: 162: 158: 155: 148: 147: 146: 144: 136: 134: 132: 128: 124: 119: 117: 113: 109: 105: 100: 98: 94: 90: 84: 82: 81:hash function 78: 67: 58: 54: 53: 48: 47: 41: 32: 31: 19: 2608:. Retrieved 2588: 2584: 2546: 2542: 2536: 2505:. Retrieved 2490: 2483: 2471: 2460:. Retrieved 2456: 2429: 2406: 2401: 2390:. Retrieved 2386: 2377: 2365: 2238: 2229: 2225: 2222: 2218: 2214: 2211: 2208: 2205: 2201: 2197: 2193: 2189: 2185: 2181: 2178: 2175: 2171: 2168: 2164: 2160: 2156: 2152: 2148: 2145: 2142: 2138: 2135: 2131: 2127: 2123: 2120: 2117: 2113: 2110: 2107: 2104: 2101: 2098: 2094: 2091: 2087: 2083: 2080: 2076: 2073: 2069: 2065: 2061: 2058: 2054: 2050: 2046: 2045:data buffer 2042: 2041:FastCDC 2038: 2034: 2030: 2026: 1917: 1914: 1851:of the file, 1749: 1608: 1596: 1589: 1523: 1521: 1517: 1386: 1350: 1066: 1024: 841:exclusive or 688: 661: 650:Galois field 643: 482: 372: 293: 140: 125:or strongly 120: 101: 93:Zobrist hash 85: 77:rolling hash 76: 74: 61: 50: 44: 2549:: 258–272. 2342:w-shingling 1611:--rsyncable 64:August 2020 2610:2020-07-22 2507:2020-07-24 2462:2018-05-24 2392:2018-05-24 2192:] 2159:] 2057:cut point 1918:For every 758:hash table 2605:215817722 2563:0166-5316 2526:ignored ( 2516:cite book 2348:Footnotes 2260:requires 2184:← ( 2151:← ( 2039:algorithm 1653:− 1640:∑ 1493:− 1482:÷ 1476:→ 1450:− 1398:− 1308:⊕ 1270:⊕ 1255:← 1118:← 988:⊕ 974:− 951:⊕ 948:… 945:⊕ 912:− 901:⊕ 868:− 827:⊕ 616:by which 599:− 238:− 209:− 180:− 127:universal 2636:Category 2330:See also 2318:bitwise 2217:← 2196: !( 2163:← 2126:← 2086:← 2079:← 2072:← 1831:is byte 1189:times: 112:adler-32 2424:. 2015. 2336:MinHash 2143:MinSize 2132:MinSize 2128:MaxSize 2118:MaxSize 2099:MinSize 2066:MaxSize 2062:MinSize 2055:output: 2053:, 95:can be 57:discuss 18:Buzhash 2603:  2561:  2498:  2223:return 2209:return 2200:& 2105:return 2043:input: 1938:where 1750:where 1351:where 1142:. If 1107:once: 819:. Let 665:restic 395:modulo 294:where 2601:S2CID 2174:< 2169:while 2141:< 2136:while 657:CRC32 653:GF(2) 108:rsync 46:split 2559:ISSN 2528:help 2496:ISBN 2322:and 2206:then 2202:Mask 2190:Gear 2157:Gear 2121:then 2102:then 2070:Mask 1993:and 1732:4096 1656:8195 1592:Borg 674:and 644:The 436:and 141:The 2593:doi 2551:doi 2047:src 1727:mod 807:011 798:101 678:MiB 671:KiB 669:512 543:by 503:by 59:) 55:. ( 2638:: 2599:. 2589:31 2587:. 2583:. 2571:^ 2557:. 2547:79 2545:. 2520:: 2518:}} 2514:{{ 2494:. 2455:. 2444:^ 2414:^ 2385:. 2355:^ 2326:. 2204:) 2198:fp 2194:if 2186:fp 2182:fp 2179:do 2153:fp 2149:fp 2146:do 2116:≥ 2111:if 2097:≤ 2092:if 2077:fp 1958:== 1601:. 1510:. 1064:. 681:. 75:A 2613:. 2595:: 2565:. 2553:: 2530:) 2510:. 2478:. 2465:. 2438:. 2407:n 2395:. 2372:. 2366:n 2306:) 2303:k 2300:( 2297:O 2277:) 2274:k 2271:( 2268:O 2248:k 2226:i 2219:i 2215:i 2212:i 2176:n 2172:i 2165:i 2161:i 2139:i 2124:n 2114:n 2108:n 2095:n 2088:0 2084:i 2081:0 2059:i 2051:n 2007:1 2004:+ 2001:n 1981:n 1961:0 1955:) 1952:n 1949:( 1946:H 1926:n 1911:. 1899:) 1896:n 1893:( 1890:S 1870:) 1867:n 1864:( 1861:H 1839:i 1817:i 1813:c 1790:n 1770:) 1767:n 1764:( 1761:S 1735:, 1723:) 1720:n 1717:( 1714:S 1711:= 1708:) 1705:n 1702:( 1699:H 1676:, 1671:i 1667:c 1661:n 1650:n 1647:= 1644:i 1636:= 1633:) 1630:n 1627:( 1624:S 1573:n 1569:2 1544:n 1540:2 1536:1 1524:N 1496:1 1490:k 1486:2 1479:H 1473:H 1453:1 1447:k 1427:H 1407:1 1404:+ 1401:k 1395:L 1370:1 1367:+ 1364:k 1360:c 1336:, 1333:) 1328:1 1325:+ 1322:k 1318:c 1314:( 1311:h 1305:) 1302:) 1297:1 1293:c 1289:( 1286:h 1283:( 1278:k 1274:s 1267:) 1264:H 1261:( 1258:s 1252:H 1229:) 1226:) 1221:1 1217:c 1213:( 1210:h 1207:( 1202:k 1198:s 1177:k 1155:1 1151:c 1130:) 1127:H 1124:( 1121:s 1115:H 1095:H 1075:H 1052:) 1047:L 1043:2 1039:, 1036:0 1033:[ 1010:, 1007:) 1002:k 998:c 994:( 991:h 985:) 982:) 977:1 971:k 967:c 963:( 960:h 957:( 954:s 942:) 939:) 934:2 930:c 926:( 923:h 920:( 915:2 909:k 905:s 898:) 895:) 890:1 886:c 882:( 879:h 876:( 871:1 865:k 861:s 857:= 854:H 804:= 801:) 795:( 792:s 768:s 744:) 739:L 735:2 731:, 728:0 725:[ 705:h 676:8 624:H 602:1 595:a 571:a 551:a 531:H 511:a 491:H 464:n 444:n 424:a 404:n 381:H 352:k 348:c 344:, 341:. 338:. 335:. 332:, 327:1 323:c 302:a 290:, 276:0 272:a 266:k 262:c 258:+ 255:. 252:. 249:. 246:+ 241:3 235:k 231:a 225:3 221:c 217:+ 212:2 206:k 202:a 196:2 192:c 188:+ 183:1 177:k 173:a 167:1 163:c 159:= 156:H 66:) 62:( 20:)

Index

Buzhash
split
Content-Defined Chunking
discuss
hash function
moving average
Zobrist hash
rapidly updated
Rabin–Karp string search algorithm
rsync
adler-32
Rabin fingerprint
pairwise independent
universal
3-wise independent
Rabin–Karp string search algorithm
Rabin fingerprint
modulo
linear congruential generator
multiplicative inverse
Rabin fingerprint
Galois field
GF(2)
CRC32
restic
barrel shifts
tabulation hashing
hash table
circular shift
exclusive or

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑