Knowledge (XXG)

Primary clustering

Source 📝

714:
Ordered linear probing (often referred to as Robin Hood hashing) is a technique for reducing the effects of primary clustering on queries. Ordered linear probing sorts the elements within each run by their hash. Thus, a query can terminate as soon as it encounters any element whose hash is larger
747:
Graveyard hashing is a variant of ordered linear probing that eliminates the asymptotic effects of primary clustering for all operations. Graveyard hashing strategically leaves gaps within runs that future insertions can make use of. These gaps, which can be thought of as tombstones (like those
632:
Many textbooks describe the winner-keeps-winning effect (in which the longer a run becomes, the more likely it is to accrue additional elements) as the sole cause of primary clustering. However, as noted by Knuth, this is not the main cause of primary clustering.
31:. The phenomenon states that, as elements are added to a linear probing hash table, they have a tendency to cluster together into long runs (i.e., long contiguous regions of the hash table that contain no free slots). If the hash table is at a load factor of 669:
element. Some positive queries may have much larger expected running times, however. For example, if one inserts an element and then immediately queries that element, the query will take the same amount of time as did the insertion, which is
752:), are inserted into the table during semi-regular rebuilds. The gaps then speed up the insertions that take place until the next semi-regular rebuild occurs. Every operation in a graveyard hash table takes expected time 198:
The longer that a run becomes, the more likely it is to accrue additional elements. This causes a positive feedback loop that contributes to the clustering effect. However, this alone would not cause the quadratic
213:
Another way to understand primary clustering is by examining the standard deviation on the number of items that hash to a given region within the hash table. Consider a sub-region of the hash table of size
439:
Primary clustering causes performance degradation for both insertions and queries in a linear probing hash table. Insertions must travel to the end of a run, and therefore take expected time
587:(i.e., using tombstones for deletions), as long as the hash table is rebuilt (and the tombstones are cleared out) semi-frequently. It suffices to perform such a rebuild at least once every 308: 704: 578: 509: 475:. Negative queries (i.e., queries that are searching for an element that turns out not to be present) must also travel to the end of a run, and thus also take expected time 473: 429: 181: 145: 663: 542: 366: 337: 1182: 89: 622: 1186: 393: 239: 63: 779: 742: 1307: 1028: 109: 431:
will often overflow, while larger regions typically will not. This intuition is often used as the starting point for formal analyses of primary clustering.
1248: 1123: 1004: 885: 511:. Positive queries can terminate as soon as they find the element that they are searching for. As a result, the expected time to query a 1283: 1211: 1158: 1088: 1053: 930: 831: 544:. However, positive queries to recently-inserted elements (e.g., an element that was just inserted) take expected time 209:
together two runs that were already relatively long. This is what causes the quadratic blowup in expected run length.
715:
than that of the element being queried. This results in both positive and negative queries taking expected time
244: 956:"Tabulation-Based 5-Independent Hashing with Applications to Linear Probing and Second Moment Estimation" 999:. Charles Eric Leiserson, Ronald L. Rivest, Clifford Stein (Fourth ed.). Cambridge, Massachusetts. 205:
A single insertion may not only increase the length of the run that it is in by one, but may instead
20: 673: 547: 478: 442: 398: 150: 114: 1301: 1176: 1022: 936: 891: 639: 518: 342: 313: 788:
as an alternative to linear probing that empirically avoids the effects of primary clustering.
1289: 1279: 1254: 1244: 1217: 1207: 1164: 1154: 1129: 1119: 1094: 1084: 1059: 1049: 1010: 1000: 975: 926: 881: 837: 827: 785: 68: 1373: 1334: 967: 918: 873: 590: 371: 217: 34: 755: 718: 94: 28: 1367: 895: 749: 584: 940: 865: 877: 1339: 1258: 1014: 1133: 979: 910: 864:
Bender, Michael A.; Kuszmaul, Bradley C.; Kuszmaul, William (February 2022).
1293: 1168: 1098: 1063: 922: 866:"Linear Probing Revisited: Tombstones Mark the Demise of Primary Clustering" 841: 1238: 1221: 994: 915:
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
310:. On the other hand, the standard deviation on the number of such items is 1322: 1113: 1273: 1148: 1078: 870:
2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)
821: 1201: 955: 1353:
Celis, Pedro, Per-Ake Larson, and J. Ian Munro. "Robin hood hashing."
971: 368:, the number of items that hash into the region will exceed the size 1355:
26th Annual Symposium on Foundations of Computer Science (sfcs 1985)
1043: 636:
Some textbooks state that the expected time for a positive query is
147:. This causes insertions and negative queries to take expected time 91:, then the expected length of the run containing a given element 823:
The art of computer programming, volume 3, sorting and searching
395:
of the region. Intuitively, this means that regions of size
241:. The expected number of items that hash into the region is 665:, typically citing Knuth. This is true for a query to a 1083:(2nd ed.). Englewood Cliffs, N.J.: Prentice-Hall. 27:
is a phenomenon that causes performance degradation in
909:
Pagh, Anna; Pagh, Rasmus; Ruzic, Milan (2007-06-11).
758: 721: 676: 642: 593: 550: 521: 481: 445: 401: 374: 345: 316: 247: 220: 153: 117: 97: 71: 37: 1203:
An introduction to data structures with applications
826:. Reading, Mass.: Addison-Wesley. pp. 527–528. 773: 736: 698: 657: 616: 572: 536: 503: 467: 423: 387: 360: 331: 302: 233: 175: 139: 103: 83: 57: 1153:. Sudbury, Mass.: Jones and Bartlett Publishers. 583:These bounds also hold for linear probing with 1115:Data structures and algorithms with JavaScript 8: 1240:Handbook of Data Structures and Applications 1181:: CS1 maint: multiple names: authors list ( 917:. New York, NY, USA: ACM. pp. 318–327. 954:Thorup, Mikkel; Zhang, Yin (January 2012). 911:"Linear probing with constant independence" 1306:: CS1 maint: location missing publisher ( 1278:(Third ed.). Reading, Massachusetts. 1185:) CS1 maint: numeric names: authors list ( 1027:: CS1 maint: location missing publisher ( 710:Techniques for avoiding primary clustering 1338: 1206:. P. G. Sorenson. New York: McGraw-Hill. 757: 720: 687: 675: 641: 597: 592: 561: 549: 520: 492: 480: 456: 444: 412: 400: 379: 373: 344: 315: 288: 275: 260: 246: 225: 219: 164: 152: 128: 116: 96: 70: 47: 36: 797: 1299: 1174: 1020: 1233: 1231: 7: 859: 857: 855: 853: 851: 815: 813: 811: 809: 807: 805: 803: 801: 339:. It follows that, with probability 303:{\displaystyle (1-1/x)x^{2}=x^{2}-x} 191:Primary clustering has two causes: 1147:Smith, Peter, February 1- (2004). 1080:Data structures and program design 784:Many sources recommend the use of 677: 643: 551: 522: 482: 446: 402: 346: 317: 154: 118: 16:Phenomenon observed in hash tables 14: 1150:Applied data structures with C++ 183:in a linear-probing hash table. 768: 762: 731: 725: 699:{\displaystyle \Theta (x^{2})} 693: 680: 652: 646: 611: 602: 573:{\displaystyle \Theta (x^{2})} 567: 554: 531: 525: 504:{\displaystyle \Theta (x^{2})} 498: 485: 468:{\displaystyle \Theta (x^{2})} 462: 449: 424:{\displaystyle \Theta (x^{2})} 418: 405: 355: 349: 326: 320: 268: 248: 176:{\displaystyle \Theta (x^{2})} 170: 157: 140:{\displaystyle \Theta (x^{2})} 134: 121: 1: 515:element in the hash table is 1200:Tremblay, Jean-Paul (1976). 1118:. Sebastopol, CA: O'Reilly. 878:10.1109/focs52979.2021.00115 872:. IEEE. pp. 1171–1182. 820:Knuth, Donald Ervin (1997). 187:Causes of primary clustering 1390: 1272:Sedgewick, Robert (1998). 1112:McMillan, Michael (2014). 996:Introduction to algorithms 993:Cormen, Thomas H. (2022). 658:{\displaystyle \Theta (x)} 537:{\displaystyle \Theta (x)} 361:{\displaystyle \Omega (1)} 332:{\displaystyle \Theta (x)} 29:linear-probing hash tables 1077:Kruse, Robert L. (1987). 960:SIAM Journal on Computing 1340:10.1093/comjnl/17.2.135 923:10.1145/1250790.1250839 84:{\displaystyle x\geq 2} 1042:Drozdek, Adam (1995). 775: 738: 700: 659: 618: 617:{\displaystyle n/(2x)} 574: 538: 505: 469: 425: 389: 362: 333: 304: 235: 177: 141: 105: 85: 59: 1323:"Ordered hash tables" 1321:Amble, Knuth (1974). 1243:. : CRC PRESS. 2020. 776: 739: 701: 660: 628:Common misconceptions 619: 575: 539: 506: 470: 435:Effect on performance 426: 390: 388:{\displaystyle x^{2}} 363: 334: 305: 236: 234:{\displaystyle x^{2}} 196:Winner keeps winning: 178: 142: 106: 86: 60: 58:{\displaystyle 1-1/x} 1327:The Computer Journal 1045:Data structures in C 774:{\displaystyle O(x)} 756: 737:{\displaystyle O(x)} 719: 674: 640: 591: 548: 519: 479: 443: 399: 372: 343: 314: 245: 218: 151: 115: 95: 69: 35: 21:computer programming 65:for some parameter 771: 734: 696: 655: 614: 570: 534: 501: 465: 421: 385: 358: 329: 300: 231: 173: 137: 101: 81: 55: 25:primary clustering 1250:978-0-367-57200-6 1125:978-1-4493-6493-9 1006:978-0-262-04630-5 972:10.1137/100800774 887:978-1-6654-2055-6 786:quadratic probing 104:{\displaystyle u} 1381: 1358: 1351: 1345: 1344: 1342: 1318: 1312: 1311: 1305: 1297: 1269: 1263: 1262: 1235: 1226: 1225: 1197: 1191: 1190: 1180: 1172: 1144: 1138: 1137: 1109: 1103: 1102: 1074: 1068: 1067: 1039: 1033: 1032: 1026: 1018: 990: 984: 983: 951: 945: 944: 906: 900: 899: 861: 846: 845: 817: 780: 778: 777: 772: 743: 741: 740: 735: 706:in expectation. 705: 703: 702: 697: 692: 691: 664: 662: 661: 656: 623: 621: 620: 615: 601: 579: 577: 576: 571: 566: 565: 543: 541: 540: 535: 510: 508: 507: 502: 497: 496: 474: 472: 471: 466: 461: 460: 430: 428: 427: 422: 417: 416: 394: 392: 391: 386: 384: 383: 367: 365: 364: 359: 338: 336: 335: 330: 309: 307: 306: 301: 293: 292: 280: 279: 264: 240: 238: 237: 232: 230: 229: 203:Joining of runs: 182: 180: 179: 174: 169: 168: 146: 144: 143: 138: 133: 132: 110: 108: 107: 102: 90: 88: 87: 82: 64: 62: 61: 56: 51: 1389: 1388: 1384: 1383: 1382: 1380: 1379: 1378: 1364: 1363: 1362: 1361: 1352: 1348: 1320: 1319: 1315: 1298: 1286: 1275:Algorithms in C 1271: 1270: 1266: 1251: 1237: 1236: 1229: 1214: 1199: 1198: 1194: 1173: 1161: 1146: 1145: 1141: 1126: 1111: 1110: 1106: 1091: 1076: 1075: 1071: 1056: 1048:. PWS Pub. Co. 1041: 1040: 1036: 1019: 1007: 992: 991: 987: 953: 952: 948: 933: 908: 907: 903: 888: 863: 862: 849: 834: 819: 818: 799: 794: 754: 753: 717: 716: 712: 683: 672: 671: 638: 637: 630: 589: 588: 557: 546: 545: 517: 516: 488: 477: 476: 452: 441: 440: 437: 408: 397: 396: 375: 370: 369: 341: 340: 312: 311: 284: 271: 243: 242: 221: 216: 215: 189: 160: 149: 148: 124: 113: 112: 93: 92: 67: 66: 33: 32: 17: 12: 11: 5: 1387: 1385: 1377: 1376: 1366: 1365: 1360: 1359: 1346: 1333:(2): 135–142. 1313: 1284: 1264: 1249: 1227: 1212: 1192: 1159: 1139: 1124: 1104: 1089: 1069: 1054: 1034: 1005: 985: 966:(2): 293–331. 946: 931: 901: 886: 847: 832: 796: 795: 793: 790: 770: 767: 764: 761: 750:lazy deletions 733: 730: 727: 724: 711: 708: 695: 690: 686: 682: 679: 654: 651: 648: 645: 629: 626: 613: 610: 607: 604: 600: 596: 585:lazy deletions 569: 564: 560: 556: 553: 533: 530: 527: 524: 500: 495: 491: 487: 484: 464: 459: 455: 451: 448: 436: 433: 420: 415: 411: 407: 404: 382: 378: 357: 354: 351: 348: 328: 325: 322: 319: 299: 296: 291: 287: 283: 278: 274: 270: 267: 263: 259: 256: 253: 250: 228: 224: 211: 210: 200: 188: 185: 172: 167: 163: 159: 156: 136: 131: 127: 123: 120: 100: 80: 77: 74: 54: 50: 46: 43: 40: 15: 13: 10: 9: 6: 4: 3: 2: 1386: 1375: 1372: 1371: 1369: 1357:. IEEE, 1985. 1356: 1350: 1347: 1341: 1336: 1332: 1328: 1324: 1317: 1314: 1309: 1303: 1295: 1291: 1287: 1285:0-201-31452-5 1281: 1277: 1276: 1268: 1265: 1260: 1256: 1252: 1246: 1242: 1241: 1234: 1232: 1228: 1223: 1219: 1215: 1213:0-07-065150-7 1209: 1205: 1204: 1196: 1193: 1188: 1184: 1178: 1170: 1166: 1162: 1160:0-7637-2562-5 1156: 1152: 1151: 1143: 1140: 1135: 1131: 1127: 1121: 1117: 1116: 1108: 1105: 1100: 1096: 1092: 1090:0-13-195884-4 1086: 1082: 1081: 1073: 1070: 1065: 1061: 1057: 1055:0-534-93495-1 1051: 1047: 1046: 1038: 1035: 1030: 1024: 1016: 1012: 1008: 1002: 998: 997: 989: 986: 981: 977: 973: 969: 965: 961: 957: 950: 947: 942: 938: 934: 932:9781595936318 928: 924: 920: 916: 912: 905: 902: 897: 893: 889: 883: 879: 875: 871: 867: 860: 858: 856: 854: 852: 848: 843: 839: 835: 833:0-201-89683-4 829: 825: 824: 816: 814: 812: 810: 808: 806: 804: 802: 798: 791: 789: 787: 782: 765: 759: 751: 745: 728: 722: 709: 707: 688: 684: 668: 649: 634: 627: 625: 608: 605: 598: 594: 586: 581: 562: 558: 528: 514: 493: 489: 457: 453: 434: 432: 413: 409: 380: 376: 352: 323: 297: 294: 289: 285: 281: 276: 272: 265: 261: 257: 254: 251: 226: 222: 208: 204: 201: 197: 194: 193: 192: 186: 184: 165: 161: 129: 125: 98: 78: 75: 72: 52: 48: 44: 41: 38: 30: 26: 22: 1354: 1349: 1330: 1326: 1316: 1274: 1267: 1239: 1202: 1195: 1149: 1142: 1114: 1107: 1079: 1072: 1044: 1037: 995: 988: 963: 959: 949: 914: 904: 869: 822: 783: 746: 713: 666: 635: 631: 624:insertions. 582: 512: 438: 212: 206: 202: 195: 190: 24: 18: 748:created by 1259:1156995269 1015:1264174621 792:References 1302:cite book 1177:cite book 1134:876268837 1023:cite book 980:0097-5397 896:235731820 678:Θ 644:Θ 552:Θ 523:Θ 483:Θ 447:Θ 403:Θ 347:Ω 318:Θ 295:− 255:− 155:Θ 119:Θ 76:≥ 42:− 1368:Category 1294:37141168 1169:53138521 1099:13823328 1064:31077222 842:36241708 1374:Hashing 1222:1858301 941:7523004 207:connect 199:blowup. 1292:  1282:  1257:  1247:  1220:  1210:  1167:  1157:  1132:  1122:  1097:  1087:  1062:  1052:  1013:  1003:  978:  939:  929:  894:  884:  840:  830:  667:random 513:random 937:S2CID 892:S2CID 1308:link 1290:OCLC 1280:ISBN 1255:OCLC 1245:ISBN 1218:OCLC 1208:ISBN 1187:link 1183:link 1165:OCLC 1155:ISBN 1130:OCLC 1120:ISBN 1095:OCLC 1085:ISBN 1060:OCLC 1050:ISBN 1029:link 1011:OCLC 1001:ISBN 976:ISSN 927:ISBN 882:ISBN 838:OCLC 828:ISBN 1335:doi 968:doi 919:doi 874:doi 111:is 19:In 1370:: 1331:17 1329:. 1325:. 1304:}} 1300:{{ 1288:. 1253:. 1230:^ 1216:. 1179:}} 1175:{{ 1163:. 1128:. 1093:. 1058:. 1025:}} 1021:{{ 1009:. 974:. 964:41 962:. 958:. 935:. 925:. 913:. 890:. 880:. 868:. 850:^ 836:. 800:^ 781:. 744:. 580:. 23:, 1343:. 1337:: 1310:) 1296:. 1261:. 1224:. 1189:) 1171:. 1136:. 1101:. 1066:. 1031:) 1017:. 982:. 970:: 943:. 921:: 898:. 876:: 844:. 769:) 766:x 763:( 760:O 732:) 729:x 726:( 723:O 694:) 689:2 685:x 681:( 653:) 650:x 647:( 612:) 609:x 606:2 603:( 599:/ 595:n 568:) 563:2 559:x 555:( 532:) 529:x 526:( 499:) 494:2 490:x 486:( 463:) 458:2 454:x 450:( 419:) 414:2 410:x 406:( 381:2 377:x 356:) 353:1 350:( 327:) 324:x 321:( 298:x 290:2 286:x 282:= 277:2 273:x 269:) 266:x 262:/ 258:1 252:1 249:( 227:2 223:x 171:) 166:2 162:x 158:( 135:) 130:2 126:x 122:( 99:u 79:2 73:x 53:x 49:/ 45:1 39:1

Index

computer programming
linear-probing hash tables
lazy deletions
lazy deletions
quadratic probing








The art of computer programming, volume 3, sorting and searching
ISBN
0-201-89683-4
OCLC
36241708





"Linear Probing Revisited: Tombstones Mark the Demise of Primary Clustering"
doi
10.1109/focs52979.2021.00115
ISBN
978-1-6654-2055-6
S2CID
235731820

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.