Knowledge (XXG)

Pearson hashing

Source 📝

122:
will result in the same hash value; using a too complex function, on the other hand, will affect speed negatively. Using a function rather than a table also allows extending the block size. Such functions naturally have to be
113:
with a program memory size on the order of hundreds of bytes. A workaround to this is to use a simple permutation function instead of a table stored in program memory. However, using a too simple function, such as
30:. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent on every byte of the input. Its implementation requires only a few instructions, plus a 256-byte 101:
Two input strings differing by exactly one character never collide. E.g., applying the algorithm on the strings ABC and AEC will never produce the same value.
387: 483: 478: 367: 23: 191: 437: 94:), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a 395: 412: 58: 95: 46: 446: 27: 456: 435:
Lemire, Daniel (2012), "The universality of iterated hashing over variable-length strings",
404: 110: 106: 80: 472: 109:
is the suggested 256 byte lookup table, which can be prohibitively large for a small
87: 31: 35: 460: 131: 105:
One of its drawbacks when compared with other hashing algorithms designed for
62: 124: 57:
has negligible cryptographic security, so the Pearson hash function is not
91: 66: 408: 119: 42: 54: 451: 178:) may be initialized differently, e.g. to the length of the data ( 50: 118:, partly defeats the usability as a hash function as 26:
designed for fast execution on processors with 8-bit
179: 175: 76:
It executes quickly on resource-limited processors.
130:The algorithm can be described by the following 86:Given a small, privileged set of inputs (e.g., 69:, for which purposes it offers these benefits: 388:"Fast Hashing of Variable-Length Text Strings" 79:There is no simple class of inputs for which 8: 134:, which computes the hash of message  83:(identical outputs) are especially likely. 450: 378: 7: 61:, but it is useful for implementing 14: 484:Hash function (non-cryptographic) 138:using the permutation table  368:Non-cryptographic hash functions 386:Pearson, Peter K. (June 1990), 24:non-cryptographic hash function 479:Error detection and correction 1: 127:, like their table variants. 38:of the values 0 through 255. 438:Discrete Applied Mathematics 500: 251:/* Permutation of 0-255 */ 461:10.1016/j.dam.2011.11.009 396:Communications of the ACM 67:data integrity check code 197: 59:cryptographically strong 41:This hash function is a 16:Fast 8-bit hash function 186:Example implementations 73:It is extremely simple. 96:perfect hash function 49:implemented via the 409:10.1145/78973.78978 174:The hash variable ( 47:substitution cipher 45:that uses an 8-bit 51:substitution table 491: 464: 463: 454: 445:(4–5): 604–617, 432: 426: 425: 424: 423: 417: 411:, archived from 392: 383: 357: 354: 351: 348: 345: 342: 339: 336: 333: 330: 327: 324: 321: 318: 315: 312: 309: 306: 303: 300: 297: 294: 291: 288: 285: 282: 279: 276: 273: 270: 267: 264: 261: 258: 255: 252: 249: 246: 243: 240: 237: 234: 231: 228: 225: 222: 219: 216: 213: 210: 207: 204: 201: 181: 177: 164:h := T 152:h := 0 148:pearson hashing 117: 107:8-bit processors 499: 498: 494: 493: 492: 490: 489: 488: 469: 468: 467: 434: 433: 429: 421: 419: 415: 390: 385: 384: 380: 376: 364: 359: 358: 355: 352: 349: 346: 343: 340: 337: 334: 331: 328: 325: 322: 319: 316: 313: 310: 307: 304: 301: 298: 295: 292: 289: 286: 283: 280: 277: 274: 271: 268: 265: 262: 259: 256: 253: 250: 247: 244: 241: 238: 235: 232: 229: 226: 223: 220: 217: 214: 211: 208: 205: 202: 199: 196: 188: 172: 115: 111:microcontroller 20:Pearson hashing 17: 12: 11: 5: 497: 495: 487: 486: 481: 471: 470: 466: 465: 427: 377: 375: 372: 371: 370: 363: 360: 206:PearsonHashing 198: 195: 189: 187: 184: 182:) modulo 256. 144: 103: 102: 99: 88:reserved words 84: 77: 74: 15: 13: 10: 9: 6: 4: 3: 2: 496: 485: 482: 480: 477: 476: 474: 462: 458: 453: 448: 444: 440: 439: 431: 428: 418:on 2012-07-04 414: 410: 406: 402: 398: 397: 389: 382: 379: 373: 369: 366: 365: 361: 193: 190: 185: 183: 170: 167: 163: 159: 155: 151: 147: 143: 141: 137: 133: 128: 126: 121: 112: 108: 100: 97: 93: 89: 85: 82: 78: 75: 72: 71: 70: 68: 64: 60: 56: 52: 48: 44: 39: 37: 34:containing a 33: 29: 25: 21: 442: 436: 430: 420:, retrieved 413:the original 400: 394: 381: 173: 168: 165: 161: 157: 153: 149: 145: 139: 135: 129: 104: 40: 32:lookup table 19: 18: 63:hash tables 53:. An 8-bit 36:permutation 473:Categories 422:2013-07-13 403:(6): 677, 374:References 132:pseudocode 81:collisions 452:1008.1715 146:algorithm 125:bijective 116:T = 255-i 28:registers 362:See also 293:GetBytes 281:Encoding 166:end loop 154:for each 120:anagrams 92:compiler 65:or as a 305:foreach 194:, 8-bit 43:CBC-MAC 344:return 227:string 215:static 212:public 200:public 169:return 90:for a 55:cipher 447:arXiv 416:(PDF) 391:(PDF) 320:bytes 299:input 275:bytes 230:input 203:class 22:is a 347:hash 329:hash 311:byte 287:UTF8 272:byte 260:hash 257:byte 239:byte 221:Hash 218:byte 162:loop 457:doi 443:160 405:doi 475:: 455:, 441:, 401:33 399:, 393:, 317:in 302:); 254:}; 192:C# 171:h 160:C 158:in 156:c 150:is 142:: 459:: 449:: 407:: 356:} 353:} 350:; 341:} 338:; 335:T 332:= 326:{ 323:) 314:b 308:( 296:( 290:. 284:. 278:= 269:; 266:0 263:= 248:{ 245:= 242:T 236:{ 233:) 224:( 209:{ 180:C 176:h 140:T 136:C 98:.

Index

non-cryptographic hash function
registers
lookup table
permutation
CBC-MAC
substitution cipher
substitution table
cipher
cryptographically strong
hash tables
data integrity check code
collisions
reserved words
compiler
perfect hash function
8-bit processors
microcontroller
anagrams
bijective
pseudocode
C#
Non-cryptographic hash functions
"Fast Hashing of Variable-Length Text Strings"
Communications of the ACM
doi
10.1145/78973.78978
the original
Discrete Applied Mathematics
arXiv
1008.1715

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.