Knowledge (XXG)

Cache stampede

Source 📝

78:
of the system via exhausting shared resources. Congestion collapse results in preventing the page from ever being completely re-rendered and re-cached, as every attempt to do so times out. Thus, cache stampede reduces the cache hit rate to zero and keeps the system continuously in congestion collapse
229:
This approach requires one more moving part - the external process - that needs to be maintained and monitored. In addition, this solution requires unnatural code separation/duplication and is mostly suited for static cache keys (i.e., not dynamically generated, as in the case of keys indexed by an
168:
may query a database, access other services, or perform some complicated operation (which is why this particular computation is being cached in the first place). When the request rate is high, the database (or any other shared resource) will suffer from an overload of requests/queries, which may in
73:
in the server farm that multiple threads of execution will all attempt to render the content of that page simultaneously. Systematically, none of the concurrent servers know that the others are doing the same rendering at the same time. If sufficiently high load is present, this may by itself be
206:
If implemented properly, locking can prevent stampedes altogether, but requires an extra write for the locking mechanism. Apart from doubling the number of writes, the main drawback is a correct implementation of the locking mechanism which also takes care of edge cases including failure of the
238:
With this approach, each process may recompute the cache value before its expiration by making an independent probabilistic decision, where the probability of performing the early recomputation increases as we get closer to the expiration of the value. Since the probabilistic decision is made
57:
to cache rendered pages for some period of time, to ease system load. Under particularly high load to a single URL, the system remains responsive as long as the resource remains cached, with requests being handled by accessing the cached copy. This minimizes the expensive rendering operation.
82:
To give a concrete example, assume the page in consideration takes 3 seconds to render and we have a traffic of 10 requests per second. Then, when the cached page expires, we have 30 processes simultaneously recomputing the rendering of the page and updating the cache with the rendered page.
355:
This approach is simple to implement and effectively reduces cache stampedes by automatically favoring early recomputations when the traffic rate increases. One drawback is that it takes more memory in cache as we need to bundle the value
61:
Under low load, cache misses result in a single recalculation of the rendering operation. The system will continue as before, with the average load being kept very low because of the high cache hit rate.
242:
The following implementation based on an exponential distribution has been shown to be optimal in terms of its effectiveness in preventing stampedes and how early recomputations can happen.
215:
This solution moves the recomputation of the cache value from the processes needing it to an external process. The recomputation of the external process can be triggered in different ways:
185:
To prevent multiple simultaneous recomputations of the same value, upon a cache miss a process will attempt to acquire the lock for that cache key and recompute it only if it acquires it.
177:
Several approaches have been proposed to mitigate cache stampedes (also known as dogpile prevention). They can be roughly grouped in 3 main categories.
428: 401: 340:
can be set to a value greater than 1 to favor earlier recomputations and further reduce stampedes but the authors show that setting
360:
with the cache item - when the caching system does not support retrieval of the key expiration time, we also need to store the
521: 70: 511: 239:
independently by each process, the effect of the stampede is mitigated as fewer processes will expire at the same time.
516: 526: 495: 352:
represents the time to recompute the value and is used to scale the probability distribution appropriately.
489: 20: 75: 39: 35: 207:
process acquiring the lock, tuning of a time-to-live for the lock, race-conditions, and so on.
468: 424: 418: 397: 391: 157:
takes a long time and the key is accessed frequently, many processes will simultaneously call
31: 445: 460: 79:
as it attempts to regenerate the resource for as long as the load remains very heavy.
505: 91:
Below is a typical cache usage pattern for an item that needs to be updated every
69:
heavy load, when the cached version of that page expires, there may be sufficient
199:
Return a "not-found" and have the client handle the absence of the value properly
42:
mechanisms come under a very high load. This behaviour is sometimes also called
50: 472: 464: 54: 202:
Keep a stale item in the cache to be used while the new value is recomputed
188:
There are different implementation options for the case when the lock is
19:"Dog-piling" redirects here. For the form of online harassment, see 393:
Developing Web Applications with Apache, MySQL, memcached, and Perl
225:
When a process needing the value encounters a cache miss
444:
Vattani, A.; Chierichetti, F.; Lowenstein, K. (2015),
49:
To understand how cache stampedes occur, consider a
496:
Problems and solutions for typical perl cache usage
446:"Optimal Probabilistic Cache Stampede Prevention" 219:When the cache value approaches its expiration 8: 164:In typical web applications, the function 420:Web Operations: Keeping the Data On Time 346:=1 works well in practice. The variable 131:← recompute_value() cache_write( 382: 417:Allspaw, John; Robbins, Jesse (2010), 396:, John Wiley & Sons, p. 353, 308:← time() – start cache_write( 7: 423:, O'Reilly Media, pp. 128–132, 161:upon expiration of the cache value. 196:Wait until the value is recomputed 14: 453:Proceedings of the VLDB Endowment 234:Probabilistic early expiration 169:turn cause a system collapse. 34:that can occur when massively 1: 304:← recompute_value() 390:Galbraith, Patrick (2009), 543: 490:Minimizing cache stampedes 18: 16:Parallel computing failure 173:Cache stampede mitigation 465:10.14778/2757807.2757813 498:, Jonathan Swartz, 2008 492:, Joshua Thijssen, 2010 211:External recomputation 74:enough to bring about 522:Computer optimization 512:Engineering failures 459:(8), VLDB: 886–897, 292:* log(rand(0,1))) ≥ 21:Dogpiling (Internet) 87:Typical cache usage 76:congestion collapse 517:Parallel computing 36:parallel computing 527:Cache (computing) 373:) in the bundle. 300:← time() 166:recompute_value() 159:recompute_value() 155:recompute_value() 32:cascading failure 534: 477: 475: 450: 441: 435: 433: 414: 408: 406: 387: 372: 365: 351: 345: 339: 167: 160: 156: 153:If the function 96: 542: 541: 537: 536: 535: 533: 532: 531: 502: 501: 486: 481: 480: 448: 443: 442: 438: 431: 416: 415: 411: 404: 389: 388: 384: 379: 367: 361: 347: 341: 335: 332: 236: 213: 183: 175: 165: 158: 154: 151: 97:units of time: 92: 89: 65:However, under 24: 17: 12: 11: 5: 540: 538: 530: 529: 524: 519: 514: 504: 503: 500: 499: 493: 485: 484:External links 482: 479: 478: 436: 429: 409: 402: 381: 380: 378: 375: 334:The parameter 244: 235: 232: 227: 226: 223: 220: 212: 209: 204: 203: 200: 197: 182: 179: 174: 171: 99: 88: 85: 28:cache stampede 15: 13: 10: 9: 6: 4: 3: 2: 539: 528: 525: 523: 520: 518: 515: 513: 510: 509: 507: 497: 494: 491: 488: 487: 483: 474: 470: 466: 462: 458: 454: 447: 440: 437: 432: 430:9781449394158 426: 422: 421: 413: 410: 405: 403:9780470538326 399: 395: 394: 386: 383: 376: 374: 371: 364: 359: 353: 350: 344: 338: 330: 327: 323: 319: 315: 311: 307: 303: 299: 295: 291: 287: 284:|| (time() - 283: 279: 275: 272:← cache_read( 271: 267: 263: 259: 255: 251: 247: 243: 240: 233: 231: 224: 221: 218: 217: 216: 210: 208: 201: 198: 195: 194: 193: 191: 186: 180: 178: 172: 170: 162: 149: 146: 142: 138: 134: 130: 126: 122: 118: 115:← cache_read( 114: 110: 106: 102: 98: 95: 86: 84: 80: 77: 72: 68: 63: 59: 56: 52: 47: 45: 41: 38:systems with 37: 33: 30:is a type of 29: 22: 456: 452: 439: 419: 412: 392: 385: 369: 362: 357: 354: 348: 342: 336: 333: 328: 325: 324:) } 321: 317: 313: 309: 305: 301: 297: 296:) { 293: 289: 285: 281: 277: 273: 269: 265: 261: 257: 253: 249: 245: 241: 237: 228: 222:Periodically 214: 205: 189: 187: 184: 176: 163: 152: 147: 144: 143:) } 140: 136: 132: 128: 127:) { 124: 120: 116: 112: 108: 104: 100: 93: 90: 81: 66: 64: 60: 48: 43: 27: 25: 71:concurrency 506:Categories 377:References 366:(that is, 260:=1) { 192:acquired: 53:that uses 51:web server 44:dog-piling 473:2150-8097 368:time() + 55:memcached 248:x-fetch( 246:function 111:) { 101:function 181:Locking 40:caching 471:  427:  400:  363:expiry 326:return 294:expiry 276:) 270:expiry 145:return 119:) 103:fetch( 449:(PDF) 358:delta 349:delta 329:value 318:delta 314:value 306:delta 302:value 298:start 286:delta 282:value 266:delta 262:value 230:id). 148:value 137:value 129:value 125:value 113:value 469:ISSN 425:ISBN 398:ISBN 343:beta 337:beta 290:beta 258:beta 67:very 461:doi 370:ttl 322:ttl 320:), 312:, ( 310:key 274:key 254:ttl 250:key 190:not 141:ttl 133:key 117:key 109:ttl 105:key 94:ttl 508:: 467:, 455:, 451:, 331:} 316:, 288:* 280:(! 278:if 268:, 264:, 256:, 252:, 150:} 139:, 135:, 123:(! 121:if 107:, 46:. 26:A 476:. 463:: 457:8 434:. 407:. 23:.

Index

Dogpiling (Internet)
cascading failure
parallel computing
caching
web server
memcached
concurrency
congestion collapse
Developing Web Applications with Apache, MySQL, memcached, and Perl
ISBN
9780470538326
Web Operations: Keeping the Data On Time
ISBN
9781449394158
"Optimal Probabilistic Cache Stampede Prevention"
doi
10.14778/2757807.2757813
ISSN
2150-8097
Minimizing cache stampedes
Problems and solutions for typical perl cache usage
Categories
Engineering failures
Parallel computing
Computer optimization
Cache (computing)

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.