78:
of the system via exhausting shared resources. Congestion collapse results in preventing the page from ever being completely re-rendered and re-cached, as every attempt to do so times out. Thus, cache stampede reduces the cache hit rate to zero and keeps the system continuously in congestion collapse
229:
This approach requires one more moving part - the external process - that needs to be maintained and monitored. In addition, this solution requires unnatural code separation/duplication and is mostly suited for static cache keys (i.e., not dynamically generated, as in the case of keys indexed by an
168:
may query a database, access other services, or perform some complicated operation (which is why this particular computation is being cached in the first place). When the request rate is high, the database (or any other shared resource) will suffer from an overload of requests/queries, which may in
73:
in the server farm that multiple threads of execution will all attempt to render the content of that page simultaneously. Systematically, none of the concurrent servers know that the others are doing the same rendering at the same time. If sufficiently high load is present, this may by itself be
206:
If implemented properly, locking can prevent stampedes altogether, but requires an extra write for the locking mechanism. Apart from doubling the number of writes, the main drawback is a correct implementation of the locking mechanism which also takes care of edge cases including failure of the
238:
With this approach, each process may recompute the cache value before its expiration by making an independent probabilistic decision, where the probability of performing the early recomputation increases as we get closer to the expiration of the value. Since the probabilistic decision is made
57:
to cache rendered pages for some period of time, to ease system load. Under particularly high load to a single URL, the system remains responsive as long as the resource remains cached, with requests being handled by accessing the cached copy. This minimizes the expensive rendering operation.
82:
To give a concrete example, assume the page in consideration takes 3 seconds to render and we have a traffic of 10 requests per second. Then, when the cached page expires, we have 30 processes simultaneously recomputing the rendering of the page and updating the cache with the rendered page.
355:
This approach is simple to implement and effectively reduces cache stampedes by automatically favoring early recomputations when the traffic rate increases. One drawback is that it takes more memory in cache as we need to bundle the value
61:
Under low load, cache misses result in a single recalculation of the rendering operation. The system will continue as before, with the average load being kept very low because of the high cache hit rate.
242:
The following implementation based on an exponential distribution has been shown to be optimal in terms of its effectiveness in preventing stampedes and how early recomputations can happen.
215:
This solution moves the recomputation of the cache value from the processes needing it to an external process. The recomputation of the external process can be triggered in different ways:
185:
To prevent multiple simultaneous recomputations of the same value, upon a cache miss a process will attempt to acquire the lock for that cache key and recompute it only if it acquires it.
177:
Several approaches have been proposed to mitigate cache stampedes (also known as dogpile prevention). They can be roughly grouped in 3 main categories.
428:
401:
340:
can be set to a value greater than 1 to favor earlier recomputations and further reduce stampedes but the authors show that setting
360:
with the cache item - when the caching system does not support retrieval of the key expiration time, we also need to store the
521:
70:
511:
239:
independently by each process, the effect of the stampede is mitigated as fewer processes will expire at the same time.
516:
526:
495:
352:
represents the time to recompute the value and is used to scale the probability distribution appropriately.
489:
20:
75:
39:
35:
207:
process acquiring the lock, tuning of a time-to-live for the lock, race-conditions, and so on.
468:
424:
418:
397:
391:
157:
takes a long time and the key is accessed frequently, many processes will simultaneously call
31:
445:
460:
79:
as it attempts to regenerate the resource for as long as the load remains very heavy.
505:
91:
Below is a typical cache usage pattern for an item that needs to be updated every
69:
heavy load, when the cached version of that page expires, there may be sufficient
199:
Return a "not-found" and have the client handle the absence of the value properly
42:
mechanisms come under a very high load. This behaviour is sometimes also called
50:
472:
464:
54:
202:
Keep a stale item in the cache to be used while the new value is recomputed
188:
There are different implementation options for the case when the lock is
19:"Dog-piling" redirects here. For the form of online harassment, see
393:
Developing Web
Applications with Apache, MySQL, memcached, and Perl
225:
When a process needing the value encounters a cache miss
444:
Vattani, A.; Chierichetti, F.; Lowenstein, K. (2015),
49:
To understand how cache stampedes occur, consider a
496:
Problems and solutions for typical perl cache usage
446:"Optimal Probabilistic Cache Stampede Prevention"
219:When the cache value approaches its expiration
8:
164:In typical web applications, the function
420:Web Operations: Keeping the Data On Time
346:=1 works well in practice. The variable
131:← recompute_value() cache_write(
382:
417:Allspaw, John; Robbins, Jesse (2010),
396:, John Wiley & Sons, p. 353,
308:← time() – start cache_write(
7:
423:, O'Reilly Media, pp. 128–132,
161:upon expiration of the cache value.
196:Wait until the value is recomputed
14:
453:Proceedings of the VLDB Endowment
234:Probabilistic early expiration
169:turn cause a system collapse.
34:that can occur when massively
1:
304:← recompute_value()
390:Galbraith, Patrick (2009),
543:
490:Minimizing cache stampedes
18:
16:Parallel computing failure
173:Cache stampede mitigation
465:10.14778/2757807.2757813
498:, Jonathan Swartz, 2008
492:, Joshua Thijssen, 2010
211:External recomputation
74:enough to bring about
522:Computer optimization
512:Engineering failures
459:(8), VLDB: 886–897,
292:* log(rand(0,1))) ≥
21:Dogpiling (Internet)
87:Typical cache usage
76:congestion collapse
517:Parallel computing
36:parallel computing
527:Cache (computing)
373:) in the bundle.
300:← time()
166:recompute_value()
159:recompute_value()
155:recompute_value()
32:cascading failure
534:
477:
475:
450:
441:
435:
433:
414:
408:
406:
387:
372:
365:
351:
345:
339:
167:
160:
156:
153:If the function
96:
542:
541:
537:
536:
535:
533:
532:
531:
502:
501:
486:
481:
480:
448:
443:
442:
438:
431:
416:
415:
411:
404:
389:
388:
384:
379:
367:
361:
347:
341:
335:
332:
236:
213:
183:
175:
165:
158:
154:
151:
97:units of time:
92:
89:
65:However, under
24:
17:
12:
11:
5:
540:
538:
530:
529:
524:
519:
514:
504:
503:
500:
499:
493:
485:
484:External links
482:
479:
478:
436:
429:
409:
402:
381:
380:
378:
375:
334:The parameter
244:
235:
232:
227:
226:
223:
220:
212:
209:
204:
203:
200:
197:
182:
179:
174:
171:
99:
88:
85:
28:cache stampede
15:
13:
10:
9:
6:
4:
3:
2:
539:
528:
525:
523:
520:
518:
515:
513:
510:
509:
507:
497:
494:
491:
488:
487:
483:
474:
470:
466:
462:
458:
454:
447:
440:
437:
432:
430:9781449394158
426:
422:
421:
413:
410:
405:
403:9780470538326
399:
395:
394:
386:
383:
376:
374:
371:
364:
359:
353:
350:
344:
338:
330:
327:
323:
319:
315:
311:
307:
303:
299:
295:
291:
287:
284:|| (time() -
283:
279:
275:
272:← cache_read(
271:
267:
263:
259:
255:
251:
247:
243:
240:
233:
231:
224:
221:
218:
217:
216:
210:
208:
201:
198:
195:
194:
193:
191:
186:
180:
178:
172:
170:
162:
149:
146:
142:
138:
134:
130:
126:
122:
118:
115:← cache_read(
114:
110:
106:
102:
98:
95:
86:
84:
80:
77:
72:
68:
63:
59:
56:
52:
47:
45:
41:
38:systems with
37:
33:
30:is a type of
29:
22:
456:
452:
439:
419:
412:
392:
385:
369:
362:
357:
354:
348:
342:
336:
333:
328:
325:
324:) }
321:
317:
313:
309:
305:
301:
297:
296:) {
293:
289:
285:
281:
277:
273:
269:
265:
261:
257:
253:
249:
245:
241:
237:
228:
222:Periodically
214:
205:
189:
187:
184:
176:
163:
152:
147:
144:
143:) }
140:
136:
132:
128:
127:) {
124:
120:
116:
112:
108:
104:
100:
93:
90:
81:
66:
64:
60:
48:
43:
27:
25:
71:concurrency
506:Categories
377:References
366:(that is,
260:=1) {
192:acquired:
53:that uses
51:web server
44:dog-piling
473:2150-8097
368:time() +
55:memcached
248:x-fetch(
246:function
111:) {
101:function
181:Locking
40:caching
471:
427:
400:
363:expiry
326:return
294:expiry
276:)
270:expiry
145:return
119:)
103:fetch(
449:(PDF)
358:delta
349:delta
329:value
318:delta
314:value
306:delta
302:value
298:start
286:delta
282:value
266:delta
262:value
230:id).
148:value
137:value
129:value
125:value
113:value
469:ISSN
425:ISBN
398:ISBN
343:beta
337:beta
290:beta
258:beta
67:very
461:doi
370:ttl
322:ttl
320:),
312:, (
310:key
274:key
254:ttl
250:key
190:not
141:ttl
133:key
117:key
109:ttl
105:key
94:ttl
508::
467:,
455:,
451:,
331:}
316:,
288:*
280:(!
278:if
268:,
264:,
256:,
252:,
150:}
139:,
135:,
123:(!
121:if
107:,
46:.
26:A
476:.
463::
457:8
434:.
407:.
23:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.