Knowledge (XXG)

Interrupt storm

Source 📝

186:(Advanced Programmable Interrupt Controller). Most computer peripherals generate interrupts through an APIC as the number of interrupts is most always less (typically 15 for the modern PC) than the number of devices. The OS must then query each driver registered to that interrupt to ask if the interrupt originated from its hardware. Faulty drivers may always claim "yes", causing the OS to not query other drivers registered to that interrupt (only one interrupt can be processed at a time). The device which originally requested the interrupt therefore does not get its interrupt serviced, so a new interrupt is generated (or is not cleared) and the processor becomes swamped with continuous interrupt signals. Any operating system can live lock under an interrupt storm caused by such a fault. A 217:
of interrupts or the resource use caused by an interrupt, passes certain thresholds. When these thresholds are no longer exceeded, an OS may then change the interrupting driver, interrupt, or interrupt handling globally, from an interrupt mode to a polling mode. Interrupt rate limiting in hardware usually negates the use of a polling mode, but can still happen during normal operation during intense I/O if the processor is unable switch contexts quickly enough to keep pace.
22: 277:
FIFO entry is marked as occupied. If at that point entry (tail−1) is free (cleared), an interrupt will be generated (level interrupt) and the tail pointer will be incremented. If the hardware requires the interrupt be acknowledged, the CPU (interrupt handler) will do that, handle the valid DMA descriptors at the head, and return from the interrupt.
216:
mode that queries for pending interrupts at fixed intervals or in a round-robin fashion. This mode can be set globally, on a per-driver, per-interrupt basis, or dynamically if the OS detects a fault condition or excessive interrupt generation. A polling mode may be enabled dynamically when the number
276:
Another interesting approach using hardware support is one where the device generates interrupt when the event queue state changes from "empty" to "not empty". Then, if there are no free DMA descriptors at the RX FIFO tail, the device drops the event. The event is then added to the tail and the
178:
controllers implement interrupt "rate limiting", which causes the controller to wait a programmable amount of time between each interrupt it generates. When not present within the device, similar functionality is usually written into the device driver, and/or the operating system itself.
163:. In such a state, the system is spending most of its resources processing interrupts instead of completing other work. To the end-user, it does not appear to be processing anything at all as there is often no output. An interrupt storm is sometimes mistaken for 208:
compatibility mode could not properly interact with the ISA interrupt routing. This would either cause interrupts to never be detected by the operating system, or the operating system would never be able to clear them, resulting in an interrupt storm.
245:
it receives from the network in between each interrupt. If the rate is set too low, the controller's buffer will overflow, and packets will be dropped. The rate must take into account how fast the buffer may fill between interrupts, and the
170:
Common causes include: misconfigured or faulty hardware, faulty device drivers, flaws in the operating system, or metastability in one or more components. The latter condition rarely occurs outside of prototype or amateur-built hardware.
193:
can usually break the storm by unloading the faulty driver, allowing the driver "underneath" the faulty one to clear the interrupt, if user input is still possible.
273:
then disables the interrupt and lets a thread/task handle the event(s) and then task polls the device, processing some number of events and enabling the interrupt.
130:
that consume the majority of the processor's time. Interrupt storms are typically caused by hardware devices that do not support interrupt rate limiting.
183: 303: 201: 105: 39: 86: 205: 43: 58: 174:
Most modern hardware and operating systems have methods for mitigating the effect of an interrupt storm. For example, most
164: 65: 374: 159: 213: 32: 291: 187: 72: 167:, since they both have similar symptoms (unresponsive or sluggish response to user input, little or no output). 226: 269:
is an example of the hardware-based approach: the system (driver) starts in interrupt enabled state, and the
54: 324: 297: 143: 369: 270: 247: 79: 154: 150: 119: 286: 237:
Interrupt rate limiting must be carefully configured for optimum results. For example, an
212:
As drivers are most often implemented by a 3rd party, most operating systems also have a
242: 363: 262:
detects interrupt storms and masks problematic interrupts for some time in response.
258:
There are hardware-based and software-based approaches to the problem. For example,
182:
The most common cause is when a device "behind" another signals an interrupt to an
153:, an interrupt storm will cause sluggish response to user input, or even appear to 147: 21: 139: 127: 238: 190: 175: 259: 197: 126:
is an event during which a processor receives an inordinate number of
250:
between the interrupt and the transfer of the buffer to the system.
266: 15: 325:"Problems updating FreeBSD's card system from ISA to PCI" 241:
controller with interrupt rate limiting will buffer the
157:
the system completely. This state is commonly known as
225:
Perhaps the first interrupt storm occurred during the
46:. Unsourced material may be challenged and removed. 8: 204:cards that were configured to operate in 106:Learn how and when to remove this message 352:. Simon and Schuster. pp. 345–355. 316: 196:This occurred in an older version of 7: 44:adding citations to reliable sources 14: 304:Programmable Interrupt Controller 20: 31:needs additional citations for 142:processing is typically a non- 1: 350:Apollo: The Race to the Moon 391: 292:Inter-processor interrupt 227:Apollo 11's lunar descent 348:Murray, Charles (1989). 298:Non-maskable interrupt 254:Interrupt mitigating 40:improve this article 265:The system used by 375:Software anomalies 271:Interrupt handler 248:interrupt latency 151:operating systems 120:operating systems 116: 115: 108: 90: 55:"Interrupt storm" 382: 354: 353: 345: 339: 338: 336: 335: 321: 111: 104: 100: 97: 91: 89: 48: 24: 16: 390: 389: 385: 384: 383: 381: 380: 379: 360: 359: 358: 357: 347: 346: 342: 333: 331: 323: 322: 318: 313: 287:Broadcast storm 283: 256: 235: 223: 136: 124:interrupt storm 112: 101: 95: 92: 49: 47: 37: 25: 12: 11: 5: 388: 386: 378: 377: 372: 362: 361: 356: 355: 340: 329:www.usenix.org 315: 314: 312: 309: 308: 307: 301: 295: 289: 282: 279: 255: 252: 234: 233:Considerations 231: 222: 219: 135: 132: 114: 113: 28: 26: 19: 13: 10: 9: 6: 4: 3: 2: 387: 376: 373: 371: 368: 367: 365: 351: 344: 341: 330: 326: 320: 317: 310: 305: 302: 299: 296: 293: 290: 288: 285: 284: 280: 278: 274: 272: 268: 263: 261: 253: 251: 249: 244: 240: 232: 230: 228: 220: 218: 215: 210: 207: 203: 199: 194: 192: 189: 185: 180: 177: 172: 168: 166: 162: 161: 156: 152: 149: 145: 141: 133: 131: 129: 125: 121: 110: 107: 99: 96:February 2013 88: 85: 81: 78: 74: 71: 67: 64: 60: 57: –  56: 52: 51:Find sources: 45: 41: 35: 34: 29:This article 27: 23: 18: 17: 349: 343: 332:. Retrieved 328: 319: 275: 264: 257: 236: 224: 211: 195: 181: 173: 169: 158: 148:time-sharing 137: 123: 117: 102: 93: 83: 76: 69: 62: 50: 38:Please help 33:verification 30: 144:preemptible 370:Interrupts 364:Categories 334:2024-05-07 311:References 134:Background 128:interrupts 66:newspapers 229:in 1969. 165:thrashing 160:live lock 140:interrupt 281:See also 239:Ethernet 200:, where 191:debugger 176:Ethernet 146:task in 138:Because 260:FreeBSD 243:packets 221:History 214:polling 198:FreeBSD 80:scholar 188:kernel 155:freeze 82:  75:  68:  61:  53:  306:(PIC) 300:(NMI) 294:(IPI) 122:, an 87:JSTOR 73:books 267:NAPI 184:APIC 59:news 206:ISA 202:PCI 118:In 42:by 366:: 327:. 337:. 109:) 103:( 98:) 94:( 84:· 77:· 70:· 63:· 36:.

Index


verification
improve this article
adding citations to reliable sources
"Interrupt storm"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
operating systems
interrupts
interrupt
preemptible
time-sharing
operating systems
freeze
live lock
thrashing
Ethernet
APIC
kernel
debugger
FreeBSD
PCI
ISA
polling
Apollo 11's lunar descent
Ethernet

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.