Knowledge (XXG)

Fencing (computing)

Source đź“ť

1045: 38: 101:. If the malfunctioning node is really down, then it cannot do any damage, so theoretically no action would be required (it could simply be brought back into the cluster with the usual join process). However, because there is a possibility that a malfunctioning node could itself consider the rest of the cluster to be the one that is malfunctioning, a 117:
There are two classes of fencing methods, one which disables a node itself, the other disallows access to resources such as shared disks. In some cases, it is assumed that if a node does not respond after a given time-threshold it may be assumed as non-operational, although there are counterexamples,
65:
As the number of nodes in a cluster increases, so does the likelihood that one of them may fail at some point. The failed node may have control over shared resources that need to be reclaimed and if the node is acting erratically, the rest of the system needs to be protected. Fencing may thus either
173:
Persistent reservation is essentially a match on a key, so the node which has the right key can do I/O, otherwise its I/O fails. Therefore, it is sufficient to change the key on a failure to ensure the right behavior during failure. However, it may not always be possible to change the key on the
74:
A node fence (or I/O fence) is a virtual "fence" that separates nodes which must not have access to a shared resource from that resource. It may separate an active node from its backup. If the backup crosses the fence and, for example, tries to control the same disk array as the primary, a data
169:
When the cluster has only two nodes, the reserve/release method may be used as a two node STONITH whereby upon detecting that node B has 'failed', node A will issue the reserve and obtain all resources (e.g. shared disk) for itself. Node B will be disabled if it tries to do I/O (in case it was
129:
uses a power controller to turn off an inoperable node. The node may then restart itself and join the cluster later. However, there are approaches in which an operator is informed of the need for a manual restart for the node.
177:
STONITH is an easier and simpler method to implement on multiple clusters, while the various approaches to resources fencing require specific implementation approaches for each cluster implementation.
281: 94:
from other active nodes modifying the resources during node failures. Mechanisms to support fencing, such as the reserve/release mechanism of SCSI, have existed since at least 1985.
371: 461: 220: 313: 1075: 442: 709: 732: 621: 727: 704: 306: 125:
method stands for "Shoot The Other Node In The Head", meaning that the suspected node is disabled or powered off. For instance,
699: 514: 806: 669: 1030: 864: 482: 402: 1049: 995: 455: 299: 160: 45: 231: 1070: 974: 769: 654: 616: 466: 356: 990: 969: 914: 801: 791: 764: 626: 286: 944: 570: 509: 422: 859: 1005: 1000: 450: 102: 87: 744: 676: 580: 472: 427: 86:
can no longer be done from it. Fencing is typically done automatically, by cluster infrastructure such as
534: 836: 796: 749: 739: 477: 397: 336: 776: 664: 659: 649: 636: 432: 939: 894: 720: 715: 694: 560: 97:
Fencing is required because it is impossible to distinguish between a real failure and a temporary
91: 964: 813: 786: 611: 575: 565: 366: 346: 341: 322: 524: 1010: 686: 644: 539: 55: 109:. Instead, the system has to assume the worst scenario and always fence in case of problems. 1020: 819: 754: 601: 417: 412: 407: 376: 98: 59: 884: 824: 759: 606: 596: 529: 361: 351: 186: 106: 519: 137:
approach disallows access to resources without powering off the node. This may include:
1015: 831: 488: 381: 28: 260: 1064: 904: 781: 154: 504: 170:
temporarily hung). On node B the I/O failure triggers some code to kill the node.
66:
disable the node, or disallow shared storage access, thus ensuring data integrity.
1025: 899: 874: 17: 62:
or protecting shared resources when a node appears to be malfunctioning.
949: 929: 854: 191: 37: 954: 934: 909: 544: 122: 76: 924: 919: 291: 147: 36: 282:
Red Hat GFS 6.0: Administrator's Guide - Using the Fencing System
959: 889: 879: 250:
by Enrique Vargas, Joseph Bianco, David Deeths 2001 ISBN page 58
295: 869: 846: 83: 42: 150:
persistent reservations to block access to shared storage.
163:(GNBD) fencing which disables access to the GNBD server 983: 845: 685: 635: 589: 553: 497: 441: 390: 329: 221:"Alan Robertson Resource fencing using STONITH" 34:Isolation of malfunctioning computer resources 307: 8: 314: 300: 292: 248:Sun Cluster environment: Sun Cluster 2.2 79:are designed to prevent this condition. 203: 215: 213: 211: 209: 207: 82:Isolating a node means ensuring that 75:hazard may occur. Mechanisms such as 7: 261:"Small Computer Standards Interface" 153:Fibre Channel fencing disables the 25: 1044: 1043: 1076:Fault-tolerant computer systems 515:Analysis of parallel algorithms 287:OCFS2 FAQ - Quorum and fencing 144:Persistent reservation fencing 54:is the process of isolating a 1: 462:Simultaneous and heterogenous 1050:Category: Parallel computing 118:e.g. a long paging rampage. 161:Global network block device 1092: 357:High-performance computing 26: 1039: 991:Automatic parallelization 627:Application checkpointing 228:IBM Linux Research Center 88:shared disk file systems 27:Not to be confused with 1006:Embarrassingly parallel 1001:Deterministic algorithm 105:could ensue, and cause 721:Associative processing 677:Non-blocking algorithm 483:Clustered multi-thread 90:, in order to protect 48: 837:Hardware acceleration 750:Superscalar processor 740:Dataflow architecture 337:Distributed computing 113:Approaches to fencing 103:split brain condition 40: 716:Pipelined processing 665:Explicit parallelism 660:Implicit parallelism 650:Dataflow programming 940:Parallel Extensions 745:Pipelined processor 814:Massively parallel 792:distributed shared 612:Cache invalidation 576:Instruction window 367:Manycore processor 347:Massively parallel 342:Parallel computing 323:Parallel computing 49: 1071:Cluster computing 1058: 1057: 1011:Parallel slowdown 645:Stream processing 535:Karp–Flatt metric 135:resources fencing 16:(Redirected from 1083: 1047: 1046: 1021:Software lockout 820:Computer cluster 755:Vector processor 710:Array processing 695:Flynn's taxonomy 602:Memory coherence 377:Computer network 316: 309: 302: 293: 269: 268: 265:ANSI X3.131-1986 257: 251: 245: 239: 238: 236: 230:. Archived from 225: 217: 60:computer cluster 21: 1091: 1090: 1086: 1085: 1084: 1082: 1081: 1080: 1061: 1060: 1059: 1054: 1035: 979: 885:Coarray Fortran 841: 825:Beowulf cluster 681: 631: 622:Synchronization 607:Cache coherence 597:Multiprocessing 585: 549: 530:Cost efficiency 525:Gustafson's law 493: 437: 386: 362:Multiprocessing 352:Cloud computing 325: 320: 278: 273: 272: 259: 258: 254: 246: 242: 234: 223: 219: 218: 205: 200: 187:Fault tolerance 183: 115: 107:data corruption 72: 46:Nehalem cluster 35: 32: 23: 22: 15: 12: 11: 5: 1089: 1087: 1079: 1078: 1073: 1063: 1062: 1056: 1055: 1053: 1052: 1040: 1037: 1036: 1034: 1033: 1028: 1023: 1018: 1016:Race condition 1013: 1008: 1003: 998: 993: 987: 985: 981: 980: 978: 977: 972: 967: 962: 957: 952: 947: 942: 937: 932: 927: 922: 917: 912: 907: 902: 897: 892: 887: 882: 877: 872: 867: 862: 857: 851: 849: 843: 842: 840: 839: 834: 829: 828: 827: 817: 811: 810: 809: 804: 799: 794: 789: 784: 774: 773: 772: 767: 760:Multiprocessor 757: 752: 747: 742: 737: 736: 735: 730: 725: 724: 723: 718: 713: 702: 691: 689: 683: 682: 680: 679: 674: 673: 672: 667: 662: 652: 647: 641: 639: 633: 632: 630: 629: 624: 619: 614: 609: 604: 599: 593: 591: 587: 586: 584: 583: 578: 573: 568: 563: 557: 555: 551: 550: 548: 547: 542: 537: 532: 527: 522: 517: 512: 507: 501: 499: 495: 494: 492: 491: 489:Hardware scout 486: 480: 475: 470: 464: 459: 453: 447: 445: 443:Multithreading 439: 438: 436: 435: 430: 425: 420: 415: 410: 405: 400: 394: 392: 388: 387: 385: 384: 382:Systolic array 379: 374: 369: 364: 359: 354: 349: 344: 339: 333: 331: 327: 326: 321: 319: 318: 311: 304: 296: 290: 289: 284: 277: 276:External links 274: 271: 270: 252: 240: 237:on 2021-01-05. 202: 201: 199: 196: 195: 194: 189: 182: 179: 167: 166: 165: 164: 158: 151: 114: 111: 71: 70:Basic concepts 68: 33: 29:Memory barrier 24: 14: 13: 10: 9: 6: 4: 3: 2: 1088: 1077: 1074: 1072: 1069: 1068: 1066: 1051: 1042: 1041: 1038: 1032: 1029: 1027: 1024: 1022: 1019: 1017: 1014: 1012: 1009: 1007: 1004: 1002: 999: 997: 994: 992: 989: 988: 986: 982: 976: 973: 971: 968: 966: 963: 961: 958: 956: 953: 951: 948: 946: 943: 941: 938: 936: 933: 931: 928: 926: 923: 921: 918: 916: 913: 911: 908: 906: 905:Global Arrays 903: 901: 898: 896: 893: 891: 888: 886: 883: 881: 878: 876: 873: 871: 868: 866: 863: 861: 858: 856: 853: 852: 850: 848: 844: 838: 835: 833: 832:Grid computer 830: 826: 823: 822: 821: 818: 815: 812: 808: 805: 803: 800: 798: 795: 793: 790: 788: 785: 783: 780: 779: 778: 775: 771: 768: 766: 763: 762: 761: 758: 756: 753: 751: 748: 746: 743: 741: 738: 734: 731: 729: 726: 722: 719: 717: 714: 711: 708: 707: 706: 703: 701: 698: 697: 696: 693: 692: 690: 688: 684: 678: 675: 671: 668: 666: 663: 661: 658: 657: 656: 653: 651: 648: 646: 643: 642: 640: 638: 634: 628: 625: 623: 620: 618: 615: 613: 610: 608: 605: 603: 600: 598: 595: 594: 592: 588: 582: 579: 577: 574: 572: 569: 567: 564: 562: 559: 558: 556: 552: 546: 543: 541: 538: 536: 533: 531: 528: 526: 523: 521: 518: 516: 513: 511: 508: 506: 503: 502: 500: 496: 490: 487: 484: 481: 479: 476: 474: 471: 468: 465: 463: 460: 457: 454: 452: 449: 448: 446: 444: 440: 434: 431: 429: 426: 424: 421: 419: 416: 414: 411: 409: 406: 404: 401: 399: 396: 395: 393: 389: 383: 380: 378: 375: 373: 370: 368: 365: 363: 360: 358: 355: 353: 350: 348: 345: 343: 340: 338: 335: 334: 332: 328: 324: 317: 312: 310: 305: 303: 298: 297: 294: 288: 285: 283: 280: 279: 275: 266: 262: 256: 253: 249: 244: 241: 233: 229: 222: 216: 214: 212: 210: 208: 204: 197: 193: 190: 188: 185: 184: 180: 178: 175: 174:failed node. 171: 162: 159: 156: 155:fibre channel 152: 149: 145: 142: 141: 140: 139: 138: 136: 131: 128: 127:power fencing 124: 119: 112: 110: 108: 104: 100: 95: 93: 89: 85: 80: 78: 69: 67: 63: 61: 57: 53: 47: 44: 39: 30: 19: 590:Coordination 520:Amdahl's law 456:Simultaneous 264: 255: 247: 243: 232:the original 227: 176: 172: 168: 143: 134: 132: 126: 120: 116: 96: 81: 73: 64: 51: 50: 18:Node fencing 1026:Scalability 787:distributed 670:Concurrency 637:Programming 478:Cooperative 467:Speculative 403:Instruction 1065:Categories 1031:Starvation 770:asymmetric 505:PRAM model 473:Preemptive 198:References 765:symmetric 510:PEM model 146:uses the 92:processes 996:Deadlock 984:Problems 950:pthreads 930:OpenHMPP 855:Ateji PX 816:computer 687:Hardware 554:Elements 540:Slowdown 451:Temporal 433:Pipeline 192:Failover 181:See also 955:RaftLib 935:OpenACC 910:GPUOpen 900:C++ AMP 875:Charm++ 617:Barrier 561:Process 545:Speedup 330:General 123:STONITH 77:STONITH 52:Fencing 1048:  925:OpenCL 920:OpenMP 865:Chapel 782:shared 777:Memory 712:(SIMT) 655:Models 566:Thread 498:Theory 469:(SpMT) 423:Memory 408:Thread 391:Levels 895:Dryad 860:Boost 581:Array 571:Fiber 485:(CMT) 458:(SMT) 372:GPGPU 235:(PDF) 224:(PDF) 148:SCSI3 58:of a 960:ROCm 890:CUDA 880:Cilk 847:APIs 807:COMA 802:NUMA 733:MIMD 728:MISD 705:SIMD 700:SISD 428:Loop 418:Data 413:Task 157:port 133:The 121:The 99:hang 56:node 975:ZPL 970:TBB 965:UPC 945:PVM 915:MPI 870:HPX 797:UMA 398:Bit 84:I/O 43:NEC 41:An 1067:: 263:. 226:. 206:^ 315:e 308:t 301:v 267:. 31:. 20:)

Index

Node fencing
Memory barrier

NEC
Nehalem cluster
node
computer cluster
STONITH
I/O
shared disk file systems
processes
hang
split brain condition
data corruption
STONITH
SCSI3
fibre channel
Global network block device
Fault tolerance
Failover





"Alan Robertson Resource fencing using STONITH"
the original
"Small Computer Standards Interface"
Red Hat GFS 6.0: Administrator's Guide - Using the Fencing System
OCFS2 FAQ - Quorum and fencing

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑