Knowledge

Ward's method

Source đź“ť

1437: 1704: 75:
Ward's minimum variance criterion minimizes the total within-cluster variance. To implement this method, at each step find the pair of clusters that leads to minimum increase in total within-cluster variance after merging. This increase is a weighted squared distance between cluster centers. At the
226:
Ward's minimum variance method can be defined and implemented recursively by a Lance–Williams algorithm. The Lance–Williams algorithms are an infinite family of agglomerative hierarchical clustering algorithms which are represented by a recursive formula for updating cluster distances at each step
43:
procedure, where the criterion for choosing the pair of clusters to merge at each step is based on the optimal value of an objective function. This objective function could be "any function that reflects the investigator's purpose." Many of the standard clustering procedures are contained in this
1083: 1448: 822: 227:(each time a pair of clusters is merged). At each step, it is necessary to optimize the objective function (find the optimal pair of clusters to merge). The recursive formula simplifies finding the optimal pair. 213: 1432:{\displaystyle d(C_{i}\cup C_{j},C_{k})={\frac {n_{i}+n_{k}}{n_{i}+n_{j}+n_{k}}}\;d(C_{i},C_{k})+{\frac {n_{j}+n_{k}}{n_{i}+n_{j}+n_{k}}}\;d(C_{j},C_{k})-{\frac {n_{k}}{n_{i}+n_{j}+n_{k}}}\;d(C_{i},C_{j}).} 1699:{\displaystyle \alpha _{i}={\frac {n_{i}+n_{k}}{n_{i}+n_{j}+n_{k}}},\qquad \alpha _{j}={\frac {n_{j}+n_{k}}{n_{i}+n_{j}+n_{k}}},\qquad \beta ={\frac {-n_{k}}{n_{i}+n_{j}+n_{k}}},\qquad \gamma =0.} 284:
were next to be merged. At this point all of the current pairwise cluster distances are known. The recursive formula gives the updated cluster distances following the pending merge of clusters
874: 591: 218:
Note: In software that implements Ward's method, it is important to check whether the function arguments should specify Euclidean distances or squared Euclidean distances.
1048: 978: 669: 661: 551: 894: 1718:
introduces the use of cluster specific feature weights, following the intuitive idea that features could have different degrees of relevance at different clusters.
924: 429: 399: 369: 1736: 1075: 1005: 618: 510: 483: 456: 336: 309: 282: 255: 1860: 1751: 934:, and group average method have a recursive formula of the above type. A table of parameters for standard methods is given by several authors. 40: 1887: 91:
The initial cluster distances in Ward's minimum variance method are therefore defined to be the squared Euclidean distance between points:
97: 60: 1848: 931: 830: 45: 927: 63:
can be used to find the same clustering defined by Ward's method, in time proportional to the size of the input
28: 44:
very general class. To illustrate the procedure, Ward used the example where the objective function is the
937:
Ward's minimum variance method can be implemented by the Lance–Williams formula. For disjoint clusters
556: 896:
are parameters, which may depend on cluster sizes, that together with the cluster distance function
77: 1010: 940: 817:{\displaystyle d_{(ij)k}=\alpha _{i}d_{ik}+\alpha _{j}d_{jk}+\beta d_{ij}+\gamma |d_{ik}-d_{jk}|,} 1818: 85: 81: 36: 627: 517: 1844: 879: 1810: 899: 404: 374: 344: 1053: 983: 596: 488: 461: 434: 314: 287: 260: 233: 76:
initial step, all clusters are singletons (clusters containing a single point). To apply a
64: 1881: 1822: 1795: 1734:
Ward, J. H., Jr. (1963), "Hierarchical Grouping to Optimize an Objective Function",
84:, the initial distance between individual objects must be (proportional to) squared 1714:
The popularity of the Ward's method has led to variations of it. For instance, Ward
926:
determine the clustering algorithm. Several standard clustering algorithms such as
624:
An algorithm belongs to the Lance-Williams family if the updated cluster distance
1814: 20: 39:
approach originally presented by Joe H. Ward, Jr. Ward suggested a general
1780:
Milligan, G. W. (1979), "Ultrametric Hierarchical Clustering Algorithms",
1442:
Hence Ward's method can be implemented as a Lance–Williams algorithm with
1796:"Feature Relevance in Ward's Hierarchical Clustering Using the Lp Norm" 208:{\displaystyle d_{ij}=d(\{X_{i}\},\{X_{j}\})={\|X_{i}-X_{j}\|^{2}}.} 1872:
Finding Groups in Data: An Introduction to Cluster Analysis
1843:, Oxford University Press, Inc., New York; Arnold, London. 67:
and space linear in the number of points being clustered.
1749:
Cormack, R. M. (1971), "A Review of Classification",
1451: 1086: 1056: 1013: 986: 943: 902: 882: 833: 672: 630: 599: 559: 520: 491: 464: 437: 407: 377: 347: 317: 290: 263: 236: 100: 1698: 1431: 1069: 1042: 999: 972: 918: 888: 868: 816: 655: 612: 585: 545: 504: 477: 450: 423: 393: 363: 330: 303: 276: 249: 207: 16:Criterion applied in hierarchical cluster analysis 1839:Everitt, B. S., Landau, S. and Leese, M. (2001), 1737:Journal of the American Statistical Association 869:{\displaystyle \alpha _{i},\alpha _{j},\beta ,} 8: 192: 165: 155: 142: 136: 123: 431:be the pairwise distances between clusters 1393: 1305: 1202: 1870:Kaufman, L. and Rousseeuw, P. J. (1990), 1674: 1661: 1648: 1636: 1626: 1607: 1594: 1581: 1569: 1556: 1549: 1540: 1523: 1510: 1497: 1485: 1472: 1465: 1456: 1450: 1417: 1404: 1384: 1371: 1358: 1347: 1341: 1329: 1316: 1296: 1283: 1270: 1258: 1245: 1238: 1226: 1213: 1193: 1180: 1167: 1155: 1142: 1135: 1123: 1110: 1097: 1085: 1061: 1055: 1031: 1018: 1012: 991: 985: 961: 948: 942: 907: 901: 881: 851: 838: 832: 806: 797: 781: 772: 757: 738: 728: 712: 702: 677: 671: 635: 629: 604: 598: 577: 564: 558: 525: 519: 496: 490: 469: 463: 442: 436: 412: 406: 382: 376: 352: 346: 322: 316: 295: 289: 268: 262: 241: 235: 195: 185: 172: 164: 149: 130: 105: 99: 1752:Journal of the Royal Statistical Society 553:be the distance between the new cluster 1727: 41:agglomerative hierarchical clustering 7: 14: 61:nearest-neighbor chain algorithm 1771:, Chapman and Hall, Boca Raton. 1686: 1619: 1535: 663:can be computed recursively by 586:{\displaystyle C_{i}\cup C_{j}} 48:, and this example is known as 1865:Algorithms for Clustering Data 1423: 1397: 1335: 1309: 1232: 1206: 1129: 1090: 807: 773: 687: 678: 645: 636: 535: 526: 158: 120: 71:The minimum variance criterion 54:Ward's minimum variance method 33:Ward's minimum variance method 1: 1841:Cluster Analysis, 4th Edition 29:hierarchical cluster analysis 1867:, New Jersey: Prentice–Hall. 1043:{\displaystyle n_{i},n_{j},} 973:{\displaystyle C_{i},C_{j},} 1888:Cluster analysis algorithms 1769:Classification, 2nd Edition 1904: 27:is a criterion applied in 1863:and Dubes, R. C. (1988), 1815:10.1007/s00357-015-9167-1 1803:Journal of Classification 656:{\displaystyle d_{(ij)k}} 546:{\displaystyle d_{(ij)k}} 222:Lance–Williams algorithms 35:is a special case of the 1853:Hartigan, J. A. (1975), 1794:R.C. de Amorim (2015). 889:{\displaystyle \gamma } 1767:Gordon, A. D. (1999), 1700: 1433: 1071: 1044: 1001: 974: 920: 919:{\displaystyle d_{ij}} 890: 870: 818: 657: 614: 587: 547: 506: 479: 452: 425: 424:{\displaystyle d_{jk}} 395: 394:{\displaystyle d_{ik}} 365: 364:{\displaystyle d_{ij}} 332: 305: 278: 251: 230:Suppose that clusters 209: 1855:Clustering Algorithms 1701: 1434: 1072: 1070:{\displaystyle n_{k}} 1045: 1002: 1000:{\displaystyle C_{k}} 975: 921: 891: 871: 819: 658: 615: 613:{\displaystyle C_{k}} 588: 548: 507: 505:{\displaystyle C_{k}} 480: 478:{\displaystyle C_{j}} 453: 451:{\displaystyle C_{i}} 426: 396: 366: 333: 331:{\displaystyle C_{j}} 306: 304:{\displaystyle C_{i}} 279: 277:{\displaystyle C_{j}} 252: 250:{\displaystyle C_{i}} 210: 1449: 1084: 1054: 1011: 984: 941: 900: 880: 831: 670: 628: 597: 557: 518: 489: 462: 435: 405: 375: 345: 315: 288: 261: 234: 98: 46:error sum of squares 78:recursive algorithm 1874:, New York: Wiley. 1857:, New York: Wiley. 1757:, 134(3), 321-367. 1696: 1429: 1067: 1040: 997: 970: 916: 886: 866: 814: 653: 610: 583: 543: 502: 475: 448: 421: 391: 361: 328: 301: 274: 247: 205: 86:Euclidean distance 82:objective function 52:or more precisely 37:objective function 1784:, 44(3), 343–346. 1681: 1614: 1530: 1391: 1303: 1200: 1895: 1827: 1826: 1800: 1791: 1785: 1778: 1772: 1765: 1759: 1747: 1741: 1732: 1705: 1703: 1702: 1697: 1682: 1680: 1679: 1678: 1666: 1665: 1653: 1652: 1642: 1641: 1640: 1627: 1615: 1613: 1612: 1611: 1599: 1598: 1586: 1585: 1575: 1574: 1573: 1561: 1560: 1550: 1545: 1544: 1531: 1529: 1528: 1527: 1515: 1514: 1502: 1501: 1491: 1490: 1489: 1477: 1476: 1466: 1461: 1460: 1438: 1436: 1435: 1430: 1422: 1421: 1409: 1408: 1392: 1390: 1389: 1388: 1376: 1375: 1363: 1362: 1352: 1351: 1342: 1334: 1333: 1321: 1320: 1304: 1302: 1301: 1300: 1288: 1287: 1275: 1274: 1264: 1263: 1262: 1250: 1249: 1239: 1231: 1230: 1218: 1217: 1201: 1199: 1198: 1197: 1185: 1184: 1172: 1171: 1161: 1160: 1159: 1147: 1146: 1136: 1128: 1127: 1115: 1114: 1102: 1101: 1076: 1074: 1073: 1068: 1066: 1065: 1049: 1047: 1046: 1041: 1036: 1035: 1023: 1022: 1006: 1004: 1003: 998: 996: 995: 979: 977: 976: 971: 966: 965: 953: 952: 932:complete linkage 925: 923: 922: 917: 915: 914: 895: 893: 892: 887: 875: 873: 872: 867: 856: 855: 843: 842: 823: 821: 820: 815: 810: 805: 804: 789: 788: 776: 765: 764: 746: 745: 733: 732: 720: 719: 707: 706: 694: 693: 662: 660: 659: 654: 652: 651: 619: 617: 616: 611: 609: 608: 592: 590: 589: 584: 582: 581: 569: 568: 552: 550: 549: 544: 542: 541: 511: 509: 508: 503: 501: 500: 484: 482: 481: 476: 474: 473: 457: 455: 454: 449: 447: 446: 430: 428: 427: 422: 420: 419: 400: 398: 397: 392: 390: 389: 370: 368: 367: 362: 360: 359: 337: 335: 334: 329: 327: 326: 310: 308: 307: 302: 300: 299: 283: 281: 280: 275: 273: 272: 256: 254: 253: 248: 246: 245: 214: 212: 211: 206: 201: 200: 199: 190: 189: 177: 176: 154: 153: 135: 134: 113: 112: 1903: 1902: 1898: 1897: 1896: 1894: 1893: 1892: 1878: 1877: 1836: 1834:Further reading 1831: 1830: 1798: 1793: 1792: 1788: 1779: 1775: 1766: 1762: 1748: 1744: 1733: 1729: 1724: 1717: 1712: 1670: 1657: 1644: 1643: 1632: 1628: 1603: 1590: 1577: 1576: 1565: 1552: 1551: 1536: 1519: 1506: 1493: 1492: 1481: 1468: 1467: 1452: 1447: 1446: 1413: 1400: 1380: 1367: 1354: 1353: 1343: 1325: 1312: 1292: 1279: 1266: 1265: 1254: 1241: 1240: 1222: 1209: 1189: 1176: 1163: 1162: 1151: 1138: 1137: 1119: 1106: 1093: 1082: 1081: 1057: 1052: 1051: 1027: 1014: 1009: 1008: 987: 982: 981: 957: 944: 939: 938: 903: 898: 897: 878: 877: 847: 834: 829: 828: 793: 777: 753: 734: 724: 708: 698: 673: 668: 667: 631: 626: 625: 600: 595: 594: 573: 560: 555: 554: 521: 516: 515: 512:, respectively, 492: 487: 486: 465: 460: 459: 438: 433: 432: 408: 403: 402: 378: 373: 372: 348: 343: 342: 318: 313: 312: 291: 286: 285: 264: 259: 258: 237: 232: 231: 224: 191: 181: 168: 145: 126: 101: 96: 95: 73: 65:distance matrix 17: 12: 11: 5: 1901: 1899: 1891: 1890: 1880: 1879: 1876: 1875: 1868: 1858: 1851: 1835: 1832: 1829: 1828: 1786: 1773: 1760: 1742: 1740:, 58, 236–244. 1726: 1725: 1723: 1720: 1715: 1711: 1708: 1707: 1706: 1695: 1692: 1689: 1685: 1677: 1673: 1669: 1664: 1660: 1656: 1651: 1647: 1639: 1635: 1631: 1625: 1622: 1618: 1610: 1606: 1602: 1597: 1593: 1589: 1584: 1580: 1572: 1568: 1564: 1559: 1555: 1548: 1543: 1539: 1534: 1526: 1522: 1518: 1513: 1509: 1505: 1500: 1496: 1488: 1484: 1480: 1475: 1471: 1464: 1459: 1455: 1440: 1439: 1428: 1425: 1420: 1416: 1412: 1407: 1403: 1399: 1396: 1387: 1383: 1379: 1374: 1370: 1366: 1361: 1357: 1350: 1346: 1340: 1337: 1332: 1328: 1324: 1319: 1315: 1311: 1308: 1299: 1295: 1291: 1286: 1282: 1278: 1273: 1269: 1261: 1257: 1253: 1248: 1244: 1237: 1234: 1229: 1225: 1221: 1216: 1212: 1208: 1205: 1196: 1192: 1188: 1183: 1179: 1175: 1170: 1166: 1158: 1154: 1150: 1145: 1141: 1134: 1131: 1126: 1122: 1118: 1113: 1109: 1105: 1100: 1096: 1092: 1089: 1077:respectively: 1064: 1060: 1039: 1034: 1030: 1026: 1021: 1017: 994: 990: 969: 964: 960: 956: 951: 947: 928:single linkage 913: 910: 906: 885: 865: 862: 859: 854: 850: 846: 841: 837: 825: 824: 813: 809: 803: 800: 796: 792: 787: 784: 780: 775: 771: 768: 763: 760: 756: 752: 749: 744: 741: 737: 731: 727: 723: 718: 715: 711: 705: 701: 697: 692: 689: 686: 683: 680: 676: 650: 647: 644: 641: 638: 634: 622: 621: 607: 603: 580: 576: 572: 567: 563: 540: 537: 534: 531: 528: 524: 513: 499: 495: 472: 468: 445: 441: 418: 415: 411: 388: 385: 381: 358: 355: 351: 325: 321: 298: 294: 271: 267: 244: 240: 223: 220: 216: 215: 204: 198: 194: 188: 184: 180: 175: 171: 167: 163: 160: 157: 152: 148: 144: 141: 138: 133: 129: 125: 122: 119: 116: 111: 108: 104: 72: 69: 15: 13: 10: 9: 6: 4: 3: 2: 1900: 1889: 1886: 1885: 1883: 1873: 1869: 1866: 1862: 1859: 1856: 1852: 1850: 1846: 1842: 1838: 1837: 1833: 1824: 1820: 1816: 1812: 1808: 1804: 1797: 1790: 1787: 1783: 1782:Psychometrika 1777: 1774: 1770: 1764: 1761: 1758: 1754: 1753: 1746: 1743: 1739: 1738: 1731: 1728: 1721: 1719: 1709: 1693: 1690: 1687: 1683: 1675: 1671: 1667: 1662: 1658: 1654: 1649: 1645: 1637: 1633: 1629: 1623: 1620: 1616: 1608: 1604: 1600: 1595: 1591: 1587: 1582: 1578: 1570: 1566: 1562: 1557: 1553: 1546: 1541: 1537: 1532: 1524: 1520: 1516: 1511: 1507: 1503: 1498: 1494: 1486: 1482: 1478: 1473: 1469: 1462: 1457: 1453: 1445: 1444: 1443: 1426: 1418: 1414: 1410: 1405: 1401: 1394: 1385: 1381: 1377: 1372: 1368: 1364: 1359: 1355: 1348: 1344: 1338: 1330: 1326: 1322: 1317: 1313: 1306: 1297: 1293: 1289: 1284: 1280: 1276: 1271: 1267: 1259: 1255: 1251: 1246: 1242: 1235: 1227: 1223: 1219: 1214: 1210: 1203: 1194: 1190: 1186: 1181: 1177: 1173: 1168: 1164: 1156: 1152: 1148: 1143: 1139: 1132: 1124: 1120: 1116: 1111: 1107: 1103: 1098: 1094: 1087: 1080: 1079: 1078: 1062: 1058: 1037: 1032: 1028: 1024: 1019: 1015: 992: 988: 967: 962: 958: 954: 949: 945: 935: 933: 929: 911: 908: 904: 883: 863: 860: 857: 852: 848: 844: 839: 835: 811: 801: 798: 794: 790: 785: 782: 778: 769: 766: 761: 758: 754: 750: 747: 742: 739: 735: 729: 725: 721: 716: 713: 709: 703: 699: 695: 690: 684: 681: 674: 666: 665: 664: 648: 642: 639: 632: 605: 601: 578: 574: 570: 565: 561: 538: 532: 529: 522: 514: 497: 493: 470: 466: 443: 439: 416: 413: 409: 386: 383: 379: 356: 353: 349: 341: 340: 339: 323: 319: 296: 292: 269: 265: 242: 238: 228: 221: 219: 202: 196: 186: 182: 178: 173: 169: 161: 150: 146: 139: 131: 127: 117: 114: 109: 106: 102: 94: 93: 92: 89: 87: 83: 79: 70: 68: 66: 62: 57: 55: 51: 50:Ward's method 47: 42: 38: 34: 30: 26: 25:Ward's method 22: 1871: 1864: 1854: 1840: 1809:(1): 46–62. 1806: 1802: 1789: 1781: 1776: 1768: 1763: 1756: 1750: 1745: 1735: 1730: 1713: 1441: 936: 826: 623: 229: 225: 217: 90: 74: 58: 53: 49: 32: 24: 18: 1861:Jain, A. K. 1007:with sizes 80:under this 1849:0340761199 1755:, Series A 1722:References 1710:Variations 21:statistics 1688:γ 1630:− 1621:β 1538:α 1454:α 1339:− 1104:∪ 884:γ 861:β 849:α 836:α 791:− 770:γ 751:β 726:α 700:α 571:∪ 193:‖ 179:− 166:‖ 1882:Category 1823:18099326 1847:  1821:  827:where 485:, and 401:, and 338:. Let 1819:S2CID 1799:(PDF) 1845:ISBN 1050:and 980:and 876:and 593:and 311:and 257:and 59:The 1811:doi 88:. 19:In 1884:: 1817:. 1807:32 1805:. 1801:. 1694:0. 930:, 458:, 371:, 56:. 31:. 23:, 1825:. 1813:: 1716:p 1691:= 1684:, 1676:k 1672:n 1668:+ 1663:j 1659:n 1655:+ 1650:i 1646:n 1638:k 1634:n 1624:= 1617:, 1609:k 1605:n 1601:+ 1596:j 1592:n 1588:+ 1583:i 1579:n 1571:k 1567:n 1563:+ 1558:j 1554:n 1547:= 1542:j 1533:, 1525:k 1521:n 1517:+ 1512:j 1508:n 1504:+ 1499:i 1495:n 1487:k 1483:n 1479:+ 1474:i 1470:n 1463:= 1458:i 1427:. 1424:) 1419:j 1415:C 1411:, 1406:i 1402:C 1398:( 1395:d 1386:k 1382:n 1378:+ 1373:j 1369:n 1365:+ 1360:i 1356:n 1349:k 1345:n 1336:) 1331:k 1327:C 1323:, 1318:j 1314:C 1310:( 1307:d 1298:k 1294:n 1290:+ 1285:j 1281:n 1277:+ 1272:i 1268:n 1260:k 1256:n 1252:+ 1247:j 1243:n 1236:+ 1233:) 1228:k 1224:C 1220:, 1215:i 1211:C 1207:( 1204:d 1195:k 1191:n 1187:+ 1182:j 1178:n 1174:+ 1169:i 1165:n 1157:k 1153:n 1149:+ 1144:i 1140:n 1133:= 1130:) 1125:k 1121:C 1117:, 1112:j 1108:C 1099:i 1095:C 1091:( 1088:d 1063:k 1059:n 1038:, 1033:j 1029:n 1025:, 1020:i 1016:n 993:k 989:C 968:, 963:j 959:C 955:, 950:i 946:C 912:j 909:i 905:d 864:, 858:, 853:j 845:, 840:i 812:, 808:| 802:k 799:j 795:d 786:k 783:i 779:d 774:| 767:+ 762:j 759:i 755:d 748:+ 743:k 740:j 736:d 730:j 722:+ 717:k 714:i 710:d 704:i 696:= 691:k 688:) 685:j 682:i 679:( 675:d 649:k 646:) 643:j 640:i 637:( 633:d 620:. 606:k 602:C 579:j 575:C 566:i 562:C 539:k 536:) 533:j 530:i 527:( 523:d 498:k 494:C 471:j 467:C 444:i 440:C 417:k 414:j 410:d 387:k 384:i 380:d 357:j 354:i 350:d 324:j 320:C 297:i 293:C 270:j 266:C 243:i 239:C 203:. 197:2 187:j 183:X 174:i 170:X 162:= 159:) 156:} 151:j 147:X 143:{ 140:, 137:} 132:i 128:X 124:{ 121:( 118:d 115:= 110:j 107:i 103:d

Index

statistics
hierarchical cluster analysis
objective function
agglomerative hierarchical clustering
error sum of squares
nearest-neighbor chain algorithm
distance matrix
recursive algorithm
objective function
Euclidean distance
single linkage
complete linkage
Journal of the American Statistical Association
Journal of the Royal Statistical Society
"Feature Relevance in Ward's Hierarchical Clustering Using the Lp Norm"
doi
10.1007/s00357-015-9167-1
S2CID
18099326
ISBN
0340761199
Jain, A. K.
Category
Cluster analysis algorithms

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑