Knowledge (XXG)

Regularization perspectives on support vector machines

Source πŸ“

1059: 1921: 2121: 508: 111:
of Tikhonov regularization, regularization perspectives on SVM provided the theory necessary to fit SVM within a broader class of algorithms. This has enabled detailed comparisons between SVM and other forms of Tikhonov regularization, and theoretical grounding for why it is beneficial to use SVM's
800: 1576: 55:
not to be excessively complicated or overfit the training data via a L2 norm of the weights term. The training and test-set errors can be measured without bias and in a fair way using accuracy, precision, Auc-Roc, precision-recall, and other metrics.
1766: 1969: 354: 2370:
A hypothesis space is the set of functions used to model the data in a machine-learning problem. Each function corresponds to a hypothesis about the structure of the data. Typically the functions in a hypothesis space form a
2415:
Computational Learning Theory, 14th Annual Conference on Computational Learning Theory, COLT 2001 and 5th European Conference on Computational Learning Theory, EuroCOLT 2001, Amsterdam, The Netherlands, July 16–19, 2001,
1701: 1340: 732: 583: 1054:{\displaystyle f(x_{i})=\sum _{j=1}^{n}c_{j}\mathbf {K} _{ij},{\text{ and }}\|f\|_{\mathcal {H}}^{2}=\langle f,f\rangle _{\mathcal {H}}=\sum _{i=1}^{n}\sum _{j=1}^{n}c_{i}c_{j}K(x_{i},x_{j})=c^{T}\mathbf {K} c.} 1071: 261: 165: 1414: 618: 650: 1916:{\displaystyle f={\underset {f\in {\mathcal {H}}}{\operatorname {argmin} }}\left\{{\frac {1}{n}}\sum _{i=1}^{n}{\big (}1-yf(x){\big )}_{+}+\lambda \|f\|_{\mathcal {H}}^{2}\right\}.} 1172: 1758: 1394: 2170: 2116:{\displaystyle f={\underset {f\in {\mathcal {H}}}{\operatorname {argmin} }}\left\{C\sum _{i=1}^{n}{\big (}1-yf(x){\big )}_{+}+{\frac {1}{2}}\|f\|_{\mathcal {H}}^{2}\right\}} 1225: 1123: 681: 535: 1961: 788: 503:{\displaystyle f={\underset {f\in {\mathcal {H}}}{\operatorname {argmin} }}\left\{{\frac {1}{n}}\sum _{i=1}^{n}V(y_{i},f(x_{i}))+\lambda \|f\|_{\mathcal {H}}^{2}\right\},} 758: 1586:
The Tikhonov regularization problem can be shown to be equivalent to traditional formulations of SVM by expressing it in terms of the hinge loss. With the hinge loss
338: 315: 288: 1231:, which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0–1 loss. The hinge loss, 59:
Regularization perspectives on support-vector machines interpret SVM as a special case of Tikhonov regularization, specifically Tikhonov regularization with the
87:
data into two categories. This traditional geometric interpretation of SVMs provides useful intuition about how SVMs work, but is difficult to relate to other
63:
for a loss function. This provides a theoretical framework with which to analyze SVM algorithms and compare them to other algorithms with the same goals: to
2413:
SchΓΆlkopf, Bernhard; Herbrich, Ralf; Smola, Alexander J. (2001). "A generalized representer theorem". In Helmbold, David P.; Williamson, Robert C. (eds.).
1070: 28:(SVMs) in the context of other regularization-based machine-learning algorithms. SVM algorithms categorize binary data, with the goal of fitting the 2556: 1592: 1234: 2566: 693: 544: 40:
and in the L2 norm sense and also corresponds to minimizing the bias and variance of our estimator of the weights. Estimators with lower
2386:
Wahba, Grace; Yonghua Wang (1990). "When is the optimal regularization parameter insensitive to the choice of the loss function".
2585: 684: 170: 2590: 32:
data in a way that minimizes the average of the hinge-loss function and L2 norm of the learned weights. This strategy avoids
1076:
The simplest and most intuitive loss function for categorization is the misclassification loss, or 0–1 loss, which is 0 if
134: 1571:{\displaystyle f_{b}(x)={\begin{cases}1,&p(1\mid x)>p(-1\mid x),\\-1,&p(1\mid x)<p(-1\mid x).\end{cases}}} 341: 92: 121: 763: 588: 653: 627: 1175: 48: 37: 2536: 2500: 2327: 2279: 129: 25: 2318:
Rosasco L.; De Vito E.; Caponnetto A.; Piana M.; Verri A. (May 2004). "Are Loss Functions All the Same".
2223: 1128: 17: 1709: 1397: 1345: 51:
algorithms produce a decision boundary that minimizes the average training-set error and constrain the
2129: 538: 84: 766: 2332: 1445: 1181: 1079: 662: 516: 2284: 1929: 791: 771: 2523: 2461: 2438: 2353: 2297: 621: 104: 41: 737: 2562: 2345: 1405: 52: 2515: 2453: 2419: 2395: 2337: 2289: 2202: 88: 320: 293: 266: 1228: 688: 76: 2481: 2241: 96: 72: 2483:
Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning
2243:
Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning
2579: 2372: 2301: 2263: 2465: 2357: 108: 29: 348:
by choosing a function that fits the data, but is not too complex. Specifically:
2527: 2418:. Lecture Notes in Computer Science. Vol. 2111. Springer. pp. 416–426. 2267: 1401: 345: 68: 33: 2341: 2293: 1404:
to the 0–1 misclassification loss function, and with infinite data returns the
2519: 2457: 2399: 80: 64: 60: 2423: 1696:{\displaystyle V{\big (}y_{i},f(x_{i}){\big )}={\big (}1-yf(x){\big )}_{+},} 760: 125: 2349: 1335:{\displaystyle V{\big (}y_{i},f(x_{i}){\big )}={\big (}1-yf(x){\big )}_{+}} 100: 2207: 2190: 727:{\displaystyle K\colon \mathbf {X} \times \mathbf {X} \to \mathbb {R} } 578:{\displaystyle V\colon \mathbf {Y} \times \mathbf {Y} \to \mathbb {R} } 2540: 2499:
Evgeniou, Theodoros; Massimiliano Pontil; Tomaso Poggio (2000).
2172:, which is equivalent to the standard SVM minimization problem. 2439:"Support Vector Machines and the Bayes Rule in Classification" 2097: 1993: 1894: 1790: 924: 892: 668: 604: 522: 481: 378: 44:
predict better or generalize better when given unseen data.
1564: 256:{\displaystyle S=\{(x_{1},y_{1}),\ldots ,(x_{n},y_{n})\}} 2313: 2311: 2224:"Regularized Least-Squares and Support Vector Machines" 107:. However, once it was discovered that SVM is also a 22:
Regularization perspectives on support-vector machines
2501:"Regularization Networks and Support Vector Machines" 2375:
of functions with norm formed from the loss function.
2132: 1972: 1932: 1769: 1712: 1595: 1417: 1348: 1237: 1184: 1131: 1082: 803: 774: 740: 696: 665: 630: 591: 547: 519: 357: 323: 296: 269: 173: 137: 160:{\displaystyle f\colon \mathbf {X} \to \mathbf {Y} } 79:, and framed geometrically as a method for finding 2384:For insight on choosing the parameter, see, e.g., 2164: 2115: 1955: 1915: 1752: 1695: 1570: 1400:. In fact, the hinge loss is the tightest convex 1388: 1334: 1219: 1166: 1117: 1053: 782: 752: 726: 675: 644: 612: 577: 529: 502: 332: 309: 282: 255: 159: 2388:Communications in Statistics – Theory and Methods 2270:(2012). "Multicategory Support Vector Machines". 2258: 2256: 1732: 1368: 2272:Journal of the American Statistical Association 2061: 2032: 1865: 1836: 1679: 1650: 1640: 1601: 1321: 1292: 1282: 1243: 8: 2092: 2085: 1889: 1882: 919: 906: 887: 880: 599: 592: 476: 469: 250: 180: 624:on the hypothesis space of functions, and 91:techniques for avoiding overfitting, like 2558:The Nature of Statistical Learning Theory 2331: 2283: 2206: 2189:Cortes, Corinna; Vladimir Vapnik (1995). 2142: 2131: 2102: 2096: 2095: 2075: 2066: 2060: 2059: 2031: 2030: 2024: 2013: 1992: 1991: 1979: 1971: 1936: 1931: 1899: 1893: 1892: 1870: 1864: 1863: 1835: 1834: 1828: 1817: 1803: 1789: 1788: 1776: 1768: 1723: 1711: 1684: 1678: 1677: 1649: 1648: 1639: 1638: 1629: 1610: 1600: 1599: 1594: 1440: 1422: 1416: 1359: 1347: 1326: 1320: 1319: 1291: 1290: 1281: 1280: 1271: 1252: 1242: 1241: 1236: 1208: 1192: 1183: 1158: 1142: 1130: 1109: 1093: 1081: 1040: 1034: 1018: 1005: 989: 979: 969: 958: 948: 937: 923: 922: 897: 891: 890: 875: 863: 858: 851: 841: 830: 814: 802: 775: 773: 739: 720: 719: 711: 703: 695: 667: 666: 664: 638: 637: 629: 613:{\displaystyle \|\cdot \|_{\mathcal {H}}} 603: 602: 590: 571: 570: 562: 554: 546: 521: 520: 518: 486: 480: 479: 451: 432: 416: 405: 391: 377: 376: 364: 356: 322: 301: 295: 274: 268: 241: 228: 203: 190: 172: 152: 144: 136: 645:{\displaystyle \lambda \in \mathbb {R} } 2181: 1227:. However, this loss function is not 2508:Advances in Computational Mathematics 1760:, the regularization problem becomes 7: 1065:Special properties of the hinge loss 71:. SVM was first proposed in 1995 by 2446:Data Mining and Knowledge Discovery 1167:{\displaystyle f(x_{i})\neq y_{i}} 14: 1753:{\displaystyle (s)_{+}=\max(s,0)} 1389:{\displaystyle (s)_{+}=\max(s,0)} 2165:{\displaystyle C=1/(2\lambda n)} 1069: 1041: 859: 776: 712: 704: 685:reproducing kernel Hilbert space 563: 555: 153: 145: 2478:For a detailed derivation, see 112:loss function, the hinge loss. 2159: 2147: 2055: 2049: 1950: 1941: 1859: 1853: 1747: 1735: 1720: 1713: 1673: 1667: 1635: 1622: 1555: 1540: 1531: 1519: 1495: 1480: 1471: 1459: 1434: 1428: 1383: 1371: 1356: 1349: 1315: 1309: 1277: 1264: 1220:{\displaystyle -y_{i}f(x_{i})} 1214: 1201: 1148: 1135: 1118:{\displaystyle f(x_{i})=y_{i}} 1099: 1086: 1024: 998: 820: 807: 716: 676:{\displaystyle {\mathcal {H}}} 567: 530:{\displaystyle {\mathcal {H}}} 460: 457: 444: 425: 247: 221: 209: 183: 149: 24:provide a way of interpreting 1: 2561:. New York: Springer-Verlag. 1956:{\displaystyle 1/(2\lambda )} 128:is a strategy for choosing a 783:{\displaystyle \mathbf {K} } 122:statistical learning theory 2607: 2342:10.1162/089976604773135104 2294:10.1198/016214504000000098 734:that can be written as an 2555:Vapnik, Vladimir (1999). 2400:10.1080/03610929008830285 2191:"Support-Vector Networks" 753:{\displaystyle n\times n} 2424:10.1007/3-540-44581-1_27 654:regularization parameter 317:(the labels are usually 2586:Support vector machines 2520:10.1023/A:1018946025316 2458:10.1023/A:1015469627679 1176:Heaviside step function 49:Tikhonov regularization 38:Tikhonov regularization 26:support-vector machines 2166: 2117: 2029: 1957: 1917: 1833: 1754: 1697: 1572: 1390: 1336: 1221: 1168: 1119: 1055: 974: 953: 846: 784: 754: 728: 677: 646: 614: 585:is the loss function, 579: 531: 504: 421: 334: 311: 284: 257: 161: 116:Theoretical background 2591:Mathematical analysis 2480:Rifkin, Ryan (2002). 2437:Lin, Yi (July 2002). 2240:Rifkin, Ryan (2002). 2167: 2118: 2009: 1958: 1918: 1813: 1755: 1698: 1573: 1391: 1337: 1222: 1169: 1120: 1056: 954: 933: 826: 785: 755: 729: 678: 647: 615: 580: 532: 505: 401: 335: 333:{\displaystyle \pm 1} 312: 310:{\displaystyle y_{i}} 285: 283:{\displaystyle x_{i}} 258: 167:given a training set 162: 18:mathematical analysis 2535:Joachims, Thorsten. 2176:Notes and references 2130: 1970: 1930: 1767: 1710: 1593: 1415: 1346: 1235: 1182: 1129: 1080: 801: 772: 738: 694: 663: 628: 589: 545: 517: 355: 321: 294: 267: 171: 135: 2489:. MIT (PhD thesis). 2249:. MIT (PhD thesis). 2107: 1904: 1408:-optimal solution: 902: 792:representer theorem 491: 2320:Neural Computation 2222:Rosasco, Lorenzo. 2208:10.1007/BF00994018 2162: 2113: 2091: 1999: 1953: 1913: 1888: 1796: 1750: 1693: 1568: 1563: 1396:, provides such a 1386: 1332: 1217: 1164: 1115: 1051: 886: 780: 750: 724: 673: 642: 610: 575: 527: 500: 475: 384: 330: 307: 280: 253: 157: 105:Bayesian inference 83:that can separate 42:Mean squared error 2568:978-0-387-98780-4 2083: 1980: 1811: 1777: 1398:convex relaxation 878: 764:positive-definite 687:, there exists a 399: 365: 344:strategies avoid 290:and their labels 53:Decision boundary 2598: 2572: 2551: 2549: 2548: 2539:. Archived from 2531: 2505: 2491: 2490: 2488: 2476: 2470: 2469: 2443: 2434: 2428: 2427: 2410: 2404: 2403: 2394:(5): 1685–1700. 2382: 2376: 2368: 2362: 2361: 2335: 2326:(5): 1063–1076. 2315: 2306: 2305: 2287: 2260: 2251: 2250: 2248: 2237: 2231: 2230: 2228: 2219: 2213: 2212: 2210: 2195:Machine Learning 2186: 2171: 2169: 2168: 2163: 2146: 2122: 2120: 2119: 2114: 2112: 2108: 2106: 2101: 2100: 2084: 2076: 2071: 2070: 2065: 2064: 2036: 2035: 2028: 2023: 2000: 1998: 1997: 1996: 1962: 1960: 1959: 1954: 1940: 1922: 1920: 1919: 1914: 1909: 1905: 1903: 1898: 1897: 1875: 1874: 1869: 1868: 1840: 1839: 1832: 1827: 1812: 1804: 1797: 1795: 1794: 1793: 1759: 1757: 1756: 1751: 1728: 1727: 1702: 1700: 1699: 1694: 1689: 1688: 1683: 1682: 1654: 1653: 1644: 1643: 1634: 1633: 1615: 1614: 1605: 1604: 1577: 1575: 1574: 1569: 1567: 1566: 1427: 1426: 1395: 1393: 1392: 1387: 1364: 1363: 1341: 1339: 1338: 1333: 1331: 1330: 1325: 1324: 1296: 1295: 1286: 1285: 1276: 1275: 1257: 1256: 1247: 1246: 1226: 1224: 1223: 1218: 1213: 1212: 1197: 1196: 1173: 1171: 1170: 1165: 1163: 1162: 1147: 1146: 1124: 1122: 1121: 1116: 1114: 1113: 1098: 1097: 1073: 1060: 1058: 1057: 1052: 1044: 1039: 1038: 1023: 1022: 1010: 1009: 994: 993: 984: 983: 973: 968: 952: 947: 929: 928: 927: 901: 896: 895: 879: 876: 871: 870: 862: 856: 855: 845: 840: 819: 818: 789: 787: 786: 781: 779: 759: 757: 756: 751: 733: 731: 730: 725: 723: 715: 707: 682: 680: 679: 674: 672: 671: 651: 649: 648: 643: 641: 619: 617: 616: 611: 609: 608: 607: 584: 582: 581: 576: 574: 566: 558: 539:hypothesis space 536: 534: 533: 528: 526: 525: 509: 507: 506: 501: 496: 492: 490: 485: 484: 456: 455: 437: 436: 420: 415: 400: 392: 385: 383: 382: 381: 339: 337: 336: 331: 316: 314: 313: 308: 306: 305: 289: 287: 286: 281: 279: 278: 262: 260: 259: 254: 246: 245: 233: 232: 208: 207: 195: 194: 166: 164: 163: 158: 156: 148: 89:machine-learning 85:multidimensional 2606: 2605: 2601: 2600: 2599: 2597: 2596: 2595: 2576: 2575: 2569: 2554: 2546: 2544: 2534: 2503: 2498: 2495: 2494: 2486: 2479: 2477: 2473: 2441: 2436: 2435: 2431: 2412: 2411: 2407: 2385: 2383: 2379: 2369: 2365: 2333:10.1.1.109.6786 2317: 2316: 2309: 2262: 2261: 2254: 2246: 2239: 2238: 2234: 2226: 2221: 2220: 2216: 2188: 2187: 2183: 2178: 2128: 2127: 2058: 2005: 2001: 1984: 1968: 1967: 1928: 1927: 1926:Multiplying by 1862: 1802: 1798: 1781: 1765: 1764: 1719: 1708: 1707: 1676: 1625: 1606: 1591: 1590: 1584: 1562: 1561: 1514: 1502: 1501: 1454: 1441: 1418: 1413: 1412: 1355: 1344: 1343: 1318: 1267: 1248: 1233: 1232: 1204: 1188: 1180: 1179: 1154: 1138: 1127: 1126: 1105: 1089: 1078: 1077: 1067: 1030: 1014: 1001: 985: 975: 918: 877: and  857: 847: 810: 799: 798: 770: 769: 736: 735: 692: 691: 689:kernel function 661: 660: 626: 625: 598: 587: 586: 543: 542: 515: 514: 447: 428: 390: 386: 369: 353: 352: 319: 318: 297: 292: 291: 270: 265: 264: 237: 224: 199: 186: 169: 168: 133: 132: 118: 77:Vladimir Vapnik 12: 11: 5: 2604: 2602: 2594: 2593: 2588: 2578: 2577: 2574: 2573: 2567: 2552: 2532: 2493: 2492: 2471: 2452:(3): 259–275. 2429: 2405: 2377: 2363: 2307: 2285:10.1.1.22.1879 2278:(465): 67–81. 2264:Lee, Yoonkyung 2252: 2232: 2214: 2201:(3): 273–297. 2180: 2179: 2177: 2174: 2161: 2158: 2155: 2152: 2149: 2145: 2141: 2138: 2135: 2124: 2123: 2111: 2105: 2099: 2094: 2090: 2087: 2082: 2079: 2074: 2069: 2063: 2057: 2054: 2051: 2048: 2045: 2042: 2039: 2034: 2027: 2022: 2019: 2016: 2012: 2008: 2004: 1995: 1990: 1987: 1983: 1978: 1975: 1952: 1949: 1946: 1943: 1939: 1935: 1924: 1923: 1912: 1908: 1902: 1896: 1891: 1887: 1884: 1881: 1878: 1873: 1867: 1861: 1858: 1855: 1852: 1849: 1846: 1843: 1838: 1831: 1826: 1823: 1820: 1816: 1810: 1807: 1801: 1792: 1787: 1784: 1780: 1775: 1772: 1749: 1746: 1743: 1740: 1737: 1734: 1731: 1726: 1722: 1718: 1715: 1704: 1703: 1692: 1687: 1681: 1675: 1672: 1669: 1666: 1663: 1660: 1657: 1652: 1647: 1642: 1637: 1632: 1628: 1624: 1621: 1618: 1613: 1609: 1603: 1598: 1583: 1580: 1579: 1578: 1565: 1560: 1557: 1554: 1551: 1548: 1545: 1542: 1539: 1536: 1533: 1530: 1527: 1524: 1521: 1518: 1515: 1513: 1510: 1507: 1504: 1503: 1500: 1497: 1494: 1491: 1488: 1485: 1482: 1479: 1476: 1473: 1470: 1467: 1464: 1461: 1458: 1455: 1453: 1450: 1447: 1446: 1444: 1439: 1436: 1433: 1430: 1425: 1421: 1385: 1382: 1379: 1376: 1373: 1370: 1367: 1362: 1358: 1354: 1351: 1329: 1323: 1317: 1314: 1311: 1308: 1305: 1302: 1299: 1294: 1289: 1284: 1279: 1274: 1270: 1266: 1263: 1260: 1255: 1251: 1245: 1240: 1216: 1211: 1207: 1203: 1200: 1195: 1191: 1187: 1161: 1157: 1153: 1150: 1145: 1141: 1137: 1134: 1112: 1108: 1104: 1101: 1096: 1092: 1088: 1085: 1066: 1063: 1062: 1061: 1050: 1047: 1043: 1037: 1033: 1029: 1026: 1021: 1017: 1013: 1008: 1004: 1000: 997: 992: 988: 982: 978: 972: 967: 964: 961: 957: 951: 946: 943: 940: 936: 932: 926: 921: 917: 914: 911: 908: 905: 900: 894: 889: 885: 882: 874: 869: 866: 861: 854: 850: 844: 839: 836: 833: 829: 825: 822: 817: 813: 809: 806: 778: 749: 746: 743: 722: 718: 714: 710: 706: 702: 699: 670: 640: 636: 633: 606: 601: 597: 594: 573: 569: 565: 561: 557: 553: 550: 541:of functions, 524: 511: 510: 499: 495: 489: 483: 478: 474: 471: 468: 465: 462: 459: 454: 450: 446: 443: 440: 435: 431: 427: 424: 419: 414: 411: 408: 404: 398: 395: 389: 380: 375: 372: 368: 363: 360: 342:Regularization 329: 326: 304: 300: 277: 273: 252: 249: 244: 240: 236: 231: 227: 223: 220: 217: 214: 211: 206: 202: 198: 193: 189: 185: 182: 179: 176: 155: 151: 147: 143: 140: 124:framework, an 117: 114: 97:early stopping 93:regularization 73:Corinna Cortes 47:Specifically, 13: 10: 9: 6: 4: 3: 2: 2603: 2592: 2589: 2587: 2584: 2583: 2581: 2570: 2564: 2560: 2559: 2553: 2543:on 2015-04-19 2542: 2538: 2533: 2529: 2525: 2521: 2517: 2513: 2509: 2502: 2497: 2496: 2485: 2484: 2475: 2472: 2467: 2463: 2459: 2455: 2451: 2447: 2440: 2433: 2430: 2425: 2421: 2417: 2409: 2406: 2401: 2397: 2393: 2389: 2381: 2378: 2374: 2373:Hilbert space 2367: 2364: 2359: 2355: 2351: 2347: 2343: 2339: 2334: 2329: 2325: 2321: 2314: 2312: 2308: 2303: 2299: 2295: 2291: 2286: 2281: 2277: 2273: 2269: 2265: 2259: 2257: 2253: 2245: 2244: 2236: 2233: 2225: 2218: 2215: 2209: 2204: 2200: 2196: 2192: 2185: 2182: 2175: 2173: 2156: 2153: 2150: 2143: 2139: 2136: 2133: 2109: 2103: 2088: 2080: 2077: 2072: 2067: 2052: 2046: 2043: 2040: 2037: 2025: 2020: 2017: 2014: 2010: 2006: 2002: 1988: 1985: 1981: 1976: 1973: 1966: 1965: 1964: 1947: 1944: 1937: 1933: 1910: 1906: 1900: 1885: 1879: 1876: 1871: 1856: 1850: 1847: 1844: 1841: 1829: 1824: 1821: 1818: 1814: 1808: 1805: 1799: 1785: 1782: 1778: 1773: 1770: 1763: 1762: 1761: 1744: 1741: 1738: 1729: 1724: 1716: 1690: 1685: 1670: 1664: 1661: 1658: 1655: 1645: 1630: 1626: 1619: 1616: 1611: 1607: 1596: 1589: 1588: 1587: 1581: 1558: 1552: 1549: 1546: 1543: 1537: 1534: 1528: 1525: 1522: 1516: 1511: 1508: 1505: 1498: 1492: 1489: 1486: 1483: 1477: 1474: 1468: 1465: 1462: 1456: 1451: 1448: 1442: 1437: 1431: 1423: 1419: 1411: 1410: 1409: 1407: 1403: 1399: 1380: 1377: 1374: 1365: 1360: 1352: 1327: 1312: 1306: 1303: 1300: 1297: 1287: 1272: 1268: 1261: 1258: 1253: 1249: 1238: 1230: 1209: 1205: 1198: 1193: 1189: 1185: 1177: 1159: 1155: 1151: 1143: 1139: 1132: 1110: 1106: 1102: 1094: 1090: 1083: 1074: 1072: 1064: 1048: 1045: 1035: 1031: 1027: 1019: 1015: 1011: 1006: 1002: 995: 990: 986: 980: 976: 970: 965: 962: 959: 955: 949: 944: 941: 938: 934: 930: 915: 912: 909: 903: 898: 883: 872: 867: 864: 852: 848: 842: 837: 834: 831: 827: 823: 815: 811: 804: 797: 796: 795: 793: 768: 765: 762: 747: 744: 741: 708: 700: 697: 690: 686: 657: 655: 634: 631: 623: 595: 559: 551: 548: 540: 497: 493: 487: 472: 466: 463: 452: 448: 441: 438: 433: 429: 422: 417: 412: 409: 406: 402: 396: 393: 387: 373: 370: 366: 361: 358: 351: 350: 349: 347: 343: 327: 324: 302: 298: 275: 271: 242: 238: 234: 229: 225: 218: 215: 212: 204: 200: 196: 191: 187: 177: 174: 141: 138: 131: 127: 123: 115: 113: 110: 106: 102: 98: 94: 90: 86: 82: 78: 74: 70: 66: 62: 57: 54: 50: 45: 43: 39: 35: 31: 27: 23: 19: 2557: 2545:. Retrieved 2541:the original 2511: 2507: 2482: 2474: 2449: 2445: 2432: 2414: 2408: 2391: 2387: 2380: 2366: 2323: 2319: 2275: 2271: 2268:Wahba, Grace 2242: 2235: 2217: 2198: 2194: 2184: 2125: 1925: 1705: 1585: 1075: 1068: 658: 512: 119: 109:special case 58: 46: 30:training set 21: 15: 2514:(1): 1–50. 2416:Proceedings 1402:upper bound 1174:, i.e. the 346:overfitting 81:hyperplanes 69:overfitting 34:overfitting 2580:Categories 2547:2012-05-18 2537:"SVMlight" 1582:Derivation 790:. By the 263:of inputs 65:generalize 61:hinge loss 2328:CiteSeerX 2302:261035640 2280:CiteSeerX 2154:λ 2093:‖ 2086:‖ 2041:− 2011:∑ 1989:∈ 1948:λ 1890:‖ 1883:‖ 1880:λ 1845:− 1815:∑ 1786:∈ 1659:− 1550:∣ 1544:− 1526:∣ 1506:− 1490:∣ 1484:− 1466:∣ 1301:− 1186:− 1152:≠ 1125:and 1 if 956:∑ 935:∑ 920:⟩ 907:⟨ 888:‖ 881:‖ 828:∑ 761:symmetric 745:× 717:→ 709:× 701:: 635:∈ 632:λ 600:‖ 596:⋅ 593:‖ 568:→ 560:× 552:: 477:‖ 470:‖ 467:λ 403:∑ 374:∈ 325:± 216:… 150:→ 142:: 126:algorithm 2466:24759201 2358:11845688 2350:15070510 1342:, where 130:function 101:sparsity 67:without 1963:yields 652:is the 120:In the 16:Within 2565:  2526:  2464:  2356:  2348:  2330:  2300:  2282:  1982:argmin 1779:argmin 1706:where 1229:convex 767:matrix 513:where 367:argmin 2528:70866 2524:S2CID 2504:(PDF) 2487:(PDF) 2462:S2CID 2442:(PDF) 2354:S2CID 2322:. 5. 2298:S2CID 2247:(PDF) 2227:(PDF) 2126:with 1406:Bayes 683:is a 659:When 620:is a 537:is a 2563:ISBN 2346:PMID 1535:< 1475:> 622:norm 340:). 103:and 75:and 36:via 2516:doi 2454:doi 2420:doi 2396:doi 2338:doi 2290:doi 2203:doi 1733:max 1369:max 1178:on 2582:: 2522:. 2512:13 2510:. 2506:. 2460:. 2448:. 2444:. 2392:19 2390:. 2352:. 2344:. 2336:. 2324:16 2310:^ 2296:. 2288:. 2276:99 2274:. 2266:; 2255:^ 2199:20 2197:. 2193:. 794:, 656:. 99:, 95:, 20:, 2571:. 2550:. 2530:. 2518:: 2468:. 2456:: 2450:6 2426:. 2422:: 2402:. 2398:: 2360:. 2340:: 2304:. 2292:: 2229:. 2211:. 2205:: 2160:) 2157:n 2151:2 2148:( 2144:/ 2140:1 2137:= 2134:C 2110:} 2104:2 2098:H 2089:f 2081:2 2078:1 2073:+ 2068:+ 2062:) 2056:) 2053:x 2050:( 2047:f 2044:y 2038:1 2033:( 2026:n 2021:1 2018:= 2015:i 2007:C 2003:{ 1994:H 1986:f 1977:= 1974:f 1951:) 1945:2 1942:( 1938:/ 1934:1 1911:. 1907:} 1901:2 1895:H 1886:f 1877:+ 1872:+ 1866:) 1860:) 1857:x 1854:( 1851:f 1848:y 1842:1 1837:( 1830:n 1825:1 1822:= 1819:i 1809:n 1806:1 1800:{ 1791:H 1783:f 1774:= 1771:f 1748:) 1745:0 1742:, 1739:s 1736:( 1730:= 1725:+ 1721:) 1717:s 1714:( 1691:, 1686:+ 1680:) 1674:) 1671:x 1668:( 1665:f 1662:y 1656:1 1651:( 1646:= 1641:) 1636:) 1631:i 1627:x 1623:( 1620:f 1617:, 1612:i 1608:y 1602:( 1597:V 1559:. 1556:) 1553:x 1547:1 1541:( 1538:p 1532:) 1529:x 1523:1 1520:( 1517:p 1512:, 1509:1 1499:, 1496:) 1493:x 1487:1 1481:( 1478:p 1472:) 1469:x 1463:1 1460:( 1457:p 1452:, 1449:1 1443:{ 1438:= 1435:) 1432:x 1429:( 1424:b 1420:f 1384:) 1381:0 1378:, 1375:s 1372:( 1366:= 1361:+ 1357:) 1353:s 1350:( 1328:+ 1322:) 1316:) 1313:x 1310:( 1307:f 1304:y 1298:1 1293:( 1288:= 1283:) 1278:) 1273:i 1269:x 1265:( 1262:f 1259:, 1254:i 1250:y 1244:( 1239:V 1215:) 1210:i 1206:x 1202:( 1199:f 1194:i 1190:y 1160:i 1156:y 1149:) 1144:i 1140:x 1136:( 1133:f 1111:i 1107:y 1103:= 1100:) 1095:i 1091:x 1087:( 1084:f 1049:. 1046:c 1042:K 1036:T 1032:c 1028:= 1025:) 1020:j 1016:x 1012:, 1007:i 1003:x 999:( 996:K 991:j 987:c 981:i 977:c 971:n 966:1 963:= 960:j 950:n 945:1 942:= 939:i 931:= 925:H 916:f 913:, 910:f 904:= 899:2 893:H 884:f 873:, 868:j 865:i 860:K 853:j 849:c 843:n 838:1 835:= 832:j 824:= 821:) 816:i 812:x 808:( 805:f 777:K 748:n 742:n 721:R 713:X 705:X 698:K 669:H 639:R 605:H 572:R 564:Y 556:Y 549:V 523:H 498:, 494:} 488:2 482:H 473:f 464:+ 461:) 458:) 453:i 449:x 445:( 442:f 439:, 434:i 430:y 426:( 423:V 418:n 413:1 410:= 407:i 397:n 394:1 388:{ 379:H 371:f 362:= 359:f 328:1 303:i 299:y 276:i 272:x 251:} 248:) 243:n 239:y 235:, 230:n 226:x 222:( 219:, 213:, 210:) 205:1 201:y 197:, 192:1 188:x 184:( 181:{ 178:= 175:S 154:Y 146:X 139:f

Index

mathematical analysis
support-vector machines
training set
overfitting
Tikhonov regularization
Mean squared error
Tikhonov regularization
Decision boundary
hinge loss
generalize
overfitting
Corinna Cortes
Vladimir Vapnik
hyperplanes
multidimensional
machine-learning
regularization
early stopping
sparsity
Bayesian inference
special case
statistical learning theory
algorithm
function
Regularization
overfitting
hypothesis space
norm
regularization parameter
reproducing kernel Hilbert space

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑