Regularization perspectives on support vector machines

1059: 1921: 2121: 508: 111:

of Tikhonov regularization, regularization perspectives on SVM provided the theory necessary to fit SVM within a broader class of algorithms. This has enabled detailed comparisons between SVM and other forms of Tikhonov regularization, and theoretical grounding for why it is beneficial to use SVM's

800: 1576: 55:

not to be excessively complicated or overfit the training data via a L2 norm of the weights term. The training and test-set errors can be measured without bias and in a fair way using accuracy, precision, Auc-Roc, precision-recall, and other metrics.

1766: 1969: 354: 2370:

A hypothesis space is the set of functions used to model the data in a machine-learning problem. Each function corresponds to a hypothesis about the structure of the data. Typically the functions in a hypothesis space form a

2415:

Computational Learning Theory, 14th Annual Conference on Computational Learning Theory, COLT 2001 and 5th European Conference on Computational Learning Theory, EuroCOLT 2001, Amsterdam, The Netherlands, July 16–19, 2001,

1701: 1340: 732: 583: 1054:{\displaystyle f(x_{i})=\sum _{j=1}^{n}c_{j}\mathbf {K} _{ij},{\text{ and }}\|f\|_{\mathcal {H}}^{2}=\langle f,f\rangle _{\mathcal {H}}=\sum _{i=1}^{n}\sum _{j=1}^{n}c_{i}c_{j}K(x_{i},x_{j})=c^{T}\mathbf {K} c.} 1071: 261: 165: 1414: 618: 650: 1916:{\displaystyle f={\underset {f\in {\mathcal {H}}}{\operatorname {argmin} }}\left\{{\frac {1}{n}}\sum _{i=1}^{n}{\big (}1-yf(x){\big )}_{+}+\lambda \|f\|_{\mathcal {H}}^{2}\right\}.} 1172: 1758: 1394: 2170: 2116:{\displaystyle f={\underset {f\in {\mathcal {H}}}{\operatorname {argmin} }}\left\{C\sum _{i=1}^{n}{\big (}1-yf(x){\big )}_{+}+{\frac {1}{2}}\|f\|_{\mathcal {H}}^{2}\right\}} 1225: 1123: 681: 535: 1961: 788: 503:{\displaystyle f={\underset {f\in {\mathcal {H}}}{\operatorname {argmin} }}\left\{{\frac {1}{n}}\sum _{i=1}^{n}V(y_{i},f(x_{i}))+\lambda \|f\|_{\mathcal {H}}^{2}\right\},} 758: 1586:

The Tikhonov regularization problem can be shown to be equivalent to traditional formulations of SVM by expressing it in terms of the hinge loss. With the hinge loss

338: 315: 288: 1231:, which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0–1 loss. The hinge loss, 59:

Regularization perspectives on support-vector machines interpret SVM as a special case of Tikhonov regularization, specifically Tikhonov regularization with the

87:

data into two categories. This traditional geometric interpretation of SVMs provides useful intuition about how SVMs work, but is difficult to relate to other

63:

for a loss function. This provides a theoretical framework with which to analyze SVM algorithms and compare them to other algorithms with the same goals: to

2413:

Schölkopf, Bernhard; Herbrich, Ralf; Smola, Alexander J. (2001). "A generalized representer theorem". In Helmbold, David P.; Williamson, Robert C. (eds.).

1070: 28:(SVMs) in the context of other regularization-based machine-learning algorithms. SVM algorithms categorize binary data, with the goal of fitting the 2556: 1592: 1234: 2566: 693: 544: 40:

and in the L2 norm sense and also corresponds to minimizing the bias and variance of our estimator of the weights. Estimators with lower

2386:

Wahba, Grace; Yonghua Wang (1990). "When is the optimal regularization parameter insensitive to the choice of the loss function".

2585: 684: 170: 2590: 32:

data in a way that minimizes the average of the hinge-loss function and L2 norm of the learned weights. This strategy avoids

1076:

The simplest and most intuitive loss function for categorization is the misclassification loss, or 0–1 loss, which is 0 if

134: 1571:{\displaystyle f_{b}(x)={\begin{cases}1,&p(1\mid x)>p(-1\mid x),\\-1,&p(1\mid x)<p(-1\mid x).\end{cases}}} 341: 92: 121: 763: 588: 653: 627: 1175: 48: 37: 2536: 2500: 2327: 2279: 129: 25: 2318:

Rosasco L.; De Vito E.; Caponnetto A.; Piana M.; Verri A. (May 2004). "Are Loss Functions All the Same".

2223: 1128: 17: 1709: 1397: 1345: 51:

algorithms produce a decision boundary that minimizes the average training-set error and constrain the

2129: 538: 84: 766: 2332: 1445: 1181: 1079: 662: 516: 2284: 1929: 791: 771: 2523: 2461: 2438: 2353: 2297: 621: 104: 41: 737: 2562: 2345: 1405: 52: 2515: 2453: 2419: 2395: 2337: 2289: 2202: 88: 320: 293: 266: 1228: 688: 76: 2481: 2241: 96: 72: 2483:

Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning

2243:

Everything Old is New Again: A Fresh Look at Historical Approaches in Machine Learning

2579: 2372: 2301: 2263: 2465: 2357: 108: 29: 348:

by choosing a function that fits the data, but is not too complex. Specifically:

2527: 2418:. Lecture Notes in Computer Science. Vol. 2111. Springer. pp. 416–426. 2267: 1401: 345: 68: 33: 2341: 2293: 1404:

to the 0–1 misclassification loss function, and with infinite data returns the

2519: 2457: 2399: 80: 64: 60: 2423: 1696:{\displaystyle V{\big (}y_{i},f(x_{i}){\big )}={\big (}1-yf(x){\big )}_{+},} 760: 125: 2349: 1335:{\displaystyle V{\big (}y_{i},f(x_{i}){\big )}={\big (}1-yf(x){\big )}_{+}} 100: 2207: 2190: 727:{\displaystyle K\colon \mathbf {X} \times \mathbf {X} \to \mathbb {R} } 578:{\displaystyle V\colon \mathbf {Y} \times \mathbf {Y} \to \mathbb {R} } 2540: 2499:

Evgeniou, Theodoros; Massimiliano Pontil; Tomaso Poggio (2000).

2172:, which is equivalent to the standard SVM minimization problem. 2439:"Support Vector Machines and the Bayes Rule in Classification" 2097: 1993: 1894: 1790: 924: 892: 668: 604: 522: 481: 378: 44:

predict better or generalize better when given unseen data.

1564: 256:{\displaystyle S=\{(x_{1},y_{1}),\ldots ,(x_{n},y_{n})\}} 2313: 2311: 2224:"Regularized Least-Squares and Support Vector Machines" 107:. However, once it was discovered that SVM is also a 22:

Regularization perspectives on support-vector machines

2501:"Regularization Networks and Support Vector Machines" 2375:

of functions with norm formed from the loss function.

2132: 1972: 1932: 1769: 1712: 1595: 1417: 1348: 1237: 1184: 1131: 1082: 803: 774: 740: 696: 665: 630: 591: 547: 519: 357: 323: 296: 269: 173: 137: 160:{\displaystyle f\colon \mathbf {X} \to \mathbf {Y} } 79:, and framed geometrically as a method for finding 2384:For insight on choosing the parameter, see, e.g., 2164: 2115: 1955: 1915: 1752: 1695: 1570: 1400:. In fact, the hinge loss is the tightest convex 1388: 1334: 1219: 1166: 1117: 1053: 782: 752: 726: 675: 644: 612: 577: 529: 502: 332: 309: 282: 255: 159: 2388:Communications in Statistics – Theory and Methods 2270:(2012). "Multicategory Support Vector Machines". 2258: 2256: 1732: 1368: 2272:Journal of the American Statistical Association 2061: 2032: 1865: 1836: 1679: 1650: 1640: 1601: 1321: 1292: 1282: 1243: 8: 2092: 2085: 1889: 1882: 919: 906: 887: 880: 599: 592: 476: 469: 250: 180: 624:on the hypothesis space of functions, and 91:techniques for avoiding overfitting, like 2558:The Nature of Statistical Learning Theory 2331: 2283: 2206: 2189:Cortes, Corinna; Vladimir Vapnik (1995). 2142: 2131: 2102: 2096: 2095: 2075: 2066: 2060: 2059: 2031: 2030: 2024: 2013: 1992: 1991: 1979: 1971: 1936: 1931: 1899: 1893: 1892: 1870: 1864: 1863: 1835: 1834: 1828: 1817: 1803: 1789: 1788: 1776: 1768: 1723: 1711: 1684: 1678: 1677: 1649: 1648: 1639: 1638: 1629: 1610: 1600: 1599: 1594: 1440: 1422: 1416: 1359: 1347: 1326: 1320: 1319: 1291: 1290: 1281: 1280: 1271: 1252: 1242: 1241: 1236: 1208: 1192: 1183: 1158: 1142: 1130: 1109: 1093: 1081: 1040: 1034: 1018: 1005: 989: 979: 969: 958: 948: 937: 923: 922: 897: 891: 890: 875: 863: 858: 851: 841: 830: 814: 802: 775: 773: 739: 720: 719: 711: 703: 695: 667: 666: 664: 638: 637: 629: 613:{\displaystyle \|\cdot \|_{\mathcal {H}}} 603: 602: 590: 571: 570: 562: 554: 546: 521: 520: 518: 486: 480: 479: 451: 432: 416: 405: 391: 377: 376: 364: 356: 322: 301: 295: 274: 268: 241: 228: 203: 190: 172: 152: 144: 136: 645:{\displaystyle \lambda \in \mathbb {R} } 2181: 1227:. However, this loss function is not 2508:Advances in Computational Mathematics 1760:, the regularization problem becomes 7: 1065:Special properties of the hinge loss 71:. SVM was first proposed in 1995 by 2446:Data Mining and Knowledge Discovery 1167:{\displaystyle f(x_{i})\neq y_{i}} 14: 1753:{\displaystyle (s)_{+}=\max(s,0)} 1389:{\displaystyle (s)_{+}=\max(s,0)} 2165:{\displaystyle C=1/(2\lambda n)} 1069: 1041: 859: 776: 712: 704: 685:reproducing kernel Hilbert space 563: 555: 153: 145: 2478:For a detailed derivation, see 112:loss function, the hinge loss. 2159: 2147: 2055: 2049: 1950: 1941: 1859: 1853: 1747: 1735: 1720: 1713: 1673: 1667: 1635: 1622: 1555: 1540: 1531: 1519: 1495: 1480: 1471: 1459: 1434: 1428: 1383: 1371: 1356: 1349: 1315: 1309: 1277: 1264: 1220:{\displaystyle -y_{i}f(x_{i})} 1214: 1201: 1148: 1135: 1118:{\displaystyle f(x_{i})=y_{i}} 1099: 1086: 1024: 998: 820: 807: 716: 676:{\displaystyle {\mathcal {H}}} 567: 530:{\displaystyle {\mathcal {H}}} 460: 457: 444: 425: 247: 221: 209: 183: 149: 24:provide a way of interpreting 1: 2561:. New York: Springer-Verlag. 1956:{\displaystyle 1/(2\lambda )} 128:is a strategy for choosing a 783:{\displaystyle \mathbf {K} } 122:statistical learning theory 2607: 2342:10.1162/089976604773135104 2294:10.1198/016214504000000098 734:that can be written as an 2555:Vapnik, Vladimir (1999). 2400:10.1080/03610929008830285 2191:"Support-Vector Networks" 753:{\displaystyle n\times n} 2424:10.1007/3-540-44581-1_27 654:regularization parameter 317:(the labels are usually 2586:Support vector machines 2520:10.1023/A:1018946025316 2458:10.1023/A:1015469627679 1176:Heaviside step function 49:Tikhonov regularization 38:Tikhonov regularization 26:support-vector machines 2166: 2117: 2029: 1957: 1917: 1833: 1754: 1697: 1572: 1390: 1336: 1221: 1168: 1119: 1055: 974: 953: 846: 784: 754: 728: 677: 646: 614: 585:is the loss function, 579: 531: 504: 421: 334: 311: 284: 257: 161: 116:Theoretical background 2591:Mathematical analysis 2480:Rifkin, Ryan (2002). 2437:Lin, Yi (July 2002). 2240:Rifkin, Ryan (2002). 2167: 2118: 2009: 1958: 1918: 1813: 1755: 1698: 1573: 1391: 1337: 1222: 1169: 1120: 1056: 954: 933: 826: 785: 755: 729: 678: 647: 615: 580: 532: 505: 401: 335: 333:{\displaystyle \pm 1} 312: 310:{\displaystyle y_{i}} 285: 283:{\displaystyle x_{i}} 258: 167:given a training set 162: 18:mathematical analysis 2535:Joachims, Thorsten. 2176:Notes and references 2130: 1970: 1930: 1767: 1710: 1593: 1415: 1346: 1235: 1182: 1129: 1080: 801: 772: 738: 694: 663: 628: 589: 545: 517: 355: 321: 294: 267: 171: 135: 2489:. MIT (PhD thesis). 2249:. MIT (PhD thesis). 2107: 1904: 1408:-optimal solution: 902: 792:representer theorem 491: 2320:Neural Computation 2222:Rosasco, Lorenzo. 2208:10.1007/BF00994018 2162: 2113: 2091: 1999: 1953: 1913: 1888: 1796: 1750: 1693: 1568: 1563: 1396:, provides such a 1386: 1332: 1217: 1164: 1115: 1051: 886: 780: 750: 724: 673: 642: 610: 575: 527: 500: 475: 384: 330: 307: 280: 253: 157: 105:Bayesian inference 83:that can separate 42:Mean squared error 2568:978-0-387-98780-4 2083: 1980: 1811: 1777: 1398:convex relaxation 878: 764:positive-definite 687:, there exists a 399: 365: 344:strategies avoid 290:and their labels 53:Decision boundary 2598: 2572: 2551: 2549: 2548: 2539:. Archived from 2531: 2505: 2491: 2490: 2488: 2476: 2470: 2469: 2443: 2434: 2428: 2427: 2410: 2404: 2403: 2394:(5): 1685–1700. 2382: 2376: 2368: 2362: 2361: 2335: 2326:(5): 1063–1076. 2315: 2306: 2305: 2287: 2260: 2251: 2250: 2248: 2237: 2231: 2230: 2228: 2219: 2213: 2212: 2210: 2195:Machine Learning 2186: 2171: 2169: 2168: 2163: 2146: 2122: 2120: 2119: 2114: 2112: 2108: 2106: 2101: 2100: 2084: 2076: 2071: 2070: 2065: 2064: 2036: 2035: 2028: 2023: 2000: 1998: 1997: 1996: 1962: 1960: 1959: 1954: 1940: 1922: 1920: 1919: 1914: 1909: 1905: 1903: 1898: 1897: 1875: 1874: 1869: 1868: 1840: 1839: 1832: 1827: 1812: 1804: 1797: 1795: 1794: 1793: 1759: 1757: 1756: 1751: 1728: 1727: 1702: 1700: 1699: 1694: 1689: 1688: 1683: 1682: 1654: 1653: 1644: 1643: 1634: 1633: 1615: 1614: 1605: 1604: 1577: 1575: 1574: 1569: 1567: 1566: 1427: 1426: 1395: 1393: 1392: 1387: 1364: 1363: 1341: 1339: 1338: 1333: 1331: 1330: 1325: 1324: 1296: 1295: 1286: 1285: 1276: 1275: 1257: 1256: 1247: 1246: 1226: 1224: 1223: 1218: 1213: 1212: 1197: 1196: 1173: 1171: 1170: 1165: 1163: 1162: 1147: 1146: 1124: 1122: 1121: 1116: 1114: 1113: 1098: 1097: 1073: 1060: 1058: 1057: 1052: 1044: 1039: 1038: 1023: 1022: 1010: 1009: 994: 993: 984: 983: 973: 968: 952: 947: 929: 928: 927: 901: 896: 895: 879: 876: 871: 870: 862: 856: 855: 845: 840: 819: 818: 789: 787: 786: 781: 779: 759: 757: 756: 751: 733: 731: 730: 725: 723: 715: 707: 682: 680: 679: 674: 672: 671: 651: 649: 648: 643: 641: 619: 617: 616: 611: 609: 608: 607: 584: 582: 581: 576: 574: 566: 558: 539:hypothesis space 536: 534: 533: 528: 526: 525: 509: 507: 506: 501: 496: 492: 490: 485: 484: 456: 455: 437: 436: 420: 415: 400: 392: 385: 383: 382: 381: 339: 337: 336: 331: 316: 314: 313: 308: 306: 305: 289: 287: 286: 281: 279: 278: 262: 260: 259: 254: 246: 245: 233: 232: 208: 207: 195: 194: 166: 164: 163: 158: 156: 148: 89:machine-learning 85:multidimensional 2606: 2605: 2601: 2600: 2599: 2597: 2596: 2595: 2576: 2575: 2569: 2554: 2546: 2544: 2534: 2503: 2498: 2495: 2494: 2486: 2479: 2477: 2473: 2441: 2436: 2435: 2431: 2412: 2411: 2407: 2385: 2383: 2379: 2369: 2365: 2333:10.1.1.109.6786 2317: 2316: 2309: 2262: 2261: 2254: 2246: 2239: 2238: 2234: 2226: 2221: 2220: 2216: 2188: 2187: 2183: 2178: 2128: 2127: 2058: 2005: 2001: 1984: 1968: 1967: 1928: 1927: 1926:Multiplying by 1862: 1802: 1798: 1781: 1765: 1764: 1719: 1708: 1707: 1676: 1625: 1606: 1591: 1590: 1584: 1562: 1561: 1514: 1502: 1501: 1454: 1441: 1418: 1413: 1412: 1355: 1344: 1343: 1318: 1267: 1248: 1233: 1232: 1204: 1188: 1180: 1179: 1154: 1138: 1127: 1126: 1105: 1089: 1078: 1077: 1067: 1030: 1014: 1001: 985: 975: 918: 877: and 857: 847: 810: 799: 798: 770: 769: 736: 735: 692: 691: 689:kernel function 661: 660: 626: 625: 598: 587: 586: 543: 542: 515: 514: 447: 428: 390: 386: 369: 353: 352: 319: 318: 297: 292: 291: 270: 265: 264: 237: 224: 199: 186: 169: 168: 133: 132: 118: 77:Vladimir Vapnik 12: 11: 5: 2604: 2602: 2594: 2593: 2588: 2578: 2577: 2574: 2573: 2567: 2552: 2532: 2493: 2492: 2471: 2452:(3): 259–275. 2429: 2405: 2377: 2363: 2307: 2285:10.1.1.22.1879 2278:(465): 67–81. 2264:Lee, Yoonkyung 2252: 2232: 2214: 2201:(3): 273–297. 2180: 2179: 2177: 2174: 2161: 2158: 2155: 2152: 2149: 2145: 2141: 2138: 2135: 2124: 2123: 2111: 2105: 2099: 2094: 2090: 2087: 2082: 2079: 2074: 2069: 2063: 2057: 2054: 2051: 2048: 2045: 2042: 2039: 2034: 2027: 2022: 2019: 2016: 2012: 2008: 2004: 1995: 1990: 1987: 1983: 1978: 1975: 1952: 1949: 1946: 1943: 1939: 1935: 1924: 1923: 1912: 1908: 1902: 1896: 1891: 1887: 1884: 1881: 1878: 1873: 1867: 1861: 1858: 1855: 1852: 1849: 1846: 1843: 1838: 1831: 1826: 1823: 1820: 1816: 1810: 1807: 1801: 1792: 1787: 1784: 1780: 1775: 1772: 1749: 1746: 1743: 1740: 1737: 1734: 1731: 1726: 1722: 1718: 1715: 1704: 1703: 1692: 1687: 1681: 1675: 1672: 1669: 1666: 1663: 1660: 1657: 1652: 1647: 1642: 1637: 1632: 1628: 1624: 1621: 1618: 1613: 1609: 1603: 1598: 1583: 1580: 1579: 1578: 1565: 1560: 1557: 1554: 1551: 1548: 1545: 1542: 1539: 1536: 1533: 1530: 1527: 1524: 1521: 1518: 1515: 1513: 1510: 1507: 1504: 1503: 1500: 1497: 1494: 1491: 1488: 1485: 1482: 1479: 1476: 1473: 1470: 1467: 1464: 1461: 1458: 1455: 1453: 1450: 1447: 1446: 1444: 1439: 1436: 1433: 1430: 1425: 1421: 1385: 1382: 1379: 1376: 1373: 1370: 1367: 1362: 1358: 1354: 1351: 1329: 1323: 1317: 1314: 1311: 1308: 1305: 1302: 1299: 1294: 1289: 1284: 1279: 1274: 1270: 1266: 1263: 1260: 1255: 1251: 1245: 1240: 1216: 1211: 1207: 1203: 1200: 1195: 1191: 1187: 1161: 1157: 1153: 1150: 1145: 1141: 1137: 1134: 1112: 1108: 1104: 1101: 1096: 1092: 1088: 1085: 1066: 1063: 1062: 1061: 1050: 1047: 1043: 1037: 1033: 1029: 1026: 1021: 1017: 1013: 1008: 1004: 1000: 997: 992: 988: 982: 978: 972: 967: 964: 961: 957: 951: 946: 943: 940: 936: 932: 926: 921: 917: 914: 911: 908: 905: 900: 894: 889: 885: 882: 874: 869: 866: 861: 854: 850: 844: 839: 836: 833: 829: 825: 822: 817: 813: 809: 806: 778: 749: 746: 743: 722: 718: 714: 710: 706: 702: 699: 670: 640: 636: 633: 606: 601: 597: 594: 573: 569: 565: 561: 557: 553: 550: 541:of functions, 524: 511: 510: 499: 495: 489: 483: 478: 474: 471: 468: 465: 462: 459: 454: 450: 446: 443: 440: 435: 431: 427: 424: 419: 414: 411: 408: 404: 398: 395: 389: 380: 375: 372: 368: 363: 360: 342:Regularization 329: 326: 304: 300: 277: 273: 252: 249: 244: 240: 236: 231: 227: 223: 220: 217: 214: 211: 206: 202: 198: 193: 189: 185: 182: 179: 176: 155: 151: 147: 143: 140: 124:framework, an 117: 114: 97:early stopping 93:regularization 73:Corinna Cortes 47:Specifically, 13: 10: 9: 6: 4: 3: 2: 2603: 2592: 2589: 2587: 2584: 2583: 2581: 2570: 2564: 2560: 2559: 2553: 2543:on 2015-04-19 2542: 2538: 2533: 2529: 2525: 2521: 2517: 2513: 2509: 2502: 2497: 2496: 2485: 2484: 2475: 2472: 2467: 2463: 2459: 2455: 2451: 2447: 2440: 2433: 2430: 2425: 2421: 2417: 2409: 2406: 2401: 2397: 2393: 2389: 2381: 2378: 2374: 2373:Hilbert space 2367: 2364: 2359: 2355: 2351: 2347: 2343: 2339: 2334: 2329: 2325: 2321: 2314: 2312: 2308: 2303: 2299: 2295: 2291: 2286: 2281: 2277: 2273: 2269: 2265: 2259: 2257: 2253: 2245: 2244: 2236: 2233: 2225: 2218: 2215: 2209: 2204: 2200: 2196: 2192: 2185: 2182: 2175: 2173: 2156: 2153: 2150: 2143: 2139: 2136: 2133: 2109: 2103: 2088: 2080: 2077: 2072: 2067: 2052: 2046: 2043: 2040: 2037: 2025: 2020: 2017: 2014: 2010: 2006: 2002: 1988: 1985: 1981: 1976: 1973: 1966: 1965: 1964: 1947: 1944: 1937: 1933: 1910: 1906: 1900: 1885: 1879: 1876: 1871: 1856: 1850: 1847: 1844: 1841: 1829: 1824: 1821: 1818: 1814: 1808: 1805: 1799: 1785: 1782: 1778: 1773: 1770: 1763: 1762: 1761: 1744: 1741: 1738: 1729: 1724: 1716: 1690: 1685: 1670: 1664: 1661: 1658: 1655: 1645: 1630: 1626: 1619: 1616: 1611: 1607: 1596: 1589: 1588: 1587: 1581: 1558: 1552: 1549: 1546: 1543: 1537: 1534: 1528: 1525: 1522: 1516: 1511: 1508: 1505: 1498: 1492: 1489: 1486: 1483: 1477: 1474: 1468: 1465: 1462: 1456: 1451: 1448: 1442: 1437: 1431: 1423: 1419: 1411: 1410: 1409: 1407: 1403: 1399: 1380: 1377: 1374: 1365: 1360: 1352: 1327: 1312: 1306: 1303: 1300: 1297: 1287: 1272: 1268: 1261: 1258: 1253: 1249: 1238: 1230: 1209: 1205: 1198: 1193: 1189: 1185: 1177: 1159: 1155: 1151: 1143: 1139: 1132: 1110: 1106: 1102: 1094: 1090: 1083: 1074: 1072: 1064: 1048: 1045: 1035: 1031: 1027: 1019: 1015: 1011: 1006: 1002: 995: 990: 986: 980: 976: 970: 965: 962: 959: 955: 949: 944: 941: 938: 934: 930: 915: 912: 909: 903: 898: 883: 872: 867: 864: 852: 848: 842: 837: 834: 831: 827: 823: 815: 811: 804: 797: 796: 795: 793: 768: 765: 762: 747: 744: 741: 708: 700: 697: 690: 686: 657: 655: 634: 631: 623: 595: 559: 551: 548: 540: 497: 493: 487: 472: 466: 463: 452: 448: 441: 438: 433: 429: 422: 417: 412: 409: 406: 402: 396: 393: 387: 373: 370: 366: 361: 358: 351: 350: 349: 347: 343: 327: 324: 302: 298: 275: 271: 242: 238: 234: 229: 225: 218: 215: 212: 204: 200: 196: 191: 187: 177: 174: 141: 138: 131: 127: 123: 115: 113: 110: 106: 102: 98: 94: 90: 86: 82: 78: 74: 70: 66: 62: 57: 54: 50: 45: 43: 39: 35: 31: 27: 23: 19: 2557: 2545:. Retrieved 2541:the original 2511: 2507: 2482: 2474: 2449: 2445: 2432: 2414: 2408: 2391: 2387: 2380: 2366: 2323: 2319: 2275: 2271: 2268:Wahba, Grace 2242: 2235: 2217: 2198: 2194: 2184: 2125: 1925: 1705: 1585: 1075: 1068: 658: 512: 119: 109:special case 58: 46: 30:training set 21: 15: 2514:(1): 1–50. 2416:Proceedings 1402:upper bound 1174:, i.e. the 346:overfitting 81:hyperplanes 69:overfitting 34:overfitting 2580:Categories 2547:2012-05-18 2537:"SVMlight" 1582:Derivation 790:. By the 263:of inputs 65:generalize 61:hinge loss 2328:CiteSeerX 2302:261035640 2280:CiteSeerX 2154:λ 2093:‖ 2086:‖ 2041:− 2011:∑ 1989:∈ 1948:λ 1890:‖ 1883:‖ 1880:λ 1845:− 1815:∑ 1786:∈ 1659:− 1550:∣ 1544:− 1526:∣ 1506:− 1490:∣ 1484:− 1466:∣ 1301:− 1186:− 1152:≠ 1125:and 1 if 956:∑ 935:∑ 920:⟩ 907:⟨ 888:‖ 881:‖ 828:∑ 761:symmetric 745:× 717:→ 709:× 701:: 635:∈ 632:λ 600:‖ 596:⋅ 593:‖ 568:→ 560:× 552:: 477:‖ 470:‖ 467:λ 403:∑ 374:∈ 325:± 216:… 150:→ 142:: 126:algorithm 2466:24759201 2358:11845688 2350:15070510 1342:, where 130:function 101:sparsity 67:without 1963:yields 652:is the 120:In the 16:Within 2565: 2526: 2464: 2356: 2348: 2330: 2300: 2282: 1982:argmin 1779:argmin 1706:where 1229:convex 767:matrix 513:where 367:argmin 2528:70866 2524:S2CID 2504:(PDF) 2487:(PDF) 2462:S2CID 2442:(PDF) 2354:S2CID 2322:. 5. 2298:S2CID 2247:(PDF) 2227:(PDF) 2126:with 1406:Bayes 683:is a 659:When 620:is a 537:is a 2563:ISBN 2346:PMID 1535:< 1475:> 622:norm 340:). 103:and 75:and 36:via 2516:doi 2454:doi 2420:doi 2396:doi 2338:doi 2290:doi 2203:doi 1733:max 1369:max 1178:on 2582:: 2522:. 2512:13 2510:. 2506:. 2460:. 2448:. 2444:. 2392:19 2390:. 2352:. 2344:. 2336:. 2324:16 2310:^ 2296:. 2288:. 2276:99 2274:. 2266:; 2255:^ 2199:20 2197:. 2193:. 794:, 656:. 99:, 95:, 20:, 2571:. 2550:. 2530:. 2518:: 2468:. 2456:: 2450:6 2426:. 2422:: 2402:. 2398:: 2360:. 2340:: 2304:. 2292:: 2229:. 2211:. 2205:: 2160:) 2157:n 2151:2 2148:( 2144:/ 2140:1 2137:= 2134:C 2110:} 2104:2 2098:H 2089:f 2081:2 2078:1 2073:+ 2068:+ 2062:) 2056:) 2053:x 2050:( 2047:f 2044:y 2038:1 2033:( 2026:n 2021:1 2018:= 2015:i 2007:C 2003:{ 1994:H 1986:f 1977:= 1974:f 1951:) 1945:2 1942:( 1938:/ 1934:1 1911:. 1907:} 1901:2 1895:H 1886:f 1877:+ 1872:+ 1866:) 1860:) 1857:x 1854:( 1851:f 1848:y 1842:1 1837:( 1830:n 1825:1 1822:= 1819:i 1809:n 1806:1 1800:{ 1791:H 1783:f 1774:= 1771:f 1748:) 1745:0 1742:, 1739:s 1736:( 1730:= 1725:+ 1721:) 1717:s 1714:( 1691:, 1686:+ 1680:) 1674:) 1671:x 1668:( 1665:f 1662:y 1656:1 1651:( 1646:= 1641:) 1636:) 1631:i 1627:x 1623:( 1620:f 1617:, 1612:i 1608:y 1602:( 1597:V 1559:. 1556:) 1553:x 1547:1 1541:( 1538:p 1532:) 1529:x 1523:1 1520:( 1517:p 1512:, 1509:1 1499:, 1496:) 1493:x 1487:1 1481:( 1478:p 1472:) 1469:x 1463:1 1460:( 1457:p 1452:, 1449:1 1443:{ 1438:= 1435:) 1432:x 1429:( 1424:b 1420:f 1384:) 1381:0 1378:, 1375:s 1372:( 1366:= 1361:+ 1357:) 1353:s 1350:( 1328:+ 1322:) 1316:) 1313:x 1310:( 1307:f 1304:y 1298:1 1293:( 1288:= 1283:) 1278:) 1273:i 1269:x 1265:( 1262:f 1259:, 1254:i 1250:y 1244:( 1239:V 1215:) 1210:i 1206:x 1202:( 1199:f 1194:i 1190:y 1160:i 1156:y 1149:) 1144:i 1140:x 1136:( 1133:f 1111:i 1107:y 1103:= 1100:) 1095:i 1091:x 1087:( 1084:f 1049:. 1046:c 1042:K 1036:T 1032:c 1028:= 1025:) 1020:j 1016:x 1012:, 1007:i 1003:x 999:( 996:K 991:j 987:c 981:i 977:c 971:n 966:1 963:= 960:j 950:n 945:1 942:= 939:i 931:= 925:H 916:f 913:, 910:f 904:= 899:2 893:H 884:f 873:, 868:j 865:i 860:K 853:j 849:c 843:n 838:1 835:= 832:j 824:= 821:) 816:i 812:x 808:( 805:f 777:K 748:n 742:n 721:R 713:X 705:X 698:K 669:H 639:R 605:H 572:R 564:Y 556:Y 549:V 523:H 498:, 494:} 488:2 482:H 473:f 464:+ 461:) 458:) 453:i 449:x 445:( 442:f 439:, 434:i 430:y 426:( 423:V 418:n 413:1 410:= 407:i 397:n 394:1 388:{ 379:H 371:f 362:= 359:f 328:1 303:i 299:y 276:i 272:x 251:} 248:) 243:n 239:y 235:, 230:n 226:x 222:( 219:, 213:, 210:) 205:1 201:y 197:, 192:1 188:x 184:( 181:{ 178:= 175:S 154:Y 146:X 139:f

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Knowledge (XXG)

Regularization perspectives on support vector machines

Index