Stochastic gradient Langevin dynamics

115:. Unlike traditional SGD, SGLD can be used for Bayesian learning as a sampling method. SGLD may be viewed as Langevin dynamics applied to posterior distributions, but the key difference is that the likelihood gradient terms are minibatched, like in SGD. SGLD, like Langevin dynamics, produces samples from a posterior distribution of parameters based on available data. First described by Welling and Teh in 2011, the method has applications in many contexts which require optimization, and is most notably applied in machine learning problems. 74: 996:, a task in which the method provides a distribution over model parameters. By introducing information about the variance of these parameters, SGLD characterizes the generalizability of these models at certain points in training. Additionally, obtaining samples from a posterior distribution permits uncertainty quantification by means of confidence intervals, a feature which is not possible using traditional stochastic gradient descent. 1016:, consisting of a single leapfrog step proposal rather than a series of steps. Since SGLD can be formulated as a modification of both stochastic gradient descent and MCMC methods, the method lies at the intersection between optimization and sampling algorithms; the method maintains SGD's ability to quickly converge to regions of low cost while providing samples to facilitate posterior inference. 1268: 25: 2107: 688: 1903: 1056: 499: 1392:

Recent contributions have proven upper bounds on mixing times for both the traditional Langevin algorithm and the Metropolis adjusted Langevin algorithm. Released in Ma et al., 2018, these bounds define the rate at which the algorithms converge to the true posterior distribution, defined formally as:

976:

SGLD is applicable in any optimization context for which it is desirable to quickly obtain posterior samples instead of a maximum a posteriori mode. In doing so, the method maintains the computational efficiency of stochastic gradient descent when compared to traditional

1506: 958: 1911: 510: 1752: 1263:{\displaystyle {\frac {p(\mathbf {\theta } ^{t}\mid \mathbf {\theta } ^{t+1})p^{*}\left(\mathbf {\theta } ^{t}\right)}{p\left(\mathbf {\theta } ^{t+1}\mid \mathbf {\theta } ^{t}\right)p^{*}(\mathbf {\theta } ^{t+1})}}<u,\ u\sim {\mathcal {U}}} 331: 963:

For early iterations of the algorithm, each parameter update mimics Stochastic Gradient Descent; however, as the algorithm approaches a local minimum or maximum, the gradient shrinks to zero and the chain produces samples surrounding the

323: 2181:

refer to the mixing rates of the Unadjusted Langevin Algorithm and the Metropolis Adjusted Langevin Algorithm respectively. These bounds are important because they show computational complexity is polynomial in dimension

776: 1399: 1326: 968:

allowing for posterior inference. This process generates approximate samples from the posterior as by balancing variance from the injected Gaussian noise and stochastic gradient computation.

866: 2102:{\displaystyle \tau _{MALA}(\varepsilon ,p^{0})\leq {\mathcal {O}}\left(e^{16LR^{2}}\kappa ^{3/2}d^{1/2}\left(d\ln \kappa +\ln \left({\frac {1}{\varepsilon }}\right)\right)^{3/2}\right)} 683:{\displaystyle \Delta \theta _{t}={\frac {\varepsilon _{t}}{2}}\left(\nabla \log p(\theta _{t})+{\frac {N}{n}}\sum _{i=1}^{n}\nabla \log p(x_{t_{i}}\mid \theta _{t})\right)+\eta _{t}} 108:

models. Like stochastic gradient descent, SGLD is an iterative optimization algorithm which uses minibatching to create a stochastic gradient estimator, as used in SGD to optimize a

2269: 1744: 1547: 227: 1898:{\displaystyle \tau _{ULA}(\varepsilon ,p^{0})\leq {\mathcal {O}}\left(e^{32LR^{2}}\kappa ^{2}{\frac {d}{\varepsilon ^{2}}}\ln \left({\frac {d}{\varepsilon ^{2}}}\right)\right)} 1044: 858: 811: 2179: 2143: 1353: 1651: 1382: 494:{\displaystyle \Delta \theta _{t}={\frac {\varepsilon _{t}}{2}}\left(\nabla \log p(\theta _{t})+\sum _{i=1}^{N}\nabla \log p(x_{t_{i}}\mid \theta _{t})\right)+\eta _{t}} 170: 1050:

rejection rate is zero, and thus a MH rejection step becomes necessary. The resulting algorithm, dubbed the Metropolis Adjusted Langevin algorithm, requires the step:

831: 141: 2230: 717: 1601: 1574: 1684: 2200: 1708: 2482: 232: 2383: 722: 1047: 2472: 2358: 2487: 1501:{\displaystyle \tau (\varepsilon ;p^{0})=\min \left\{k\mid \left\|p^{k}-p^{*}\right\|_{\mathrm {V} }\leq \varepsilon \right\}} 982: 953:{\displaystyle \sum _{t=1}^{\infty }\varepsilon _{t}=\infty \quad \sum _{t=1}^{\infty }\varepsilon _{t}^{2}<\infty } 97: 1276: 93: 2477: 989: 89: 77:

SGLD can be applied to the optimization of non-convex objective functions, shown here to be a sum of Gaussians.

2235: 1716: 1013: 109: 54: 1514: 175: 504:

Stochastic gradient Langevin dynamics uses a modified update procedure with minibatched likelihood terms:

2328:; Sagun, Levent; Zecchina, Riccardo (2017). "Entropy-sgd: Biasing gradient descent into wide valleys". 1022: 836: 781: 2325: 1009: 2148: 2419: 2329: 2115: 1331: 986: 965: 112: 105: 1606: 1358: 1046:

such that they do not approach zero asymptotically, SGLD fails to produce samples for which the

146: 73: 2449: 2403: 2379: 2354: 101: 816: 126: 2439: 2429: 2205: 1711: 978: 696: 35: 1579: 1552: 2321: 1660: 1654: 2444: 2407: 2313: 2185: 1693: 318:{\displaystyle p(\theta \mid X)\propto p(\theta )\prod _{i=1}^{N}p(x_{i}\mid \theta )} 2466: 993: 1657:

norm. Under some regularity conditions of an L-Lipschitz smooth objective function

2291: 985:

of the objective function. In practice, SGLD can be applied to the training of

2317: 1687: 2434: 44: 2453: 24: 981:

while providing additional information regarding the landscape around the

2349:

Kennedy, A. D. (1990). "The theory of hybrid stochastic algorithms".

2299:

Proceedings of the 28th International Conference on Machine Learning

2424: 2334: 72: 2351:

Probabilistic Methods in Quantum Field Theory and Quantum Gravity

1328:

is a normal distribution centered one gradient descent step from

771:{\displaystyle \eta _{t}\sim {\mathcal {N}}(0,\varepsilon _{t})} 18: 2292:"Bayesian Learning via Stochastic Gradient Langevin Dynamics" 1004:

If gradient computations are exact, SGLD reduces down to the

2241: 1961: 1799: 1240: 741: 229:, Langevin dynamics samples from the posterior distribution 813:

is the likelihood of the data given the parameter vector

92:

and sampling technique composed of characteristics from

49: 39: 2238: 2208: 2188: 2151: 2118: 1914: 1755: 1719: 1696: 1663: 1609: 1582: 1555: 1517: 1402: 1361: 1334: 1279: 1059: 1025: 869: 839: 819: 784: 725: 699: 513: 334: 235: 178: 149: 129: 2374:

Neal, R. (2011). "MCMC Using Hamiltonian Dynamics".

2263: 2224: 2194: 2173: 2137: 2101: 1897: 1738: 1702: 1678: 1645: 1595: 1568: 1541: 1500: 1376: 1347: 1320: 1262: 1038: 1019:Considering relaxed constraints on the step sizes 952: 852: 825: 805: 770: 711: 682: 493: 317: 221: 164: 135: 1431: 1321:{\displaystyle p(\theta ^{t}\mid \theta ^{t+1})} 2412:Proceedings of the National Academy of Sciences 2402:Ma, Y. A.; Chen, Y.; Jin, C.; Flammarion, N.; 1008:algorithm, first coined in the literature of 8: 199: 185: 2408:"Sampling can be faster than optimization" 2443: 2433: 2423: 2333: 2240: 2239: 2237: 2216: 2207: 2187: 2156: 2150: 2123: 2117: 2084: 2080: 2061: 2020: 2016: 2002: 1998: 1986: 1975: 1960: 1959: 1947: 1919: 1913: 1878: 1869: 1851: 1842: 1836: 1824: 1813: 1798: 1797: 1785: 1760: 1754: 1726: 1718: 1695: 1662: 1634: 1629: 1623: 1615: 1610: 1608: 1587: 1581: 1560: 1554: 1516: 1480: 1479: 1468: 1455: 1419: 1401: 1360: 1339: 1333: 1303: 1290: 1278: 1239: 1238: 1202: 1197: 1187: 1172: 1167: 1151: 1146: 1125: 1120: 1109: 1090: 1085: 1075: 1070: 1060: 1058: 1030: 1024: 938: 933: 923: 912: 895: 885: 874: 868: 844: 838: 818: 783: 759: 740: 739: 730: 724: 698: 674: 653: 638: 633: 608: 597: 583: 571: 536: 530: 521: 512: 485: 464: 449: 444: 419: 408: 392: 357: 351: 342: 333: 300: 284: 273: 234: 213: 202: 192: 177: 148: 128: 1388:Mixing rates and algorithmic convergence 1012:. This algorithm is also a reduction of 2279: 2312:Chaudhari, Pratik; Choromanska, Anna; 2264:{\displaystyle {\mathcal {O}}(\log d)} 1739:{\displaystyle \kappa ={\frac {L}{m}}} 1542:{\displaystyle \varepsilon \in (0,1)} 222:{\displaystyle X=\{x_{i}\}_{i=1}^{N}} 82:Stochastic gradient Langevin dynamics 7: 2397: 2395: 2376:Handbook of Markov Chain Monte Carlo 2290:Welling, Max; Teh, Yee Whye (2011). 2285: 2283: 2483:Optimization algorithms and methods 1603:is the posterior distribution, and 16:Optimization and sampling technique 2353:. Plenum Press. pp. 209–223. 1481: 1000:Variants and associated algorithms 947: 924: 904: 886: 860:satisfy the following conditions: 614: 552: 514: 425: 373: 335: 14: 1549:is an arbitrary error tolerance, 1039:{\displaystyle \varepsilon _{t}} 853:{\displaystyle \varepsilon _{t}} 806:{\displaystyle p(x\mid \theta )} 23: 907: 2258: 2246: 1953: 1934: 1791: 1772: 1746:, we have mixing rate bounds: 1690:outside of a region of radius 1673: 1667: 1630: 1624: 1616: 1611: 1576:is some initial distribution, 1536: 1524: 1475: 1447: 1425: 1406: 1371: 1365: 1315: 1283: 1257: 1245: 1214: 1193: 1102: 1066: 800: 788: 765: 746: 659: 626: 577: 564: 470: 437: 398: 385: 312: 293: 266: 260: 251: 239: 159: 153: 104:, a mathematical extension of 1: 2174:{\displaystyle \tau _{MALA}} 1384:is our target distribution. 123:Given some parameter vector 100:optimization algorithm, and 2138:{\displaystyle \tau _{ULA}} 1348:{\displaystyle \theta ^{t}} 172:, and a set of data points 94:Stochastic gradient descent 2504: 1646:{\displaystyle ||*||_{TV}} 1377:{\displaystyle p(\theta )} 165:{\displaystyle p(\theta )} 966:maximum a posteriori mode 143:, its prior distribution 2473:Computational statistics 2488:Stochastic optimization 2435:10.1073/pnas.1820003116 1014:Hamiltonian Monte Carlo 826:{\displaystyle \theta } 719:is a positive integer, 325:by updating the chain: 136:{\displaystyle \theta } 38:, as no other articles 2265: 2226: 2225:{\displaystyle LR^{2}} 2196: 2175: 2139: 2103: 1899: 1740: 1704: 1680: 1647: 1597: 1570: 1543: 1502: 1378: 1349: 1322: 1264: 1040: 954: 928: 890: 854: 827: 807: 772: 713: 712:{\displaystyle n<N} 684: 613: 495: 424: 319: 289: 223: 166: 137: 78: 2266: 2227: 2197: 2176: 2140: 2104: 1900: 1741: 1705: 1681: 1648: 1598: 1596:{\displaystyle p^{*}} 1571: 1569:{\displaystyle p^{0}} 1544: 1503: 1379: 1350: 1323: 1265: 1041: 955: 908: 870: 855: 833:, and our step sizes 828: 808: 773: 714: 685: 593: 496: 404: 320: 269: 224: 167: 138: 76: 2236: 2206: 2186: 2149: 2116: 1912: 1753: 1717: 1694: 1686:which is m-strongly 1679:{\displaystyle U(x)} 1661: 1607: 1580: 1553: 1515: 1400: 1359: 1332: 1277: 1057: 1023: 1010:lattice field theory 1006:Langevin Monte Carlo 867: 837: 817: 782: 723: 697: 511: 332: 233: 176: 147: 127: 2418:(42): 20881–20885. 2320:; Baldassi, Carlo; 1048:Metropolis Hastings 943: 778:is Gaussian noise, 218: 2261: 2222: 2192: 2171: 2135: 2099: 1895: 1736: 1700: 1676: 1643: 1593: 1566: 1539: 1498: 1374: 1345: 1318: 1260: 1036: 950: 929: 850: 823: 803: 768: 709: 680: 491: 315: 219: 198: 162: 133: 113:objective function 106:molecular dynamics 79: 57:for suggestions. 47:to this page from 2385:978-1-4200-7941-8 2195:{\displaystyle d} 2069: 1884: 1857: 1734: 1703:{\displaystyle R} 1231: 1218: 591: 545: 366: 119:Formal definition 102:Langevin dynamics 71: 70: 2495: 2478:Gradient methods 2458: 2457: 2447: 2437: 2427: 2399: 2390: 2389: 2371: 2365: 2364: 2346: 2340: 2339: 2337: 2326:Chayes, Jennifer 2322:Borgs, Christian 2309: 2303: 2302: 2296: 2287: 2270: 2268: 2267: 2262: 2245: 2244: 2231: 2229: 2228: 2223: 2221: 2220: 2201: 2199: 2198: 2193: 2180: 2178: 2177: 2172: 2170: 2169: 2144: 2142: 2141: 2136: 2134: 2133: 2108: 2106: 2105: 2100: 2098: 2094: 2093: 2092: 2088: 2079: 2075: 2074: 2070: 2062: 2029: 2028: 2024: 2011: 2010: 2006: 1993: 1992: 1991: 1990: 1965: 1964: 1952: 1951: 1933: 1932: 1904: 1902: 1901: 1896: 1894: 1890: 1889: 1885: 1883: 1882: 1870: 1858: 1856: 1855: 1843: 1841: 1840: 1831: 1830: 1829: 1828: 1803: 1802: 1790: 1789: 1771: 1770: 1745: 1743: 1742: 1737: 1735: 1727: 1712:condition number 1709: 1707: 1706: 1701: 1685: 1683: 1682: 1677: 1652: 1650: 1649: 1644: 1642: 1641: 1633: 1627: 1619: 1614: 1602: 1600: 1599: 1594: 1592: 1591: 1575: 1573: 1572: 1567: 1565: 1564: 1548: 1546: 1545: 1540: 1507: 1505: 1504: 1499: 1497: 1493: 1486: 1485: 1484: 1478: 1474: 1473: 1472: 1460: 1459: 1424: 1423: 1383: 1381: 1380: 1375: 1354: 1352: 1351: 1346: 1344: 1343: 1327: 1325: 1324: 1319: 1314: 1313: 1295: 1294: 1269: 1267: 1266: 1261: 1244: 1243: 1229: 1219: 1217: 1213: 1212: 1201: 1192: 1191: 1182: 1178: 1177: 1176: 1171: 1162: 1161: 1150: 1135: 1134: 1130: 1129: 1124: 1114: 1113: 1101: 1100: 1089: 1080: 1079: 1074: 1061: 1045: 1043: 1042: 1037: 1035: 1034: 979:gradient descent 959: 957: 956: 951: 942: 937: 927: 922: 900: 899: 889: 884: 859: 857: 856: 851: 849: 848: 832: 830: 829: 824: 812: 810: 809: 804: 777: 775: 774: 769: 764: 763: 745: 744: 735: 734: 718: 716: 715: 710: 689: 687: 686: 681: 679: 678: 666: 662: 658: 657: 645: 644: 643: 642: 612: 607: 592: 584: 576: 575: 546: 541: 540: 531: 526: 525: 500: 498: 497: 492: 490: 489: 477: 473: 469: 468: 456: 455: 454: 453: 423: 418: 397: 396: 367: 362: 361: 352: 347: 346: 324: 322: 321: 316: 305: 304: 288: 283: 228: 226: 225: 220: 217: 212: 197: 196: 171: 169: 168: 163: 142: 140: 139: 134: 66: 63: 52: 50:related articles 27: 19: 2503: 2502: 2498: 2497: 2496: 2494: 2493: 2492: 2463: 2462: 2461: 2401: 2400: 2393: 2386: 2373: 2372: 2368: 2361: 2348: 2347: 2343: 2314:Soatto, Stefano 2311: 2310: 2306: 2294: 2289: 2288: 2281: 2277: 2234: 2233: 2212: 2204: 2203: 2202:conditional on 2184: 2183: 2152: 2147: 2146: 2119: 2114: 2113: 2057: 2035: 2031: 2030: 2012: 1994: 1982: 1971: 1970: 1966: 1943: 1915: 1910: 1909: 1874: 1865: 1847: 1832: 1820: 1809: 1808: 1804: 1781: 1756: 1751: 1750: 1715: 1714: 1692: 1691: 1659: 1658: 1655:total variation 1628: 1605: 1604: 1583: 1578: 1577: 1556: 1551: 1550: 1513: 1512: 1464: 1451: 1450: 1446: 1445: 1438: 1434: 1415: 1398: 1397: 1390: 1357: 1356: 1335: 1330: 1329: 1299: 1286: 1275: 1274: 1196: 1183: 1166: 1145: 1144: 1140: 1136: 1119: 1115: 1105: 1084: 1069: 1062: 1055: 1054: 1026: 1021: 1020: 1002: 990:Neural Networks 974: 891: 865: 864: 840: 835: 834: 815: 814: 780: 779: 755: 726: 721: 720: 695: 694: 670: 649: 634: 629: 567: 551: 547: 532: 517: 509: 508: 481: 460: 445: 440: 388: 372: 368: 353: 338: 330: 329: 296: 231: 230: 188: 174: 173: 145: 144: 125: 124: 121: 67: 61: 58: 48: 45:introduce links 28: 17: 12: 11: 5: 2501: 2499: 2491: 2490: 2485: 2480: 2475: 2465: 2464: 2460: 2459: 2391: 2384: 2366: 2359: 2341: 2304: 2278: 2276: 2273: 2260: 2257: 2254: 2251: 2248: 2243: 2219: 2215: 2211: 2191: 2168: 2165: 2162: 2159: 2155: 2132: 2129: 2126: 2122: 2110: 2109: 2097: 2091: 2087: 2083: 2078: 2073: 2068: 2065: 2060: 2056: 2053: 2050: 2047: 2044: 2041: 2038: 2034: 2027: 2023: 2019: 2015: 2009: 2005: 2001: 1997: 1989: 1985: 1981: 1978: 1974: 1969: 1963: 1958: 1955: 1950: 1946: 1942: 1939: 1936: 1931: 1928: 1925: 1922: 1918: 1906: 1905: 1893: 1888: 1881: 1877: 1873: 1868: 1864: 1861: 1854: 1850: 1846: 1839: 1835: 1827: 1823: 1819: 1816: 1812: 1807: 1801: 1796: 1793: 1788: 1784: 1780: 1777: 1774: 1769: 1766: 1763: 1759: 1733: 1730: 1725: 1722: 1699: 1675: 1672: 1669: 1666: 1640: 1637: 1632: 1626: 1622: 1618: 1613: 1590: 1586: 1563: 1559: 1538: 1535: 1532: 1529: 1526: 1523: 1520: 1509: 1508: 1496: 1492: 1489: 1483: 1477: 1471: 1467: 1463: 1458: 1454: 1449: 1444: 1441: 1437: 1433: 1430: 1427: 1422: 1418: 1414: 1411: 1408: 1405: 1389: 1386: 1373: 1370: 1367: 1364: 1342: 1338: 1317: 1312: 1309: 1306: 1302: 1298: 1293: 1289: 1285: 1282: 1271: 1270: 1259: 1256: 1253: 1250: 1247: 1242: 1237: 1234: 1228: 1225: 1222: 1216: 1211: 1208: 1205: 1200: 1195: 1190: 1186: 1181: 1175: 1170: 1165: 1160: 1157: 1154: 1149: 1143: 1139: 1133: 1128: 1123: 1118: 1112: 1108: 1104: 1099: 1096: 1093: 1088: 1083: 1078: 1073: 1068: 1065: 1033: 1029: 1001: 998: 983:critical point 973: 970: 961: 960: 949: 946: 941: 936: 932: 926: 921: 918: 915: 911: 906: 903: 898: 894: 888: 883: 880: 877: 873: 847: 843: 822: 802: 799: 796: 793: 790: 787: 767: 762: 758: 754: 751: 748: 743: 738: 733: 729: 708: 705: 702: 691: 690: 677: 673: 669: 665: 661: 656: 652: 648: 641: 637: 632: 628: 625: 622: 619: 616: 611: 606: 603: 600: 596: 590: 587: 582: 579: 574: 570: 566: 563: 560: 557: 554: 550: 544: 539: 535: 529: 524: 520: 516: 502: 501: 488: 484: 480: 476: 472: 467: 463: 459: 452: 448: 443: 439: 436: 433: 430: 427: 422: 417: 414: 411: 407: 403: 400: 395: 391: 387: 384: 381: 378: 375: 371: 365: 360: 356: 350: 345: 341: 337: 314: 311: 308: 303: 299: 295: 292: 287: 282: 279: 276: 272: 268: 265: 262: 259: 256: 253: 250: 247: 244: 241: 238: 216: 211: 208: 205: 201: 195: 191: 187: 184: 181: 161: 158: 155: 152: 132: 120: 117: 110:differentiable 69: 68: 55:Find link tool 31: 29: 22: 15: 13: 10: 9: 6: 4: 3: 2: 2500: 2489: 2486: 2484: 2481: 2479: 2476: 2474: 2471: 2470: 2468: 2455: 2451: 2446: 2441: 2436: 2431: 2426: 2421: 2417: 2413: 2409: 2405: 2404:Jordan, M. I. 2398: 2396: 2392: 2387: 2381: 2378:. CRC Press. 2377: 2370: 2367: 2362: 2360:0-306-43602-7 2356: 2352: 2345: 2342: 2336: 2331: 2327: 2323: 2319: 2315: 2308: 2305: 2300: 2293: 2286: 2284: 2280: 2274: 2272: 2255: 2252: 2249: 2217: 2213: 2209: 2189: 2166: 2163: 2160: 2157: 2153: 2130: 2127: 2124: 2120: 2095: 2089: 2085: 2081: 2076: 2071: 2066: 2063: 2058: 2054: 2051: 2048: 2045: 2042: 2039: 2036: 2032: 2025: 2021: 2017: 2013: 2007: 2003: 1999: 1995: 1987: 1983: 1979: 1976: 1972: 1967: 1956: 1948: 1944: 1940: 1937: 1929: 1926: 1923: 1920: 1916: 1908: 1907: 1891: 1886: 1879: 1875: 1871: 1866: 1862: 1859: 1852: 1848: 1844: 1837: 1833: 1825: 1821: 1817: 1814: 1810: 1805: 1794: 1786: 1782: 1778: 1775: 1767: 1764: 1761: 1757: 1749: 1748: 1747: 1731: 1728: 1723: 1720: 1713: 1697: 1689: 1670: 1664: 1656: 1638: 1635: 1620: 1588: 1584: 1561: 1557: 1533: 1530: 1527: 1521: 1518: 1494: 1490: 1487: 1469: 1465: 1461: 1456: 1452: 1442: 1439: 1435: 1428: 1420: 1416: 1412: 1409: 1403: 1396: 1395: 1394: 1387: 1385: 1368: 1362: 1340: 1336: 1310: 1307: 1304: 1300: 1296: 1291: 1287: 1280: 1254: 1251: 1248: 1235: 1232: 1226: 1223: 1220: 1209: 1206: 1203: 1198: 1188: 1184: 1179: 1173: 1168: 1163: 1158: 1155: 1152: 1147: 1141: 1137: 1131: 1126: 1121: 1116: 1110: 1106: 1097: 1094: 1091: 1086: 1081: 1076: 1071: 1063: 1053: 1052: 1051: 1049: 1031: 1027: 1017: 1015: 1011: 1007: 999: 997: 995: 994:Deep Learning 991: 988: 984: 980: 971: 969: 967: 944: 939: 934: 930: 919: 916: 913: 909: 901: 896: 892: 881: 878: 875: 871: 863: 862: 861: 845: 841: 820: 797: 794: 791: 785: 760: 756: 752: 749: 736: 731: 727: 706: 703: 700: 675: 671: 667: 663: 654: 650: 646: 639: 635: 630: 623: 620: 617: 609: 604: 601: 598: 594: 588: 585: 580: 572: 568: 561: 558: 555: 548: 542: 537: 533: 527: 522: 518: 507: 506: 505: 486: 482: 478: 474: 465: 461: 457: 450: 446: 441: 434: 431: 428: 420: 415: 412: 409: 405: 401: 393: 389: 382: 379: 376: 369: 363: 358: 354: 348: 343: 339: 328: 327: 326: 309: 306: 301: 297: 290: 285: 280: 277: 274: 270: 263: 257: 254: 248: 245: 242: 236: 214: 209: 206: 203: 193: 189: 182: 179: 156: 150: 130: 118: 116: 114: 111: 107: 103: 99: 98:Robbins–Monro 95: 91: 87: 83: 75: 65: 56: 51: 46: 42: 41: 37: 32:This article 30: 26: 21: 20: 2415: 2411: 2375: 2369: 2350: 2344: 2307: 2298: 2111: 1510: 1391: 1272: 1018: 1005: 1003: 975: 962: 692: 503: 122: 90:optimization 85: 81: 80: 62:January 2019 59: 33: 2318:LeCun, Yann 972:Application 2467:Categories 2425:1811.08413 2335:1611.01838 2301:: 681–688. 2275:References 53:; try the 40:link to it 2253:⁡ 2154:τ 2121:τ 2067:ε 2055:⁡ 2046:κ 2043:⁡ 1996:κ 1957:≤ 1938:ε 1917:τ 1876:ε 1863:⁡ 1849:ε 1834:κ 1795:≤ 1776:ε 1758:τ 1721:κ 1621:∗ 1589:∗ 1522:∈ 1519:ε 1491:ε 1488:≤ 1470:∗ 1462:− 1443:∣ 1410:ε 1404:τ 1369:θ 1337:θ 1301:θ 1297:∣ 1288:θ 1236:∼ 1199:θ 1189:∗ 1169:θ 1164:∣ 1148:θ 1122:θ 1111:∗ 1087:θ 1082:∣ 1072:θ 1028:ε 948:∞ 931:ε 925:∞ 910:∑ 905:∞ 893:ε 887:∞ 872:∑ 842:ε 821:θ 798:θ 795:∣ 757:ε 737:∼ 728:η 672:η 651:θ 647:∣ 621:⁡ 615:∇ 595:∑ 569:θ 559:⁡ 553:∇ 534:ε 519:θ 515:Δ 483:η 462:θ 458:∣ 432:⁡ 426:∇ 406:∑ 390:θ 380:⁡ 374:∇ 355:ε 340:θ 336:Δ 310:θ 307:∣ 271:∏ 264:θ 255:∝ 246:∣ 243:θ 157:θ 131:θ 43:. Please 2454:31570618 2406:(2018). 1476:‖ 1448:‖ 987:Bayesian 88:) is an 2445:6800351 1653:is the 2452: 2442: 2382: 2357: 2232:being 2112:where 1688:convex 1511:where 1273:where 1230: 693:where 36:orphan 34:is an 2420:arXiv 2330:arXiv 2295:(PDF) 1710:with 2450:PMID 2380:ISBN 2355:ISBN 2145:and 1355:and 1221:< 945:< 704:< 96:, a 86:SGLD 2440:PMC 2430:doi 2416:116 2250:log 1432:min 992:in 618:log 556:log 429:log 377:log 2469:: 2448:. 2438:. 2428:. 2414:. 2410:. 2394:^ 2324:; 2316:; 2297:. 2282:^ 2271:. 2052:ln 2040:ln 1977:16 1860:ln 1815:32 2456:. 2432:: 2422:: 2388:. 2363:. 2338:. 2332:: 2259:) 2256:d 2247:( 2242:O 2218:2 2214:R 2210:L 2190:d 2167:A 2164:L 2161:A 2158:M 2131:A 2128:L 2125:U 2096:) 2090:2 2086:/ 2082:3 2077:) 2072:) 2064:1 2059:( 2049:+ 2037:d 2033:( 2026:2 2022:/ 2018:1 2014:d 2008:2 2004:/ 2000:3 1988:2 1984:R 1980:L 1973:e 1968:( 1962:O 1954:) 1949:0 1945:p 1941:, 1935:( 1930:A 1927:L 1924:A 1921:M 1892:) 1887:) 1880:2 1872:d 1867:( 1853:2 1845:d 1838:2 1826:2 1822:R 1818:L 1811:e 1806:( 1800:O 1792:) 1787:0 1783:p 1779:, 1773:( 1768:A 1765:L 1762:U 1732:m 1729:L 1724:= 1698:R 1674:) 1671:x 1668:( 1665:U 1639:V 1636:T 1631:| 1625:| 1617:| 1612:| 1585:p 1562:0 1558:p 1537:) 1534:1 1531:, 1528:0 1525:( 1495:} 1482:V 1466:p 1457:k 1453:p 1440:k 1436:{ 1429:= 1426:) 1421:0 1417:p 1413:; 1407:( 1372:) 1366:( 1363:p 1341:t 1316:) 1311:1 1308:+ 1305:t 1292:t 1284:( 1281:p 1258:] 1255:1 1252:, 1249:0 1246:[ 1241:U 1233:u 1227:, 1224:u 1215:) 1210:1 1207:+ 1204:t 1194:( 1185:p 1180:) 1174:t 1159:1 1156:+ 1153:t 1142:( 1138:p 1132:) 1127:t 1117:( 1107:p 1103:) 1098:1 1095:+ 1092:t 1077:t 1067:( 1064:p 1032:t 940:2 935:t 920:1 917:= 914:t 902:= 897:t 882:1 879:= 876:t 846:t 801:) 792:x 789:( 786:p 766:) 761:t 753:, 750:0 747:( 742:N 732:t 707:N 701:n 676:t 668:+ 664:) 660:) 655:t 640:i 636:t 631:x 627:( 624:p 610:n 605:1 602:= 599:i 589:n 586:N 581:+ 578:) 573:t 565:( 562:p 549:( 543:2 538:t 528:= 523:t 487:t 479:+ 475:) 471:) 466:t 451:i 447:t 442:x 438:( 435:p 421:N 416:1 413:= 410:i 402:+ 399:) 394:t 386:( 383:p 370:( 364:2 359:t 349:= 344:t 313:) 302:i 298:x 294:( 291:p 286:N 281:1 278:= 275:i 267:) 261:( 258:p 252:) 249:X 240:( 237:p 215:N 210:1 207:= 204:i 200:} 194:i 190:x 186:{ 183:= 180:X 160:) 154:( 151:p 84:( 64:) 60:(

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Knowledge

Stochastic gradient Langevin dynamics

Index