Knowledge (XXG)

Automatic basis function construction

Source 📝

22: 2770:
approaches to 1, Krylov and BEBFs converge slowly. This is because the error Krylov based methods are restricted by Chebyshev polynomial bound. To solve this problem, methods such as BARBs are proposed. BARBs is an incremental variant of Drazin bases, and converges faster than Krylov and BEBFs when
125:
approximators (LFAs) are widely adopted for their low theoretical complexity. Two sub-problems needs to be solved for better approximation: weight optimization and basis construction. To solve the second problem, one way is to design special basis functions. Those basis functions work well in
1444: 2025:
Loosely speaking, Bellman error points towards the optimal value function. The sequence of BEBF form a basis space which is orthogonal to the real value function space; thus with sufficient number of BEBFs, any value function can be represented exactly.
2020: 2655: 2245: 1223: 900:
This spectral framework can be used for value function approximation (VFA). Given the fixed policy, the edge weights are determined by corresponding states' transition probability. To get smooth value approximation,
602: 1260: 2794:
Another type is reward-insensitive proto value basis function derived from graph Lapalacian. This method uses graph information, but the construction of adjacency matrix makes this method hard to analyze.
520: 853: 1842: 2886:
M. Petrik. An analysis of Laplacian methods for value function approximation in MDPs. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2574–2579, 2007
2837:. (2006) Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA. 1889: 2858:
Mahadevan, Sridhar; Maggioni, Mauro. (2005) Value function approximation with diffusion wavelets and Laplacian eigenfuctions. Proceedings of Advances in Neural Information Processing Systems.
1083: 1512: 224: 178: 370: 302: 2750:
The first type of methods are reward-sensitive, like Krylov and BEBFs; they dilate the reward function geometrically through transition matrix. However, when discount factor
1697: 1015: 40: 2726: 2306: 1118: 965: 893:
could be constructed by simply checking whether two states are connected, and D could be calculated by summing up every row of W. In continuous state space, we could take
2518: 2137: 2067: 661: 1753: 1605: 1557: 2545: 2164: 1644: 721: 2789: 2768: 2408: 2358: 1933: 685: 2448: 2489: 2385: 2108: 744: 625: 1252: 891: 453: 433: 413: 393: 322: 264: 244: 126:
specific tasks but are significantly restricted to domains. Thus constructing basis construction functions automatically is preferred for broader applications.
2327:
Bellman Average Reward Bases (or BARBs) is similar to Krylov Bases, but the reward function is being dilated by the average adjusted transition matrix
2553: 2172: 1126: 913:
Krylov basis construction uses the actual transition matrix instead of random walk Laplacian. The assumption of this method is that transition model
528: 1439:{\displaystyle v=Br={\frac {1}{\alpha _{0}}}\sum _{i=0}^{m-1}\alpha _{i+1}(I-\gamma P)^{i}r=\sum _{i=0}^{m-1}\alpha _{i+1}\beta _{i}y_{i}.} 2895:
R. Parr, C. Painter-Wakefield, L.-H. Li, and M. Littman. Analyzing feature generation for value-function approximation. In ICML’07, 2007.
119:(MDP) problems have large or continuous state spaces, which typically require some sort of approximation to be represented efficiently. 84: 58: 458: 786: 100: 1762: 2846:
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction.(1998) MIT Press, Cambridge, MA, chapter 8
2950: 1858: 1515: 863: 2955: 2945: 664: 777:
In this approach, Mahadevan analyzes the connectivity graph between states to determine a set of basis functions.
103:, which allows it to perform well where expert-constructed basis functions are difficult or impossible to create. 1023: 1458: 2904:
S. Mahadevan and B. Liu. Basis construction from power series expansions of value functions. In NIPS’10, 2010
116: 112: 2870:
and Carl D. Meyer. The idea behind Krylov methods. American Mathematical Monthly, 105(10):889–899, 1998.
183: 137: 333: 269: 1653: 970: 2913:
William J. Stewart. Numerical methods for computing stationary distributions of finite irreducible
2804: 2660: 2253: 1088: 927: 2496: 2115: 2039: 902: 630: 2015:{\displaystyle \varepsilon =r+\gamma P{\hat {v}}-{\hat {v}}=r+\gamma P\Phi \theta -\Phi \theta } 1714: 1566: 1523: 2523: 2142: 1617: 690: 2774: 2753: 2393: 2330: 670: 134:
A Markov decision process with finite state space and fixed policy is defined with a 5-tuple
2809: 2423: 859: 757: 96: 72: 2456: 2363: 2075: 729: 610: 2814: 1228: 867: 122: 1085:
is enough to represent any value function, and m is the degree of minimal polynomial of
876: 438: 418: 398: 378: 307: 249: 229: 88: 2939: 2914: 2834: 726:
Basis construction looks for ways to automatically construct better basis function
99:
accurately. Automatic basis construction is independent of prior knowledge of the
2650:{\displaystyle :\phi _{i+1}=r-P^{*}r+P\Phi _{i}\theta _{i}-\Phi _{i}\theta _{i}} 2240:{\displaystyle \varepsilon =r+\gamma P\Phi _{i}\theta _{i}-\Phi _{i}\theta _{i}} 1218:{\displaystyle p(A)={\frac {1}{\alpha _{0}}}\sum _{i=0}^{m-1}\alpha _{i+1}A^{i}} 894: 92: 2867: 2917:. In Advances in Computational Probability. Kluwer Academic Publishers, 1997. 455:
is commonly being approximated via a linear combination of basis function
597:{\displaystyle v\approx {\hat {v}}=\sum _{i=1}^{n}\theta _{n}\phi _{n}} 749:
A good construction method should have the following characteristics:
2930: 862:
which represents the states of fixed policy MDP which forms an
753:
Small error bounds between the estimate and real value function
95:
to a lower-dimensional embedding, while still representing the
15: 2747:
There are two principal types of basis construction methods.
515:{\displaystyle \Phi ={\phi _{1},\phi _{2},\ldots ,\phi _{n}}} 848:{\displaystyle L=I-D^{-{\frac {1}{2}}}WD^{-{\frac {1}{2}}}} 2733:
N represents the number of iterations till convergence.
2313:
N represents the number of iterations till convergence.
36: 1837:{\displaystyle z_{i}:=z_{i}-<z_{j},z_{i}>z_{j};} 2777: 2756: 2663: 2556: 2526: 2499: 2459: 2426: 2396: 2366: 2333: 2256: 2175: 2145: 2118: 2078: 2042: 1936: 1861: 1765: 1717: 1656: 1620: 1569: 1526: 1461: 1263: 1231: 1129: 1091: 1026: 973: 930: 879: 789: 732: 693: 673: 633: 613: 531: 461: 441: 421: 401: 381: 336: 310: 272: 252: 232: 186: 140: 31:
may be too technical for most readers to understand
2783: 2762: 2720: 2649: 2539: 2512: 2483: 2442: 2402: 2390:BARBs converges faster than BEBFs and Krylov when 2379: 2352: 2300: 2239: 2158: 2131: 2102: 2061: 2014: 1884:{\displaystyle \parallel z_{i}\parallel \approx 0} 1883: 1836: 1747: 1691: 1638: 1599: 1551: 1506: 1438: 1246: 1217: 1112: 1077: 1009: 959: 885: 847: 738: 715: 679: 655: 619: 596: 514: 447: 427: 407: 387: 364: 316: 296: 258: 238: 218: 172: 687:is a weight vector with n parameters and usually 435:grows too large for this kind of representation. 2250:add bellman error to form new basis function: 924:The vectors in Neumann series are denoted as 873:In discrete state space, the adjacency matrix 780:The normalized graph Laplacian is defined as: 2854: 2852: 746:which can represent the value function well. 415:is usually maintained as tabular form. While 8: 1078:{\displaystyle y_{0},y_{1},\ldots ,y_{m-1}} 2736:":" means juxtaposing matrices or vectors. 2316:":" means juxtaposing matrices or vectors. 763:Converge to stationary value function fast 2776: 2755: 2703: 2690: 2668: 2662: 2641: 2631: 2618: 2608: 2589: 2564: 2555: 2531: 2525: 2504: 2498: 2458: 2431: 2425: 2395: 2371: 2365: 2344: 2332: 2283: 2261: 2255: 2231: 2221: 2208: 2198: 2174: 2150: 2144: 2123: 2117: 2077: 2047: 2041: 1971: 1970: 1956: 1955: 1935: 1869: 1860: 1825: 1812: 1799: 1783: 1770: 1764: 1716: 1677: 1661: 1655: 1619: 1568: 1531: 1525: 1507:{\displaystyle z_{1},z_{2},\ldots ,z_{k}} 1498: 1479: 1466: 1460: 1427: 1417: 1401: 1385: 1374: 1358: 1327: 1311: 1300: 1288: 1279: 1262: 1230: 1209: 1193: 1177: 1166: 1154: 1145: 1128: 1090: 1063: 1044: 1031: 1025: 972: 948: 935: 929: 878: 833: 829: 810: 806: 788: 731: 708: 700: 692: 672: 642: 634: 632: 612: 588: 578: 568: 557: 539: 538: 530: 505: 486: 473: 468: 460: 440: 420: 400: 380: 361: 335: 309: 271: 251: 231: 193: 185: 141: 139: 87:of looking for a set of task-independent 59:Learn how and when to remove this message 43:, without removing the technical details. 2882: 2880: 2878: 2876: 1930:Bellman error (or BEBFs) is defined as: 1254:, the value function can be written as: 180:, which includes the finite state space 2826: 2387:can be calculated by many methods in. 1020:It shows that Krylov space spanned by 2657:, and add it to form new bases matrix 663:matrix in which every row contains a 77:automatic basis function construction 41:make it understandable to non-experts 7: 2520:according to current basis function 2139:according to current basis function 2687: 2665: 2628: 2605: 2528: 2280: 2258: 2218: 2195: 2147: 2006: 1997: 1916:k: number of eigenvectors in basis 1123:Suppose the minimal polynomial is 733: 614: 462: 14: 219:{\displaystyle S={1,2,\ldots ,s}} 173:{\displaystyle {s,a,p,\gamma ,r}} 365:{\displaystyle v=r+\gamma Pv.\,} 327:Bellman equation is defined as: 297:{\displaystyle \gamma \in [0,1)} 20: 2833:Keller, Philipp; Mannor, Shie; 1692:{\displaystyle z_{i}:=Pz_{i-1}} 375:When the number of elements in 2715: 2683: 2478: 2466: 2295: 2276: 2097: 2085: 1976: 1961: 1862: 1742: 1730: 1594: 1582: 1355: 1339: 1139: 1133: 1107: 1092: 1010:{\displaystyle i\in [0,infty)} 1004: 980: 709: 701: 643: 635: 544: 291: 279: 1: 2169:compute new bellman error by 2721:{\displaystyle \Phi _{i+1}=} 2323:Bellman average reward bases 2301:{\displaystyle \Phi _{i+1}=} 1113:{\displaystyle (I-\gamma P)} 960:{\displaystyle y_{i}=P^{i}r} 2513:{\displaystyle \theta _{i}} 2132:{\displaystyle \theta _{i}} 2062:{\displaystyle \phi _{1}=r} 870:related to nodes' degrees. 760:in the value function space 656:{\displaystyle |S|\times n} 304:, and the transition model 2972: 2493:compute the weight vector 2112:compute the weight vector 1919:l: total number of vectors 1748:{\displaystyle j:=1:(i-1)} 1600:{\displaystyle i:=1:(l+k)} 1552:{\displaystyle z_{k+1}:=r} 226:, the finite action space 2540:{\displaystyle \Phi _{i}} 2159:{\displaystyle \Phi _{i}} 1639:{\displaystyle i>k+1} 716:{\displaystyle n\ll |s|} 2784:{\displaystyle \gamma } 2763:{\displaystyle \gamma } 2743:Discussion and analysis 2403:{\displaystyle \gamma } 2353:{\displaystyle P-P^{*}} 1453:Augmented Krylov Method 680:{\displaystyle \theta } 667:for corresponding row, 117:Markov Decision Process 2785: 2764: 2722: 2651: 2541: 2514: 2485: 2444: 2443:{\displaystyle P^{*}r} 2404: 2381: 2354: 2302: 2241: 2160: 2133: 2104: 2063: 2016: 1885: 1838: 1749: 1693: 1640: 1601: 1553: 1508: 1440: 1396: 1322: 1248: 1219: 1188: 1114: 1079: 1011: 961: 887: 849: 740: 717: 681: 657: 621: 598: 573: 516: 449: 429: 409: 389: 366: 318: 298: 260: 246:, the reward function 240: 220: 174: 115:(RL), most real-world 113:reinforcement learning 2786: 2765: 2723: 2652: 2542: 2515: 2486: 2484:{\displaystyle i\in } 2445: 2405: 2382: 2380:{\displaystyle P^{*}} 2355: 2303: 2242: 2161: 2134: 2105: 2103:{\displaystyle i\in } 2064: 2017: 1886: 1839: 1750: 1694: 1641: 1602: 1554: 1509: 1441: 1370: 1296: 1249: 1220: 1162: 1115: 1080: 1012: 962: 888: 850: 741: 739:{\displaystyle \Phi } 718: 682: 658: 622: 620:{\displaystyle \Phi } 599: 553: 517: 450: 430: 410: 390: 367: 319: 299: 261: 241: 221: 175: 2775: 2754: 2661: 2554: 2524: 2497: 2457: 2424: 2394: 2364: 2331: 2254: 2173: 2143: 2116: 2076: 2040: 1934: 1859: 1763: 1715: 1654: 1618: 1567: 1524: 1459: 1261: 1247:{\displaystyle BA=I} 1229: 1127: 1089: 1024: 971: 928: 877: 787: 730: 691: 671: 631: 611: 529: 459: 439: 419: 399: 379: 334: 308: 270: 250: 230: 184: 138: 2951:Dynamic programming 2805:Dynamic programming 2550:compute new basis: 1926:Bellman error basis 522:, so that we have: 85:mathematical method 2956:Stochastic control 2781: 2760: 2718: 2647: 2537: 2510: 2481: 2440: 2400: 2377: 2350: 2298: 2237: 2156: 2129: 2100: 2059: 2012: 1881: 1876:∥ ≈ 1834: 1745: 1689: 1636: 1597: 1549: 1504: 1436: 1244: 1215: 1110: 1075: 1007: 957: 903:diffusion wavelets 883: 845: 736: 713: 677: 653: 617: 594: 512: 445: 425: 405: 385: 362: 314: 294: 266:, discount factor 256: 236: 216: 170: 130:Problem definition 2946:Optimal decisions 2420:stage stage i=1, 2036:stage stage i=1, 1979: 1964: 1294: 1160: 886:{\displaystyle W} 841: 818: 773:Proto-value basis 547: 448:{\displaystyle v} 428:{\displaystyle S} 408:{\displaystyle v} 388:{\displaystyle S} 317:{\displaystyle P} 259:{\displaystyle r} 239:{\displaystyle A} 69: 68: 61: 2963: 2918: 2911: 2905: 2902: 2896: 2893: 2887: 2884: 2871: 2868:Ilse C. F. Ipsen 2865: 2859: 2856: 2847: 2844: 2838: 2831: 2810:Bellman equation 2790: 2788: 2787: 2782: 2769: 2767: 2766: 2761: 2727: 2725: 2724: 2719: 2714: 2713: 2695: 2694: 2679: 2678: 2656: 2654: 2653: 2648: 2646: 2645: 2636: 2635: 2623: 2622: 2613: 2612: 2594: 2593: 2575: 2574: 2546: 2544: 2543: 2538: 2536: 2535: 2519: 2517: 2516: 2511: 2509: 2508: 2490: 2488: 2487: 2482: 2449: 2447: 2446: 2441: 2436: 2435: 2409: 2407: 2406: 2401: 2386: 2384: 2383: 2378: 2376: 2375: 2359: 2357: 2356: 2351: 2349: 2348: 2307: 2305: 2304: 2299: 2288: 2287: 2272: 2271: 2246: 2244: 2243: 2238: 2236: 2235: 2226: 2225: 2213: 2212: 2203: 2202: 2165: 2163: 2162: 2157: 2155: 2154: 2138: 2136: 2135: 2130: 2128: 2127: 2109: 2107: 2106: 2101: 2068: 2066: 2065: 2060: 2052: 2051: 2021: 2019: 2018: 2013: 1981: 1980: 1972: 1966: 1965: 1957: 1890: 1888: 1887: 1882: 1874: 1873: 1843: 1841: 1840: 1835: 1830: 1829: 1817: 1816: 1804: 1803: 1788: 1787: 1775: 1774: 1754: 1752: 1751: 1746: 1698: 1696: 1695: 1690: 1688: 1687: 1666: 1665: 1645: 1643: 1642: 1637: 1606: 1604: 1603: 1598: 1558: 1556: 1555: 1550: 1542: 1541: 1513: 1511: 1510: 1505: 1503: 1502: 1484: 1483: 1471: 1470: 1445: 1443: 1442: 1437: 1432: 1431: 1422: 1421: 1412: 1411: 1395: 1384: 1363: 1362: 1338: 1337: 1321: 1310: 1295: 1293: 1292: 1280: 1253: 1251: 1250: 1245: 1224: 1222: 1221: 1216: 1214: 1213: 1204: 1203: 1187: 1176: 1161: 1159: 1158: 1146: 1119: 1117: 1116: 1111: 1084: 1082: 1081: 1076: 1074: 1073: 1049: 1048: 1036: 1035: 1016: 1014: 1013: 1008: 966: 964: 963: 958: 953: 952: 940: 939: 897:Laplacian of W. 892: 890: 889: 884: 864:undirected graph 860:adjacency matrix 854: 852: 851: 846: 844: 843: 842: 834: 821: 820: 819: 811: 758:orthogonal basis 745: 743: 742: 737: 722: 720: 719: 714: 712: 704: 686: 684: 683: 678: 662: 660: 659: 654: 646: 638: 626: 624: 623: 618: 603: 601: 600: 595: 593: 592: 583: 582: 572: 567: 549: 548: 540: 521: 519: 518: 513: 511: 510: 509: 491: 490: 478: 477: 454: 452: 451: 446: 434: 432: 431: 426: 414: 412: 411: 406: 394: 392: 391: 386: 371: 369: 368: 363: 323: 321: 320: 315: 303: 301: 300: 295: 265: 263: 262: 257: 245: 243: 242: 237: 225: 223: 222: 217: 215: 179: 177: 176: 171: 169: 73:machine learning 64: 57: 53: 50: 44: 24: 23: 16: 2971: 2970: 2966: 2965: 2964: 2962: 2961: 2960: 2936: 2935: 2927: 2922: 2921: 2912: 2908: 2903: 2899: 2894: 2890: 2885: 2874: 2866: 2862: 2857: 2850: 2845: 2841: 2832: 2828: 2823: 2815:Optimal control 2801: 2791:becomes large. 2773: 2772: 2752: 2751: 2745: 2699: 2686: 2664: 2659: 2658: 2637: 2627: 2614: 2604: 2585: 2560: 2552: 2551: 2527: 2522: 2521: 2500: 2495: 2494: 2455: 2454: 2427: 2422: 2421: 2410:is close to 1. 2392: 2391: 2367: 2362: 2361: 2340: 2329: 2328: 2325: 2279: 2257: 2252: 2251: 2227: 2217: 2204: 2194: 2171: 2170: 2146: 2141: 2140: 2119: 2114: 2113: 2074: 2073: 2043: 2038: 2037: 1932: 1931: 1928: 1865: 1857: 1856: 1821: 1808: 1795: 1779: 1766: 1761: 1760: 1713: 1712: 1673: 1657: 1652: 1651: 1616: 1615: 1565: 1564: 1527: 1522: 1521: 1494: 1475: 1462: 1457: 1456: 1423: 1413: 1397: 1354: 1323: 1284: 1259: 1258: 1227: 1226: 1205: 1189: 1150: 1125: 1124: 1087: 1086: 1059: 1040: 1027: 1022: 1021: 969: 968: 944: 931: 926: 925: 921:are available. 911: 875: 874: 868:diagonal matrix 825: 802: 785: 784: 775: 770: 768:Popular methods 728: 727: 689: 688: 669: 668: 629: 628: 609: 608: 584: 574: 527: 526: 501: 482: 469: 457: 456: 437: 436: 417: 416: 397: 396: 377: 376: 332: 331: 306: 305: 268: 267: 248: 247: 228: 227: 182: 181: 136: 135: 132: 127: 123:Linear function 109: 89:basis functions 81:basis discovery 65: 54: 48: 45: 37:help improve it 34: 25: 21: 12: 11: 5: 2969: 2967: 2959: 2958: 2953: 2948: 2938: 2937: 2934: 2933: 2926: 2925:External links 2923: 2920: 2919: 2906: 2897: 2888: 2872: 2860: 2848: 2839: 2825: 2824: 2822: 2819: 2818: 2817: 2812: 2807: 2800: 2797: 2780: 2759: 2744: 2741: 2740: 2739: 2738: 2737: 2734: 2730: 2729: 2717: 2712: 2709: 2706: 2702: 2698: 2693: 2689: 2685: 2682: 2677: 2674: 2671: 2667: 2644: 2640: 2634: 2630: 2626: 2621: 2617: 2611: 2607: 2603: 2600: 2597: 2592: 2588: 2584: 2581: 2578: 2573: 2570: 2567: 2563: 2559: 2548: 2534: 2530: 2507: 2503: 2480: 2477: 2474: 2471: 2468: 2465: 2462: 2451: 2439: 2434: 2430: 2418: 2399: 2374: 2370: 2347: 2343: 2339: 2336: 2324: 2321: 2320: 2319: 2318: 2317: 2314: 2310: 2309: 2297: 2294: 2291: 2286: 2282: 2278: 2275: 2270: 2267: 2264: 2260: 2248: 2234: 2230: 2224: 2220: 2216: 2211: 2207: 2201: 2197: 2193: 2190: 2187: 2184: 2181: 2178: 2167: 2153: 2149: 2126: 2122: 2099: 2096: 2093: 2090: 2087: 2084: 2081: 2070: 2058: 2055: 2050: 2046: 2034: 2011: 2008: 2005: 2002: 1999: 1996: 1993: 1990: 1987: 1984: 1978: 1975: 1969: 1963: 1960: 1954: 1951: 1948: 1945: 1942: 1939: 1927: 1924: 1923: 1922: 1921: 1920: 1917: 1909: 1908: 1907: 1902: 1901: 1900: 1880: 1877: 1872: 1868: 1864: 1851: 1846: 1845: 1844: 1833: 1828: 1824: 1820: 1815: 1811: 1807: 1802: 1798: 1794: 1791: 1786: 1782: 1778: 1773: 1769: 1744: 1741: 1738: 1735: 1732: 1729: 1726: 1723: 1720: 1707: 1702: 1701: 1700: 1686: 1683: 1680: 1676: 1672: 1669: 1664: 1660: 1635: 1632: 1629: 1626: 1623: 1596: 1593: 1590: 1587: 1584: 1581: 1578: 1575: 1572: 1559: 1548: 1545: 1540: 1537: 1534: 1530: 1519: 1501: 1497: 1493: 1490: 1487: 1482: 1478: 1474: 1469: 1465: 1454: 1447: 1446: 1435: 1430: 1426: 1420: 1416: 1410: 1407: 1404: 1400: 1394: 1391: 1388: 1383: 1380: 1377: 1373: 1369: 1366: 1361: 1357: 1353: 1350: 1347: 1344: 1341: 1336: 1333: 1330: 1326: 1320: 1317: 1314: 1309: 1306: 1303: 1299: 1291: 1287: 1283: 1278: 1275: 1272: 1269: 1266: 1243: 1240: 1237: 1234: 1225:, and we have 1212: 1208: 1202: 1199: 1196: 1192: 1186: 1183: 1180: 1175: 1172: 1169: 1165: 1157: 1153: 1149: 1144: 1141: 1138: 1135: 1132: 1109: 1106: 1103: 1100: 1097: 1094: 1072: 1069: 1066: 1062: 1058: 1055: 1052: 1047: 1043: 1039: 1034: 1030: 1006: 1003: 1000: 997: 994: 991: 988: 985: 982: 979: 976: 956: 951: 947: 943: 938: 934: 910: 907: 882: 866:(N,E). D is a 856: 855: 840: 837: 832: 828: 824: 817: 814: 809: 805: 801: 798: 795: 792: 774: 771: 769: 766: 765: 764: 761: 754: 735: 711: 707: 703: 699: 696: 676: 665:feature vector 652: 649: 645: 641: 637: 616: 605: 604: 591: 587: 581: 577: 571: 566: 563: 560: 556: 552: 546: 543: 537: 534: 508: 504: 500: 497: 494: 489: 485: 481: 476: 472: 467: 464: 444: 424: 404: 384: 373: 372: 360: 357: 354: 351: 348: 345: 342: 339: 313: 293: 290: 287: 284: 281: 278: 275: 255: 235: 214: 211: 208: 205: 202: 199: 196: 192: 189: 168: 165: 162: 159: 156: 153: 150: 147: 144: 131: 128: 121: 108: 105: 97:value function 67: 66: 28: 26: 19: 13: 10: 9: 6: 4: 3: 2: 2968: 2957: 2954: 2952: 2949: 2947: 2944: 2943: 2941: 2932:UMASS ALL lab 2931: 2929: 2928: 2924: 2916: 2915:Markov chains 2910: 2907: 2901: 2898: 2892: 2889: 2883: 2881: 2879: 2877: 2873: 2869: 2864: 2861: 2855: 2853: 2849: 2843: 2840: 2836: 2835:Precup, Doina 2830: 2827: 2820: 2816: 2813: 2811: 2808: 2806: 2803: 2802: 2798: 2796: 2792: 2778: 2757: 2748: 2742: 2735: 2732: 2731: 2710: 2707: 2704: 2700: 2696: 2691: 2680: 2675: 2672: 2669: 2642: 2638: 2632: 2624: 2619: 2615: 2609: 2601: 2598: 2595: 2590: 2586: 2582: 2579: 2576: 2571: 2568: 2565: 2561: 2557: 2549: 2532: 2505: 2501: 2492: 2491: 2475: 2472: 2469: 2463: 2460: 2452: 2437: 2432: 2428: 2419: 2416: 2413: 2412: 2411: 2397: 2388: 2372: 2368: 2345: 2341: 2337: 2334: 2322: 2315: 2312: 2311: 2292: 2289: 2284: 2273: 2268: 2265: 2262: 2249: 2232: 2228: 2222: 2214: 2209: 2205: 2199: 2191: 2188: 2185: 2182: 2179: 2176: 2168: 2151: 2124: 2120: 2111: 2110: 2094: 2091: 2088: 2082: 2079: 2071: 2056: 2053: 2048: 2044: 2035: 2032: 2029: 2028: 2027: 2023: 2009: 2003: 2000: 1994: 1991: 1988: 1985: 1982: 1973: 1967: 1958: 1952: 1949: 1946: 1943: 1940: 1937: 1925: 1918: 1915: 1914: 1913: 1910: 1906: 1903: 1898: 1895: 1894: 1893: 1878: 1875: 1870: 1866: 1855: 1852: 1850: 1847: 1831: 1826: 1822: 1818: 1813: 1809: 1805: 1800: 1796: 1792: 1789: 1784: 1780: 1776: 1771: 1767: 1759: 1758: 1757: 1739: 1736: 1733: 1727: 1724: 1721: 1718: 1711: 1708: 1706: 1703: 1684: 1681: 1678: 1674: 1670: 1667: 1662: 1658: 1650: 1649: 1648: 1633: 1630: 1627: 1624: 1621: 1614: 1611: 1610: 1609: 1591: 1588: 1585: 1579: 1576: 1573: 1570: 1563: 1560: 1546: 1543: 1538: 1535: 1532: 1528: 1520: 1517: 1514:are top real 1499: 1495: 1491: 1488: 1485: 1480: 1476: 1472: 1467: 1463: 1455: 1452: 1449: 1448: 1433: 1428: 1424: 1418: 1414: 1408: 1405: 1402: 1398: 1392: 1389: 1386: 1381: 1378: 1375: 1371: 1367: 1364: 1359: 1351: 1348: 1345: 1342: 1334: 1331: 1328: 1324: 1318: 1315: 1312: 1307: 1304: 1301: 1297: 1289: 1285: 1281: 1276: 1273: 1270: 1267: 1264: 1257: 1256: 1255: 1241: 1238: 1235: 1232: 1210: 1206: 1200: 1197: 1194: 1190: 1184: 1181: 1178: 1173: 1170: 1167: 1163: 1155: 1151: 1147: 1142: 1136: 1130: 1121: 1104: 1101: 1098: 1095: 1070: 1067: 1064: 1060: 1056: 1053: 1050: 1045: 1041: 1037: 1032: 1028: 1018: 1001: 998: 995: 992: 989: 986: 983: 977: 974: 954: 949: 945: 941: 936: 932: 922: 920: 916: 908: 906: 904: 898: 896: 880: 871: 869: 865: 861: 858:Here W is an 838: 835: 830: 826: 822: 815: 812: 807: 803: 799: 796: 793: 790: 783: 782: 781: 778: 772: 767: 762: 759: 755: 752: 751: 750: 747: 724: 705: 697: 694: 674: 666: 650: 647: 639: 589: 585: 579: 575: 569: 564: 561: 558: 554: 550: 541: 535: 532: 525: 524: 523: 506: 502: 498: 495: 492: 487: 483: 479: 474: 470: 465: 442: 422: 402: 382: 358: 355: 352: 349: 346: 343: 340: 337: 330: 329: 328: 325: 311: 288: 285: 282: 276: 273: 253: 233: 212: 209: 206: 203: 200: 197: 194: 190: 187: 166: 163: 160: 157: 154: 151: 148: 145: 142: 129: 124: 120: 118: 114: 106: 104: 102: 98: 94: 91:that map the 90: 86: 82: 78: 74: 63: 60: 52: 42: 38: 32: 29:This article 27: 18: 17: 2909: 2900: 2891: 2863: 2842: 2829: 2793: 2749: 2746: 2414: 2389: 2326: 2030: 2024: 1929: 1911: 1904: 1896: 1891: 1853: 1848: 1755: 1709: 1704: 1646: 1612: 1607: 1561: 1516:eigenvectors 1450: 1122: 1019: 923: 918: 914: 912: 909:Krylov basis 899: 872: 857: 779: 776: 748: 725: 606: 374: 326: 133: 110: 80: 76: 70: 55: 46: 30: 917:and reward 895:random walk 93:state space 2940:Categories 2821:References 905:are used. 395:is small, 107:Motivation 49:March 2019 2779:γ 2758:γ 2701:ϕ 2688:Φ 2666:Φ 2639:θ 2629:Φ 2625:− 2616:θ 2606:Φ 2591:∗ 2583:− 2562:ϕ 2529:Φ 2502:θ 2464:∈ 2433:∗ 2415:Algorithm 2398:γ 2373:∗ 2346:∗ 2338:− 2293:ε 2281:Φ 2259:Φ 2229:θ 2219:Φ 2215:− 2206:θ 2196:Φ 2189:γ 2177:ε 2148:Φ 2121:θ 2083:∈ 2045:ϕ 2031:Algorithm 2010:θ 2007:Φ 2004:− 2001:θ 1998:Φ 1992:γ 1977:^ 1968:− 1962:^ 1950:γ 1938:ε 1863:∥ 1790:− 1737:− 1682:− 1489:… 1451:Algorithm 1415:β 1399:α 1390:− 1372:∑ 1349:γ 1346:− 1325:α 1316:− 1298:∑ 1286:α 1191:α 1182:− 1164:∑ 1152:α 1102:γ 1099:− 1068:− 1054:… 978:∈ 831:− 808:− 800:− 734:Φ 698:≪ 675:θ 648:× 615:Φ 586:ϕ 576:θ 555:∑ 545:^ 536:≈ 503:ϕ 496:… 484:ϕ 471:ϕ 463:Φ 350:γ 277:∈ 274:γ 207:… 161:γ 83:) is the 2799:See also 967:for all 2360:. Here 1912:end for 1849:end for 35:Please 2453:stage 2072:stage 1905:end if 1705:end if 101:domain 2417:BARBs 1897:break 756:Form 627:is a 607:Here 2033:BEBF 1892:then 1819:> 1793:< 1647:then 1625:> 1518:of P 79:(or 1710:for 1562:for 111:In 71:In 39:to 2942:: 2875:^ 2851:^ 2022:. 1854:if 1777::= 1756:do 1722::= 1668::= 1613:if 1608:do 1574::= 1544::= 1120:. 1017:. 723:. 324:. 75:, 2728:; 2716:] 2711:1 2708:+ 2705:i 2697:: 2692:i 2684:[ 2681:= 2676:1 2673:+ 2670:i 2643:i 2633:i 2620:i 2610:i 2602:P 2599:+ 2596:r 2587:P 2580:r 2577:= 2572:1 2569:+ 2566:i 2558:: 2547:; 2533:i 2506:i 2479:] 2476:N 2473:, 2470:2 2467:[ 2461:i 2450:; 2438:r 2429:P 2369:P 2342:P 2335:P 2308:; 2296:] 2290:: 2285:i 2277:[ 2274:= 2269:1 2266:+ 2263:i 2247:; 2233:i 2223:i 2210:i 2200:i 2192:P 2186:+ 2183:r 2180:= 2166:; 2152:i 2125:i 2098:] 2095:N 2092:, 2089:2 2086:[ 2080:i 2069:; 2057:r 2054:= 2049:1 1995:P 1989:+ 1986:r 1983:= 1974:v 1959:v 1953:P 1947:+ 1944:r 1941:= 1899:; 1879:0 1871:i 1867:z 1832:; 1827:j 1823:z 1814:i 1810:z 1806:, 1801:j 1797:z 1785:i 1781:z 1772:i 1768:z 1743:) 1740:1 1734:i 1731:( 1728:: 1725:1 1719:j 1699:; 1685:1 1679:i 1675:z 1671:P 1663:i 1659:z 1634:1 1631:+ 1628:k 1622:i 1595:) 1592:k 1589:+ 1586:l 1583:( 1580:: 1577:1 1571:i 1547:r 1539:1 1536:+ 1533:k 1529:z 1500:k 1496:z 1492:, 1486:, 1481:2 1477:z 1473:, 1468:1 1464:z 1434:. 1429:i 1425:y 1419:i 1409:1 1406:+ 1403:i 1393:1 1387:m 1382:0 1379:= 1376:i 1368:= 1365:r 1360:i 1356:) 1352:P 1343:I 1340:( 1335:1 1332:+ 1329:i 1319:1 1313:m 1308:0 1305:= 1302:i 1290:0 1282:1 1277:= 1274:r 1271:B 1268:= 1265:v 1242:I 1239:= 1236:A 1233:B 1211:i 1207:A 1201:1 1198:+ 1195:i 1185:1 1179:m 1174:0 1171:= 1168:i 1156:0 1148:1 1143:= 1140:) 1137:A 1134:( 1131:p 1108:) 1105:P 1096:I 1093:( 1071:1 1065:m 1061:y 1057:, 1051:, 1046:1 1042:y 1038:, 1033:0 1029:y 1005:) 1002:y 999:t 996:f 993:n 990:i 987:, 984:0 981:[ 975:i 955:r 950:i 946:P 942:= 937:i 933:y 919:r 915:P 881:W 839:2 836:1 827:D 823:W 816:2 813:1 804:D 797:I 794:= 791:L 710:| 706:s 702:| 695:n 651:n 644:| 640:S 636:| 590:n 580:n 570:n 565:1 562:= 559:i 551:= 542:v 533:v 507:n 499:, 493:, 488:2 480:, 475:1 466:= 443:v 423:S 403:v 383:S 359:. 356:v 353:P 347:+ 344:r 341:= 338:v 312:P 292:) 289:1 286:, 283:0 280:[ 254:r 234:A 213:s 210:, 204:, 201:2 198:, 195:1 191:= 188:S 167:r 164:, 158:, 155:p 152:, 149:a 146:, 143:s 62:) 56:( 51:) 47:( 33:.

Index

help improve it
make it understandable to non-experts
Learn how and when to remove this message
machine learning
mathematical method
basis functions
state space
value function
domain
reinforcement learning
Markov Decision Process
Linear function
feature vector
orthogonal basis
adjacency matrix
undirected graph
diagonal matrix
random walk
diffusion wavelets
eigenvectors
Dynamic programming
Bellman equation
Optimal control
Precup, Doina


Ilse C. F. Ipsen


Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.