22:
2770:
approaches to 1, Krylov and BEBFs converge slowly. This is because the error Krylov based methods are restricted by
Chebyshev polynomial bound. To solve this problem, methods such as BARBs are proposed. BARBs is an incremental variant of Drazin bases, and converges faster than Krylov and BEBFs when
125:
approximators (LFAs) are widely adopted for their low theoretical complexity. Two sub-problems needs to be solved for better approximation: weight optimization and basis construction. To solve the second problem, one way is to design special basis functions. Those basis functions work well in
1444:
2025:
Loosely speaking, Bellman error points towards the optimal value function. The sequence of BEBF form a basis space which is orthogonal to the real value function space; thus with sufficient number of BEBFs, any value function can be represented exactly.
2020:
2655:
2245:
1223:
900:
This spectral framework can be used for value function approximation (VFA). Given the fixed policy, the edge weights are determined by corresponding states' transition probability. To get smooth value approximation,
602:
1260:
2794:
Another type is reward-insensitive proto value basis function derived from graph
Lapalacian. This method uses graph information, but the construction of adjacency matrix makes this method hard to analyze.
520:
853:
1842:
2886:
M. Petrik. An analysis of
Laplacian methods for value function approximation in MDPs. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2574–2579, 2007
2837:. (2006) Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA.
1889:
2858:
Mahadevan, Sridhar; Maggioni, Mauro. (2005) Value function approximation with diffusion wavelets and
Laplacian eigenfuctions. Proceedings of Advances in Neural Information Processing Systems.
1083:
1512:
224:
178:
370:
302:
2750:
The first type of methods are reward-sensitive, like Krylov and BEBFs; they dilate the reward function geometrically through transition matrix. However, when discount factor
1697:
1015:
40:
2726:
2306:
1118:
965:
893:
could be constructed by simply checking whether two states are connected, and D could be calculated by summing up every row of W. In continuous state space, we could take
2518:
2137:
2067:
661:
1753:
1605:
1557:
2545:
2164:
1644:
721:
2789:
2768:
2408:
2358:
1933:
685:
2448:
2489:
2385:
2108:
744:
625:
1252:
891:
453:
433:
413:
393:
322:
264:
244:
126:
specific tasks but are significantly restricted to domains. Thus constructing basis construction functions automatically is preferred for broader applications.
2327:
Bellman
Average Reward Bases (or BARBs) is similar to Krylov Bases, but the reward function is being dilated by the average adjusted transition matrix
2553:
2172:
1126:
913:
Krylov basis construction uses the actual transition matrix instead of random walk
Laplacian. The assumption of this method is that transition model
528:
1439:{\displaystyle v=Br={\frac {1}{\alpha _{0}}}\sum _{i=0}^{m-1}\alpha _{i+1}(I-\gamma P)^{i}r=\sum _{i=0}^{m-1}\alpha _{i+1}\beta _{i}y_{i}.}
2895:
R. Parr, C. Painter-Wakefield, L.-H. Li, and M. Littman. Analyzing feature generation for value-function approximation. In ICML’07, 2007.
119:(MDP) problems have large or continuous state spaces, which typically require some sort of approximation to be represented efficiently.
84:
58:
458:
786:
100:
1762:
2846:
Richard S. Sutton and Andrew G. Barto. Reinforcement
Learning: An Introduction.(1998) MIT Press, Cambridge, MA, chapter 8
2950:
1858:
1515:
863:
2955:
2945:
664:
777:
In this approach, Mahadevan analyzes the connectivity graph between states to determine a set of basis functions.
103:, which allows it to perform well where expert-constructed basis functions are difficult or impossible to create.
1023:
1458:
2904:
S. Mahadevan and B. Liu. Basis construction from power series expansions of value functions. In NIPS’10, 2010
116:
112:
2870:
and Carl D. Meyer. The idea behind Krylov methods. American
Mathematical Monthly, 105(10):889–899, 1998.
183:
137:
333:
269:
1653:
970:
2913:
William J. Stewart. Numerical methods for computing stationary distributions of finite irreducible
2804:
2660:
2253:
1088:
927:
2496:
2115:
2039:
902:
630:
2015:{\displaystyle \varepsilon =r+\gamma P{\hat {v}}-{\hat {v}}=r+\gamma P\Phi \theta -\Phi \theta }
1714:
1566:
1523:
2523:
2142:
1617:
690:
2774:
2753:
2393:
2330:
670:
134:
A Markov decision process with finite state space and fixed policy is defined with a 5-tuple
2809:
2423:
859:
757:
96:
72:
2456:
2363:
2075:
729:
610:
2814:
1228:
867:
122:
1085:
is enough to represent any value function, and m is the degree of minimal polynomial of
876:
438:
418:
398:
378:
307:
249:
229:
88:
2939:
2914:
2834:
726:
Basis construction looks for ways to automatically construct better basis function
99:
accurately. Automatic basis construction is independent of prior knowledge of the
2650:{\displaystyle :\phi _{i+1}=r-P^{*}r+P\Phi _{i}\theta _{i}-\Phi _{i}\theta _{i}}
2240:{\displaystyle \varepsilon =r+\gamma P\Phi _{i}\theta _{i}-\Phi _{i}\theta _{i}}
1218:{\displaystyle p(A)={\frac {1}{\alpha _{0}}}\sum _{i=0}^{m-1}\alpha _{i+1}A^{i}}
894:
92:
2867:
2917:. In Advances in Computational Probability. Kluwer Academic Publishers, 1997.
455:
is commonly being approximated via a linear combination of basis function
597:{\displaystyle v\approx {\hat {v}}=\sum _{i=1}^{n}\theta _{n}\phi _{n}}
749:
A good construction method should have the following characteristics:
2930:
862:
which represents the states of fixed policy MDP which forms an
753:
Small error bounds between the estimate and real value function
95:
to a lower-dimensional embedding, while still representing the
15:
2747:
There are two principal types of basis construction methods.
515:{\displaystyle \Phi ={\phi _{1},\phi _{2},\ldots ,\phi _{n}}}
848:{\displaystyle L=I-D^{-{\frac {1}{2}}}WD^{-{\frac {1}{2}}}}
2733:
N represents the number of iterations till convergence.
2313:
N represents the number of iterations till convergence.
36:
1837:{\displaystyle z_{i}:=z_{i}-<z_{j},z_{i}>z_{j};}
2777:
2756:
2663:
2556:
2526:
2499:
2459:
2426:
2396:
2366:
2333:
2256:
2175:
2145:
2118:
2078:
2042:
1936:
1861:
1765:
1717:
1656:
1620:
1569:
1526:
1461:
1263:
1231:
1129:
1091:
1026:
973:
930:
879:
789:
732:
693:
673:
633:
613:
531:
461:
441:
421:
401:
381:
336:
310:
272:
252:
232:
186:
140:
31:
may be too technical for most readers to understand
2783:
2762:
2720:
2649:
2539:
2512:
2483:
2442:
2402:
2390:BARBs converges faster than BEBFs and Krylov when
2379:
2352:
2300:
2239:
2158:
2131:
2102:
2061:
2014:
1884:{\displaystyle \parallel z_{i}\parallel \approx 0}
1883:
1836:
1747:
1691:
1638:
1599:
1551:
1506:
1438:
1246:
1217:
1112:
1077:
1009:
959:
885:
847:
738:
715:
679:
655:
619:
596:
514:
447:
427:
407:
387:
364:
316:
296:
258:
238:
218:
172:
687:is a weight vector with n parameters and usually
435:grows too large for this kind of representation.
2250:add bellman error to form new basis function:
924:The vectors in Neumann series are denoted as
873:In discrete state space, the adjacency matrix
780:The normalized graph Laplacian is defined as:
2854:
2852:
746:which can represent the value function well.
415:is usually maintained as tabular form. While
8:
1078:{\displaystyle y_{0},y_{1},\ldots ,y_{m-1}}
2736:":" means juxtaposing matrices or vectors.
2316:":" means juxtaposing matrices or vectors.
763:Converge to stationary value function fast
2776:
2755:
2703:
2690:
2668:
2662:
2641:
2631:
2618:
2608:
2589:
2564:
2555:
2531:
2525:
2504:
2498:
2458:
2431:
2425:
2395:
2371:
2365:
2344:
2332:
2283:
2261:
2255:
2231:
2221:
2208:
2198:
2174:
2150:
2144:
2123:
2117:
2077:
2047:
2041:
1971:
1970:
1956:
1955:
1935:
1869:
1860:
1825:
1812:
1799:
1783:
1770:
1764:
1716:
1677:
1661:
1655:
1619:
1568:
1531:
1525:
1507:{\displaystyle z_{1},z_{2},\ldots ,z_{k}}
1498:
1479:
1466:
1460:
1427:
1417:
1401:
1385:
1374:
1358:
1327:
1311:
1300:
1288:
1279:
1262:
1230:
1209:
1193:
1177:
1166:
1154:
1145:
1128:
1090:
1063:
1044:
1031:
1025:
972:
948:
935:
929:
878:
833:
829:
810:
806:
788:
731:
708:
700:
692:
672:
642:
634:
632:
612:
588:
578:
568:
557:
539:
538:
530:
505:
486:
473:
468:
460:
440:
420:
400:
380:
361:
335:
309:
271:
251:
231:
193:
185:
141:
139:
87:of looking for a set of task-independent
59:Learn how and when to remove this message
43:, without removing the technical details.
2882:
2880:
2878:
2876:
1930:Bellman error (or BEBFs) is defined as:
1254:, the value function can be written as:
180:, which includes the finite state space
2826:
2387:can be calculated by many methods in.
1020:It shows that Krylov space spanned by
2657:, and add it to form new bases matrix
663:matrix in which every row contains a
77:automatic basis function construction
41:make it understandable to non-experts
7:
2520:according to current basis function
2139:according to current basis function
2687:
2665:
2628:
2605:
2528:
2280:
2258:
2218:
2195:
2147:
2006:
1997:
1916:k: number of eigenvectors in basis
1123:Suppose the minimal polynomial is
733:
614:
462:
14:
219:{\displaystyle S={1,2,\ldots ,s}}
173:{\displaystyle {s,a,p,\gamma ,r}}
365:{\displaystyle v=r+\gamma Pv.\,}
327:Bellman equation is defined as:
297:{\displaystyle \gamma \in [0,1)}
20:
2833:Keller, Philipp; Mannor, Shie;
1692:{\displaystyle z_{i}:=Pz_{i-1}}
375:When the number of elements in
2715:
2683:
2478:
2466:
2295:
2276:
2097:
2085:
1976:
1961:
1862:
1742:
1730:
1594:
1582:
1355:
1339:
1139:
1133:
1107:
1092:
1010:{\displaystyle i\in [0,infty)}
1004:
980:
709:
701:
643:
635:
544:
291:
279:
1:
2169:compute new bellman error by
2721:{\displaystyle \Phi _{i+1}=}
2323:Bellman average reward bases
2301:{\displaystyle \Phi _{i+1}=}
1113:{\displaystyle (I-\gamma P)}
960:{\displaystyle y_{i}=P^{i}r}
2513:{\displaystyle \theta _{i}}
2132:{\displaystyle \theta _{i}}
2062:{\displaystyle \phi _{1}=r}
870:related to nodes' degrees.
760:in the value function space
656:{\displaystyle |S|\times n}
304:, and the transition model
2972:
2493:compute the weight vector
2112:compute the weight vector
1919:l: total number of vectors
1748:{\displaystyle j:=1:(i-1)}
1600:{\displaystyle i:=1:(l+k)}
1552:{\displaystyle z_{k+1}:=r}
226:, the finite action space
2540:{\displaystyle \Phi _{i}}
2159:{\displaystyle \Phi _{i}}
1639:{\displaystyle i>k+1}
716:{\displaystyle n\ll |s|}
2784:{\displaystyle \gamma }
2763:{\displaystyle \gamma }
2743:Discussion and analysis
2403:{\displaystyle \gamma }
2353:{\displaystyle P-P^{*}}
1453:Augmented Krylov Method
680:{\displaystyle \theta }
667:for corresponding row,
117:Markov Decision Process
2785:
2764:
2722:
2651:
2541:
2514:
2485:
2444:
2443:{\displaystyle P^{*}r}
2404:
2381:
2354:
2302:
2241:
2160:
2133:
2104:
2063:
2016:
1885:
1838:
1749:
1693:
1640:
1601:
1553:
1508:
1440:
1396:
1322:
1248:
1219:
1188:
1114:
1079:
1011:
961:
887:
849:
740:
717:
681:
657:
621:
598:
573:
516:
449:
429:
409:
389:
366:
318:
298:
260:
246:, the reward function
240:
220:
174:
115:(RL), most real-world
113:reinforcement learning
2786:
2765:
2723:
2652:
2542:
2515:
2486:
2484:{\displaystyle i\in }
2445:
2405:
2382:
2380:{\displaystyle P^{*}}
2355:
2303:
2242:
2161:
2134:
2105:
2103:{\displaystyle i\in }
2064:
2017:
1886:
1839:
1750:
1694:
1641:
1602:
1554:
1509:
1441:
1370:
1296:
1249:
1220:
1162:
1115:
1080:
1012:
962:
888:
850:
741:
739:{\displaystyle \Phi }
718:
682:
658:
622:
620:{\displaystyle \Phi }
599:
553:
517:
450:
430:
410:
390:
367:
319:
299:
261:
241:
221:
175:
2775:
2754:
2661:
2554:
2524:
2497:
2457:
2424:
2394:
2364:
2331:
2254:
2173:
2143:
2116:
2076:
2040:
1934:
1859:
1763:
1715:
1654:
1618:
1567:
1524:
1459:
1261:
1247:{\displaystyle BA=I}
1229:
1127:
1089:
1024:
971:
928:
877:
787:
730:
691:
671:
631:
611:
529:
459:
439:
419:
399:
379:
334:
308:
270:
250:
230:
184:
138:
2951:Dynamic programming
2805:Dynamic programming
2550:compute new basis:
1926:Bellman error basis
522:, so that we have:
85:mathematical method
2956:Stochastic control
2781:
2760:
2718:
2647:
2537:
2510:
2481:
2440:
2400:
2377:
2350:
2298:
2237:
2156:
2129:
2100:
2059:
2012:
1881:
1876:∥ ≈
1834:
1745:
1689:
1636:
1597:
1549:
1504:
1436:
1244:
1215:
1110:
1075:
1007:
957:
903:diffusion wavelets
883:
845:
736:
713:
677:
653:
617:
594:
512:
445:
425:
405:
385:
362:
314:
294:
266:, discount factor
256:
236:
216:
170:
130:Problem definition
2946:Optimal decisions
2420:stage stage i=1,
2036:stage stage i=1,
1979:
1964:
1294:
1160:
886:{\displaystyle W}
841:
818:
773:Proto-value basis
547:
448:{\displaystyle v}
428:{\displaystyle S}
408:{\displaystyle v}
388:{\displaystyle S}
317:{\displaystyle P}
259:{\displaystyle r}
239:{\displaystyle A}
69:
68:
61:
2963:
2918:
2911:
2905:
2902:
2896:
2893:
2887:
2884:
2871:
2868:Ilse C. F. Ipsen
2865:
2859:
2856:
2847:
2844:
2838:
2831:
2810:Bellman equation
2790:
2788:
2787:
2782:
2769:
2767:
2766:
2761:
2727:
2725:
2724:
2719:
2714:
2713:
2695:
2694:
2679:
2678:
2656:
2654:
2653:
2648:
2646:
2645:
2636:
2635:
2623:
2622:
2613:
2612:
2594:
2593:
2575:
2574:
2546:
2544:
2543:
2538:
2536:
2535:
2519:
2517:
2516:
2511:
2509:
2508:
2490:
2488:
2487:
2482:
2449:
2447:
2446:
2441:
2436:
2435:
2409:
2407:
2406:
2401:
2386:
2384:
2383:
2378:
2376:
2375:
2359:
2357:
2356:
2351:
2349:
2348:
2307:
2305:
2304:
2299:
2288:
2287:
2272:
2271:
2246:
2244:
2243:
2238:
2236:
2235:
2226:
2225:
2213:
2212:
2203:
2202:
2165:
2163:
2162:
2157:
2155:
2154:
2138:
2136:
2135:
2130:
2128:
2127:
2109:
2107:
2106:
2101:
2068:
2066:
2065:
2060:
2052:
2051:
2021:
2019:
2018:
2013:
1981:
1980:
1972:
1966:
1965:
1957:
1890:
1888:
1887:
1882:
1874:
1873:
1843:
1841:
1840:
1835:
1830:
1829:
1817:
1816:
1804:
1803:
1788:
1787:
1775:
1774:
1754:
1752:
1751:
1746:
1698:
1696:
1695:
1690:
1688:
1687:
1666:
1665:
1645:
1643:
1642:
1637:
1606:
1604:
1603:
1598:
1558:
1556:
1555:
1550:
1542:
1541:
1513:
1511:
1510:
1505:
1503:
1502:
1484:
1483:
1471:
1470:
1445:
1443:
1442:
1437:
1432:
1431:
1422:
1421:
1412:
1411:
1395:
1384:
1363:
1362:
1338:
1337:
1321:
1310:
1295:
1293:
1292:
1280:
1253:
1251:
1250:
1245:
1224:
1222:
1221:
1216:
1214:
1213:
1204:
1203:
1187:
1176:
1161:
1159:
1158:
1146:
1119:
1117:
1116:
1111:
1084:
1082:
1081:
1076:
1074:
1073:
1049:
1048:
1036:
1035:
1016:
1014:
1013:
1008:
966:
964:
963:
958:
953:
952:
940:
939:
897:Laplacian of W.
892:
890:
889:
884:
864:undirected graph
860:adjacency matrix
854:
852:
851:
846:
844:
843:
842:
834:
821:
820:
819:
811:
758:orthogonal basis
745:
743:
742:
737:
722:
720:
719:
714:
712:
704:
686:
684:
683:
678:
662:
660:
659:
654:
646:
638:
626:
624:
623:
618:
603:
601:
600:
595:
593:
592:
583:
582:
572:
567:
549:
548:
540:
521:
519:
518:
513:
511:
510:
509:
491:
490:
478:
477:
454:
452:
451:
446:
434:
432:
431:
426:
414:
412:
411:
406:
394:
392:
391:
386:
371:
369:
368:
363:
323:
321:
320:
315:
303:
301:
300:
295:
265:
263:
262:
257:
245:
243:
242:
237:
225:
223:
222:
217:
215:
179:
177:
176:
171:
169:
73:machine learning
64:
57:
53:
50:
44:
24:
23:
16:
2971:
2970:
2966:
2965:
2964:
2962:
2961:
2960:
2936:
2935:
2927:
2922:
2921:
2912:
2908:
2903:
2899:
2894:
2890:
2885:
2874:
2866:
2862:
2857:
2850:
2845:
2841:
2832:
2828:
2823:
2815:Optimal control
2801:
2791:becomes large.
2773:
2772:
2752:
2751:
2745:
2699:
2686:
2664:
2659:
2658:
2637:
2627:
2614:
2604:
2585:
2560:
2552:
2551:
2527:
2522:
2521:
2500:
2495:
2494:
2455:
2454:
2427:
2422:
2421:
2410:is close to 1.
2392:
2391:
2367:
2362:
2361:
2340:
2329:
2328:
2325:
2279:
2257:
2252:
2251:
2227:
2217:
2204:
2194:
2171:
2170:
2146:
2141:
2140:
2119:
2114:
2113:
2074:
2073:
2043:
2038:
2037:
1932:
1931:
1928:
1865:
1857:
1856:
1821:
1808:
1795:
1779:
1766:
1761:
1760:
1713:
1712:
1673:
1657:
1652:
1651:
1616:
1615:
1565:
1564:
1527:
1522:
1521:
1494:
1475:
1462:
1457:
1456:
1423:
1413:
1397:
1354:
1323:
1284:
1259:
1258:
1227:
1226:
1205:
1189:
1150:
1125:
1124:
1087:
1086:
1059:
1040:
1027:
1022:
1021:
969:
968:
944:
931:
926:
925:
921:are available.
911:
875:
874:
868:diagonal matrix
825:
802:
785:
784:
775:
770:
768:Popular methods
728:
727:
689:
688:
669:
668:
629:
628:
609:
608:
584:
574:
527:
526:
501:
482:
469:
457:
456:
437:
436:
417:
416:
397:
396:
377:
376:
332:
331:
306:
305:
268:
267:
248:
247:
228:
227:
182:
181:
136:
135:
132:
127:
123:Linear function
109:
89:basis functions
81:basis discovery
65:
54:
48:
45:
37:help improve it
34:
25:
21:
12:
11:
5:
2969:
2967:
2959:
2958:
2953:
2948:
2938:
2937:
2934:
2933:
2926:
2925:External links
2923:
2920:
2919:
2906:
2897:
2888:
2872:
2860:
2848:
2839:
2825:
2824:
2822:
2819:
2818:
2817:
2812:
2807:
2800:
2797:
2780:
2759:
2744:
2741:
2740:
2739:
2738:
2737:
2734:
2730:
2729:
2717:
2712:
2709:
2706:
2702:
2698:
2693:
2689:
2685:
2682:
2677:
2674:
2671:
2667:
2644:
2640:
2634:
2630:
2626:
2621:
2617:
2611:
2607:
2603:
2600:
2597:
2592:
2588:
2584:
2581:
2578:
2573:
2570:
2567:
2563:
2559:
2548:
2534:
2530:
2507:
2503:
2480:
2477:
2474:
2471:
2468:
2465:
2462:
2451:
2439:
2434:
2430:
2418:
2399:
2374:
2370:
2347:
2343:
2339:
2336:
2324:
2321:
2320:
2319:
2318:
2317:
2314:
2310:
2309:
2297:
2294:
2291:
2286:
2282:
2278:
2275:
2270:
2267:
2264:
2260:
2248:
2234:
2230:
2224:
2220:
2216:
2211:
2207:
2201:
2197:
2193:
2190:
2187:
2184:
2181:
2178:
2167:
2153:
2149:
2126:
2122:
2099:
2096:
2093:
2090:
2087:
2084:
2081:
2070:
2058:
2055:
2050:
2046:
2034:
2011:
2008:
2005:
2002:
1999:
1996:
1993:
1990:
1987:
1984:
1978:
1975:
1969:
1963:
1960:
1954:
1951:
1948:
1945:
1942:
1939:
1927:
1924:
1923:
1922:
1921:
1920:
1917:
1909:
1908:
1907:
1902:
1901:
1900:
1880:
1877:
1872:
1868:
1864:
1851:
1846:
1845:
1844:
1833:
1828:
1824:
1820:
1815:
1811:
1807:
1802:
1798:
1794:
1791:
1786:
1782:
1778:
1773:
1769:
1744:
1741:
1738:
1735:
1732:
1729:
1726:
1723:
1720:
1707:
1702:
1701:
1700:
1686:
1683:
1680:
1676:
1672:
1669:
1664:
1660:
1635:
1632:
1629:
1626:
1623:
1596:
1593:
1590:
1587:
1584:
1581:
1578:
1575:
1572:
1559:
1548:
1545:
1540:
1537:
1534:
1530:
1519:
1501:
1497:
1493:
1490:
1487:
1482:
1478:
1474:
1469:
1465:
1454:
1447:
1446:
1435:
1430:
1426:
1420:
1416:
1410:
1407:
1404:
1400:
1394:
1391:
1388:
1383:
1380:
1377:
1373:
1369:
1366:
1361:
1357:
1353:
1350:
1347:
1344:
1341:
1336:
1333:
1330:
1326:
1320:
1317:
1314:
1309:
1306:
1303:
1299:
1291:
1287:
1283:
1278:
1275:
1272:
1269:
1266:
1243:
1240:
1237:
1234:
1225:, and we have
1212:
1208:
1202:
1199:
1196:
1192:
1186:
1183:
1180:
1175:
1172:
1169:
1165:
1157:
1153:
1149:
1144:
1141:
1138:
1135:
1132:
1109:
1106:
1103:
1100:
1097:
1094:
1072:
1069:
1066:
1062:
1058:
1055:
1052:
1047:
1043:
1039:
1034:
1030:
1006:
1003:
1000:
997:
994:
991:
988:
985:
982:
979:
976:
956:
951:
947:
943:
938:
934:
910:
907:
882:
866:(N,E). D is a
856:
855:
840:
837:
832:
828:
824:
817:
814:
809:
805:
801:
798:
795:
792:
774:
771:
769:
766:
765:
764:
761:
754:
735:
711:
707:
703:
699:
696:
676:
665:feature vector
652:
649:
645:
641:
637:
616:
605:
604:
591:
587:
581:
577:
571:
566:
563:
560:
556:
552:
546:
543:
537:
534:
508:
504:
500:
497:
494:
489:
485:
481:
476:
472:
467:
464:
444:
424:
404:
384:
373:
372:
360:
357:
354:
351:
348:
345:
342:
339:
313:
293:
290:
287:
284:
281:
278:
275:
255:
235:
214:
211:
208:
205:
202:
199:
196:
192:
189:
168:
165:
162:
159:
156:
153:
150:
147:
144:
131:
128:
121:
108:
105:
97:value function
67:
66:
28:
26:
19:
13:
10:
9:
6:
4:
3:
2:
2968:
2957:
2954:
2952:
2949:
2947:
2944:
2943:
2941:
2932:UMASS ALL lab
2931:
2929:
2928:
2924:
2916:
2915:Markov chains
2910:
2907:
2901:
2898:
2892:
2889:
2883:
2881:
2879:
2877:
2873:
2869:
2864:
2861:
2855:
2853:
2849:
2843:
2840:
2836:
2835:Precup, Doina
2830:
2827:
2820:
2816:
2813:
2811:
2808:
2806:
2803:
2802:
2798:
2796:
2792:
2778:
2757:
2748:
2742:
2735:
2732:
2731:
2710:
2707:
2704:
2700:
2696:
2691:
2680:
2675:
2672:
2669:
2642:
2638:
2632:
2624:
2619:
2615:
2609:
2601:
2598:
2595:
2590:
2586:
2582:
2579:
2576:
2571:
2568:
2565:
2561:
2557:
2549:
2532:
2505:
2501:
2492:
2491:
2475:
2472:
2469:
2463:
2460:
2452:
2437:
2432:
2428:
2419:
2416:
2413:
2412:
2411:
2397:
2388:
2372:
2368:
2345:
2341:
2337:
2334:
2322:
2315:
2312:
2311:
2292:
2289:
2284:
2273:
2268:
2265:
2262:
2249:
2232:
2228:
2222:
2214:
2209:
2205:
2199:
2191:
2188:
2185:
2182:
2179:
2176:
2168:
2151:
2124:
2120:
2111:
2110:
2094:
2091:
2088:
2082:
2079:
2071:
2056:
2053:
2048:
2044:
2035:
2032:
2029:
2028:
2027:
2023:
2009:
2003:
2000:
1994:
1991:
1988:
1985:
1982:
1973:
1967:
1958:
1952:
1949:
1946:
1943:
1940:
1937:
1925:
1918:
1915:
1914:
1913:
1910:
1906:
1903:
1898:
1895:
1894:
1893:
1878:
1875:
1870:
1866:
1855:
1852:
1850:
1847:
1831:
1826:
1822:
1818:
1813:
1809:
1805:
1800:
1796:
1792:
1789:
1784:
1780:
1776:
1771:
1767:
1759:
1758:
1757:
1739:
1736:
1733:
1727:
1724:
1721:
1718:
1711:
1708:
1706:
1703:
1684:
1681:
1678:
1674:
1670:
1667:
1662:
1658:
1650:
1649:
1648:
1633:
1630:
1627:
1624:
1621:
1614:
1611:
1610:
1609:
1591:
1588:
1585:
1579:
1576:
1573:
1570:
1563:
1560:
1546:
1543:
1538:
1535:
1532:
1528:
1520:
1517:
1514:are top real
1499:
1495:
1491:
1488:
1485:
1480:
1476:
1472:
1467:
1463:
1455:
1452:
1449:
1448:
1433:
1428:
1424:
1418:
1414:
1408:
1405:
1402:
1398:
1392:
1389:
1386:
1381:
1378:
1375:
1371:
1367:
1364:
1359:
1351:
1348:
1345:
1342:
1334:
1331:
1328:
1324:
1318:
1315:
1312:
1307:
1304:
1301:
1297:
1289:
1285:
1281:
1276:
1273:
1270:
1267:
1264:
1257:
1256:
1255:
1241:
1238:
1235:
1232:
1210:
1206:
1200:
1197:
1194:
1190:
1184:
1181:
1178:
1173:
1170:
1167:
1163:
1155:
1151:
1147:
1142:
1136:
1130:
1121:
1104:
1101:
1098:
1095:
1070:
1067:
1064:
1060:
1056:
1053:
1050:
1045:
1041:
1037:
1032:
1028:
1018:
1001:
998:
995:
992:
989:
986:
983:
977:
974:
954:
949:
945:
941:
936:
932:
922:
920:
916:
908:
906:
904:
898:
896:
880:
871:
869:
865:
861:
858:Here W is an
838:
835:
830:
826:
822:
815:
812:
807:
803:
799:
796:
793:
790:
783:
782:
781:
778:
772:
767:
762:
759:
755:
752:
751:
750:
747:
724:
705:
697:
694:
674:
666:
650:
647:
639:
589:
585:
579:
575:
569:
564:
561:
558:
554:
550:
541:
535:
532:
525:
524:
523:
506:
502:
498:
495:
492:
487:
483:
479:
474:
470:
465:
442:
422:
402:
382:
358:
355:
352:
349:
346:
343:
340:
337:
330:
329:
328:
325:
311:
288:
285:
282:
276:
273:
253:
233:
212:
209:
206:
203:
200:
197:
194:
190:
187:
166:
163:
160:
157:
154:
151:
148:
145:
142:
129:
124:
120:
118:
114:
106:
104:
102:
98:
94:
91:that map the
90:
86:
82:
78:
74:
63:
60:
52:
42:
38:
32:
29:This article
27:
18:
17:
2909:
2900:
2891:
2863:
2842:
2829:
2793:
2749:
2746:
2414:
2389:
2326:
2030:
2024:
1929:
1911:
1904:
1896:
1891:
1853:
1848:
1755:
1709:
1704:
1646:
1612:
1607:
1561:
1516:eigenvectors
1450:
1122:
1019:
923:
918:
914:
912:
909:Krylov basis
899:
872:
857:
779:
776:
748:
725:
606:
374:
326:
133:
110:
80:
76:
70:
55:
46:
30:
917:and reward
895:random walk
93:state space
2940:Categories
2821:References
905:are used.
395:is small,
107:Motivation
49:March 2019
2779:γ
2758:γ
2701:ϕ
2688:Φ
2666:Φ
2639:θ
2629:Φ
2625:−
2616:θ
2606:Φ
2591:∗
2583:−
2562:ϕ
2529:Φ
2502:θ
2464:∈
2433:∗
2415:Algorithm
2398:γ
2373:∗
2346:∗
2338:−
2293:ε
2281:Φ
2259:Φ
2229:θ
2219:Φ
2215:−
2206:θ
2196:Φ
2189:γ
2177:ε
2148:Φ
2121:θ
2083:∈
2045:ϕ
2031:Algorithm
2010:θ
2007:Φ
2004:−
2001:θ
1998:Φ
1992:γ
1977:^
1968:−
1962:^
1950:γ
1938:ε
1863:∥
1790:−
1737:−
1682:−
1489:…
1451:Algorithm
1415:β
1399:α
1390:−
1372:∑
1349:γ
1346:−
1325:α
1316:−
1298:∑
1286:α
1191:α
1182:−
1164:∑
1152:α
1102:γ
1099:−
1068:−
1054:…
978:∈
831:−
808:−
800:−
734:Φ
698:≪
675:θ
648:×
615:Φ
586:ϕ
576:θ
555:∑
545:^
536:≈
503:ϕ
496:…
484:ϕ
471:ϕ
463:Φ
350:γ
277:∈
274:γ
207:…
161:γ
83:) is the
2799:See also
967:for all
2360:. Here
1912:end for
1849:end for
35:Please
2453:stage
2072:stage
1905:end if
1705:end if
101:domain
2417:BARBs
1897:break
756:Form
627:is a
607:Here
2033:BEBF
1892:then
1819:>
1793:<
1647:then
1625:>
1518:of P
79:(or
1710:for
1562:for
111:In
71:In
39:to
2942::
2875:^
2851:^
2022:.
1854:if
1777::=
1756:do
1722::=
1668::=
1613:if
1608:do
1574::=
1544::=
1120:.
1017:.
723:.
324:.
75:,
2728:;
2716:]
2711:1
2708:+
2705:i
2697::
2692:i
2684:[
2681:=
2676:1
2673:+
2670:i
2643:i
2633:i
2620:i
2610:i
2602:P
2599:+
2596:r
2587:P
2580:r
2577:=
2572:1
2569:+
2566:i
2558::
2547:;
2533:i
2506:i
2479:]
2476:N
2473:,
2470:2
2467:[
2461:i
2450:;
2438:r
2429:P
2369:P
2342:P
2335:P
2308:;
2296:]
2290::
2285:i
2277:[
2274:=
2269:1
2266:+
2263:i
2247:;
2233:i
2223:i
2210:i
2200:i
2192:P
2186:+
2183:r
2180:=
2166:;
2152:i
2125:i
2098:]
2095:N
2092:,
2089:2
2086:[
2080:i
2069:;
2057:r
2054:=
2049:1
1995:P
1989:+
1986:r
1983:=
1974:v
1959:v
1953:P
1947:+
1944:r
1941:=
1899:;
1879:0
1871:i
1867:z
1832:;
1827:j
1823:z
1814:i
1810:z
1806:,
1801:j
1797:z
1785:i
1781:z
1772:i
1768:z
1743:)
1740:1
1734:i
1731:(
1728::
1725:1
1719:j
1699:;
1685:1
1679:i
1675:z
1671:P
1663:i
1659:z
1634:1
1631:+
1628:k
1622:i
1595:)
1592:k
1589:+
1586:l
1583:(
1580::
1577:1
1571:i
1547:r
1539:1
1536:+
1533:k
1529:z
1500:k
1496:z
1492:,
1486:,
1481:2
1477:z
1473:,
1468:1
1464:z
1434:.
1429:i
1425:y
1419:i
1409:1
1406:+
1403:i
1393:1
1387:m
1382:0
1379:=
1376:i
1368:=
1365:r
1360:i
1356:)
1352:P
1343:I
1340:(
1335:1
1332:+
1329:i
1319:1
1313:m
1308:0
1305:=
1302:i
1290:0
1282:1
1277:=
1274:r
1271:B
1268:=
1265:v
1242:I
1239:=
1236:A
1233:B
1211:i
1207:A
1201:1
1198:+
1195:i
1185:1
1179:m
1174:0
1171:=
1168:i
1156:0
1148:1
1143:=
1140:)
1137:A
1134:(
1131:p
1108:)
1105:P
1096:I
1093:(
1071:1
1065:m
1061:y
1057:,
1051:,
1046:1
1042:y
1038:,
1033:0
1029:y
1005:)
1002:y
999:t
996:f
993:n
990:i
987:,
984:0
981:[
975:i
955:r
950:i
946:P
942:=
937:i
933:y
919:r
915:P
881:W
839:2
836:1
827:D
823:W
816:2
813:1
804:D
797:I
794:=
791:L
710:|
706:s
702:|
695:n
651:n
644:|
640:S
636:|
590:n
580:n
570:n
565:1
562:=
559:i
551:=
542:v
533:v
507:n
499:,
493:,
488:2
480:,
475:1
466:=
443:v
423:S
403:v
383:S
359:.
356:v
353:P
347:+
344:r
341:=
338:v
312:P
292:)
289:1
286:,
283:0
280:[
254:r
234:A
213:s
210:,
204:,
201:2
198:,
195:1
191:=
188:S
167:r
164:,
158:,
155:p
152:,
149:a
146:,
143:s
62:)
56:(
51:)
47:(
33:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.