Talk:Gradient descent

212:, when the variable of optimization is not a vector, but a function. Then it is really a pain to find the optimal step. :) And you don't need to use an arbitrarily small step size as you say, you can make it adaptive, so you can try to jump a bit more at each step than at the previous one, and scale back if you jump too much. Of course, in this way you are not guaranteed to find the true minimum, but those infinite-dimensional problems often times have an infinite number of local minima to start with, so it is not as if you lost something. 84: 74: 53: 22: 539:

I added an animation to try to aid in the understanding of this example. I know that there is great potential for something like this to work for explanation, but I don't have the proper software to edit the animation sufficiently. If anyone would like to clean it up, I'd appreciate it. I could give

749:

Hi all, (I am a CS prof) Personally, I do not think that it is clear what it means that the direction of steepest ascent to ne the gradient. Even if one understands it, I dont think it is clear that it is true. I know that my undergrad and graduate students even in machine learning dont really get

175:

I guess you are saying that given a direction, one should find the optimal step in that direction. That is I guess also gradient descent, but from what I know, in practice it is very hard to do that. So one just takes some reasonable step in one direction, and then from there takes a new direction,

487:

Now this is fixed, I believe. We take the transpose of the Jacobian of G with respect to X, and multiply it by the G vector. Which is equivalent to taking the gradient of our F function. Though I believe there is a 0.5 multiple which is not accounted for. This is a scalar, of course, and since the

337:

I'm not sure why we have previous_step_size = 1/precision? Given that precision is set to be a small double (0.00001) we need only set previous_step_size to be greater than 0.0001 - why not just make it 1.0? For extremely high precision there is a problem of arithmetic overflow that can occur when

384:

This "gradient flow" interpretation is neat indeed, but harder to grasp than the "discrete" formulation used in the article, and it is the discrete version of gradient descent which is most used in applications. Some materials on this proposed continuous formulation would be OK, as long as it is

231:

I updated the page with a new subsection on different ways to choose the trial direction and step size. Since this is an important problem in modern machine learning, I thought it would be worth it having its own section. This was my first edit on Knowledge, so would definitely appreciate any

426:

The description of using gradient descent as a way to solve a non-linear equation is impossible to follow because (as the previous comment suggested), basically none of the symbols are defined before they are used. What are "f" and "z" and everything else in that argument?

357:

I've usually seen gradient descent formulated more elegantly as a partial differential equation x'(t)=-\nabla f(x(t)). Once discretized, it takes the form that is given in the article. I think it would be nice to add this, does anyone object?

310:

Looks okay to me, for the very basic algorithm. One problem is that you're unlikely to land on a point where the gradient is exactly zero. Terminate the loop when the gradient is "small enough" (choosing what this means is the hard part). --

503:

I believe it also made an error in arithmetic. The first entry in the G(X0) matrix is computed as -1.5. I believe it is -2.5, as the cosine of 0 is 1. It took me an hour and a half to figure out that was wrong. ;-) I'll attempt to fix

194:

along the vector thus defined. This is generally a better idea than making arbitrary small steps as you only need to compute the gradient at the minimum in each direction (of course it still suffers from zigzaging).

453:

The "talker" right above me is right. The author(s) skip(s) around a bit changing style and notation. It's a nice attempt but has gone awry. Try Google for something better. <! W. Watson !: -->

140: 405:

As far as I can tell, the symbols z, z_0, and t are used without being introduced or defined earlier in the page. This makes the description very difficult to follow.

611: 607: 593: 675:

If there is anybody who could fix that in the original graphing packages, that would improve the situation a lot. This way a lot of graphs are just unusable.

165:

I was under the impression that gradient descent did a line search in the trial direction (direction of steepest gradient). Can anyone else confirm this?

817: 519:

I think it's better now. Many of the calculations needed to be redone in light of this, and the optimization is less exciting, but I think it's correct.

130: 252:

Doesn't sound logical to me: "algorythm that approaches a local minimumof a function" mean that the algorythm itself is approaching something. What?

106: 812: 434: 412: 682: 579: 267:

You are right. That poor wording has been there from the very beginning. I rewrote the article at some point but failed to notice that.

589:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

209: 97: 58: 751: 373: 654: 33: 488:

gamma "step" multiplies the same term, I suppose it doesn't matter. But it should probably be added for completeness.

774: 390: 343: 277: 232:

feedback, or let me know if I did something wrong. P.S. I am a PhD student who has been researching this topic. (

217: 196: 181: 610:

to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the

237: 438: 416: 686: 645: 571: 233: 759: 339: 316: 735: 629:

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with

617: 477: 369: 39: 755: 570:. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit 83: 789:

We should remove the redirect from steepest descent since that is a different optimization algorithm.

292:

Excusme, maybe someone could tell me, if I got it right, I tried to write it in procedural code here:

770: 678: 430: 408: 386: 361: 273: 213: 177: 731: 580:

https://web.archive.org/web/20140508200053/http://brahms.cpmc.columbia.edu/publications/momentum.pdf

21: 473: 365: 541: 520: 505: 489: 105:

on Knowledge. If you would like to participate, please visit the project page, where you can join

89: 669:

The new graph uses colors that are very hard to distinguish with red-green-color vision issues:

614:

before doing mass systematic removals. This message is updated dynamically through the template

73: 52: 630: 545: 524: 509: 493: 312: 190:

Usually one finds a trial direction, then uses a one dimensional minimisation algorithm like

727:

If there is someone who could correct this with access to graphing software, it would help.

583: 563: 200: 166: 637: 794: 191: 596:, "External links modified" talk page sections are no longer generated or monitored by 636:

If you found an error with any archives or the URLs themselves, you can fix them with

806: 462: 721: 717: 326: 603: 301: 258: 102: 790: 602:. No special action is required regarding these talk page notices, other than 472:

It also uses the jacobian matrix of the scalar function F, which is nonsense.

79: 325:

In a many applications, the minimum derivative is a user-input variable. --

255:

I am interested in what I can do (or what I can find) with this algorythm.

798: 778: 763: 739: 690: 671:

https://en.wikipedia.org/Gradient_descent#/media/File:Steepest_descent.png

659: 549: 528: 513: 497: 481: 466: 442: 420: 394: 377: 347: 329: 320: 304: 293: 281: 261: 241: 221: 203: 185: 169: 458: 705:

Examples showing the Zig-Zagging nature of the method show gradient

670: 176:

as this article illustrates. Did I understand your question right?

722:

https://commons.wikimedia.org/File:Gradient_ascent_(surface).png

718:

https://commons.wikimedia.org/File:Gradient_ascent_(contour).png

15: 574:

for additional information. I made the following changes:

272:

This algorithm is used to find local minima of functions.

584:

http://brahms.cpmc.columbia.edu/publications/momentum.pdf

567: 401:

Non-linear systems example introduces undefined symbols

199:

extends this to ensure that zigzaging doesn't happen.

449:

Please correct this article or kill it. June 13, 2010

101:, a collaborative effort to improve the coverage of 769:Page doesn't seem to exist/isn't publicly visible. 606:using the archive tool instructions below. Editors 535:Introduction of animated example. January 16, 2011 592:This message was posted before February 2018. 8: 752:Draft:The_Gradient_is_the_Steepest_Direction 754:What are your thoughts in it? Thanks Jeff 676: 562:I have just modified one external link on 47: 540:the Octave code to anyone who wants it. 208:I myself used gradient descent only for 294:http://howto.wikia.com/Gradient_descent 248:Someone needs to correct the definition 49: 19: 750:it. Today I wrote the following page 745:The_Gradient_is_the_Steepest_Direction 7: 704:The images used in Description : --> 95:This article is within the scope of 38:It is of interest to the following 14: 818:Mid-priority mathematics articles 566:. Please take a moment to review 210:infinite-dimensional optimization 115:Knowledge:WikiProject Mathematics 118:Template:WikiProject Mathematics 82: 72: 51: 20: 709:being used instead of gradient 135:This article has been rated as 1: 799:14:55, 27 December 2023 (UTC) 665:Barely distinguishable colors 421:11:12, 25 February 2010 (UTC) 330:05:04, 29 December 2006 (UTC) 222:15:54, 25 November 2005 (UTC) 204:08:37, 25 November 2005 (UTC) 186:05:59, 25 November 2005 (UTC) 170:04:45, 25 November 2005 (UTC) 109:and see a list of open tasks. 813:C-Class mathematics articles 785:Remove steepest descent link 779:05:41, 28 January 2024 (UTC) 550:19:22, 16 January 2011 (UTC) 529:05:32, 10 January 2011 (UTC) 514:03:16, 10 January 2011 (UTC) 498:05:32, 10 January 2011 (UTC) 482:09:50, 12 October 2010 (UTC) 321:01:10, 8 November 2006 (UTC) 305:23:33, 7 November 2006 (UTC) 282:04:40, 3 November 2006 (UTC) 262:21:46, 2 November 2006 (UTC) 242:00:50, 10 October 2020 (UTC) 288:Understanding the algorythm 834: 660:21:43, 23 March 2017 (UTC) 623:(last update: 5 June 2024) 559:Hello fellow Wikipedians, 395:02:27, 7 August 2008 (UTC) 385:further down the article. 378:22:09, 6 August 2008 (UTC) 691:17:35, 2 April 2019 (UTC) 467:01:03, 15 June 2010 (UTC) 457:Okay, I hope that helps. 353:More elegant formulation? 197:Conjugate gradient method 161:Search in trial direction 134: 67: 46: 764:15:30, 18 May 2020 (UTC) 443:18:21, 13 May 2010 (UTC) 348:23:34, 14 May 2018 (UTC) 338:computing 1/precision. 141:project's priority scale 740:20:18, 8 May 2019 (UTC) 555:External links modified 98:WikiProject Mathematics 724:are labelled as such. 28:This article is rated 604:regular verification 121:mathematics articles 594:After February 2018 648:InternetArchiveBot 599:InternetArchiveBot 90:Mathematics portal 34:content assessment 693: 681:comment added by 624: 433:comment added by 411:comment added by 380: 364:comment added by 155: 154: 151: 150: 147: 146: 825: 716:Images used are 697:Wrong Image Used 658: 649: 622: 621: 600: 564:Gradient descent 445: 423: 359: 234:Jeremy Bernstein 123: 122: 119: 116: 113: 92: 87: 86: 76: 69: 68: 63: 55: 48: 31: 25: 24: 16: 833: 832: 828: 827: 826: 824: 823: 822: 803: 802: 787: 771:Youarelovedfool 747: 699: 667: 652: 647: 615: 608:have permission 598: 572:this simple FaQ 557: 537: 451: 428: 406: 403: 387:Oleg Alexandrov 355: 340:BlackMetalStats 290: 274:Oleg Alexandrov 250: 214:Oleg Alexandrov 178:Oleg Alexandrov 163: 120: 117: 114: 111: 110: 88: 81: 61: 32:on Knowledge's 29: 12: 11: 5: 831: 829: 821: 820: 815: 805: 804: 786: 783: 782: 781: 746: 743: 698: 695: 666: 663: 642: 641: 634: 587: 586: 578:Added archive 556: 553: 536: 533: 532: 531: 501: 500: 470: 469: 450: 447: 435:69.196.185.126 413:216.244.31.162 402: 399: 398: 397: 354: 351: 335: 334: 333: 332: 289: 286: 285: 284: 269: 268: 249: 246: 229: 228: 227: 226: 225: 224: 192:brent's method 162: 159: 157: 153: 152: 149: 148: 145: 144: 133: 127: 126: 124: 107:the discussion 94: 93: 77: 65: 64: 56: 44: 43: 37: 26: 13: 10: 9: 6: 4: 3: 2: 830: 819: 816: 814: 811: 810: 808: 801: 800: 796: 792: 784: 780: 776: 772: 768: 767: 766: 765: 761: 757: 753: 744: 742: 741: 737: 733: 728: 725: 723: 719: 714: 712: 708: 702: 696: 694: 692: 688: 684: 683:88.219.19.174 680: 673: 672: 664: 662: 661: 656: 651: 650: 639: 635: 632: 628: 627: 626: 619: 613: 609: 605: 601: 595: 590: 585: 581: 577: 576: 575: 573: 569: 565: 560: 554: 552: 551: 547: 543: 534: 530: 526: 522: 518: 517: 516: 515: 511: 507: 499: 495: 491: 486: 485: 484: 483: 479: 475: 468: 464: 460: 456: 455: 454: 448: 446: 444: 440: 436: 432: 424: 422: 418: 414: 410: 400: 396: 392: 388: 383: 382: 381: 379: 375: 371: 367: 363: 352: 350: 349: 345: 341: 331: 328: 324: 323: 322: 318: 314: 309: 308: 307: 306: 303: 299: 296: 295: 287: 283: 279: 275: 271: 270: 266: 265: 264: 263: 260: 256: 253: 247: 245: 243: 239: 235: 223: 219: 215: 211: 207: 206: 205: 202: 198: 193: 189: 188: 187: 183: 179: 174: 173: 172: 171: 168: 160: 158: 142: 138: 132: 129: 128: 125: 108: 104: 100: 99: 91: 85: 80: 78: 75: 71: 70: 66: 60: 57: 54: 50: 45: 41: 35: 27: 23: 18: 17: 788: 756:JeffAEdmonds 748: 729: 726: 715: 710: 706: 703: 700: 677:— Preceding 674: 668: 646: 643: 618:source check 597: 591: 588: 561: 558: 538: 502: 471: 452: 425: 404: 356: 336: 313:Jitse Niesen 300: 297: 291: 257: 254: 251: 230: 164: 156: 137:Mid-priority 136: 96: 62:Mid‑priority 40:WikiProjects 429:—Preceding 407:—Preceding 360:—Preceding 298:Thank you. 112:Mathematics 103:mathematics 59:Mathematics 807:Categories 655:Report bug 732:Enchoc373 638:this tool 631:this tool 730:Thanks, 679:unsigned 644:Cheers.— 474:Lostella 431:unsigned 409:unsigned 374:contribs 366:Winblows 362:unsigned 711:descent 701:Hello, 568:my edit 542:Peryeat 521:Peryeat 506:Peryeat 490:Peryeat 327:Cowbert 139:on the 30:C-class 707:ascent 302:Inyuki 259:Inyuki 36:scale. 791:DMH43 795:talk 775:talk 760:talk 736:talk 720:and 687:talk 546:talk 525:talk 510:talk 494:talk 478:talk 463:talk 439:talk 417:talk 391:talk 370:talk 344:talk 317:talk 278:talk 238:talk 218:talk 182:talk 713:. 612:RfC 582:to 504:it. 244:). 201:njh 167:njh 131:Mid 809:: 797:) 777:) 762:) 738:) 689:) 625:. 620:}} 616:{{ 548:) 527:) 512:) 496:) 480:) 465:) 441:) 419:) 393:) 376:) 372:• 346:) 319:) 280:) 240:) 220:) 184:) 793:( 773:( 758:( 734:( 685:( 657:) 653:( 640:. 633:. 544:( 523:( 508:( 492:( 476:( 461:( 459:0 437:( 415:( 389:( 368:( 342:( 315:( 276:( 236:( 216:( 180:( 143:. 42::

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index