212:, when the variable of optimization is not a vector, but a function. Then it is really a pain to find the optimal step. :) And you don't need to use an arbitrarily small step size as you say, you can make it adaptive, so you can try to jump a bit more at each step than at the previous one, and scale back if you jump too much. Of course, in this way you are not guaranteed to find the true minimum, but those infinite-dimensional problems often times have an infinite number of local minima to start with, so it is not as if you lost something.
84:
74:
53:
22:
539:
I added an animation to try to aid in the understanding of this example. I know that there is great potential for something like this to work for explanation, but I don't have the proper software to edit the animation sufficiently. If anyone would like to clean it up, I'd appreciate it. I could give
749:
Hi all, (I am a CS prof) Personally, I do not think that it is clear what it means that the direction of steepest ascent to ne the gradient. Even if one understands it, I dont think it is clear that it is true. I know that my undergrad and graduate students even in machine learning dont really get
175:
I guess you are saying that given a direction, one should find the optimal step in that direction. That is I guess also gradient descent, but from what I know, in practice it is very hard to do that. So one just takes some reasonable step in one direction, and then from there takes a new direction,
487:
Now this is fixed, I believe. We take the transpose of the
Jacobian of G with respect to X, and multiply it by the G vector. Which is equivalent to taking the gradient of our F function. Though I believe there is a 0.5 multiple which is not accounted for. This is a scalar, of course, and since the
337:
I'm not sure why we have previous_step_size = 1/precision? Given that precision is set to be a small double (0.00001) we need only set previous_step_size to be greater than 0.0001 - why not just make it 1.0? For extremely high precision there is a problem of arithmetic overflow that can occur when
384:
This "gradient flow" interpretation is neat indeed, but harder to grasp than the "discrete" formulation used in the article, and it is the discrete version of gradient descent which is most used in applications. Some materials on this proposed continuous formulation would be OK, as long as it is
231:
I updated the page with a new subsection on different ways to choose the trial direction and step size. Since this is an important problem in modern machine learning, I thought it would be worth it having its own section. This was my first edit on
Knowledge, so would definitely appreciate any
426:
The description of using gradient descent as a way to solve a non-linear equation is impossible to follow because (as the previous comment suggested), basically none of the symbols are defined before they are used. What are "f" and "z" and everything else in that argument?
357:
I've usually seen gradient descent formulated more elegantly as a partial differential equation x'(t)=-\nabla f(x(t)). Once discretized, it takes the form that is given in the article. I think it would be nice to add this, does anyone object?
310:
Looks okay to me, for the very basic algorithm. One problem is that you're unlikely to land on a point where the gradient is exactly zero. Terminate the loop when the gradient is "small enough" (choosing what this means is the hard part). --
503:
I believe it also made an error in arithmetic. The first entry in the G(X0) matrix is computed as -1.5. I believe it is -2.5, as the cosine of 0 is 1. It took me an hour and a half to figure out that was wrong. ;-) I'll attempt to fix
194:
along the vector thus defined. This is generally a better idea than making arbitrary small steps as you only need to compute the gradient at the minimum in each direction (of course it still suffers from zigzaging).
453:
The "talker" right above me is right. The author(s) skip(s) around a bit changing style and notation. It's a nice attempt but has gone awry. Try Google for something better. <! W. Watson !: -->
140:
405:
As far as I can tell, the symbols z, z_0, and t are used without being introduced or defined earlier in the page. This makes the description very difficult to follow.
611:
607:
593:
675:
If there is anybody who could fix that in the original graphing packages, that would improve the situation a lot. This way a lot of graphs are just unusable.
165:
I was under the impression that gradient descent did a line search in the trial direction (direction of steepest gradient). Can anyone else confirm this?
817:
519:
I think it's better now. Many of the calculations needed to be redone in light of this, and the optimization is less exciting, but I think it's correct.
130:
252:
Doesn't sound logical to me: "algorythm that approaches a local minimumof a function" mean that the algorythm itself is approaching something. What?
106:
812:
434:
412:
682:
579:
267:
You are right. That poor wording has been there from the very beginning. I rewrote the article at some point but failed to notice that.
589:
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
209:
97:
58:
751:
373:
654:
33:
488:
gamma "step" multiplies the same term, I suppose it doesn't matter. But it should probably be added for completeness.
774:
390:
343:
277:
232:
feedback, or let me know if I did something wrong. P.S. I am a PhD student who has been researching this topic. (
217:
196:
181:
610:
to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the
237:
438:
416:
686:
645:
571:
233:
759:
339:
316:
735:
629:
If you have discovered URLs which were erroneously considered dead by the bot, you can report them with
617:
477:
369:
39:
755:
570:. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit
83:
789:
We should remove the redirect from steepest descent since that is a different optimization algorithm.
292:
Excusme, maybe someone could tell me, if I got it right, I tried to write it in procedural code here:
770:
678:
430:
408:
386:
361:
273:
213:
177:
731:
580:
https://web.archive.org/web/20140508200053/http://brahms.cpmc.columbia.edu/publications/momentum.pdf
21:
473:
365:
541:
520:
505:
489:
105:
on
Knowledge. If you would like to participate, please visit the project page, where you can join
89:
669:
The new graph uses colors that are very hard to distinguish with red-green-color vision issues:
614:
before doing mass systematic removals. This message is updated dynamically through the template
73:
52:
630:
545:
524:
509:
493:
312:
190:
Usually one finds a trial direction, then uses a one dimensional minimisation algorithm like
727:
If there is someone who could correct this with access to graphing software, it would help.
583:
563:
200:
166:
637:
794:
191:
596:, "External links modified" talk page sections are no longer generated or monitored by
636:
If you found an error with any archives or the URLs themselves, you can fix them with
806:
462:
721:
717:
326:
603:
301:
258:
102:
790:
602:. No special action is required regarding these talk page notices, other than
472:
It also uses the jacobian matrix of the scalar function F, which is nonsense.
79:
325:
In a many applications, the minimum derivative is a user-input variable. --
255:
I am interested in what I can do (or what I can find) with this algorythm.
798:
778:
763:
739:
690:
671:
https://en.wikipedia.org/Gradient_descent#/media/File:Steepest_descent.png
659:
549:
528:
513:
497:
481:
466:
442:
420:
394:
377:
347:
329:
320:
304:
293:
281:
261:
241:
221:
203:
185:
169:
458:
705:
Examples showing the Zig-Zagging nature of the method show gradient
670:
176:
as this article illustrates. Did I understand your question right?
722:
https://commons.wikimedia.org/File:Gradient_ascent_(surface).png
718:
https://commons.wikimedia.org/File:Gradient_ascent_(contour).png
15:
574:
for additional information. I made the following changes:
272:
This algorithm is used to find local minima of functions.
584:
http://brahms.cpmc.columbia.edu/publications/momentum.pdf
567:
401:
Non-linear systems example introduces undefined symbols
199:
extends this to ensure that zigzaging doesn't happen.
449:
Please correct this article or kill it. June 13, 2010
101:, a collaborative effort to improve the coverage of
769:Page doesn't seem to exist/isn't publicly visible.
606:using the archive tool instructions below. Editors
535:Introduction of animated example. January 16, 2011
592:This message was posted before February 2018.
8:
752:Draft:The_Gradient_is_the_Steepest_Direction
754:What are your thoughts in it? Thanks Jeff
676:
562:I have just modified one external link on
47:
540:the Octave code to anyone who wants it.
208:I myself used gradient descent only for
294:http://howto.wikia.com/Gradient_descent
248:Someone needs to correct the definition
49:
19:
750:it. Today I wrote the following page
745:The_Gradient_is_the_Steepest_Direction
7:
704:The images used in Description : -->
95:This article is within the scope of
38:It is of interest to the following
14:
818:Mid-priority mathematics articles
566:. Please take a moment to review
210:infinite-dimensional optimization
115:Knowledge:WikiProject Mathematics
118:Template:WikiProject Mathematics
82:
72:
51:
20:
709:being used instead of gradient
135:This article has been rated as
1:
799:14:55, 27 December 2023 (UTC)
665:Barely distinguishable colors
421:11:12, 25 February 2010 (UTC)
330:05:04, 29 December 2006 (UTC)
222:15:54, 25 November 2005 (UTC)
204:08:37, 25 November 2005 (UTC)
186:05:59, 25 November 2005 (UTC)
170:04:45, 25 November 2005 (UTC)
109:and see a list of open tasks.
813:C-Class mathematics articles
785:Remove steepest descent link
779:05:41, 28 January 2024 (UTC)
550:19:22, 16 January 2011 (UTC)
529:05:32, 10 January 2011 (UTC)
514:03:16, 10 January 2011 (UTC)
498:05:32, 10 January 2011 (UTC)
482:09:50, 12 October 2010 (UTC)
321:01:10, 8 November 2006 (UTC)
305:23:33, 7 November 2006 (UTC)
282:04:40, 3 November 2006 (UTC)
262:21:46, 2 November 2006 (UTC)
242:00:50, 10 October 2020 (UTC)
288:Understanding the algorythm
834:
660:21:43, 23 March 2017 (UTC)
623:(last update: 5 June 2024)
559:Hello fellow Wikipedians,
395:02:27, 7 August 2008 (UTC)
385:further down the article.
378:22:09, 6 August 2008 (UTC)
691:17:35, 2 April 2019 (UTC)
467:01:03, 15 June 2010 (UTC)
457:Okay, I hope that helps.
353:More elegant formulation?
197:Conjugate gradient method
161:Search in trial direction
134:
67:
46:
764:15:30, 18 May 2020 (UTC)
443:18:21, 13 May 2010 (UTC)
348:23:34, 14 May 2018 (UTC)
338:computing 1/precision.
141:project's priority scale
740:20:18, 8 May 2019 (UTC)
555:External links modified
98:WikiProject Mathematics
724:are labelled as such.
28:This article is rated
604:regular verification
121:mathematics articles
594:After February 2018
648:InternetArchiveBot
599:InternetArchiveBot
90:Mathematics portal
34:content assessment
693:
681:comment added by
624:
433:comment added by
411:comment added by
380:
364:comment added by
155:
154:
151:
150:
147:
146:
825:
716:Images used are
697:Wrong Image Used
658:
649:
622:
621:
600:
564:Gradient descent
445:
423:
359:
234:Jeremy Bernstein
123:
122:
119:
116:
113:
92:
87:
86:
76:
69:
68:
63:
55:
48:
31:
25:
24:
16:
833:
832:
828:
827:
826:
824:
823:
822:
803:
802:
787:
771:Youarelovedfool
747:
699:
667:
652:
647:
615:
608:have permission
598:
572:this simple FaQ
557:
537:
451:
428:
406:
403:
387:Oleg Alexandrov
355:
340:BlackMetalStats
290:
274:Oleg Alexandrov
250:
214:Oleg Alexandrov
178:Oleg Alexandrov
163:
120:
117:
114:
111:
110:
88:
81:
61:
32:on Knowledge's
29:
12:
11:
5:
831:
829:
821:
820:
815:
805:
804:
786:
783:
782:
781:
746:
743:
698:
695:
666:
663:
642:
641:
634:
587:
586:
578:Added archive
556:
553:
536:
533:
532:
531:
501:
500:
470:
469:
450:
447:
435:69.196.185.126
413:216.244.31.162
402:
399:
398:
397:
354:
351:
335:
334:
333:
332:
289:
286:
285:
284:
269:
268:
249:
246:
229:
228:
227:
226:
225:
224:
192:brent's method
162:
159:
157:
153:
152:
149:
148:
145:
144:
133:
127:
126:
124:
107:the discussion
94:
93:
77:
65:
64:
56:
44:
43:
37:
26:
13:
10:
9:
6:
4:
3:
2:
830:
819:
816:
814:
811:
810:
808:
801:
800:
796:
792:
784:
780:
776:
772:
768:
767:
766:
765:
761:
757:
753:
744:
742:
741:
737:
733:
728:
725:
723:
719:
714:
712:
708:
702:
696:
694:
692:
688:
684:
683:88.219.19.174
680:
673:
672:
664:
662:
661:
656:
651:
650:
639:
635:
632:
628:
627:
626:
619:
613:
609:
605:
601:
595:
590:
585:
581:
577:
576:
575:
573:
569:
565:
560:
554:
552:
551:
547:
543:
534:
530:
526:
522:
518:
517:
516:
515:
511:
507:
499:
495:
491:
486:
485:
484:
483:
479:
475:
468:
464:
460:
456:
455:
454:
448:
446:
444:
440:
436:
432:
424:
422:
418:
414:
410:
400:
396:
392:
388:
383:
382:
381:
379:
375:
371:
367:
363:
352:
350:
349:
345:
341:
331:
328:
324:
323:
322:
318:
314:
309:
308:
307:
306:
303:
299:
296:
295:
287:
283:
279:
275:
271:
270:
266:
265:
264:
263:
260:
256:
253:
247:
245:
243:
239:
235:
223:
219:
215:
211:
207:
206:
205:
202:
198:
193:
189:
188:
187:
183:
179:
174:
173:
172:
171:
168:
160:
158:
142:
138:
132:
129:
128:
125:
108:
104:
100:
99:
91:
85:
80:
78:
75:
71:
70:
66:
60:
57:
54:
50:
45:
41:
35:
27:
23:
18:
17:
788:
756:JeffAEdmonds
748:
729:
726:
715:
710:
706:
703:
700:
677:— Preceding
674:
668:
646:
643:
618:source check
597:
591:
588:
561:
558:
538:
502:
471:
452:
425:
404:
356:
336:
313:Jitse Niesen
300:
297:
291:
257:
254:
251:
230:
164:
156:
137:Mid-priority
136:
96:
62:Mid‑priority
40:WikiProjects
429:—Preceding
407:—Preceding
360:—Preceding
298:Thank you.
112:Mathematics
103:mathematics
59:Mathematics
807:Categories
655:Report bug
732:Enchoc373
638:this tool
631:this tool
730:Thanks,
679:unsigned
644:Cheers.—
474:Lostella
431:unsigned
409:unsigned
374:contribs
366:Winblows
362:unsigned
711:descent
701:Hello,
568:my edit
542:Peryeat
521:Peryeat
506:Peryeat
490:Peryeat
327:Cowbert
139:on the
30:C-class
707:ascent
302:Inyuki
259:Inyuki
36:scale.
791:DMH43
795:talk
775:talk
760:talk
736:talk
720:and
687:talk
546:talk
525:talk
510:talk
494:talk
478:talk
463:talk
439:talk
417:talk
391:talk
370:talk
344:talk
317:talk
278:talk
238:talk
218:talk
182:talk
713:.
612:RfC
582:to
504:it.
244:).
201:njh
167:njh
131:Mid
809::
797:)
777:)
762:)
738:)
689:)
625:.
620:}}
616:{{
548:)
527:)
512:)
496:)
480:)
465:)
441:)
419:)
393:)
376:)
372:•
346:)
319:)
280:)
240:)
220:)
184:)
793:(
773:(
758:(
734:(
685:(
657:)
653:(
640:.
633:.
544:(
523:(
508:(
492:(
476:(
461:(
459:0
437:(
415:(
389:(
368:(
342:(
315:(
276:(
236:(
216:(
180:(
143:.
42::
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.