Knowledge (XXG)

Softplus

Source đź“ť

17: 771: 1106: 1487: 1289: 896: 542: 579: 236: 958: 332: 1562: 925:). This is because (following the definition of the Legendre transform: the derivatives are inverse functions) the derivative of softplus is the logistic function, whose inverse function is the 1320: 1158: 108: 427: 905:; the softmax with the first argument set to zero is the multivariable generalization of the logistic function. Both LogSumExp and softmax are used in machine learning. 1150: 1312: 782: 352: 256: 149: 1507: 766:{\displaystyle \operatorname {LSE_{0}} ^{+}(x_{1},\dots ,x_{n}):=\operatorname {LSE} (0,x_{1},\dots ,x_{n})=\ln(1+e^{x_{1}}+\cdots +e^{x_{n}}).} 447: 1603: 1593: 154: 1101:{\displaystyle \ln \left(1+e^{x}\right)\approx {\begin{cases}\ln 2,&x=0,\\{\frac {x}{1-e^{-x/\ln 2}}},&x\neq 0\end{cases}}} 1618: 1608: 261: 1598: 941: 1482:{\displaystyle f(x)={\frac {\ln(1+e^{kx})}{k}},\qquad \qquad f'(x)={\frac {e^{kx}}{1+e^{kx}}}={\frac {1}{1+e^{-kx}}}.} 123: 1284:{\displaystyle \log _{2}(1+2^{y})\approx {\begin{cases}1,&y=0,\\{\frac {y}{1-e^{-y}}},&y\neq 0.\end{cases}}} 1623: 552: 1205: 1002: 937: 50: 1613: 384: 1515:
Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'00)
1540: 918: 1114: 438: 114: 1550:
Rectifier and softplus activation functions. The second one is a smooth version of the first.
914: 902: 548: 37: 891:{\displaystyle \operatorname {LSE} (x_{1},\dots ,x_{n})=\ln(e^{x_{1}}+\cdots +e^{x_{n}}),} 1506:
Dugas, Charles; Bengio, Yoshua; Bélisle, François; Nadeau, Claude; Garcia, René (2000).
1297: 337: 241: 134: 365:
are used in machine learning. The name "softplus" (2000), by analogy with the earlier
1587: 1536: 933: 118: 25: 1523:
has a positive first derivative, its primitive, which we call softplus, is convex.
940:, minimizing logistic loss corresponds to maximizing entropy. This justifies the 33: 570: 564: 1508:"Incorporating second-order functional knowledge for better option pricing" 366: 921:) of the softplus function is the negative binary entropy (with base 569:
The multivariable generalization of single-variable softplus is the
537:{\displaystyle f'(x)={\frac {e^{x}}{1+e^{x}}}={\frac {1}{1+e^{-x}}}} 16: 926: 551:
is a smooth approximation of the derivative of the rectifier, the
231:{\displaystyle \log(1+e^{x})=\log(1+\epsilon )\gtrapprox \log 1=0} 1567:
Developer Guide for Intel Data Analytics Acceleration Library
1277: 1094: 1563:"Smooth Rectifier Linear Unit (SmoothReLU) Forward Layer" 929:, which is the derivative of negative binary entropy. 1323: 1300: 1161: 1117: 961: 785: 582: 450: 387: 340: 327:{\displaystyle \log(1+e^{x})\gtrapprox \log(e^{x})=x} 264: 244: 157: 137: 53: 1481: 1306: 1283: 1144: 1100: 890: 765: 536: 421: 346: 326: 250: 230: 143: 102: 401: 377:, which is sometimes denoted with a superscript 369:(1989) is presumably because it is a smooth ( 8: 238:, so just above 0, while for large positive 113:It is a smooth approximation (in fact, an 1461: 1445: 1430: 1410: 1404: 1361: 1339: 1322: 1299: 1248: 1232: 1200: 1188: 1166: 1160: 1116: 1058: 1051: 1035: 997: 983: 960: 874: 869: 848: 843: 818: 799: 784: 749: 744: 723: 718: 687: 668: 637: 618: 602: 595: 584: 581: 522: 506: 494: 477: 471: 449: 392: 386: 339: 309: 284: 263: 243: 177: 156: 136: 88: 52: 373:) approximation of the positive part of 131:in machine learning. For large negative 15: 1498: 952:This function can be approximated as: 1542:Deep sparse rectifier neural networks 573:with the first argument set to zero: 7: 103:{\displaystyle f(x)=\log(1+e^{x}).} 1111:By making the change of variables 592: 588: 585: 437:The derivative of softplus is the 14: 422:{\displaystyle x^{+}:=\max(0,x)} 1535:Xavier Glorot; Antoine Bordes; 1383: 1382: 932:Softplus can be interpreted as 1398: 1392: 1370: 1348: 1333: 1327: 1194: 1175: 1139: 1133: 936:(as a positive number), so by 882: 836: 824: 792: 757: 705: 693: 655: 643: 611: 465: 459: 416: 404: 315: 302: 290: 271: 207: 195: 183: 164: 94: 75: 63: 57: 1: 942:principle of maximum entropy 129:ReLU (rectified linear unit) 1640: 1604:Artificial neural networks 1594:Computational neuroscience 776:The LogSumExp function is 562: 1145:{\displaystyle x=y\ln(2)} 1152:, this is equivalent to 901:and its gradient is the 121:, which is known as the 1619:Entropy and information 553:Heaviside step function 1609:Functions and mappings 1517:. MIT Press: 451–457. 1483: 1308: 1294:A sharpness parameter 1285: 1146: 1102: 944:as loss minimization. 892: 767: 538: 423: 348: 328: 252: 232: 145: 104: 29: 1484: 1309: 1286: 1147: 1103: 893: 768: 539: 424: 349: 329: 253: 233: 146: 105: 19: 1321: 1298: 1159: 1115: 959: 783: 580: 448: 385: 338: 262: 242: 155: 135: 51: 1599:Logistic regression 917:(specifically, the 1519:Since the sigmoid 1479: 1304: 1281: 1276: 1142: 1098: 1093: 919:Legendre transform 888: 763: 534: 419: 344: 324: 248: 228: 141: 100: 30: 1474: 1440: 1377: 1314:may be included: 1307:{\displaystyle k} 1258: 1075: 948:Alternative forms 532: 501: 439:logistic function 433:Related functions 347:{\displaystyle x} 251:{\displaystyle x} 144:{\displaystyle x} 115:analytic function 24:function and the 1631: 1578: 1577: 1575: 1574: 1559: 1553: 1552: 1547: 1532: 1526: 1525: 1512: 1503: 1488: 1486: 1485: 1480: 1475: 1473: 1472: 1471: 1446: 1441: 1439: 1438: 1437: 1418: 1417: 1405: 1391: 1378: 1373: 1369: 1368: 1340: 1313: 1311: 1310: 1305: 1290: 1288: 1287: 1282: 1280: 1279: 1259: 1257: 1256: 1255: 1233: 1193: 1192: 1171: 1170: 1151: 1149: 1148: 1143: 1107: 1105: 1104: 1099: 1097: 1096: 1076: 1074: 1073: 1072: 1062: 1036: 993: 989: 988: 987: 915:convex conjugate 909:Convex conjugate 897: 895: 894: 889: 881: 880: 879: 878: 855: 854: 853: 852: 823: 822: 804: 803: 772: 770: 769: 764: 756: 755: 754: 753: 730: 729: 728: 727: 692: 691: 673: 672: 642: 641: 623: 622: 607: 606: 601: 600: 599: 549:sigmoid function 543: 541: 540: 535: 533: 531: 530: 529: 507: 502: 500: 499: 498: 482: 481: 472: 458: 428: 426: 425: 420: 397: 396: 376: 353: 351: 350: 345: 334:, so just above 333: 331: 330: 325: 314: 313: 289: 288: 257: 255: 254: 249: 237: 235: 234: 229: 182: 181: 150: 148: 147: 142: 109: 107: 106: 101: 93: 92: 38:machine learning 1639: 1638: 1634: 1633: 1632: 1630: 1629: 1628: 1584: 1583: 1582: 1581: 1572: 1570: 1561: 1560: 1556: 1545: 1534: 1533: 1529: 1510: 1505: 1504: 1500: 1495: 1457: 1450: 1426: 1419: 1406: 1384: 1357: 1341: 1319: 1318: 1296: 1295: 1275: 1274: 1263: 1244: 1237: 1229: 1228: 1214: 1201: 1184: 1162: 1157: 1156: 1113: 1112: 1092: 1091: 1080: 1047: 1040: 1032: 1031: 1017: 998: 979: 972: 968: 957: 956: 950: 924: 911: 870: 865: 844: 839: 814: 795: 781: 780: 745: 740: 719: 714: 683: 664: 633: 614: 591: 583: 578: 577: 567: 561: 518: 511: 490: 483: 473: 451: 446: 445: 435: 388: 383: 382: 374: 336: 335: 305: 280: 260: 259: 240: 239: 173: 153: 152: 133: 132: 84: 49: 48: 12: 11: 5: 1637: 1635: 1627: 1626: 1624:Loss functions 1621: 1616: 1611: 1606: 1601: 1596: 1586: 1585: 1580: 1579: 1554: 1527: 1497: 1496: 1494: 1491: 1490: 1489: 1478: 1470: 1467: 1464: 1460: 1456: 1453: 1449: 1444: 1436: 1433: 1429: 1425: 1422: 1416: 1413: 1409: 1403: 1400: 1397: 1394: 1390: 1387: 1381: 1376: 1372: 1367: 1364: 1360: 1356: 1353: 1350: 1347: 1344: 1338: 1335: 1332: 1329: 1326: 1303: 1292: 1291: 1278: 1273: 1270: 1267: 1264: 1262: 1254: 1251: 1247: 1243: 1240: 1236: 1231: 1230: 1227: 1224: 1221: 1218: 1215: 1213: 1210: 1207: 1206: 1204: 1199: 1196: 1191: 1187: 1183: 1180: 1177: 1174: 1169: 1165: 1141: 1138: 1135: 1132: 1129: 1126: 1123: 1120: 1109: 1108: 1095: 1090: 1087: 1084: 1081: 1079: 1071: 1068: 1065: 1061: 1057: 1054: 1050: 1046: 1043: 1039: 1034: 1033: 1030: 1027: 1024: 1021: 1018: 1016: 1013: 1010: 1007: 1004: 1003: 1001: 996: 992: 986: 982: 978: 975: 971: 967: 964: 949: 946: 922: 910: 907: 899: 898: 887: 884: 877: 873: 868: 864: 861: 858: 851: 847: 842: 838: 835: 832: 829: 826: 821: 817: 813: 810: 807: 802: 798: 794: 791: 788: 774: 773: 762: 759: 752: 748: 743: 739: 736: 733: 726: 722: 717: 713: 710: 707: 704: 701: 698: 695: 690: 686: 682: 679: 676: 671: 667: 663: 660: 657: 654: 651: 648: 645: 640: 636: 632: 629: 626: 621: 617: 613: 610: 605: 598: 594: 590: 587: 563:Main article: 560: 557: 545: 544: 528: 525: 521: 517: 514: 510: 505: 497: 493: 489: 486: 480: 476: 470: 467: 464: 461: 457: 454: 434: 431: 418: 415: 412: 409: 406: 403: 400: 395: 391: 343: 323: 320: 317: 312: 308: 304: 301: 298: 295: 292: 287: 283: 279: 276: 273: 270: 267: 247: 227: 224: 221: 218: 215: 212: 209: 206: 203: 200: 197: 194: 191: 188: 185: 180: 176: 172: 169: 166: 163: 160: 140: 111: 110: 99: 96: 91: 87: 83: 80: 77: 74: 71: 68: 65: 62: 59: 56: 13: 10: 9: 6: 4: 3: 2: 1636: 1625: 1622: 1620: 1617: 1615: 1612: 1610: 1607: 1605: 1602: 1600: 1597: 1595: 1592: 1591: 1589: 1568: 1564: 1558: 1555: 1551: 1544: 1543: 1538: 1537:Yoshua Bengio 1531: 1528: 1524: 1522: 1516: 1509: 1502: 1499: 1492: 1476: 1468: 1465: 1462: 1458: 1454: 1451: 1447: 1442: 1434: 1431: 1427: 1423: 1420: 1414: 1411: 1407: 1401: 1395: 1388: 1385: 1379: 1374: 1365: 1362: 1358: 1354: 1351: 1345: 1342: 1336: 1330: 1324: 1317: 1316: 1315: 1301: 1271: 1268: 1265: 1260: 1252: 1249: 1245: 1241: 1238: 1234: 1225: 1222: 1219: 1216: 1211: 1208: 1202: 1197: 1189: 1185: 1181: 1178: 1172: 1167: 1163: 1155: 1154: 1153: 1136: 1130: 1127: 1124: 1121: 1118: 1088: 1085: 1082: 1077: 1069: 1066: 1063: 1059: 1055: 1052: 1048: 1044: 1041: 1037: 1028: 1025: 1022: 1019: 1014: 1011: 1008: 1005: 999: 994: 990: 984: 980: 976: 973: 969: 965: 962: 955: 954: 953: 947: 945: 943: 939: 935: 934:logistic loss 930: 928: 920: 916: 908: 906: 904: 885: 875: 871: 866: 862: 859: 856: 849: 845: 840: 833: 830: 827: 819: 815: 811: 808: 805: 800: 796: 789: 786: 779: 778: 777: 760: 750: 746: 741: 737: 734: 731: 724: 720: 715: 711: 708: 702: 699: 696: 688: 684: 680: 677: 674: 669: 665: 661: 658: 652: 649: 646: 638: 634: 630: 627: 624: 619: 615: 608: 603: 596: 576: 575: 574: 572: 566: 558: 556: 554: 550: 547:The logistic 526: 523: 519: 515: 512: 508: 503: 495: 491: 487: 484: 478: 474: 468: 462: 455: 452: 444: 443: 442: 440: 432: 430: 413: 410: 407: 398: 393: 389: 380: 372: 368: 364: 360: 355: 341: 321: 318: 310: 306: 299: 296: 293: 285: 281: 277: 274: 268: 265: 245: 225: 222: 219: 216: 213: 210: 204: 201: 198: 192: 189: 186: 178: 174: 170: 167: 161: 158: 138: 130: 126: 125: 120: 119:ramp function 116: 97: 89: 85: 81: 78: 72: 69: 66: 60: 54: 47: 46: 45: 43: 39: 35: 27: 26:ramp function 23: 18: 1614:Exponentials 1571:. Retrieved 1566: 1557: 1549: 1541: 1530: 1520: 1518: 1514: 1501: 1293: 1110: 951: 931: 912: 900: 775: 568: 546: 436: 378: 370: 362: 358: 356: 128: 122: 112: 44:function is 41: 31: 21: 20:Plot of the 1548:. AISTATS. 34:mathematics 1588:Categories 1573:2018-12-04 1493:References 363:SmoothReLU 357:The names 1463:− 1346:⁡ 1269:≠ 1250:− 1242:− 1198:≈ 1173:⁡ 1131:⁡ 1086:≠ 1067:⁡ 1053:− 1045:− 1009:⁡ 995:≈ 966:⁡ 860:⋯ 834:⁡ 809:… 790:⁡ 735:⋯ 703:⁡ 678:… 653:⁡ 628:… 609:⁡ 571:LogSumExp 565:LogSumExp 559:LogSumExp 524:− 300:⁡ 294:⪆ 269:⁡ 217:⁡ 211:⪆ 205:ϵ 193:⁡ 162:⁡ 124:rectifier 117:) to the 73:⁡ 1539:(2011). 1389:′ 456:′ 359:softplus 42:softplus 22:softplus 938:duality 903:softmax 367:softmax 1569:. 2017 258:it is 151:it is 40:, the 1546:(PDF) 1511:(PDF) 927:logit 913:The 379:plus 371:soft 361:and 36:and 1164:log 787:LSE 650:LSE 402:max 297:log 266:log 214:log 190:log 159:log 127:or 70:log 32:In 1590:: 1565:. 1513:. 1343:ln 1272:0. 1128:ln 1064:ln 1006:ln 963:ln 831:ln 700:ln 647::= 555:. 441:: 429:. 399::= 381:, 354:. 1576:. 1521:h 1477:. 1469:x 1466:k 1459:e 1455:+ 1452:1 1448:1 1443:= 1435:x 1432:k 1428:e 1424:+ 1421:1 1415:x 1412:k 1408:e 1402:= 1399:) 1396:x 1393:( 1386:f 1380:, 1375:k 1371:) 1366:x 1363:k 1359:e 1355:+ 1352:1 1349:( 1337:= 1334:) 1331:x 1328:( 1325:f 1302:k 1266:y 1261:, 1253:y 1246:e 1239:1 1235:y 1226:, 1223:0 1220:= 1217:y 1212:, 1209:1 1203:{ 1195:) 1190:y 1186:2 1182:+ 1179:1 1176:( 1168:2 1140:) 1137:2 1134:( 1125:y 1122:= 1119:x 1089:0 1083:x 1078:, 1070:2 1060:/ 1056:x 1049:e 1042:1 1038:x 1029:, 1026:0 1023:= 1020:x 1015:, 1012:2 1000:{ 991:) 985:x 981:e 977:+ 974:1 970:( 923:e 886:, 883:) 876:n 872:x 867:e 863:+ 857:+ 850:1 846:x 841:e 837:( 828:= 825:) 820:n 816:x 812:, 806:, 801:1 797:x 793:( 761:. 758:) 751:n 747:x 742:e 738:+ 732:+ 725:1 721:x 716:e 712:+ 709:1 706:( 697:= 694:) 689:n 685:x 681:, 675:, 670:1 666:x 662:, 659:0 656:( 644:) 639:n 635:x 631:, 625:, 620:1 616:x 612:( 604:+ 597:0 593:E 589:S 586:L 527:x 520:e 516:+ 513:1 509:1 504:= 496:x 492:e 488:+ 485:1 479:x 475:e 469:= 466:) 463:x 460:( 453:f 417:) 414:x 411:, 408:0 405:( 394:+ 390:x 375:x 342:x 322:x 319:= 316:) 311:x 307:e 303:( 291:) 286:x 282:e 278:+ 275:1 272:( 246:x 226:0 223:= 220:1 208:) 202:+ 199:1 196:( 187:= 184:) 179:x 175:e 171:+ 168:1 165:( 139:x 98:. 95:) 90:x 86:e 82:+ 79:1 76:( 67:= 64:) 61:x 58:( 55:f 28:.

Index


ramp function
mathematics
machine learning
analytic function
ramp function
rectifier
softmax
logistic function
sigmoid function
Heaviside step function
LogSumExp
LogSumExp
softmax
convex conjugate
Legendre transform
logit
logistic loss
duality
principle of maximum entropy
"Incorporating second-order functional knowledge for better option pricing"
Yoshua Bengio
Deep sparse rectifier neural networks
"Smooth Rectifier Linear Unit (SmoothReLU) Forward Layer"
Categories
Computational neuroscience
Logistic regression
Artificial neural networks
Functions and mappings
Exponentials

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑