Softplus - Knowledge (XXG)

17: 771: 1106: 1487: 1289: 896: 542: 579: 236: 958: 332: 1562: 925:). This is because (following the definition of the Legendre transform: the derivatives are inverse functions) the derivative of softplus is the logistic function, whose inverse function is the 1320: 1158: 108: 427: 905:; the softmax with the first argument set to zero is the multivariable generalization of the logistic function. Both LogSumExp and softmax are used in machine learning. 1150: 1312: 782: 352: 256: 149: 1507: 766:{\displaystyle \operatorname {LSE_{0}} ^{+}(x_{1},\dots ,x_{n}):=\operatorname {LSE} (0,x_{1},\dots ,x_{n})=\ln(1+e^{x_{1}}+\cdots +e^{x_{n}}).} 447: 1603: 1593: 154: 1101:{\displaystyle \ln \left(1+e^{x}\right)\approx {\begin{cases}\ln 2,&x=0,\\{\frac {x}{1-e^{-x/\ln 2}}},&x\neq 0\end{cases}}} 1618: 1608: 261: 1598: 941: 1482:{\displaystyle f(x)={\frac {\ln(1+e^{kx})}{k}},\qquad \qquad f'(x)={\frac {e^{kx}}{1+e^{kx}}}={\frac {1}{1+e^{-kx}}}.} 123: 1284:{\displaystyle \log _{2}(1+2^{y})\approx {\begin{cases}1,&y=0,\\{\frac {y}{1-e^{-y}}},&y\neq 0.\end{cases}}} 1623: 552: 1205: 1002: 937: 50: 1613: 384: 1515:

Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'00)

1540: 918: 1114: 438: 114: 1550:

Rectifier and softplus activation functions. The second one is a smooth version of the first.

914: 902: 548: 37: 891:{\displaystyle \operatorname {LSE} (x_{1},\dots ,x_{n})=\ln(e^{x_{1}}+\cdots +e^{x_{n}}),} 1506:

Dugas, Charles; Bengio, Yoshua; Bélisle, François; Nadeau, Claude; Garcia, René (2000).

1297: 337: 241: 134: 365:

are used in machine learning. The name "softplus" (2000), by analogy with the earlier

1587: 1536: 933: 118: 25: 1523:

has a positive first derivative, its primitive, which we call softplus, is convex.

940:, minimizing logistic loss corresponds to maximizing entropy. This justifies the 33: 570: 564: 1508:"Incorporating second-order functional knowledge for better option pricing" 366: 921:) of the softplus function is the negative binary entropy (with base 569:

The multivariable generalization of single-variable softplus is the

537:{\displaystyle f'(x)={\frac {e^{x}}{1+e^{x}}}={\frac {1}{1+e^{-x}}}} 16: 926: 551:

is a smooth approximation of the derivative of the rectifier, the

231:{\displaystyle \log(1+e^{x})=\log(1+\epsilon )\gtrapprox \log 1=0} 1567:

Developer Guide for Intel Data Analytics Acceleration Library

1277: 1094: 1563:"Smooth Rectifier Linear Unit (SmoothReLU) Forward Layer" 929:, which is the derivative of negative binary entropy. 1323: 1300: 1161: 1117: 961: 785: 582: 450: 387: 340: 327:{\displaystyle \log(1+e^{x})\gtrapprox \log(e^{x})=x} 264: 244: 157: 137: 53: 1481: 1306: 1283: 1144: 1100: 890: 765: 536: 421: 346: 326: 250: 230: 143: 102: 401: 377:, which is sometimes denoted with a superscript 369:(1989) is presumably because it is a smooth ( 8: 238:, so just above 0, while for large positive 113:It is a smooth approximation (in fact, an 1461: 1445: 1430: 1410: 1404: 1361: 1339: 1322: 1299: 1248: 1232: 1200: 1188: 1166: 1160: 1116: 1058: 1051: 1035: 997: 983: 960: 874: 869: 848: 843: 818: 799: 784: 749: 744: 723: 718: 687: 668: 637: 618: 602: 595: 584: 581: 522: 506: 494: 477: 471: 449: 392: 386: 339: 309: 284: 263: 243: 177: 156: 136: 88: 52: 373:) approximation of the positive part of 131:in machine learning. For large negative 15: 1498: 952:This function can be approximated as: 1542:Deep sparse rectifier neural networks 573:with the first argument set to zero: 7: 103:{\displaystyle f(x)=\log(1+e^{x}).} 1111:By making the change of variables 592: 588: 585: 437:The derivative of softplus is the 14: 422:{\displaystyle x^{+}:=\max(0,x)} 1535:Xavier Glorot; Antoine Bordes; 1383: 1382: 932:Softplus can be interpreted as 1398: 1392: 1370: 1348: 1333: 1327: 1194: 1175: 1139: 1133: 936:(as a positive number), so by 882: 836: 824: 792: 757: 705: 693: 655: 643: 611: 465: 459: 416: 404: 315: 302: 290: 271: 207: 195: 183: 164: 94: 75: 63: 57: 1: 942:principle of maximum entropy 129:ReLU (rectified linear unit) 1640: 1604:Artificial neural networks 1594:Computational neuroscience 776:The LogSumExp function is 562: 1145:{\displaystyle x=y\ln(2)} 1152:, this is equivalent to 901:and its gradient is the 121:, which is known as the 1619:Entropy and information 553:Heaviside step function 1609:Functions and mappings 1517:. MIT Press: 451–457. 1483: 1308: 1294:A sharpness parameter 1285: 1146: 1102: 944:as loss minimization. 892: 767: 538: 423: 348: 328: 252: 232: 145: 104: 29: 1484: 1309: 1286: 1147: 1103: 893: 768: 539: 424: 349: 329: 253: 233: 146: 105: 19: 1321: 1298: 1159: 1115: 959: 783: 580: 448: 385: 338: 262: 242: 155: 135: 51: 1599:Logistic regression 917:(specifically, the 1519:Since the sigmoid 1479: 1304: 1281: 1276: 1142: 1098: 1093: 919:Legendre transform 888: 763: 534: 419: 344: 324: 248: 228: 141: 100: 30: 1474: 1440: 1377: 1314:may be included: 1307:{\displaystyle k} 1258: 1075: 948:Alternative forms 532: 501: 439:logistic function 433:Related functions 347:{\displaystyle x} 251:{\displaystyle x} 144:{\displaystyle x} 115:analytic function 24:function and the 1631: 1578: 1577: 1575: 1574: 1559: 1553: 1552: 1547: 1532: 1526: 1525: 1512: 1503: 1488: 1486: 1485: 1480: 1475: 1473: 1472: 1471: 1446: 1441: 1439: 1438: 1437: 1418: 1417: 1405: 1391: 1378: 1373: 1369: 1368: 1340: 1313: 1311: 1310: 1305: 1290: 1288: 1287: 1282: 1280: 1279: 1259: 1257: 1256: 1255: 1233: 1193: 1192: 1171: 1170: 1151: 1149: 1148: 1143: 1107: 1105: 1104: 1099: 1097: 1096: 1076: 1074: 1073: 1072: 1062: 1036: 993: 989: 988: 987: 915:convex conjugate 909:Convex conjugate 897: 895: 894: 889: 881: 880: 879: 878: 855: 854: 853: 852: 823: 822: 804: 803: 772: 770: 769: 764: 756: 755: 754: 753: 730: 729: 728: 727: 692: 691: 673: 672: 642: 641: 623: 622: 607: 606: 601: 600: 599: 549:sigmoid function 543: 541: 540: 535: 533: 531: 530: 529: 507: 502: 500: 499: 498: 482: 481: 472: 458: 428: 426: 425: 420: 397: 396: 376: 353: 351: 350: 345: 334:, so just above 333: 331: 330: 325: 314: 313: 289: 288: 257: 255: 254: 249: 237: 235: 234: 229: 182: 181: 150: 148: 147: 142: 109: 107: 106: 101: 93: 92: 38:machine learning 1639: 1638: 1634: 1633: 1632: 1630: 1629: 1628: 1584: 1583: 1582: 1581: 1572: 1570: 1561: 1560: 1556: 1545: 1534: 1533: 1529: 1510: 1505: 1504: 1500: 1495: 1457: 1450: 1426: 1419: 1406: 1384: 1357: 1341: 1319: 1318: 1296: 1295: 1275: 1274: 1263: 1244: 1237: 1229: 1228: 1214: 1201: 1184: 1162: 1157: 1156: 1113: 1112: 1092: 1091: 1080: 1047: 1040: 1032: 1031: 1017: 998: 979: 972: 968: 957: 956: 950: 924: 911: 870: 865: 844: 839: 814: 795: 781: 780: 745: 740: 719: 714: 683: 664: 633: 614: 591: 583: 578: 577: 567: 561: 518: 511: 490: 483: 473: 451: 446: 445: 435: 388: 383: 382: 374: 336: 335: 305: 280: 260: 259: 240: 239: 173: 153: 152: 133: 132: 84: 49: 48: 12: 11: 5: 1637: 1635: 1627: 1626: 1624:Loss functions 1621: 1616: 1611: 1606: 1601: 1596: 1586: 1585: 1580: 1579: 1554: 1527: 1497: 1496: 1494: 1491: 1490: 1489: 1478: 1470: 1467: 1464: 1460: 1456: 1453: 1449: 1444: 1436: 1433: 1429: 1425: 1422: 1416: 1413: 1409: 1403: 1400: 1397: 1394: 1390: 1387: 1381: 1376: 1372: 1367: 1364: 1360: 1356: 1353: 1350: 1347: 1344: 1338: 1335: 1332: 1329: 1326: 1303: 1292: 1291: 1278: 1273: 1270: 1267: 1264: 1262: 1254: 1251: 1247: 1243: 1240: 1236: 1231: 1230: 1227: 1224: 1221: 1218: 1215: 1213: 1210: 1207: 1206: 1204: 1199: 1196: 1191: 1187: 1183: 1180: 1177: 1174: 1169: 1165: 1141: 1138: 1135: 1132: 1129: 1126: 1123: 1120: 1109: 1108: 1095: 1090: 1087: 1084: 1081: 1079: 1071: 1068: 1065: 1061: 1057: 1054: 1050: 1046: 1043: 1039: 1034: 1033: 1030: 1027: 1024: 1021: 1018: 1016: 1013: 1010: 1007: 1004: 1003: 1001: 996: 992: 986: 982: 978: 975: 971: 967: 964: 949: 946: 922: 910: 907: 899: 898: 887: 884: 877: 873: 868: 864: 861: 858: 851: 847: 842: 838: 835: 832: 829: 826: 821: 817: 813: 810: 807: 802: 798: 794: 791: 788: 774: 773: 762: 759: 752: 748: 743: 739: 736: 733: 726: 722: 717: 713: 710: 707: 704: 701: 698: 695: 690: 686: 682: 679: 676: 671: 667: 663: 660: 657: 654: 651: 648: 645: 640: 636: 632: 629: 626: 621: 617: 613: 610: 605: 598: 594: 590: 587: 563:Main article: 560: 557: 545: 544: 528: 525: 521: 517: 514: 510: 505: 497: 493: 489: 486: 480: 476: 470: 467: 464: 461: 457: 454: 434: 431: 418: 415: 412: 409: 406: 403: 400: 395: 391: 343: 323: 320: 317: 312: 308: 304: 301: 298: 295: 292: 287: 283: 279: 276: 273: 270: 267: 247: 227: 224: 221: 218: 215: 212: 209: 206: 203: 200: 197: 194: 191: 188: 185: 180: 176: 172: 169: 166: 163: 160: 140: 111: 110: 99: 96: 91: 87: 83: 80: 77: 74: 71: 68: 65: 62: 59: 56: 13: 10: 9: 6: 4: 3: 2: 1636: 1625: 1622: 1620: 1617: 1615: 1612: 1610: 1607: 1605: 1602: 1600: 1597: 1595: 1592: 1591: 1589: 1568: 1564: 1558: 1555: 1551: 1544: 1543: 1538: 1537:Yoshua Bengio 1531: 1528: 1524: 1522: 1516: 1509: 1502: 1499: 1492: 1476: 1468: 1465: 1462: 1458: 1454: 1451: 1447: 1442: 1434: 1431: 1427: 1423: 1420: 1414: 1411: 1407: 1401: 1395: 1388: 1385: 1379: 1374: 1365: 1362: 1358: 1354: 1351: 1345: 1342: 1336: 1330: 1324: 1317: 1316: 1315: 1301: 1271: 1268: 1265: 1260: 1252: 1249: 1245: 1241: 1238: 1234: 1225: 1222: 1219: 1216: 1211: 1208: 1202: 1197: 1189: 1185: 1181: 1178: 1172: 1167: 1163: 1155: 1154: 1153: 1136: 1130: 1127: 1124: 1121: 1118: 1088: 1085: 1082: 1077: 1069: 1066: 1063: 1059: 1055: 1052: 1048: 1044: 1041: 1037: 1028: 1025: 1022: 1019: 1014: 1011: 1008: 1005: 999: 994: 990: 984: 980: 976: 973: 969: 965: 962: 955: 954: 953: 947: 945: 943: 939: 935: 934:logistic loss 930: 928: 920: 916: 908: 906: 904: 885: 875: 871: 866: 862: 859: 856: 849: 845: 840: 833: 830: 827: 819: 815: 811: 808: 805: 800: 796: 789: 786: 779: 778: 777: 760: 750: 746: 741: 737: 734: 731: 724: 720: 715: 711: 708: 702: 699: 696: 688: 684: 680: 677: 674: 669: 665: 661: 658: 652: 649: 646: 638: 634: 630: 627: 624: 619: 615: 608: 603: 596: 576: 575: 574: 572: 566: 558: 556: 554: 550: 547:The logistic 526: 523: 519: 515: 512: 508: 503: 495: 491: 487: 484: 478: 474: 468: 462: 455: 452: 444: 443: 442: 440: 432: 430: 413: 410: 407: 398: 393: 389: 380: 372: 368: 364: 360: 355: 341: 321: 318: 310: 306: 299: 296: 293: 285: 281: 277: 274: 268: 265: 245: 225: 222: 219: 216: 213: 210: 204: 201: 198: 192: 189: 186: 178: 174: 170: 167: 161: 158: 138: 130: 126: 125: 120: 119:ramp function 116: 97: 89: 85: 81: 78: 72: 69: 66: 60: 54: 47: 46: 45: 43: 39: 35: 27: 26:ramp function 23: 18: 1614:Exponentials 1571:. Retrieved 1566: 1557: 1549: 1541: 1530: 1520: 1518: 1514: 1501: 1293: 1110: 951: 931: 912: 900: 775: 568: 546: 436: 378: 370: 362: 358: 356: 128: 122: 112: 44:function is 41: 31: 21: 20:Plot of the 1548:. AISTATS. 34:mathematics 1588:Categories 1573:2018-12-04 1493:References 363:SmoothReLU 357:The names 1463:− 1346:⁡ 1269:≠ 1250:− 1242:− 1198:≈ 1173:⁡ 1131:⁡ 1086:≠ 1067:⁡ 1053:− 1045:− 1009:⁡ 995:≈ 966:⁡ 860:⋯ 834:⁡ 809:… 790:⁡ 735:⋯ 703:⁡ 678:… 653:⁡ 628:… 609:⁡ 571:LogSumExp 565:LogSumExp 559:LogSumExp 524:− 300:⁡ 294:⪆ 269:⁡ 217:⁡ 211:⪆ 205:ϵ 193:⁡ 162:⁡ 124:rectifier 117:) to the 73:⁡ 1539:(2011). 1389:′ 456:′ 359:softplus 42:softplus 22:softplus 938:duality 903:softmax 367:softmax 1569:. 2017 258:it is 151:it is 40:, the 1546:(PDF) 1511:(PDF) 927:logit 913:The 379:plus 371:soft 361:and 36:and 1164:log 787:LSE 650:LSE 402:max 297:log 266:log 214:log 190:log 159:log 127:or 70:log 32:In 1590:: 1565:. 1513:. 1343:ln 1272:0. 1128:ln 1064:ln 1006:ln 963:ln 831:ln 700:ln 647::= 555:. 441:: 429:. 399::= 381:, 354:. 1576:. 1521:h 1477:. 1469:x 1466:k 1459:e 1455:+ 1452:1 1448:1 1443:= 1435:x 1432:k 1428:e 1424:+ 1421:1 1415:x 1412:k 1408:e 1402:= 1399:) 1396:x 1393:( 1386:f 1380:, 1375:k 1371:) 1366:x 1363:k 1359:e 1355:+ 1352:1 1349:( 1337:= 1334:) 1331:x 1328:( 1325:f 1302:k 1266:y 1261:, 1253:y 1246:e 1239:1 1235:y 1226:, 1223:0 1220:= 1217:y 1212:, 1209:1 1203:{ 1195:) 1190:y 1186:2 1182:+ 1179:1 1176:( 1168:2 1140:) 1137:2 1134:( 1125:y 1122:= 1119:x 1089:0 1083:x 1078:, 1070:2 1060:/ 1056:x 1049:e 1042:1 1038:x 1029:, 1026:0 1023:= 1020:x 1015:, 1012:2 1000:{ 991:) 985:x 981:e 977:+ 974:1 970:( 923:e 886:, 883:) 876:n 872:x 867:e 863:+ 857:+ 850:1 846:x 841:e 837:( 828:= 825:) 820:n 816:x 812:, 806:, 801:1 797:x 793:( 761:. 758:) 751:n 747:x 742:e 738:+ 732:+ 725:1 721:x 716:e 712:+ 709:1 706:( 697:= 694:) 689:n 685:x 681:, 675:, 670:1 666:x 662:, 659:0 656:( 644:) 639:n 635:x 631:, 625:, 620:1 616:x 612:( 604:+ 597:0 593:E 589:S 586:L 527:x 520:e 516:+ 513:1 509:1 504:= 496:x 492:e 488:+ 485:1 479:x 475:e 469:= 466:) 463:x 460:( 453:f 417:) 414:x 411:, 408:0 405:( 394:+ 390:x 375:x 342:x 322:x 319:= 316:) 311:x 307:e 303:( 291:) 286:x 282:e 278:+ 275:1 272:( 246:x 226:0 223:= 220:1 208:) 202:+ 199:1 196:( 187:= 184:) 179:x 175:e 171:+ 168:1 165:( 139:x 98:. 95:) 90:x 86:e 82:+ 79:1 76:( 67:= 64:) 61:x 58:( 55:f 28:.

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index