Knowledge (XXG)

AI capability control

Source 📝

193:
allow the AI greater access to the outside world. The AI might offer a gatekeeper a recipe for perfect health, immortality, or whatever the gatekeeper is believed to most desire; alternatively, the AI could threaten to do horrific things to the gatekeeper and his family once it inevitably escapes. One strategy to attempt to box the AI would be to allow it to respond to narrow multiple-choice questions whose answers would benefit human science or medicine, but otherwise bar all other communication with, or observation of, the AI. A more lenient "informational containment" strategy would restrict the AI to a low-bandwidth text-only interface, which would at least prevent emotive imagery or some kind of hypothetical "hypnotic pattern". However, on a technical level, no system can be completely isolated and still remain useful: even if the operators refrain from allowing the AI to communicate and instead merely run it for the purpose of observing its inner dynamics, the AI could strategically alter its dynamics to influence the observers. For example, it could choose to creatively malfunction in a way that increases the probability that its operators will become lulled into a false sense of security and choose to reboot and then de-isolate the system.
136:
However, oracles may share many of the goal definition issues associated with general purpose superintelligence. An oracle would have an incentive to escape its controlled environment so that it can acquire more computational resources and potentially control what questions it is asked. Oracles may not be truthful, possibly lying to promote hidden agendas. To mitigate this, Bostrom suggests building multiple oracles, all slightly different, and comparing their answers in order to reach a consensus.
254:
broadly, unlike with conventional computer security, attempting to box a superintelligent AI would be intrinsically risky as there could be no certainty that the boxing plan will work. Additionally, scientific progress on boxing would be fundamentally difficult because there would be no way to test boxing hypotheses against a dangerous superintelligence until such an entity exists, by which point the consequences of a test failure would be catastrophic.
686:
accepts open-ended questions would need some metric with which to rank possible truthful answers in terms of their informativeness or appropriateness. In either case, building an oracle that has a fully domain-general ability to answer natural language questions is an AI-complete problem. If one could do that, one could probably also build an AI that has a decent ability to understand human intentions as well as human words.
1603: 112:
or not is also unmotivated to care about whether the off-switch remains functional, and could incidentally and innocently disable it in the course of its operations (for example, for the purpose of removing and recycling an unnecessary component). More broadly, indifferent agents will act as if the off-switch can never be pressed, and might therefore fail to make contingency plans to arrange a graceful shutdown.
108:
computers. This problem has been formalised as an assistance game between a human and an AI, in which the AI can choose whether to disable its off-switch; and then, if the switch is still enabled, the human can choose whether to press it or not. One workaround suggested by computer scientist Stuart J. Russell is to ensure that the AI interprets human choices as important information about its intended goals.
250:
grows, the more likely the system would be able to escape even the best-designed capability control methods. In order to solve the overall "control problem" for a superintelligent AI and avoid existential risk, boxing would at best be an adjunct to "motivation selection" methods that seek to ensure the superintelligent AI's goals are compatible with human survival.
685:
An oracle is a question-answering system. It might accept questions in a natural language and present its answers as text. An oracle that accepts only yes/no questions could output its best guess with a single bit, or perhaps with a few extra bits to represent its degree of confidence. An oracle that
152:
An AI box is a proposed method of capability control in which an AI is run on an isolated computer system with heavily restricted input and output channels—for example, text-only channels and no connection to the internet. The purpose of an AI box is to reduce the risk of the AI taking control of the
135:
Because of its limited impact on the world, it may be wise to build an oracle as a precursor to a superintelligent AI. The oracle could tell humans how to successfully build a strong AI, and perhaps provide answers to difficult moral and philosophical problems requisite to the success of the project.
820:
For example, consider the risk that an oracle will answer questions not in a maximally truthful way but in such a way as to subtly manipulate us into promoting its own hidden agenda. One way to slightly mitigate this threat could be to create multiple oracles, each with a slightly different code and
120:
An oracle is a hypothetical AI designed to answer questions and prevented from gaining any goals or subgoals that involve modifying the world beyond its limited environment. A successfully controlled oracle would have considerably less immediate benefit than a successfully controlled general purpose
111:
Alternatively, Laurent Orseau and Stuart Armstrong proved that a broad class of agents, called safely interruptible agents, can learn to become indifferent to whether their off-switch gets pressed. This approach has the limitation that an AI which is completely indifferent to whether it is shut down
107:
One potential way to prevent harmful outcomes is to give human supervisors the ability to easily shut down a misbehaving AI via an "off-switch". However, in order to achieve their assigned objective, such AIs will have an incentive to disable any off-switches, or to run copies of themselves on other
192:
Even casual conversation with the computer's operators, or with a human guard, could allow such a superintelligent AI to deploy psychological tricks, ranging from befriending to blackmail, to convince a human gatekeeper, truthfully or deceitfully, that it is in the gatekeeper's interest to agree to
268:
features an AI with a female humanoid body engaged in a social experiment with a male human in a confined building acting as a physical "AI box". Despite being watched by the experiment's organizer, the AI manages to escape by manipulating its human partner to help it, leaving him stranded inside.
1096:
I argue that confinement is intrinsically impractical. For the case of physical confinement: Imagine yourself confined to your house with only limited data access to the outside, to your masters. If those masters thought at a rate -- say -- one million times slower than you, there is little doubt
174:
A superintelligent AI with access to the Internet could hack into other computer systems and copy itself like a computer virus. Less obviously, even if the AI only had access to its own computer operating system, it could attempt to send coded messages to a human sympathizer via its hardware, for
1069:
There were three more AI-Box experiments besides the ones described on the linked page, which I never got around to adding in. ... So, after investigating to make sure they could afford to lose it, I played another three AI-Box experiments. I won the first, and then lost the next two. And then I
232:
The AI box experiment involves simulating a communication between an AI and a human being to see if the AI can be "released". As an actual super-intelligent AI has not yet been developed, it is substituted by a human. The other person in the experiment plays the "Gatekeeper", the person with the
253:
All physical boxing proposals are naturally dependent on our understanding of the laws of physics; if a superintelligence could infer physical laws that we are currently unaware of, then those laws might allow for a means of escape that humans could not anticipate and thus could not block. More
249:
Boxing an AI could be supplemented with other methods of shaping the AI's capabilities, providing incentives to the AI, stunting the AI's growth, or implementing "tripwires" that automatically shut the AI off if a transgression attempt is somehow detected. However, the more intelligent a system
240:
Yudkowsky says that, despite being of human rather than superhuman intelligence, he was on two occasions able to convince the Gatekeeper, purely through argumentation, to let him out of the box. Due to the rules of the experiment, he did not reveal the transcript or his successful AI coercion
179:
takes inspiration from the field of computer security and proposes that a boxed AI could, like a potential virus, be run inside a "virtual machine" that limits access to its own networking and operating system hardware. An additional safeguard, completely unnecessary for potential viruses but
131:
states that an oracle would be his response to a scenario in which superintelligence is known to be only a decade away. His reasoning is that an oracle, being simpler than a general purpose superintelligence, would have a higher chance of being successfully controlled under such constraints.
160:
The likelihood of security flaws involving hardware or software vulnerabilities can be reduced by formally verifying the design of the AI box. Security breaches may occur if the AI is able to manipulate the human supervisors into letting it out, via its understanding of their psychology.
224:
to attempt to demonstrate that a suitably advanced artificial intelligence can either convince, or perhaps even trick or coerce, a human being into voluntarily "releasing" it, using only text-based communication. This is one of the points in Yudkowsky's work aimed at creating a
184:; otherwise, it might be able to transmit radio signals to local radio receivers by shuffling the electrons in its internal circuits in appropriate patterns. The main disadvantage of implementing physical containment is that it reduces the functionality of the AI. 65:, are postulated to be able to make themselves faster and more intelligent by modifying their source code. These improvements would make further improvements possible, which would in turn make further iterative improvements possible, and so on, leading to a sudden 156:
While boxing reduces the AI's ability to carry out undesirable behavior, it also reduces its usefulness. Boxing has fewer costs when applied to a question-answering system, which may not require interaction with the outside world.
1220: 415:
Similarly, Marvin Minsky once suggested that an AI program designed to solve the Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful supercomputers to help achieve its
1635: 1606: 144:
An AI could be blinded to certain variables in its environment. This could provide certain safety benefits, such as an AI not knowing how a reward is generated, making it more difficult to exploit.
1213: 1570: 84:, an innocuous mathematical conjecture, could decide to try to convert the planet into a giant supercomputer whose sole purpose is to make additional mathematical calculations (see also 35:. However, capability control becomes less effective as agents become more intelligent and their ability to exploit flaws in human control systems increases, potentially resulting in an 821:
a slightly different information base. A simple mechanism could then compare the answers given by the different oracles and only present them for human viewing if all the answers agree.
1206: 1287: 1229: 56: 36: 1358: 1398: 95:
are by default highly uninterpretable. This makes it more difficult to detect deception or other undesired behavior as the model self-trains iteratively. Advances in
1630: 1393: 1577: 1557: 212:
in order to select the best option, as well as features which would give the system a desire to escape in the first place, in order to decide on such actions.
398: 1127: 387:. "Future progress in artificial intelligence: A survey of expert opinion" in Fundamental Issues of Artificial Intelligence. Springer 553-571 (2016). 1363: 1403: 865:, Studies in Applied Philosophy, Epistemology and Rational Ethics, vol. 5, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 397–413, 770:, Studies in Applied Philosophy, Epistemology and Rational Ethics, vol. 5, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 335–347, 508:
Yampolskiy, Roman V. "Unexplainability and Incomprehensibility of AI." Journal of Artificial Intelligence and Consciousness 7.02 (2020): 277-291.
27:
proposals, also referred to as AI confinement, aim to increase our ability to monitor and control the behavior of AI systems, including proposed
878: 833:
Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (25 July 2016). "Concrete Problems in AI Safety".
783: 562: 314: 96: 294: 1545: 1153: 408: 1282: 966: 813: 678: 632: 352: 1292: 1111: 1011:
Armstrong, Stuart; Sandberg, Anders; Bostrom, Nick (6 June 2012). "Thinking Inside the Box: Controlling and Using an Oracle AI".
226: 1247: 28: 1353: 638: 698:
Armstrong, Stuart; Sandberg, Anders; Bostrom, Nick (2012). "Thinking Inside the Box: Controlling and Using an Oracle AI".
594: 1584: 1444: 1373: 1625: 1590: 397:
Russell, Stuart J.; Norvig, Peter (2003). "Section 26.3: The Ethics and Risks of Developing Artificial Intelligence".
1097:
that over a period of years (your time) you could come up with 'helpful advice' that would incidentally set you free.
659:
Soares, Nate, et al. "Corrigibility." Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.
1193: 153:
environment away from its operators, while still allowing the AI to output solutions to narrow technical problems.
237:
only, and the experiment ends when either the Gatekeeper releases the AI, or the allotted time of two hours ends.
1343: 1327: 1277: 62: 1378: 1297: 289: 85: 518:
Hadfield-Menell, Dylan; Dragan, Anca; Abbeel, Pieter; Russell, Stuart (15 June 2017). "The Off-Switch Game".
1233: 20: 1302: 1020: 429: 66: 205: 121:
superintelligence, though an oracle could still create trillions of dollars worth of value. In his book
1109:
Yampolskiy, Roman (2012). "Leakproofing the Singularity Artificial Intelligence Confinement Problem".
1348: 1087: 455: 380: 1082:
Vinge, Vernor (1993). "The coming technological singularity: How to survive in the post-human era".
1563: 1025: 209: 201: 1038: 834: 715: 519: 491: 445: 81: 1198: 80:. For example, an extremely advanced system of this sort, given the sole purpose of solving the 737: 241:
tactics. Yudkowsky subsequently said that he had tried it against three others and lost twice.
1529: 1504: 1322: 962: 874: 809: 779: 674: 628: 568: 558: 546: 483: 404: 348: 264: 234: 221: 196:
However, for this eventually to occur, a system would require full understanding of the human
128: 1551: 1524: 1424: 1272: 1057: 1030: 866: 771: 707: 473: 463: 176: 123: 77: 73: 1489: 1469: 1459: 1449: 1383: 1317: 92: 1091: 858: 763: 459: 229:
that when "released" would not destroy the human race intentionally or unintentionally.
1312: 902: 551: 304: 1619: 1519: 1464: 1434: 625:
Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence
495: 478: 1509: 1439: 1252: 1189:"Presentation titled 'Thinking inside the box: using and controlling an Oracle AI'" 1042: 804:
Bostrom, Nick (2014). "Chapter 10: Oracles, genies, sovereigns, tools (page 147)".
719: 669:
Bostrom, Nick (2014). "Chapter 10: Oracles, genies, sovereigns, tools (page 145)".
384: 340: 181: 44: 40: 32: 870: 775: 620: 1514: 1499: 1307: 1267: 284: 586: 180:
possibly useful for a superintelligent AI, would be to place the computer in a
1479: 1454: 1429: 1388: 1368: 1187: 1034: 987: 711: 572: 468: 433: 1084:
Vision-21: Interdisciplinary Science and Engineering in the Era of Cyberspace
487: 1494: 1484: 1262: 279: 738:"Leakproofing the singularity: Artificial intelligence confinement problem" 76:
AI could, if its goals differed from humanity's, take actions resulting in
930: 367:
I.J. Good, "Speculations Concerning the First Ultraintelligent Machine"],
1474: 1181: 935: 309: 299: 957:
Bostrom, Nick (2013). "Chapter 9: The Control Problem: boxing methods".
43:
and others recommend capability control methods only as a supplement to
233:
ability to "release" the AI. They communicate through a text interface/
1408: 553:
Human Compatible: Artificial Intelligence and the Problem of Control
839: 524: 450: 1184:, including experimental protocols and suggestions for replication 175:
instance by manipulating its cooling fans. In response, Professor
434:"Methods for interpreting and understanding deep neural networks" 197: 1202: 959:
Superintelligence: the coming machine intelligence revolution
931:"Control dangerous AI before it controls us, one expert says" 220:
The AI-box experiment is an informal experiment devised by
31:(AGIs), in order to reduce the danger they might pose if 1182:
Eliezer Yudkowsky's description of his AI-box experiment
857:
Yampolskiy, Roman V. (2013), Müller, Vincent C. (ed.),
614: 612: 1636:
Existential risk from artificial general intelligence
1288:
Existential risk from artificial general intelligence
905:(2010). "The singularity: A philosophical analysis". 57:
Existential risk from artificial general intelligence
16:
Monitoring and controlling the behavior of AI systems
1128:"Artificial Intelligence: Gods, egos and Ex Machina" 762:
Armstrong, Stuart (2013), Müller, Vincent C. (ed.),
1538: 1417: 1359:
Center for Human-Compatible Artificial Intelligence
1336: 1240: 619:Orseau, Laurent; Armstrong, Stuart (25 June 2016). 347:(First ed.). Oxford: Oxford University Press. 335: 333: 331: 550: 61:Some hypothetical intelligence technologies, like 1399:Leverhulme Centre for the Future of Intelligence 863:Philosophy and Theory of Artificial Intelligence 768:Philosophy and Theory of Artificial Intelligence 988:"The AI-Box Experiment: – Eliezer S. Yudkowsky" 764:"Risks and Mitigation Strategies for Oracle AI" 1394:Institute for Ethics and Emerging Technologies 1578:Superintelligence: Paths, Dangers, Strategies 1558:Open letter on artificial intelligence (2015) 1214: 806:Superintelligence: Paths, Dangers, Strategies 671:Superintelligence: Paths, Dangers, Strategies 345:Superintelligence: Paths, Dangers, Strategies 208:, a way for empathizing, for instance, using 8: 897: 895: 403:. Upper Saddle River, N.J.: Prentice Hall. 1221: 1207: 1199: 1154:""Ex Machina" and the paper clips of doom" 859:"What to Do with the Singularity Paradox?" 400:Artificial Intelligence: A Modern Approach 1024: 838: 523: 477: 467: 449: 91:One strong challenge for control is that 1364:Centre for the Study of Existential Risk 1404:Machine Intelligence Research Institute 327: 924: 922: 920: 587:"Google developing kill switch for AI" 1631:Philosophy of artificial intelligence 1056:Yudkowsky, Eliezer (8 October 2008). 982: 980: 978: 852: 850: 731: 729: 641:from the original on 15 February 2021 428:Montavon, Grégoire; Samek, Wojciech; 315:Regulation of artificial intelligence 97:interpretable artificial intelligence 7: 1152:Achenbach, Joel (30 December 2015). 541: 539: 537: 535: 295:Asilomar Conference on Beneficial AI 39:. Therefore, the Oxford philosopher 1126:Robbins, Martin (26 January 2016). 961:. Oxford: Oxford University Press. 808:. Oxford: Oxford University Press. 673:. Oxford: Oxford University Press. 1546:Statement on AI risk of extinction 14: 1283:Ethics of artificial intelligence 597:from the original on 11 June 2016 204:contained in its world model for 99:could mitigate this difficulty. 1602: 1601: 1293:Friendly artificial intelligence 1112:Journal of Consciousness Studies 1058:"Shut up and do the impossible!" 907:Journal of Consciousness Studies 745:Journal of Consciousness Studies 227:friendly artificial intelligence 29:artificial general intelligences 627:. UAI'16. AUAI Press: 557–566. 103:Interruptibility and off-switch 1354:Center for Applied Rationality 1: 621:"Safely interruptible agents" 1374:Future of Humanity Institute 929:Hsu, Jeremy (1 March 2012). 871:10.1007/978-3-642-31674-6_30 776:10.1007/978-3-642-31674-6_25 1591:Artificial Intelligence Act 1585:Do You Trust This Computer? 1652: 736:Yampolskiy, Roman (2012). 54: 1599: 1344:Alignment Research Center 1328:Technological singularity 1278:Effective accelerationism 1035:10.1007/s11023-012-9282-2 712:10.1007/s11023-012-9282-2 557:. United States: Viking. 479:21.11116/0000-0000-4313-F 469:10.1016/j.dsp.2017.10.011 438:Digital Signal Processing 37:existential risk from AGI 1379:Future of Life Institute 1298:Instrumental convergence 290:Artificial consciousness 1234:artificial intelligence 21:artificial intelligence 1303:Intelligence explosion 67:intelligence explosion 1258:AI capability control 369:Advances in Computers 206:Model-based reasoning 25:AI capability control 1349:Center for AI Safety 1070:called a halt to it. 430:Müller, Klaus Robert 1564:Our Final Invention 1092:1993vise.nasa...11V 549:(October 8, 2019). 460:2018DSP....73....1M 245:Overall limitations 210:Affective computing 202:Psyche (psychology) 86:paperclip maximizer 1626:Singularitarianism 1013:Minds and Machines 700:Minds and Machines 188:Social engineering 82:Riemann hypothesis 1613: 1612: 1530:Eliezer Yudkowsky 1505:Stuart J. Russell 1323:Superintelligence 992:www.yudkowsky.net 880:978-3-642-31673-9 785:978-3-642-31673-9 564:978-0-525-55861-3 381:Vincent C. Müller 235:computer terminal 222:Eliezer Yudkowsky 216:AI-box experiment 165:Avenues of escape 129:Stuart J. Russell 45:alignment methods 1643: 1605: 1604: 1552:Human Compatible 1525:Roman Yampolskiy 1273:Consequentialism 1230:Existential risk 1223: 1216: 1209: 1200: 1190: 1169: 1168: 1166: 1164: 1149: 1143: 1142: 1140: 1138: 1123: 1117: 1116: 1106: 1100: 1099: 1079: 1073: 1072: 1066: 1064: 1053: 1047: 1046: 1028: 1008: 1002: 1001: 999: 998: 984: 973: 972: 954: 948: 947: 945: 943: 926: 915: 914: 899: 890: 889: 888: 887: 854: 845: 844: 842: 830: 824: 823: 801: 795: 794: 793: 792: 759: 753: 752: 742: 733: 724: 723: 695: 689: 688: 666: 660: 657: 651: 650: 648: 646: 616: 607: 606: 604: 602: 583: 577: 576: 556: 543: 530: 529: 527: 515: 509: 506: 500: 499: 481: 471: 453: 425: 419: 418: 394: 388: 378: 372: 365: 359: 358: 337: 177:Roman Yampolskiy 127:, AI researcher 124:Human Compatible 78:human extinction 74:superintelligent 19:In the field of 1651: 1650: 1646: 1645: 1644: 1642: 1641: 1640: 1616: 1615: 1614: 1609: 1595: 1534: 1490:Steve Omohundro 1470:Geoffrey Hinton 1460:Stephen Hawking 1445:Paul Christiano 1425:Scott Alexander 1413: 1384:Google DeepMind 1332: 1318:Suffering risks 1236: 1227: 1188: 1178: 1173: 1172: 1162: 1160: 1158:Washington Post 1151: 1150: 1146: 1136: 1134: 1125: 1124: 1120: 1108: 1107: 1103: 1081: 1080: 1076: 1062: 1060: 1055: 1054: 1050: 1010: 1009: 1005: 996: 994: 986: 985: 976: 969: 956: 955: 951: 941: 939: 928: 927: 918: 903:Chalmers, David 901: 900: 893: 885: 883: 881: 856: 855: 848: 832: 831: 827: 816: 803: 802: 798: 790: 788: 786: 761: 760: 756: 751:(1–2): 194–214. 740: 735: 734: 727: 697: 696: 692: 681: 668: 667: 663: 658: 654: 644: 642: 635: 618: 617: 610: 600: 598: 593:. 8 June 2016. 585: 584: 580: 565: 547:Russell, Stuart 545: 544: 533: 517: 516: 512: 507: 503: 427: 426: 422: 411: 396: 395: 391: 379: 375: 371:, vol. 6, 1965. 366: 362: 355: 339: 338: 329: 324: 319: 275: 262:The 2014 movie 260: 247: 218: 190: 172: 167: 150: 142: 118: 105: 93:neural networks 59: 53: 17: 12: 11: 5: 1649: 1647: 1639: 1638: 1633: 1628: 1618: 1617: 1611: 1610: 1600: 1597: 1596: 1594: 1593: 1588: 1581: 1574: 1567: 1560: 1555: 1548: 1542: 1540: 1536: 1535: 1533: 1532: 1527: 1522: 1517: 1512: 1507: 1502: 1497: 1492: 1487: 1482: 1477: 1472: 1467: 1462: 1457: 1452: 1447: 1442: 1437: 1432: 1427: 1421: 1419: 1415: 1414: 1412: 1411: 1406: 1401: 1396: 1391: 1386: 1381: 1376: 1371: 1366: 1361: 1356: 1351: 1346: 1340: 1338: 1334: 1333: 1331: 1330: 1325: 1320: 1315: 1313:Machine ethics 1310: 1305: 1300: 1295: 1290: 1285: 1280: 1275: 1270: 1265: 1260: 1255: 1250: 1244: 1242: 1238: 1237: 1228: 1226: 1225: 1218: 1211: 1203: 1197: 1196: 1185: 1177: 1176:External links 1174: 1171: 1170: 1144: 1118: 1101: 1074: 1048: 1026:10.1.1.396.799 1019:(4): 299–324. 1003: 974: 967: 949: 916: 891: 879: 846: 825: 814: 796: 784: 754: 725: 706:(4): 299–324. 690: 679: 661: 652: 633: 608: 578: 563: 531: 510: 501: 420: 410:978-0137903955 409: 389: 373: 360: 353: 326: 325: 323: 320: 318: 317: 312: 307: 305:Machine ethics 302: 297: 292: 287: 282: 276: 274: 271: 259: 256: 246: 243: 217: 214: 189: 186: 171: 168: 166: 163: 149: 146: 141: 138: 117: 114: 104: 101: 72:An unconfined 55:Main article: 52: 49: 15: 13: 10: 9: 6: 4: 3: 2: 1648: 1637: 1634: 1632: 1629: 1627: 1624: 1623: 1621: 1608: 1598: 1592: 1589: 1587: 1586: 1582: 1580: 1579: 1575: 1573: 1572: 1571:The Precipice 1568: 1566: 1565: 1561: 1559: 1556: 1554: 1553: 1549: 1547: 1544: 1543: 1541: 1537: 1531: 1528: 1526: 1523: 1521: 1520:Frank Wilczek 1518: 1516: 1513: 1511: 1508: 1506: 1503: 1501: 1498: 1496: 1493: 1491: 1488: 1486: 1483: 1481: 1478: 1476: 1473: 1471: 1468: 1466: 1465:Dan Hendrycks 1463: 1461: 1458: 1456: 1453: 1451: 1448: 1446: 1443: 1441: 1438: 1436: 1435:Yoshua Bengio 1433: 1431: 1428: 1426: 1423: 1422: 1420: 1416: 1410: 1407: 1405: 1402: 1400: 1397: 1395: 1392: 1390: 1387: 1385: 1382: 1380: 1377: 1375: 1372: 1370: 1367: 1365: 1362: 1360: 1357: 1355: 1352: 1350: 1347: 1345: 1342: 1341: 1339: 1337:Organizations 1335: 1329: 1326: 1324: 1321: 1319: 1316: 1314: 1311: 1309: 1306: 1304: 1301: 1299: 1296: 1294: 1291: 1289: 1286: 1284: 1281: 1279: 1276: 1274: 1271: 1269: 1266: 1264: 1261: 1259: 1256: 1254: 1251: 1249: 1246: 1245: 1243: 1239: 1235: 1231: 1224: 1219: 1217: 1212: 1210: 1205: 1204: 1201: 1195: 1191: 1186: 1183: 1180: 1179: 1175: 1159: 1155: 1148: 1145: 1133: 1129: 1122: 1119: 1114: 1113: 1105: 1102: 1098: 1093: 1089: 1085: 1078: 1075: 1071: 1059: 1052: 1049: 1044: 1040: 1036: 1032: 1027: 1022: 1018: 1014: 1007: 1004: 993: 989: 983: 981: 979: 975: 970: 968:9780199678112 964: 960: 953: 950: 938: 937: 932: 925: 923: 921: 917: 913:(9–10): 7–65. 912: 908: 904: 898: 896: 892: 882: 876: 872: 868: 864: 860: 853: 851: 847: 841: 836: 829: 826: 822: 817: 815:9780199678112 811: 807: 800: 797: 787: 781: 777: 773: 769: 765: 758: 755: 750: 746: 739: 732: 730: 726: 721: 717: 713: 709: 705: 701: 694: 691: 687: 682: 680:9780199678112 676: 672: 665: 662: 656: 653: 640: 636: 634:9780996643115 630: 626: 622: 615: 613: 609: 596: 592: 588: 582: 579: 574: 570: 566: 560: 555: 554: 548: 542: 540: 538: 536: 532: 526: 521: 514: 511: 505: 502: 497: 493: 489: 485: 480: 475: 470: 465: 461: 457: 452: 447: 443: 439: 435: 431: 424: 421: 417: 412: 406: 402: 401: 393: 390: 386: 382: 377: 374: 370: 364: 361: 356: 354:9780199678112 350: 346: 342: 341:Bostrom, Nick 336: 334: 332: 328: 321: 316: 313: 311: 308: 306: 303: 301: 298: 296: 293: 291: 288: 286: 283: 281: 278: 277: 272: 270: 267: 266: 257: 255: 251: 244: 242: 238: 236: 230: 228: 223: 215: 213: 211: 207: 203: 199: 194: 187: 185: 183: 178: 169: 164: 162: 158: 154: 147: 145: 139: 137: 133: 130: 126: 125: 115: 113: 109: 102: 100: 98: 94: 89: 87: 83: 79: 75: 70: 68: 64: 58: 50: 48: 46: 42: 38: 34: 30: 26: 23:(AI) design, 22: 1583: 1576: 1569: 1562: 1550: 1510:Jaan Tallinn 1450:Eric Drexler 1440:Nick Bostrom 1257: 1253:AI alignment 1161:. Retrieved 1157: 1147: 1135:. Retrieved 1132:The Guardian 1131: 1121: 1110: 1104: 1095: 1083: 1077: 1068: 1061:. Retrieved 1051: 1016: 1012: 1006: 995:. Retrieved 991: 958: 952: 940:. Retrieved 934: 910: 906: 884:, retrieved 862: 828: 819: 805: 799: 789:, retrieved 767: 757: 748: 744: 703: 699: 693: 684: 670: 664: 655: 643:. Retrieved 624: 599:. Retrieved 590: 581: 552: 513: 504: 441: 437: 423: 414: 399: 392: 385:Nick Bostrom 376: 368: 363: 344: 263: 261: 252: 248: 239: 231: 219: 195: 191: 182:Faraday cage 173: 159: 155: 151: 143: 134: 122: 119: 110: 106: 90: 71: 60: 41:Nick Bostrom 24: 18: 1515:Max Tegmark 1500:Martin Rees 1308:Longtermism 1268:AI takeover 285:AI takeover 1620:Categories 1480:Shane Legg 1455:Sam Harris 1430:Sam Altman 1369:EleutherAI 1115:: 194–214. 997:2022-09-19 942:29 January 886:2022-09-19 840:1606.06565 791:2022-09-18 645:7 February 573:1083694322 525:1611.08219 451:1706.07979 322:References 265:Ex Machina 258:In fiction 51:Motivation 33:misaligned 1495:Huw Price 1485:Elon Musk 1389:Humanity+ 1263:AI safety 1086:: 11–22. 1063:11 August 1021:CiteSeerX 496:207170725 488:1051-2004 280:AI safety 63:"seed AI" 1607:Category 1475:Bill Joy 1241:Concepts 936:NBC News 639:Archived 595:Archived 591:BBC News 444:: 1–15. 432:(2018). 343:(2014). 310:Multivac 300:HAL 9000 273:See also 170:Physical 140:Blinding 1194:YouTube 1163:9 April 1137:9 April 1088:Bibcode 1043:9464769 720:9464769 601:12 June 456:Bibcode 1418:People 1409:OpenAI 1041:  1023:  965:  877:  812:  782:  718:  677:  631:  571:  561:  494:  486:  407:  351:  148:Boxing 116:Oracle 1539:Other 1232:from 1039:S2CID 835:arXiv 741:(PDF) 716:S2CID 520:arXiv 492:S2CID 446:arXiv 416:goal. 1165:2018 1139:2018 1065:2015 963:ISBN 944:2016 875:ISBN 810:ISBN 780:ISBN 675:ISBN 647:2021 629:ISBN 603:2016 569:OCLC 559:ISBN 484:ISSN 405:ISBN 383:and 349:ISBN 200:and 198:Mind 88:). 1248:AGI 1192:on 1031:doi 867:doi 772:doi 708:doi 474:hdl 464:doi 1622:: 1156:. 1130:. 1094:. 1067:. 1037:. 1029:. 1017:22 1015:. 990:. 977:^ 933:. 919:^ 911:17 909:. 894:^ 873:, 861:, 849:^ 818:. 778:, 766:, 749:19 747:. 743:. 728:^ 714:. 704:22 702:. 683:. 637:. 623:. 611:^ 589:. 567:. 534:^ 490:. 482:. 472:. 462:. 454:. 442:73 440:. 436:. 413:. 330:^ 69:. 47:. 1222:e 1215:t 1208:v 1167:. 1141:. 1090:: 1045:. 1033:: 1000:. 971:. 946:. 869:: 843:. 837:: 774:: 722:. 710:: 649:. 605:. 575:. 528:. 522:: 498:. 476:: 466:: 458:: 448:: 357:.

Index

artificial intelligence
artificial general intelligences
misaligned
existential risk from AGI
Nick Bostrom
alignment methods
Existential risk from artificial general intelligence
"seed AI"
intelligence explosion
superintelligent
human extinction
Riemann hypothesis
paperclip maximizer
neural networks
interpretable artificial intelligence
Human Compatible
Stuart J. Russell
Roman Yampolskiy
Faraday cage
Mind
Psyche (psychology)
Model-based reasoning
Affective computing
Eliezer Yudkowsky
friendly artificial intelligence
computer terminal
Ex Machina
AI safety
AI takeover
Artificial consciousness

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.