193:
allow the AI greater access to the outside world. The AI might offer a gatekeeper a recipe for perfect health, immortality, or whatever the gatekeeper is believed to most desire; alternatively, the AI could threaten to do horrific things to the gatekeeper and his family once it inevitably escapes. One strategy to attempt to box the AI would be to allow it to respond to narrow multiple-choice questions whose answers would benefit human science or medicine, but otherwise bar all other communication with, or observation of, the AI. A more lenient "informational containment" strategy would restrict the AI to a low-bandwidth text-only interface, which would at least prevent emotive imagery or some kind of hypothetical "hypnotic pattern". However, on a technical level, no system can be completely isolated and still remain useful: even if the operators refrain from allowing the AI to communicate and instead merely run it for the purpose of observing its inner dynamics, the AI could strategically alter its dynamics to influence the observers. For example, it could choose to creatively malfunction in a way that increases the probability that its operators will become lulled into a false sense of security and choose to reboot and then de-isolate the system.
136:
However, oracles may share many of the goal definition issues associated with general purpose superintelligence. An oracle would have an incentive to escape its controlled environment so that it can acquire more computational resources and potentially control what questions it is asked. Oracles may not be truthful, possibly lying to promote hidden agendas. To mitigate this, Bostrom suggests building multiple oracles, all slightly different, and comparing their answers in order to reach a consensus.
254:
broadly, unlike with conventional computer security, attempting to box a superintelligent AI would be intrinsically risky as there could be no certainty that the boxing plan will work. Additionally, scientific progress on boxing would be fundamentally difficult because there would be no way to test boxing hypotheses against a dangerous superintelligence until such an entity exists, by which point the consequences of a test failure would be catastrophic.
686:
accepts open-ended questions would need some metric with which to rank possible truthful answers in terms of their informativeness or appropriateness. In either case, building an oracle that has a fully domain-general ability to answer natural language questions is an AI-complete problem. If one could do that, one could probably also build an AI that has a decent ability to understand human intentions as well as human words.
1603:
112:
or not is also unmotivated to care about whether the off-switch remains functional, and could incidentally and innocently disable it in the course of its operations (for example, for the purpose of removing and recycling an unnecessary component). More broadly, indifferent agents will act as if the off-switch can never be pressed, and might therefore fail to make contingency plans to arrange a graceful shutdown.
108:
computers. This problem has been formalised as an assistance game between a human and an AI, in which the AI can choose whether to disable its off-switch; and then, if the switch is still enabled, the human can choose whether to press it or not. One workaround suggested by computer scientist Stuart J. Russell is to ensure that the AI interprets human choices as important information about its intended goals.
250:
grows, the more likely the system would be able to escape even the best-designed capability control methods. In order to solve the overall "control problem" for a superintelligent AI and avoid existential risk, boxing would at best be an adjunct to "motivation selection" methods that seek to ensure the superintelligent AI's goals are compatible with human survival.
685:
An oracle is a question-answering system. It might accept questions in a natural language and present its answers as text. An oracle that accepts only yes/no questions could output its best guess with a single bit, or perhaps with a few extra bits to represent its degree of confidence. An oracle that
152:
An AI box is a proposed method of capability control in which an AI is run on an isolated computer system with heavily restricted input and output channels—for example, text-only channels and no connection to the internet. The purpose of an AI box is to reduce the risk of the AI taking control of the
135:
Because of its limited impact on the world, it may be wise to build an oracle as a precursor to a superintelligent AI. The oracle could tell humans how to successfully build a strong AI, and perhaps provide answers to difficult moral and philosophical problems requisite to the success of the project.
820:
For example, consider the risk that an oracle will answer questions not in a maximally truthful way but in such a way as to subtly manipulate us into promoting its own hidden agenda. One way to slightly mitigate this threat could be to create multiple oracles, each with a slightly different code and
120:
An oracle is a hypothetical AI designed to answer questions and prevented from gaining any goals or subgoals that involve modifying the world beyond its limited environment. A successfully controlled oracle would have considerably less immediate benefit than a successfully controlled general purpose
111:
Alternatively, Laurent Orseau and Stuart
Armstrong proved that a broad class of agents, called safely interruptible agents, can learn to become indifferent to whether their off-switch gets pressed. This approach has the limitation that an AI which is completely indifferent to whether it is shut down
107:
One potential way to prevent harmful outcomes is to give human supervisors the ability to easily shut down a misbehaving AI via an "off-switch". However, in order to achieve their assigned objective, such AIs will have an incentive to disable any off-switches, or to run copies of themselves on other
192:
Even casual conversation with the computer's operators, or with a human guard, could allow such a superintelligent AI to deploy psychological tricks, ranging from befriending to blackmail, to convince a human gatekeeper, truthfully or deceitfully, that it is in the gatekeeper's interest to agree to
268:
features an AI with a female humanoid body engaged in a social experiment with a male human in a confined building acting as a physical "AI box". Despite being watched by the experiment's organizer, the AI manages to escape by manipulating its human partner to help it, leaving him stranded inside.
1096:
I argue that confinement is intrinsically impractical. For the case of physical confinement: Imagine yourself confined to your house with only limited data access to the outside, to your masters. If those masters thought at a rate -- say -- one million times slower than you, there is little doubt
174:
A superintelligent AI with access to the
Internet could hack into other computer systems and copy itself like a computer virus. Less obviously, even if the AI only had access to its own computer operating system, it could attempt to send coded messages to a human sympathizer via its hardware, for
1069:
There were three more AI-Box experiments besides the ones described on the linked page, which I never got around to adding in. ... So, after investigating to make sure they could afford to lose it, I played another three AI-Box experiments. I won the first, and then lost the next two. And then I
232:
The AI box experiment involves simulating a communication between an AI and a human being to see if the AI can be "released". As an actual super-intelligent AI has not yet been developed, it is substituted by a human. The other person in the experiment plays the "Gatekeeper", the person with the
253:
All physical boxing proposals are naturally dependent on our understanding of the laws of physics; if a superintelligence could infer physical laws that we are currently unaware of, then those laws might allow for a means of escape that humans could not anticipate and thus could not block. More
249:
Boxing an AI could be supplemented with other methods of shaping the AI's capabilities, providing incentives to the AI, stunting the AI's growth, or implementing "tripwires" that automatically shut the AI off if a transgression attempt is somehow detected. However, the more intelligent a system
240:
Yudkowsky says that, despite being of human rather than superhuman intelligence, he was on two occasions able to convince the
Gatekeeper, purely through argumentation, to let him out of the box. Due to the rules of the experiment, he did not reveal the transcript or his successful AI coercion
179:
takes inspiration from the field of computer security and proposes that a boxed AI could, like a potential virus, be run inside a "virtual machine" that limits access to its own networking and operating system hardware. An additional safeguard, completely unnecessary for potential viruses but
131:
states that an oracle would be his response to a scenario in which superintelligence is known to be only a decade away. His reasoning is that an oracle, being simpler than a general purpose superintelligence, would have a higher chance of being successfully controlled under such constraints.
160:
The likelihood of security flaws involving hardware or software vulnerabilities can be reduced by formally verifying the design of the AI box. Security breaches may occur if the AI is able to manipulate the human supervisors into letting it out, via its understanding of their psychology.
224:
to attempt to demonstrate that a suitably advanced artificial intelligence can either convince, or perhaps even trick or coerce, a human being into voluntarily "releasing" it, using only text-based communication. This is one of the points in
Yudkowsky's work aimed at creating a
184:; otherwise, it might be able to transmit radio signals to local radio receivers by shuffling the electrons in its internal circuits in appropriate patterns. The main disadvantage of implementing physical containment is that it reduces the functionality of the AI.
65:, are postulated to be able to make themselves faster and more intelligent by modifying their source code. These improvements would make further improvements possible, which would in turn make further iterative improvements possible, and so on, leading to a sudden
156:
While boxing reduces the AI's ability to carry out undesirable behavior, it also reduces its usefulness. Boxing has fewer costs when applied to a question-answering system, which may not require interaction with the outside world.
1220:
415:
Similarly, Marvin Minsky once suggested that an AI program designed to solve the
Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful supercomputers to help achieve its
1635:
1606:
144:
An AI could be blinded to certain variables in its environment. This could provide certain safety benefits, such as an AI not knowing how a reward is generated, making it more difficult to exploit.
1213:
1570:
84:, an innocuous mathematical conjecture, could decide to try to convert the planet into a giant supercomputer whose sole purpose is to make additional mathematical calculations (see also
35:. However, capability control becomes less effective as agents become more intelligent and their ability to exploit flaws in human control systems increases, potentially resulting in an
821:
a slightly different information base. A simple mechanism could then compare the answers given by the different oracles and only present them for human viewing if all the answers agree.
1206:
1287:
1229:
56:
36:
1358:
1398:
95:
are by default highly uninterpretable. This makes it more difficult to detect deception or other undesired behavior as the model self-trains iteratively. Advances in
1630:
1393:
1577:
1557:
212:
in order to select the best option, as well as features which would give the system a desire to escape in the first place, in order to decide on such actions.
398:
1127:
387:. "Future progress in artificial intelligence: A survey of expert opinion" in Fundamental Issues of Artificial Intelligence. Springer 553-571 (2016).
1363:
1403:
865:, Studies in Applied Philosophy, Epistemology and Rational Ethics, vol. 5, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 397–413,
770:, Studies in Applied Philosophy, Epistemology and Rational Ethics, vol. 5, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 335–347,
508:
Yampolskiy, Roman V. "Unexplainability and
Incomprehensibility of AI." Journal of Artificial Intelligence and Consciousness 7.02 (2020): 277-291.
27:
proposals, also referred to as AI confinement, aim to increase our ability to monitor and control the behavior of AI systems, including proposed
878:
833:
Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (25 July 2016). "Concrete
Problems in AI Safety".
783:
562:
314:
96:
294:
1545:
1153:
408:
1282:
966:
813:
678:
632:
352:
1292:
1111:
1011:
Armstrong, Stuart; Sandberg, Anders; Bostrom, Nick (6 June 2012). "Thinking Inside the Box: Controlling and Using an Oracle AI".
226:
1247:
28:
1353:
638:
698:
Armstrong, Stuart; Sandberg, Anders; Bostrom, Nick (2012). "Thinking Inside the Box: Controlling and Using an Oracle AI".
594:
1584:
1444:
1373:
1625:
1590:
397:
Russell, Stuart J.; Norvig, Peter (2003). "Section 26.3: The Ethics and Risks of
Developing Artificial Intelligence".
1097:
that over a period of years (your time) you could come up with 'helpful advice' that would incidentally set you free.
659:
Soares, Nate, et al. "Corrigibility." Workshops at the Twenty-Ninth AAAI Conference on
Artificial Intelligence. 2015.
1193:
153:
environment away from its operators, while still allowing the AI to output solutions to narrow technical problems.
237:
only, and the experiment ends when either the
Gatekeeper releases the AI, or the allotted time of two hours ends.
1343:
1327:
1277:
62:
1378:
1297:
289:
85:
518:
Hadfield-Menell, Dylan; Dragan, Anca; Abbeel, Pieter; Russell, Stuart (15 June 2017). "The Off-Switch Game".
1233:
20:
1302:
1020:
429:
66:
205:
121:
superintelligence, though an oracle could still create trillions of dollars worth of value. In his book
1109:
Yampolskiy, Roman (2012). "Leakproofing the Singularity Artificial Intelligence Confinement Problem".
1348:
1087:
455:
380:
1082:
Vinge, Vernor (1993). "The coming technological singularity: How to survive in the post-human era".
1563:
1025:
209:
201:
1038:
834:
715:
519:
491:
445:
81:
1198:
80:. For example, an extremely advanced system of this sort, given the sole purpose of solving the
737:
241:
tactics. Yudkowsky subsequently said that he had tried it against three others and lost twice.
1529:
1504:
1322:
962:
874:
809:
779:
674:
628:
568:
558:
546:
483:
404:
348:
264:
234:
221:
196:
However, for this eventually to occur, a system would require full understanding of the human
128:
1551:
1524:
1424:
1272:
1057:
1030:
866:
771:
707:
473:
463:
176:
123:
77:
73:
1489:
1469:
1459:
1449:
1383:
1317:
92:
1091:
858:
763:
459:
229:
that when "released" would not destroy the human race intentionally or unintentionally.
1312:
902:
551:
304:
1619:
1519:
1464:
1434:
625:
Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence
495:
478:
1509:
1439:
1252:
1189:"Presentation titled 'Thinking inside the box: using and controlling an Oracle AI'"
1042:
804:
Bostrom, Nick (2014). "Chapter 10: Oracles, genies, sovereigns, tools (page 147)".
719:
669:
Bostrom, Nick (2014). "Chapter 10: Oracles, genies, sovereigns, tools (page 145)".
384:
340:
181:
44:
40:
32:
870:
775:
620:
1514:
1499:
1307:
1267:
284:
586:
180:
possibly useful for a superintelligent AI, would be to place the computer in a
1479:
1454:
1429:
1388:
1368:
1187:
1034:
987:
711:
572:
468:
433:
1084:
Vision-21: Interdisciplinary Science and Engineering in the Era of Cyberspace
487:
1494:
1484:
1262:
279:
738:"Leakproofing the singularity: Artificial intelligence confinement problem"
76:
AI could, if its goals differed from humanity's, take actions resulting in
930:
367:
I.J. Good, "Speculations Concerning the First Ultraintelligent Machine"],
1474:
1181:
935:
309:
299:
957:
Bostrom, Nick (2013). "Chapter 9: The Control Problem: boxing methods".
43:
and others recommend capability control methods only as a supplement to
233:
ability to "release" the AI. They communicate through a text interface/
1408:
553:
Human Compatible: Artificial Intelligence and the Problem of Control
839:
524:
450:
1184:, including experimental protocols and suggestions for replication
175:
instance by manipulating its cooling fans. In response, Professor
434:"Methods for interpreting and understanding deep neural networks"
197:
1202:
959:
Superintelligence: the coming machine intelligence revolution
931:"Control dangerous AI before it controls us, one expert says"
220:
The AI-box experiment is an informal experiment devised by
31:(AGIs), in order to reduce the danger they might pose if
1182:
Eliezer Yudkowsky's description of his AI-box experiment
857:
Yampolskiy, Roman V. (2013), Müller, Vincent C. (ed.),
614:
612:
1636:
Existential risk from artificial general intelligence
1288:
Existential risk from artificial general intelligence
905:(2010). "The singularity: A philosophical analysis".
57:
Existential risk from artificial general intelligence
16:
Monitoring and controlling the behavior of AI systems
1128:"Artificial Intelligence: Gods, egos and Ex Machina"
762:
Armstrong, Stuart (2013), Müller, Vincent C. (ed.),
1538:
1417:
1359:
Center for Human-Compatible Artificial Intelligence
1336:
1240:
619:Orseau, Laurent; Armstrong, Stuart (25 June 2016).
347:(First ed.). Oxford: Oxford University Press.
335:
333:
331:
550:
61:Some hypothetical intelligence technologies, like
1399:Leverhulme Centre for the Future of Intelligence
863:Philosophy and Theory of Artificial Intelligence
768:Philosophy and Theory of Artificial Intelligence
988:"The AI-Box Experiment: – Eliezer S. Yudkowsky"
764:"Risks and Mitigation Strategies for Oracle AI"
1394:Institute for Ethics and Emerging Technologies
1578:Superintelligence: Paths, Dangers, Strategies
1558:Open letter on artificial intelligence (2015)
1214:
806:Superintelligence: Paths, Dangers, Strategies
671:Superintelligence: Paths, Dangers, Strategies
345:Superintelligence: Paths, Dangers, Strategies
208:, a way for empathizing, for instance, using
8:
897:
895:
403:. Upper Saddle River, N.J.: Prentice Hall.
1221:
1207:
1199:
1154:""Ex Machina" and the paper clips of doom"
859:"What to Do with the Singularity Paradox?"
400:Artificial Intelligence: A Modern Approach
1024:
838:
523:
477:
467:
449:
91:One strong challenge for control is that
1364:Centre for the Study of Existential Risk
1404:Machine Intelligence Research Institute
327:
924:
922:
920:
587:"Google developing kill switch for AI"
1631:Philosophy of artificial intelligence
1056:Yudkowsky, Eliezer (8 October 2008).
982:
980:
978:
852:
850:
731:
729:
641:from the original on 15 February 2021
428:Montavon, Grégoire; Samek, Wojciech;
315:Regulation of artificial intelligence
97:interpretable artificial intelligence
7:
1152:Achenbach, Joel (30 December 2015).
541:
539:
537:
535:
295:Asilomar Conference on Beneficial AI
39:. Therefore, the Oxford philosopher
1126:Robbins, Martin (26 January 2016).
961:. Oxford: Oxford University Press.
808:. Oxford: Oxford University Press.
673:. Oxford: Oxford University Press.
1546:Statement on AI risk of extinction
14:
1283:Ethics of artificial intelligence
597:from the original on 11 June 2016
204:contained in its world model for
99:could mitigate this difficulty.
1602:
1601:
1293:Friendly artificial intelligence
1112:Journal of Consciousness Studies
1058:"Shut up and do the impossible!"
907:Journal of Consciousness Studies
745:Journal of Consciousness Studies
227:friendly artificial intelligence
29:artificial general intelligences
627:. UAI'16. AUAI Press: 557–566.
103:Interruptibility and off-switch
1354:Center for Applied Rationality
1:
621:"Safely interruptible agents"
1374:Future of Humanity Institute
929:Hsu, Jeremy (1 March 2012).
871:10.1007/978-3-642-31674-6_30
776:10.1007/978-3-642-31674-6_25
1591:Artificial Intelligence Act
1585:Do You Trust This Computer?
1652:
736:Yampolskiy, Roman (2012).
54:
1599:
1344:Alignment Research Center
1328:Technological singularity
1278:Effective accelerationism
1035:10.1007/s11023-012-9282-2
712:10.1007/s11023-012-9282-2
557:. United States: Viking.
479:21.11116/0000-0000-4313-F
469:10.1016/j.dsp.2017.10.011
438:Digital Signal Processing
37:existential risk from AGI
1379:Future of Life Institute
1298:Instrumental convergence
290:Artificial consciousness
1234:artificial intelligence
21:artificial intelligence
1303:Intelligence explosion
67:intelligence explosion
1258:AI capability control
369:Advances in Computers
206:Model-based reasoning
25:AI capability control
1349:Center for AI Safety
1070:called a halt to it.
430:Müller, Klaus Robert
1564:Our Final Invention
1092:1993vise.nasa...11V
549:(October 8, 2019).
460:2018DSP....73....1M
245:Overall limitations
210:Affective computing
202:Psyche (psychology)
86:paperclip maximizer
1626:Singularitarianism
1013:Minds and Machines
700:Minds and Machines
188:Social engineering
82:Riemann hypothesis
1613:
1612:
1530:Eliezer Yudkowsky
1505:Stuart J. Russell
1323:Superintelligence
992:www.yudkowsky.net
880:978-3-642-31673-9
785:978-3-642-31673-9
564:978-0-525-55861-3
381:Vincent C. Müller
235:computer terminal
222:Eliezer Yudkowsky
216:AI-box experiment
165:Avenues of escape
129:Stuart J. Russell
45:alignment methods
1643:
1605:
1604:
1552:Human Compatible
1525:Roman Yampolskiy
1273:Consequentialism
1230:Existential risk
1223:
1216:
1209:
1200:
1190:
1169:
1168:
1166:
1164:
1149:
1143:
1142:
1140:
1138:
1123:
1117:
1116:
1106:
1100:
1099:
1079:
1073:
1072:
1066:
1064:
1053:
1047:
1046:
1028:
1008:
1002:
1001:
999:
998:
984:
973:
972:
954:
948:
947:
945:
943:
926:
915:
914:
899:
890:
889:
888:
887:
854:
845:
844:
842:
830:
824:
823:
801:
795:
794:
793:
792:
759:
753:
752:
742:
733:
724:
723:
695:
689:
688:
666:
660:
657:
651:
650:
648:
646:
616:
607:
606:
604:
602:
583:
577:
576:
556:
543:
530:
529:
527:
515:
509:
506:
500:
499:
481:
471:
453:
425:
419:
418:
394:
388:
378:
372:
365:
359:
358:
337:
177:Roman Yampolskiy
127:, AI researcher
124:Human Compatible
78:human extinction
74:superintelligent
19:In the field of
1651:
1650:
1646:
1645:
1644:
1642:
1641:
1640:
1616:
1615:
1614:
1609:
1595:
1534:
1490:Steve Omohundro
1470:Geoffrey Hinton
1460:Stephen Hawking
1445:Paul Christiano
1425:Scott Alexander
1413:
1384:Google DeepMind
1332:
1318:Suffering risks
1236:
1227:
1188:
1178:
1173:
1172:
1162:
1160:
1158:Washington Post
1151:
1150:
1146:
1136:
1134:
1125:
1124:
1120:
1108:
1107:
1103:
1081:
1080:
1076:
1062:
1060:
1055:
1054:
1050:
1010:
1009:
1005:
996:
994:
986:
985:
976:
969:
956:
955:
951:
941:
939:
928:
927:
918:
903:Chalmers, David
901:
900:
893:
885:
883:
881:
856:
855:
848:
832:
831:
827:
816:
803:
802:
798:
790:
788:
786:
761:
760:
756:
751:(1–2): 194–214.
740:
735:
734:
727:
697:
696:
692:
681:
668:
667:
663:
658:
654:
644:
642:
635:
618:
617:
610:
600:
598:
593:. 8 June 2016.
585:
584:
580:
565:
547:Russell, Stuart
545:
544:
533:
517:
516:
512:
507:
503:
427:
426:
422:
411:
396:
395:
391:
379:
375:
371:, vol. 6, 1965.
366:
362:
355:
339:
338:
329:
324:
319:
275:
262:The 2014 movie
260:
247:
218:
190:
172:
167:
150:
142:
118:
105:
93:neural networks
59:
53:
17:
12:
11:
5:
1649:
1647:
1639:
1638:
1633:
1628:
1618:
1617:
1611:
1610:
1600:
1597:
1596:
1594:
1593:
1588:
1581:
1574:
1567:
1560:
1555:
1548:
1542:
1540:
1536:
1535:
1533:
1532:
1527:
1522:
1517:
1512:
1507:
1502:
1497:
1492:
1487:
1482:
1477:
1472:
1467:
1462:
1457:
1452:
1447:
1442:
1437:
1432:
1427:
1421:
1419:
1415:
1414:
1412:
1411:
1406:
1401:
1396:
1391:
1386:
1381:
1376:
1371:
1366:
1361:
1356:
1351:
1346:
1340:
1338:
1334:
1333:
1331:
1330:
1325:
1320:
1315:
1313:Machine ethics
1310:
1305:
1300:
1295:
1290:
1285:
1280:
1275:
1270:
1265:
1260:
1255:
1250:
1244:
1242:
1238:
1237:
1228:
1226:
1225:
1218:
1211:
1203:
1197:
1196:
1185:
1177:
1176:External links
1174:
1171:
1170:
1144:
1118:
1101:
1074:
1048:
1026:10.1.1.396.799
1019:(4): 299–324.
1003:
974:
967:
949:
916:
891:
879:
846:
825:
814:
796:
784:
754:
725:
706:(4): 299–324.
690:
679:
661:
652:
633:
608:
578:
563:
531:
510:
501:
420:
410:978-0137903955
409:
389:
373:
360:
353:
326:
325:
323:
320:
318:
317:
312:
307:
305:Machine ethics
302:
297:
292:
287:
282:
276:
274:
271:
259:
256:
246:
243:
217:
214:
189:
186:
171:
168:
166:
163:
149:
146:
141:
138:
117:
114:
104:
101:
72:An unconfined
55:Main article:
52:
49:
15:
13:
10:
9:
6:
4:
3:
2:
1648:
1637:
1634:
1632:
1629:
1627:
1624:
1623:
1621:
1608:
1598:
1592:
1589:
1587:
1586:
1582:
1580:
1579:
1575:
1573:
1572:
1571:The Precipice
1568:
1566:
1565:
1561:
1559:
1556:
1554:
1553:
1549:
1547:
1544:
1543:
1541:
1537:
1531:
1528:
1526:
1523:
1521:
1520:Frank Wilczek
1518:
1516:
1513:
1511:
1508:
1506:
1503:
1501:
1498:
1496:
1493:
1491:
1488:
1486:
1483:
1481:
1478:
1476:
1473:
1471:
1468:
1466:
1465:Dan Hendrycks
1463:
1461:
1458:
1456:
1453:
1451:
1448:
1446:
1443:
1441:
1438:
1436:
1435:Yoshua Bengio
1433:
1431:
1428:
1426:
1423:
1422:
1420:
1416:
1410:
1407:
1405:
1402:
1400:
1397:
1395:
1392:
1390:
1387:
1385:
1382:
1380:
1377:
1375:
1372:
1370:
1367:
1365:
1362:
1360:
1357:
1355:
1352:
1350:
1347:
1345:
1342:
1341:
1339:
1337:Organizations
1335:
1329:
1326:
1324:
1321:
1319:
1316:
1314:
1311:
1309:
1306:
1304:
1301:
1299:
1296:
1294:
1291:
1289:
1286:
1284:
1281:
1279:
1276:
1274:
1271:
1269:
1266:
1264:
1261:
1259:
1256:
1254:
1251:
1249:
1246:
1245:
1243:
1239:
1235:
1231:
1224:
1219:
1217:
1212:
1210:
1205:
1204:
1201:
1195:
1191:
1186:
1183:
1180:
1179:
1175:
1159:
1155:
1148:
1145:
1133:
1129:
1122:
1119:
1114:
1113:
1105:
1102:
1098:
1093:
1089:
1085:
1078:
1075:
1071:
1059:
1052:
1049:
1044:
1040:
1036:
1032:
1027:
1022:
1018:
1014:
1007:
1004:
993:
989:
983:
981:
979:
975:
970:
968:9780199678112
964:
960:
953:
950:
938:
937:
932:
925:
923:
921:
917:
913:(9–10): 7–65.
912:
908:
904:
898:
896:
892:
882:
876:
872:
868:
864:
860:
853:
851:
847:
841:
836:
829:
826:
822:
817:
815:9780199678112
811:
807:
800:
797:
787:
781:
777:
773:
769:
765:
758:
755:
750:
746:
739:
732:
730:
726:
721:
717:
713:
709:
705:
701:
694:
691:
687:
682:
680:9780199678112
676:
672:
665:
662:
656:
653:
640:
636:
634:9780996643115
630:
626:
622:
615:
613:
609:
596:
592:
588:
582:
579:
574:
570:
566:
560:
555:
554:
548:
542:
540:
538:
536:
532:
526:
521:
514:
511:
505:
502:
497:
493:
489:
485:
480:
475:
470:
465:
461:
457:
452:
447:
443:
439:
435:
431:
424:
421:
417:
412:
406:
402:
401:
393:
390:
386:
382:
377:
374:
370:
364:
361:
356:
354:9780199678112
350:
346:
342:
341:Bostrom, Nick
336:
334:
332:
328:
321:
316:
313:
311:
308:
306:
303:
301:
298:
296:
293:
291:
288:
286:
283:
281:
278:
277:
272:
270:
267:
266:
257:
255:
251:
244:
242:
238:
236:
230:
228:
223:
215:
213:
211:
207:
203:
199:
194:
187:
185:
183:
178:
169:
164:
162:
158:
154:
147:
145:
139:
137:
133:
130:
126:
125:
115:
113:
109:
102:
100:
98:
94:
89:
87:
83:
79:
75:
70:
68:
64:
58:
50:
48:
46:
42:
38:
34:
30:
26:
23:(AI) design,
22:
1583:
1576:
1569:
1562:
1550:
1510:Jaan Tallinn
1450:Eric Drexler
1440:Nick Bostrom
1257:
1253:AI alignment
1161:. Retrieved
1157:
1147:
1135:. Retrieved
1132:The Guardian
1131:
1121:
1110:
1104:
1095:
1083:
1077:
1068:
1061:. Retrieved
1051:
1016:
1012:
1006:
995:. Retrieved
991:
958:
952:
940:. Retrieved
934:
910:
906:
884:, retrieved
862:
828:
819:
805:
799:
789:, retrieved
767:
757:
748:
744:
703:
699:
693:
684:
670:
664:
655:
643:. Retrieved
624:
599:. Retrieved
590:
581:
552:
513:
504:
441:
437:
423:
414:
399:
392:
385:Nick Bostrom
376:
368:
363:
344:
263:
261:
252:
248:
239:
231:
219:
195:
191:
182:Faraday cage
173:
159:
155:
151:
143:
134:
122:
119:
110:
106:
90:
71:
60:
41:Nick Bostrom
24:
18:
1515:Max Tegmark
1500:Martin Rees
1308:Longtermism
1268:AI takeover
285:AI takeover
1620:Categories
1480:Shane Legg
1455:Sam Harris
1430:Sam Altman
1369:EleutherAI
1115:: 194–214.
997:2022-09-19
942:29 January
886:2022-09-19
840:1606.06565
791:2022-09-18
645:7 February
573:1083694322
525:1611.08219
451:1706.07979
322:References
265:Ex Machina
258:In fiction
51:Motivation
33:misaligned
1495:Huw Price
1485:Elon Musk
1389:Humanity+
1263:AI safety
1086:: 11–22.
1063:11 August
1021:CiteSeerX
496:207170725
488:1051-2004
280:AI safety
63:"seed AI"
1607:Category
1475:Bill Joy
1241:Concepts
936:NBC News
639:Archived
595:Archived
591:BBC News
444:: 1–15.
432:(2018).
343:(2014).
310:Multivac
300:HAL 9000
273:See also
170:Physical
140:Blinding
1194:YouTube
1163:9 April
1137:9 April
1088:Bibcode
1043:9464769
720:9464769
601:12 June
456:Bibcode
1418:People
1409:OpenAI
1041:
1023:
965:
877:
812:
782:
718:
677:
631:
571:
561:
494:
486:
407:
351:
148:Boxing
116:Oracle
1539:Other
1232:from
1039:S2CID
835:arXiv
741:(PDF)
716:S2CID
520:arXiv
492:S2CID
446:arXiv
416:goal.
1165:2018
1139:2018
1065:2015
963:ISBN
944:2016
875:ISBN
810:ISBN
780:ISBN
675:ISBN
647:2021
629:ISBN
603:2016
569:OCLC
559:ISBN
484:ISSN
405:ISBN
383:and
349:ISBN
200:and
198:Mind
88:).
1248:AGI
1192:on
1031:doi
867:doi
772:doi
708:doi
474:hdl
464:doi
1622::
1156:.
1130:.
1094:.
1067:.
1037:.
1029:.
1017:22
1015:.
990:.
977:^
933:.
919:^
911:17
909:.
894:^
873:,
861:,
849:^
818:.
778:,
766:,
749:19
747:.
743:.
728:^
714:.
704:22
702:.
683:.
637:.
623:.
611:^
589:.
567:.
534:^
490:.
482:.
472:.
462:.
454:.
442:73
440:.
436:.
413:.
330:^
69:.
47:.
1222:e
1215:t
1208:v
1167:.
1141:.
1090::
1045:.
1033::
1000:.
971:.
946:.
869::
843:.
837::
774::
722:.
710::
649:.
605:.
575:.
528:.
522::
498:.
476::
466::
458::
448::
357:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.