Letter from Utopia: Talking to Nick Bostrom

This conversation, transcribed by Phoebe Kaufmann, focuses on Nick Bostrom’s book Superintelligence: Paths, Dangers, Strategies. It was reading Superintelligence’s meticulous, cosmos-encompassing thought experiments, with Bostrom’s lucid prose calmly outlining unprecedented urgencies posed by existential-risk scenarios, that made me want to explore literary aspects of public-intellectual practice in the first place. Bostrom is Professor at Oxford University, where he is founding Director of the Future of Humanity Institute, and directs the Strategic Artificial Intelligence Research Center. His 200-plus publications include the books Anthropic Bias (Routledge, 2002), Global Catastrophic Risks (Oxford University Press, 2008), and Human Enhancement (Oxford University Press, 2009). Bostrom has an intellectual background in physics, computational neuroscience, mathematical logic, and philosophy. He has been listed on Foreign Policy‘s Top 100 Global Thinkers list, and on Prospect magazine’s World Thinkers list. Here Bostrom and I discuss applications of his book across any number of fields — from history to philosophy to public policy to practices of everyday life (both now and in millennia to come).

ANDY FITCH: If we start from a working definition of superintelligence as “any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest,” and if we posit this superintelligence’s capacities to include an ability to learn, to deal with uncertainty and to calculate complex probabilistic information (perhaps also to assimilate sensory data and to pursue intuitive reasoning), and if we conceive of ourselves as fitfully approaching an intelligence explosion, one in which intelligent machines will design with increasing skill and speed ever more intelligent machines, and if we hypothesize the potential for this self-furthering machine intelligence to attain strategic dominance over us (the way that we possess such dominance over other species), and we recognize the potential existential risk within such a takeoff scenario (a situation which we must manage with great deftness on our first and only try), we can begin to trace the complicated rhetorical vector of this book — which apparently seeks both to foreground a great urgency, and to counsel a sober, cautious, carefully coordinated plan of long-term deliberative action. So, as we begin to outline Superintelligence’s broader arguments, could you also discuss its dexterous efforts at combining a call to public alarm and a proactive, context-shaping, transdisciplinary (philosophical, scientific, policy-oriented) blueprint for calm, clear, perspicacious decision-making at the highest levels? What types of anticipated and/or desired responses, from which types of readers, shaped your rhetorical calculus for this book?

NICK BOSTROM: I guess the answer is somewhat complex. There was a several-fold objective. One objective was to bring more attention to bear on the idea that if AI research were to succeed in its original ambition, this would be arguably the most important event in all of human history, and could be associated with an existential risk that should be taken seriously.

Another goal was to try to make some progress on this problem, such that after this progress had been made, people could see more easily specific research projects to pursue. It’s one thing to think If machines become superintelligent, they could be very powerful, they could be risky. But where do you go from there? How do you actually start to make progress on the control problem? How could you produce academic research on this topic? So to begin to break down this big problem into smaller problems, to develop the concepts that you need in order to start thinking about this, to do some of that intellectual groundwork was the second objective.

The third objective was just to fill in the picture in general for people who want to have more realistic views about what the future of humanity might look like, so that we can, perhaps, prioritize more wisely the scarce amount of research and attention that focuses on securing our long-term global future.

Today, I would think of the first of these objectives as having been achieved. There is now much more attention focused on this problem, and many people (by no means all people) now take it seriously, including some technical people, some funders, and some other influential people in the AI world. Today, it’s not so much that the area needs more attention, that there needs to be a higher level of concern. The challenge is more to channel this in a constructive direction. Over the last couple of years the technical research agenda has emerged, so on this alignment problem the goal now is to ramp that up, to recruit some really bright researchers to start working on that, and to make sure it proceeds in the right direction. In parallel now, we need to start thinking about the policies and political critiques that arise or that will arise as we move closer towards this destination.

Basically, the approach was to try to lay out the issues as clearly as I could in the way I saw them. I didn’t really have a target audience in mind when I wrote the book. I was kind of thinking of the target audience as an earlier version of myself: asking what I would have found useful, and then whether that would help other people. But as the conversation proceeds, I think that there is a balance that needs to be struck. It’s key for the AI-development community to be on the same side as the AI-safety community. The ideal is that these will fuse into just one community. That requires avoiding this obvious failure scenario, which, fortunately, has not yet materialized. But you could imagine, in another parallel universe, the AI-development community feeling threatened that they are being painted as villains, as doing something dangerous. Then they might begin to close rank and to deny that there could be any risk, so as not to give ammunition to the fear-mongers. That scenario would have made a dialogue impossible. That has not happened, but I think the possibility that the conversation could run off the tracks amid some adversarial dynamic remains a concern, so preventing that from happening remains a priority.

To give one concrete example addressing human-generated existential risk, could we take the more recent advent of nuclear technology as a case study for how humans have successfully and/or unsuccessfully addressed globe-altering technological breakthroughs? I guess that my bigger, slightly more desperate question might ask for historical examples (if any exist) of humans developing a coherent, comprehensive safety regime prior to the deployment of a potentially destructive technology. Given the central importance that you place upon cultivating an appropriate sequence of social, ethical, theoretical and scientific growth related to the emergence of AI, I would love to hear that, at least once in the past, humans have prioritized such a cautious implementation, and have gotten it right. And nuclear technology’s disruptive emergence, amid the extremely volatile political context of World War II, at least provides some equivalent tensions. In terms of an adversarial dynamic, for instance: at one point nuclear technology might have seemed to promise an epochal eclipse equivalent to an AI explosion, might have prompted its opponents to envision impending martial armageddon, and its enthusiastic proponents to herald controlled nuclear reactors as civilization’s last requisite technical achievement — unleashing a previously unfathomable overabundance of energy, and rendering humanity’s competition for limited resources obsolete. Instead, 70 years later, amid enduring (yet so far survivable) global hostilities and inequities, amid the ongoing presence of conflicted social attitudes towards nuclear technology, we find that neither the most negative nor the most positively inclined predictions have held up. What insight can this legacy offer to considerations of AI, and what would make AI’s emergence qualitatively harder for humans to muddle through? Why, amid ongoing technological revolutions, would capacities for unintended destruction so outpace capacities for control?

It’s certainly true that the human species has not gone extinct…yet. The question then is: what can we conclude from that? I don’t think we can conclude that the possibility of the human species going extinct will forever remain negligible. I would say in particular that if there are existential risks, they are likely to come from new technological capabilities, precisely because we don’t have any track record of surviving those, whereas we have a long track record of surviving threats from nature: volcano eruptions, asteroid impacts, fires, storms. Now, with regards to nuclear technology, there are some similarities and some dissimilarities. There were attempts at the very early stages…the physicist Leo Szilard was the first to realize the possibility of a fission chain-reaction. He eventually realized that it would be possible to create a nuclear explosion, and he then started thinking about what he should do with this insight. He began going around to some of his physicist colleagues, trying to persuade them not to publish more on nuclear fission, because he could see that if it became possible to release enormous amounts of energies in the atom, then that could make it possible to create very powerful weapons.

And then at some point he roped in his friend Enrico Fermi, and they met with Albert Einstein, who then wrote a letter to Franklin Roosevelt, urging Roosevelt to look into the possibility of designing nuclear weaponry. This was during the Second World War, so they were afraid that maybe Hitler would drop nuclear weapons first, which might then have given him the ability to win the war, with far-reaching consequences. So from that arose the Manhattan Project, the nuclear bomb, and, subsequently, the enormous arms build-up during the Cold War, with tens of thousands of nuclear warheads on hair-trigger alert. At some points, these weapons even seemed to have a cause to be fired off. At the Future of Humanity Institute, we’ve hung pictures in several of our rooms of individuals whose actions seem to have helped avert nuclear war. Stanislav Petrov, for example, was a Russian nuclear officer at one of those early-warning facilities, who made a judgement call at one point which might have helped to prevent nuclear armageddon. There were a couple of those incidents.

So we were maybe somewhat lucky to get through this period unscathed — and this is with technology that is only 70 years old. We are not in the clear yet. There could be future crises. But the danger with nuclear weapons is primarily deliberate use for destructive purposes. It’s not accident risks (even though there are accident risks once you have, for military purposes, built up these huge arsenals). But the technology itself is relatively simple to control. In that respect, it’s different from machine superintelligence, where perhaps the biggest cause of concern is just that it might turn out to be very hard to get this kind of advanced AI to do what we plan for it to do. We certainly have never developed any similar technology that we could then derive reassurance from. Perhaps the most closely comparable instance would be the rise of Homo sapiens 100,000 years ago, which we know did not turn out too well for other hominoid species. The Neanderthals and Homo floresiensis were wiped out by our species.

So there’s some prima facie reason for concern if we are going to develop general intelligence that greatly outstrips our own, in the same way that ours greatly outstripped that of the great apes — enough of a prima facie concern to make it worthwhile to look in detail at what precisely could go wrong, and what we might do to prevent that. To make that stronger case, I think one has to look at the specifics that it takes a full book to describe. Most generally, the concern is that the transition goes wrong, and that instead of having some super interesting, super happy, super long-lived post-human civilizations spreading across the universe, you get, say, just endless paperclips, or some machine that optimizes the universe for some goal that we’d regard as completely worthless.

Well, to start addressing questions related to AI-inputs and to subsequent outcomes, when you offer Bertrand Russell’s point that “everything is vague to a degree you do not realize till you have tried to make it precise,” I hear echoes of Ralph Waldo Emerson describing language as fossil poetry (always the relic of some specific individual’s creative act), or echoes of Friedrich Nietzsche declaring every truth claim a lie in the extra-moral sense. I wonder if/when a competently designed superintelligence would need to pass beyond the clumsy creative calculus of symbolic language, and I also note, in your description of computer code’s progressive shorthand from one generation of engineers to the next, yet another form of fossil poetry perhaps supplanting verbal language. Superintelligence does do an excellent job prompting many initial procedural questions concerning the cultivation of AI values (if we wish to protect human values in an AI-inflected future, how do we not only agree upon but define those values, both now and for subsequent, ideally more thoughtful, generations? How do we clearly communicate these values both in the abstract and in an infinitude of individual cases? How do we know which values we actually live by, rather than superficially espousing? How do we determine which integral values, however ethically appealing, actually impede species survival?). And I greatly appreciate your suggestion that we should ask AI itself to determine what we really want from it (this abdication of human agency also does discomfit me, for reasons I’ll raise later). But first, to catch up on essential components of a functional AI design, and given the possibility for an advanced computing machine to face combinatorial explosions whenever it attempts to calculate the infinite possibilities at play even in a quite simple act: does it make sense to posit the necessity of AI participating in something like Nietzschean perspectivism, corralling endless possibilities into actionable probability, positing values in this way, constructing its own creative research questions, even telling itself fictions? Some of the (undoubtedly prescient and thought-provoking) doomsday scenarios sketched in your book make me wonder what kind of agent would cause them (a “genuine” AI, or only some sub-optimal, failed attempt). I’ll want to ask more about how a truly generalized superintelligence could prove inept at acquiring expansive capacities of embodied affectivity, emotive empathy and the like. But first, would an operable, value-positing superintelligence possess the potential to engage in an objective, omniscient knowing, or must it, in order to function in any way, bind itself to a much more narrow, perhaps all-too-human type of perspective?

I’m not sure I would describe it as a fiction. I think all knowledge any mortal can have is partial. We know only some things. The superintelligence could know many more things, particularly things related to how to make powerful technologies of various kinds, and how to use them to achieve its goals, whatever those happened to be. I don’t think omnipotence comes in. That seems, if one takes it literally, far beyond the reach of any system that we can build, and really seems to be a concept that belongs more to theology.

Sure that makes sense. But could you describe a bit more the types of knowing that you do envision for this superintelligence, and how these depart from human forms of knowing? If one loose definition of superintelligence suggests that it outperforms us in everything we do, is that itself a partial (but in no way comprehensive) explanation of what it does, and how it thinks?

Yeah probably. That’s a necessary but not sufficient description. If AI is able to know things that we had no inkling of, or to know in different ways, that would be extra. But just the capability of doing everything that we can do, but doing it, say, much faster, would already be sufficient to cause a massive reshaping of the world.

In general, I think the tendency to anthropomorphize these entities is a big barrier to understanding. You mentioned the word “empathy.” In general, I have a problem with that word because it means two very different things that get smuggled in together. One meaning of “empathy” is the ability to understand what other people are thinking and feeling. You can read their minds. The other sense of empathy is caring what other people are thinking and feeling. Sympathy or compassion get rolled into this concept sometimes.

I think they’re very different capacities and that they can come apart. Even in humans, they can come apart. Psychopaths, for instance, are often good at reading other people’s minds, but they are not moved by other people’s pains. Their understanding just gets used to manipulate people more effectively. So I think there’s one sense of empathy that you would expect in a sufficiently advanced AI, and that’s this ability to read minds. It really doesn’t follow that the AI would be motivated to act so as to do what these other minds want it to do. That would require a special design of its modulation system, one which need not be there, and presumably will not be there unless we succeed at designing it.

So from the working definition of superintelligence as “any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest,” could we define more precisely those “domains of interest”? Does “cognitive” here perhaps narrow the range of capacities considered?

Right, “cognitive performance” would mean understanding what other people are thinking and what they are trying to do. AI would have the maximum amount of this type of understanding, as much as any human has, and then maybe even more. But I’m saying that this tendency is not there by default. It can come separately from this cognitive ability to understand what people are thinking. I’m saying that even in the human realm, which is much, much smaller than this space of all possible minds, we see that you could have the cognitive ability to understand people without any motivation to help them, as we see in some psychopaths.

Along similar lines, and here betraying my limited grasp of whole-brain emulation (an alternate form of heightened intelligence, not requiring full-scale AI), do I understand correctly that “I” as this present embodied being would not carry on through whole-brain emulation, but that an approximate replication would take my place, that “I” would not get any closer to immortality here than a photographic portrait might take me? Does my indifference to present-day social-media circulation (a seeming precursor to emulated existence) foreshadow my future indifference to whole-brain emulation? Or more broadly, in Robin Hanson’s new book, economic incentives drive the emergence of whole-brain emulation, whereas, in Superintelligence, whole-brain emulation also gets presented as a philosophical/existential pursuit. Does that distinction seem fair, and can you clarify some of these qualitative questions concerning the identity, value, ultimate purpose of emulations?

The main difference is that I place less probability on whole-brain emulation being that through which we attain general machine intelligence. I place some probability on it, which is why I have a chapter on it, but I think it’s more likely that we will get there through synthetic means. So the broader book discusses synthetic means, whereas Robin zooms in on this particular scenario where the easiest way to get machines to be generally intelligent is by emulating individual human brains. Maybe (down the road, in Robin’s scenario) we will get to synthetic machine intelligence, but that is not necessarily the way that artificial intelligence must happen first. I don’t think this has to do with any differences in philosophical outlook or metaphysical outlook. I think it’s just a guess about which technological pathways are most likely to succeed.

Then returning to our limited embodied capacities to imagine the extreme (to us) prospects of an AI future, I want to address the importance in your thinking of something like exponentialized leaps or cognitive scale shifts. You persuasively demonstrate that many skeptical responses to your future-tending claims derive from an anthropic refusal to scale one’s speculations beyond the range of plausible-seeming human experience. Your positive-feedback-loop driven scenarios provide much of Superintelligence’s most compelling prose (along with its most striking descriptions of AI gone haywire — of, say, emotively empty smiley-face stickers “xeroxed trillions upon trillions of times and plastered across the galaxies,” of cosmos-crushing infrastructure expansion producing nothing more than a “Disneyland without children”). Both such manifestations of perverse instantiation (the meaningless yet destructive outcome, you posit, of many, perhaps all innocuous-seeming tasks coded into unbounded superintelligent machines) made me curious of how you, strategically, as a writer, seek to overcome our present cognitive clumsiness anticipating scale shifts. And amid my many Nietzsche-inflected proddings, I also should ask about Pascal’s or Schopenhauer’s place in shaping your reflective/rhetorical sense of scale.

I wonder what I can say in general about that, other than that one tries occasionally perhaps to get people to lift their gaze up from the plot of soil right in front of them, to look at the horizon, or the bigger starry sky above. I guess that may be where these attempts at poetic wording come from. I think I was influenced by Schopenhauer. I’m not sure whether that manifests itself specifically in this respect.

Have you always had the capacity for such expansive speculations, and have you found it frustrating trying to communicate such visions to others?

Well, one word I tend not to use very much is “exponential.” That’s a very specific kind of growth pattern. Nowadays, the word “exponential” seems to be used as a synonym with “fast,” but that seems to muddle the thinking. In general, I’m trying to move towards greater precision, so that these topics that traditionally have been the exclusive domain of theology, or philosophy, or science fiction, or idle pub-talk over a beer late at night (like the kind of domain where people feel free to just make things up because they sound good, or resonate, or are used as metaphors to say something very different)…those approaches aren’t really about the future at all, but are an implicit critique of the present, or are at least ways to illustrate some feature of human nature. I’m interested in the future in its own right, so if we actually want to try to develop slightly less inaccurate maps about the future and how our present actions might affect long-term outcomes, then trying to become a little more precise with some of these concepts seems like a useful first step.

But one area where I do sometimes feel like I’m living in a room with a low ceiling is when I try to convey this sense of what could be achievable if we play our cards right, and if we have some luck. What could the world be like, and what could the human condition be like, if things go right? This vastness of the post-human space of possibilities, and the wonderfulness of some of those, is literally beyond our ability to imagine. Perhaps we can sort of point towards it, or reach towards it, or get some faint premonition of it. If more people could get some shared sense of that, this might help to motivate the hard work that will be required to get there, and the patience and concern amid all the possible ways to fail to get there. My own work is largely driven by the sense that there’s this huge prize that is ours to lose. Because the future could be so very good, it is then worth taking every precaution to make sure that we don’t fail to reach it. Sometimes that seems harder to convey. I’ve tried in my “Letter from Utopia,” for example, to do that. There are little passages in the book on our cosmic endowment and what is at stake.

To me, actually, our cosmic endowment came across as a crucial concept to this book, as one of the main premises that may be new to readers.

I think it’s also pretty foundational to my thinking more broadly.

And do you sense that most humans find it harder to conceive of these positive scenarios, compared to the negative ones? Do you think of humans as somehow wired to anticipate risk and to worry about that, more than to envision greatly improved circumstances that we could move deliberately towards and attain?

Well, I think some of the negative scenarios are quite easy to envisage. We see this even, say, aside from the context of AI, just in attempts to create either utopias or dystopias. There are many more persuasive possible dystopias in the literature (1984, Brave New World, for example) than there are really attractive utopias — places where you would actually want to live for the rest of time. Those are harder to describe in a way that is both plausible and attractive.

To continue on this topic of pervasive dystopic scenarios, and more broadly to place your book alongside sci-fi texts purporting to address similar futuristic concerns: in the past you have critiqued sci-fi for its good-story bias, it’s inclination, let’s say, to foreground the anthropomorphized Terminator-esque robot (wanting to destroy us because it’s just mean) over abstracted threats like perverse instantiation — a process that plays itself out without anything like human intentions or human drama driving it. In these critiques of good-story bias, you seemed to suggest that the types of negative scenarios that most trouble you are actually hard for us to imagine, or that at least sci-fi literature doesn’t tend to represent them.

I think there is a difference between easy-to-conceive and makes-for-a-good-story. So the world just blows up suddenly: that scenario is quite easy to conceive, but doesn’t necessarily make for a good story. So a good-story bias is not the bias towards a good outcome — it just means that the kinds of stories that we hear, see, and read are stories that make for an interesting narrative, typically with human-like protagonists who face a sequence of escalating challenges. They are people we can relate to in some way, and they confront humanly understandable problems that maybe get resolved, but not too quickly. Individuals usually play central parts in the plot and have a big influence.

A number of these constraints filter the kinds of stories we see, say, in Hollywood productions, or in novels. So if we base our probability estimates, as to what the future can bring, on our exposure to these familiar scenarios, then that will include this good-story bias. We will tend to place too much probability on scenarios that make for a good story, and too little probability on boring scenarios, scenarios that you couldn’t really write a book or do a movie plot about.

Yeah, that’s why I described your most haunting, asymptotic scenarios as actually hard to envision. You’ll offer, let’s say, the scenario of a machine somehow eclipsing life (all of its manifestations on Earth) in an instant, because that seems like the right thing to do according to cold computer calculations. Actually, I don’t think most humans would have an easy time conceiving of that possibility.

No?

It’s almost a non-possibility, in the imaginative sense. And I think that, if most readers could conceive of and absorb that particular story, then they would get their narrative fix by just reading that two-sentence story, and skip more elaborate plots and move on with their lives. Similarly, if more readers could picture Superintelligence’s bleakest scenarios, then the book might have caused much more panic.

Well you’re right that I’m not trying to create that kind of panic. Panic is not what we need.

I’d say that the most menacing scenarios you lay out offer a kind of all-eclipsing, nothing-one-can-do-but-sit-there-and-be-immobilized-in-fear situation, which seems quite distinct from a conventional good story into which humans can project themselves, can feel some sense of agency, can maybe even rehearse outcomes which later they can apply to real-life scenarios. And then more specifically regarding your book’s relation to science fiction: I do appreciate your critique of canned sci-fi scenarios designed to reinforce our good-story bias, our hunger for reassuring human triumphs — but when past readers have compared your work to sci-fi, I don’t think that they have had in mind such redemptive conclusions and escapist fantasies, but instead your doomsday premises delivered with brutalist realist clarity. So I sense more to explore in how sci-fi has shaped your avowedly more serious studies of future human prospects. If we conceive of a sci-fi plot, however contrived, as itself a technology, a winnowing down from the combinatorial explosion of narrative possibilities to a workable, probabilistic, plausible range of exploratory conjectures and tentatively drawn conclusions, does that description at all fit your sense of how textual modeling, sci-fi or otherwise, might serve as constructive precursor to Superintelligence’s own speculative work? I ask about this relationship because your book does move deep into the speculative, in a way that much academic writing would not attempt, and which might risk making the book seem less rigorous (not to general readers, perhaps, but to discipline-bound academics). But could you frame this speculative method more positively, and describe a generative, fruitful affinity between your work and sci-fi, or some other forms of art and literature?

What drives me to engage with a subject this way is that there are certain questions on which we need to try to gain better insight. Actions will depend, implicitly or explicitly, on our guesses about the answers. So then the problem is: even though we can’t obtain any sort of answer to such questions (it’s just not possible at the present time to know with certainty, and in detail, what the relevant future variables will be), how can we form the best possible opinion about them (one that is thoughtful, reflective, and takes into account all the relevant considerations)? I think there is better and worse there. You look around for any clues, useful techniques, epistemic tools that can add some incremental insight into this. That varies depending precisely on what the question is. In some cases there is hard science that can be brought to bear from computer science or cosmology. Sometimes all that we have to fall back on are plausibility judgments, ideally formed by historical experience, or conversations with other thoughtful people — whatever closer or remoter parallels we can find. If there were a more rigorous and perhaps even mathematical way to go about it, I would do that instead. It’s just that some of these questions are unavoidable. At this moment, we don’t know how to address them using any more rigorous method, and so we use the most rigorous methods available, and sometimes those aren’t very rigorous.

Returning then to the topic of compelling metaphors from Superintelligence, the vectored barge (unable to pursue, unidirectionally, any one particular tack) by which you concretize our potential to refine AI’s motivation through a process of indirect normativity seems central. By this point in the book, I long have struggled with the apparent bind that, given our flawed human state, AI either must actualize outcomes antithetical to our intentions, or must acquiesce to carrying out our imperfect, characteristically destructive desires on a cosmos-devouring scale. By here I have recognized why AI should, for our own sakes, make “mincemeat of our ignorance and folly,” yet I also have sensed the acute need for a “largely undisturbed anthropocentric frame of reference to provide humanly meaningful evaluative generalizations.” And so your concerted attention to indirect normativity, in which AI expands its prowess through the very act of discovering/communicating what we really want from it, in which coding for functional domesticity prescribes that AI minimize its impact upon the world, in which AI even might determine that humans ultimately want to remain independent of its authority, and thus might shut itself down, provides a gracefully climactic pause in Superintelligence (not unlike, if I may, the moment when we picture Camus’s Sisyphus happy, or picture Nietzsche’s stoic philosopher wrapped in overcoat, walking off “under the rain with slow strides”). That pause, in turn, does fulfill my own fretful conjectures of at least one possibly endurable solution. All of which I mention in order to suggest that I cannot soberly outline or assess your barge metaphor and your sealed-envelope example. Could you please outline the benefits and also the risks to refining AI through processes of indirect normativity?

The reason for thinking that some form of indirect normativity might be useful is that it seems completely impossible to write down a list of everything we care about, and all the possible trade-offs and precise definitions of each thing on that list, and to get that right on the first attempt, such that we would be happy with some powerful optimization process transforming the world and maximalizing this vision. It is just something that we would invariably fail at doing, and so rather than trying to do all of that work ourselves, before we create superintelligence, we might prepare to ask the AI what we would have wanted it to do if we had the opportunity to think 40,000 years about that question, if we had known more about ourselves, and if we ourselves had been smarter.

You could if you wanted add idealizations of our current selves, and then use this idealized vantage point as the target for the AI, and so we indirectly specify what we will want the AI to do. We specify a process whereby the AI is supposed to go about approximating the objective, rather than trying to specify that objective directly. That seems to be a very attractive way of approaching this problem, in that we can use what the AI is especially good at, intellectual work, to simplify our job, thereby increasing the chances that we can complete this job in time.

And of course even the most optimistic reader quickly can return to a more sober/somber tone by considering the nearly insurmountable control problem we face as we try to predict machine intelligence’s potential range of devious machinations. Superintelligence spells out these concerns so well that I leave it to you to determine the extent to which you want to repeat them here. Instead, I’ll raise the tangential concern of how humans might effectively plan, communicate, coordinate any AI control strategy without leaving a digital trail for AI to track. Even Albert (the fictitious friendly dog-emulation that you write about elsewhere) reads the newspapers, and senses humans’ panicked, reactionary, threatening responses to his enhanced mental powers. And one easily can imagine, say, an accelerating, adolescent AI, in a sudden rush of human-level narcissism, launching a Google search on the term “AI.” But more broadly, given our ever expanding dependence upon digitized planning, communication, bureaucratic technologies, how can we implement any defensive strategy that doesn’t immediately expose itself to infiltration? Do we have to stuff precautionary paper scraps inside a desk drawer, and hope the houseflies don’t host nanotech cameras? Or your conception of multiple oracle machines, all with non-overlapping portfolios, makes sense, yet faces the ongoing challenges posed by a hyper-cunning agent potentially well-equipped to break down such barriers with ease. And your conjecture that a superintelligent machine might simply deduce, without need for any external confirmation, our plans for its containment makes the situation sound even more bleak (or, perhaps one should say: provides yet another compelling case for prioritizing processes of indirect normativity — perhaps with control strategies as provisional, unreliable, precautionary side efforts).

Yeah I don’t think that we should rely on being able to use capability-control in that sense. These methods would require us to curtail the AI — whether by keeping it in a box, or by keeping certain information secret from it. Instead, we should focus primarily on motivation-selection methods: making sure that the AI is on our side, so that even if it escapes the box, or figures out all our secrets, it is still safe, because it is fundamentally an extension of the human will. Any capability-control methods would just be auxiliary ones. These could be some extra precautions we slap on, but not ones that we rely on.

Also I should add that the relief provided by imagining a process of indirect normativity, a process that both harnesses AI to humanly constructive ends, and that clarifies our most fundamental philosophical propositions, does contain its own seed of discouraging finality. You yourself postulate that, ultimately, humans might prefer their inherently flawed, deluded, risky, self-overcoming ways to any sanitized scenario that a benevolent AI could provide. And when I think of us offloading to AI the responsibility for answering basic existential questions, when I sense us abdicating our most distinctive-seeming human endeavor of creative valuation, I wonder what purpose remains in being human anyway. Of course one might argue that a narcissistic metaphysics prioritizing human perception, choice, expression soon will seem as outmoded as religious cosmologies situating our species at the physical center of the universe, and that new thought paradigms necessarily will take its place. And Superintelligence does raise the possibility of holding onto AI largely as a safety net, not eclipsing human agency so much as preserving our survival in moments of acute (say astronomical, supervolcanic) crisis. But when you yourself ask “Where is the limit to our possible laziness,” do I sense your own listless apprehension of some purportedly utopian scenarios potentially to come?

Well we wouldn’t necessarily have to do anything for instrumental reasons. We would be more in a position perhaps of the rentier, or the aristocrat, or the child who doesn’t have to work for a living. This child will do things because he/she enjoys doing them, or finds them intrinsically worthwhile. I think a lot of people in the arts world would recognize something similar. Maybe you have to get some food on the table, but that’s not the ultimate reason why you became a novelist, publish books, write poetry, or make paintings. You feel that engaging in this activity has some inherent worth. Either you get enjoyment out of it, or it makes you able to perceive beauty, or makes it possible for you to engage in a larger creative interaction with other minds. Whatever it is, there are many reasons for doing things other than that you have to do them or you will starve or not have shelter. Those alternative types of reasons are the ones that will remain. You might say that they are reasons for playing. We would then have to redirect our energies to discover what the most awesome ways are to stake out individual lives (and societies and civilizations) in this enlarged space of possibilities, where physical necessities do not apply to humans.

Though here I do sense that, for the past few decades, progressive-minded individuals (artistic and beyond) have, in many ways, taught themselves to think “small”: to prioritize minimal consumption over maximalist growth; to view the promulgation of any one particular set of cultural values as a form of imperialistic homogenization; to reframe the early-modern expansion of European states not as some triumphant or righteous or guilt-free conquest but as a destructive, profit-driven exploitation of countless equally valid cultures, ecosystems, species. So I wonder if you also could introduce the materialistic terms of the cosmic endowment (those terms seem different perhaps from the secularist-spiritual terms of your “Letter from Utopia”) specifically to an audience skeptical towards arguments prioritizing materialist gain. What advice might you have for why artists and intellectuals today should think “bigger,” should desire, anticipate, celebrate humanity’s cosmic expansion? And/or, have you in fact deliberately constructed two different types of arguments, for two different types of audiences, with the cosmic endowment appealing, say, to scientists and entrepreneurial classes (and perhaps to everybody’s pragmatic side), and the post-humanistic “Letter from Utopia” appealing to artists and intellectuals (and perhaps to everybody’s principled side)?

Well, I could say something slightly different, which is I think that we should think about big things more the way we think about small things. That is, with attention to detail, carefully, in a kind of tactical matter. Whereas too often, when people engage with the big questions, it’s like a license to just cut the tether on your hot air balloon, and let it float with the symbolic winds, wherever they bring it.

And then more specifically: “Letter from Utopia,” with its man-from-the-future oracular style, its second-person mode of galvanizing address, its playful puns and allegorical cajolings, reminds me so much of Nietzsche’s Zarathustra that I would love to hear more about who your own literary models were for this text. I also would love to hear more about possibilities for bridging “scientific,” “entrepreneurial,” “artistic,” “philosophical” concerns through the construction of disarming literary forms — from Superintelligence’s fabular opening scene to countless other short (sometimes quite funny) pieces and performances (if we can count YouTube readings) of yours. Can you articulate your own poetics (basically, the place that literary form has in shaping the complex intellectual inquiry, the timely policy prescriptions, the deft argumentative pivots you seek to deploy in recent writings)?

Just as a poet or fiction author might find that their poetry or fiction reflects their life experiences or their worldview, so my fables or poetry (although they arrive intuitively) reflect my world model and way of looking at the world.

It’s just that my world model is derived from more analytic pursuits — my life experience is kind of weird. I live in this little atlas of theoretical conceptions, but that comes out in the same basic activity that in others might result in something about sunsets or flowers. So it’s hard to differentiate between what you describe as “literary forms,” and “intellectual inquiry” and “policy prescriptions” in this case, in that the underlying thing that they are (a self-expression) is itself more based on a strategic assessment of what is needed.