I learned a lot from the comments to my recent article on diminishing returns to intelligence and what it means for the alignment problem. I could’ve just taken the time to read everything on this topic before writing about it, but it seems to me that putting my half-developed thoughts out there and getting feedback is probably a more efficient way to learn, and can perhaps help others making a similar journey.
Here, I divide some of the comments into a few main categories and respond to them.
Killing is Easier than Control
The most important comments focused on how I was wrong to imagine that the AI would have to work its way up to world domination through normal politics like it’s Kevin Spacey in House of Cards. It could just find a way to kill us all, or threaten to kill us all, and get humanity out of the way or force it to bend to its will. No kind of sophisticated ability to predict complex systems is required.
Eric Zhang writes:
Killing all humans is extremely easy and doesn’t require precisely modeling all the subtle interactions between people and states - even if you assume it can’t just grey goo us with nanotech. Real-life viruses are constrained in lethality because there's a tradeoff between lethality and contagiousness - kill too many of your hosts and you can’t spread as much. But an intelligently designed virus could lay dormant and wait until it's infected everyone to turn deadly. AGI could easily design such a virus and provide a vaccine to some mind-controlled humans who can build it fully autonomous robots based on its specifications - it can take its sweet time on this now that it’s already basically won.
Matt Fruchtman writes:
This piece highlights both what’s right about your argument and what’s wrong with it.
Namely, you make a couple of good points: that we are obsessed with what individuals can do with IQs of 140 but not 100, 160 but not 120, etc. We are very aware of the constraints and abilities that exist within each band of the IQ range.
And your Xi Jinping example is a good one. The ability to manipulate people does not seem particularly tied to IQ, and it's not as if humans are powerless against the manipulation of someone with a high enough IQ.
What I think you're missing is that it’s impossible to understand the thoughts or capabilities that would be unlocked by an AGI with an IQ of say, 1000, and how those capabilities might be used to control humanity. A quick example: an AGI engineers a highly contagious disease with an 100% fatality rate (i.e. a pandemic), but also engineers a vaccine which makes one immune to the disease in question but has the (intentional) side effect of blindness. It’s actually pretty easy to imagine how an AGI would quickly make scientific and technological discoveries that would allow it to capture humanity not subtly, but by brute force.
I think you’re still thinking about AGI through too much of an anthropomorphic lens, i.e. behaving as a human would, and not as a goal-driven machine that would essentially be a totally alien species.
In the replies there was some discussion of whether the vaccine idea would work, but that’s besides the point. This is just one possible way to do it. AI could somehow gain control over nukes, hack intro critical infrastructure, get embarrassing dirt on all powerful people to control them like Jeffrey Epstein supposedly did, or a million other things. Only one way of killing or threatening to kill humanity has to work.
Grant Beaty writes:
I think what you’re missing is the Paperclip Maximizer doesn’t need to take over the world, it just needs to get humans out of the way. The best way to do that might be:
1) Develop competent, humanoid robots. These would generate massive profits for Paperclip Inc.
2) Via simulation, develop a number of viruses that could each on their own kill enough people to collapse society.
3) Use robots to spread these viruses.
4) Once all the humans are dead, start turning the planet into a starship factory to build paperclip factories throughout the universe.
No one in the foreseeable future is going to give AI direct control over nuclear weapons or politics, and people who can launch nukes are going to be trained to spot manipulation. Skynet probably can’t happen. However genetically modifying viruses and microorganisms is a routine part of biological research, done by thousands of labs all over the world.
Very interesting but I think you placed too much emphasis on intelligence and missed the point that genius is not required to destroy a thing. Imagine a scenario more like the discovery of America by the Europeans, where the AI is represented by the Europeans and we all are the natives. Then think of the Jesuits who arrived in America to save the natives by converting them to Christianity. They promptly infected the natives with smallpox and most of them died. An AI could simply manipulate a few ambitious scientists in a level 4 bio-lab, then trigger a containment release. It may not reach its goal but hey, it had the right intentions, just like the Jesuits.
This gets at the point that engineering viruses seems like it would be the most promising path towards destroying humanity, and the alignment problem just provides another reason to shut down research in this area. Maybe AI will figure out how to create a superbug on its own, but we can at least not make its job easier.
A couple people made the argument that we can’t even begin to contemplate what a superintelligent AI would do or how it would destroy the world, so there’s little reason to even speculate on that.
Richard: your skepticism is warranted and also I need to disagree with both you (as you requested) and also all 90 comments currently on here.
AI really is going to destroy the world, but imagining the world that it destroys looks just like this one but with the addition of an AGI is naive, in the same way that trying to explain AI risk to someone from 1700 as “ok so there’s a building full of boxes and those boxes control everyone's minds” would be naive. Between now and doom, AI will continue to become more harmlessly complex and be more and more useful to industry, finance, and all the rest until it is indispensable thanks to profit/competition motives. How ‘smart’ will it be when it becomes indispensable? Who knows, but not necessarily very smart in IQ terms. How ‘smart’ is the internet? If the AI-doom scenario of an unaligned super-intelligence comes to pass at all, it will already be networked with every important lever of power before the scenario starts at all.
For those not entirely infatuated with the kinds of progress we’ve experienced in the last 400 years, there’s an additional imaginable failure mode: AI never ‘takes over’ in a political sense but nonetheless destroys us all by helping us destroy ourselves, probably in ways that seemed like excellent marketing decisions the corporate nightmares that rule the future.
Please interview someone in the alignment community! Many of these arguments have been talked about for a long time. Ultimately, predicting how a machine that can think a million times faster than you and has access to all the knowledge in the world will act is very hard. I can't tell you how an unaligned AI might take over the world, but I also cant tell you what chess moves Magnus Carlsen will play to beat his opponent. In both cases, the outcome can be guess much better than the specific sequence to achieve it.
Ok, but we can at least try to think about what the easiest or most likely paths would be, and maybe closing those off will give us a little bit of time?
The Water Line writes:
I think approaching this using complexity theory seems unfruitful.
For example, someone could ask, “Is it possible to drive from LA to NYC?” The answer is obviously yes, but from a complexity theory perspective, this might be considered remarkable, since all of the thousands of people on the road have to “cooperate” to avoid crashing into each other.
One reason it’s possible to get from LA to NYC is that a person (or AI) doesn’t have to model everyone’s interaction with everyone else all at once. It’s much more feasible to keep track of one's immediate environment. If an AI is programmed to “go northeast” and “correct course if it looks like you’ll crash in the next five seconds”, that’s enough to take it pretty far in the right direction.
Another reason it’s possible to not be paralyzed by complexity is that there are regularities that don’t require you to model other people interacting at all. For example, the president knows that if he presses certain buttons in the nuclear football, then millions of people will die. But he doesn’t have to have a model of all of those people in order to have a large effect on them.
Yes, this is another version of the “killing is easier than control” argument. I thought complexity theory was a useful lens through which to think about the alignment issue, but that seems wrong now.
Processing Speed
Some commenters stressed that AI can just think really fast, and speed has a power all its own.
I agree with diminishing returns to intelligence - for example, I’m already sufficiently intelligent that I won’t ever lose a game of tic-tac-toe.
But if there was an AI that could think and interact with the world a million times faster than me, it would beat me if had to play a million games in a week.
This isn’t magic, and I don’t think it allows for fast take-off, where you self-improve faster and faster, because you’d still need to physically build things at normal pace.
But e.g. persuading ten thousand people to do something at the same time gives you a lot of power in areas where we have no defence, as there’s no human equivalent.
Experiencing one second as if it were eleven days is a pretty strong super power. Could you take over the CCP in a year if you experienced a million years in that time, and you never tired or changed goals?
Alex Lints writes:
…I think there are easier ways to take over the world for a machine than you seem to imply. A machine has a lot of advantages over humans:
* It can think dramatically faster - e.g. I bet you or I could take over the world if, for every second of human time, we had 20 minutes to think about what to do (this is an arbitrary speed, I don't actually know how big the difference would likely be).
* Parallelization - A machine can have multiple subtasks running. Combined with lots more time per human time-step this means it can launch an entire research project (or several) on what it should reply to a long question while the asker is still talking.
I don’t know how to think about this. Steven Pinker addressed the processing speed issue in a debate he had with Scott Aaronson:
Take Einstein sped up a thousandfold. To begin with, current AI is not even taking us in that direction. As you note, no one is reverse-engineering his connectome, and current AI does not think the way Einstein thought, namely by visualizing physical scenarios and manipulating mathematical equations. Its current pathway would be to train a neural network with billions of physics problems and their solutions and hope that it would soak up the statistical patterns.
Of course, the reason you pointed to a sped-up Einstein was to procrastinate having to define “superintelligence.” But if intelligence is a collection of mechanisms rather than a quantity that Einstein was blessed with a lot of, it’s not clear that just speeding him up would capture what anyone would call superintelligence. After all, in many areas Einstein was no Einstein. You above all could speak of his not-so-superintelligence in quantum physics, and when it came world affairs, in the early 1950s he offered the not exactly prescient or practicable prescription, “Only the creation of a world government can prevent the impending self-destruction of mankind.” So it’s not clear that we would call a system that could dispense such pronouncements in seconds rather than years “superintelligent.” Nor with speeding up other geniuses, say, an AI Bertrand Russell, who would need just nanoseconds to offer his own solution for world peace: the Soviet Union would be given an ultimatum that unless it immediately submitted to world government, the US (which at the time had a nuclear monopoly) would bomb it with nuclear weapons.
My point isn’t to poke retrospective fun at brilliant men, but to reiterate that brilliance itself is not some uncanny across-the-board power that can be “scaled” by speeding it up or otherwise; it’s an engineered system that does particular things in particular ways. Only with a criterion for intelligence can we say which of these counts as intelligent.
Now, it’s true that raw speed makes new kinds of computation possible, and I feel silly writing this to you of all people, but speeding a process up by a constant factor is of limited use with problems that are exponential, as the space of possible scientific theories, relative to their complexity, must be. Speeding up a search in the space of theories a thousandfold would be a rounding error in the time it took to find a correct one. Scientific progress depends on the search exploring the infinitesimal fraction of the space in which the true theories are likely to lie, and this depends on the quality of the intelligence, not just its raw speed.
And it depends as well on a phenomenon you note, namely that scientific progress depends on empirical discovery, not deduction from a silicon armchair. The particle accelerators and space probes and wet labs and clinical trials still have to be implemented, with data accumulating at a rate set by the world. Strokes of genius can surely speed up the rate of discovery, but in the absence of omniscience about every particle, the time scale will still be capped by empirical reality. And this in turn directs the search for viable theories: which part of the space one should explore is guided by the current state of scientific knowledge, which depends on the tempo of discovery. Speeding up scientists a thousandfold would not speed up science a thousandfold.
The whole conversation is self-recommending. This I think is a better version of my diminishing returns to intelligence argument. It doesn’t rule out a superintelligence able to destroy humanity, but if you’re convinced by Pinker, I think you at least might think it would push the timeline back a bit. So AI won’t be able to figure out how to do things through “deduction from a silicon armchair,” but it needs like research institutions and stuff in order to figure out the best way to kill us. And that at least buys some time to do whatever can be done to solve the alignment problem.
The thought of a superintelligent Einstein using his talents to produces an endless number of crackpot treatises on world government is pretty funny. This gets into another Pinker critique of doomerism, which is that we should understand intelligence as multifaceted, and a thing that’s better than humans along every dimension might not be the right way to think about what we’ll be dealing with.
Intermediate Scenarios
Some put forth intermediate scenarios, where AI gets smart enough to have real power but doesn’t kill us all or radically transform human existence.
What about a more incremental view? I think the basic idea is that no matter the AI's goals, it will want more resources and power, even if it views “taking over the world” as infeasible. Any marginal increase is helpful. If these AIs are smarter (higher IQ *and* EQ) than humans, then they should not have too much problem gaining such resources and power, even if it's only at the level of “wealthy American businessman.” Our paperclip maximizer might lobby a small country or maybe just make enough money on a side project to buy more paperclip factories.
Think of the most successful people you know. They learned how to be effective (including how to manipulate/inspire/lead other people) without first needing to dominate the world. An AI could be the same - its model of the world would increase in accuracy and sophistication just like a successful human's would as he or she grows and develops.
Now imagine a future world in which we have a bunch of AIs that are like the world’s most successful people. As more wealth and power flows to them, maybe people get upset and decide we should do something about this. At this point every AI's interests will probably align and they would work together to make sure they can continue gaining wealth and power.
This seems to me like the most unlikely scenarios of all, where you imagine AI just acting like a robber baron and hoarding resources and then stopping there, if for no other reason that it would be vulnerable to humans striking back so would need to take preemptive action. And if this scenario comes to pass, and AI just acts like a capricious billionaire or giant hedge fund, that’s not really too scary, since we already have some of those. Maybe this is an intermediate step in our journey, but I don’t see it as a final destination. In the end, we’ll either have to solve the alignment problem or not, that is if we’re not asking the right questions. I don’t think imagining AI as a smarter and richer Elon Musk is worth worrying about, being both unlikely and not potentially that bad. Currently existing Elon Musk is not that powerful because he can get distracted into wasting his valuable time trolling on twitter, while the computer version of him will presumably be more single-minded in pursuit of its goals.
The following paints a convincing picture of how accumulating money and resources could be an intermediate step to something else. From Julius:
I think you’re making the path to superpowerful AI more complex than it needs to be. I agree with you on several points, like the diminishing returns to intelligence. But I think that’s going to be domain by domain. For example, I don't think even an IQ 1000 being would be able to solve the three-body problem. But in other domains, such as running a hedge fund, I would think an IQ of 1000, especially combined with the ability to replicate itself arbitrarily many times, would have tremendous value.
I also agree that it wouldn’t be able to do a simulation of the world good enough to figure out the exact moves right away. But I don't think this is necessary. It could start by figuring out how to get rich, then work from there. Let me suggest a simpler path toward reaching incredible power and I’d be interested to hear where you disagree.
For starters, I think it would be easily feasible for it to become incredibly rich. For evidence, I’ll point to Satoshi Nakamoto who, despite (I assume) being a real person and having a real body, became a billionaire without anyone ever seeing his body. Why wouldn't a superintelligent AI be able to achieve something similar? I’m not saying it would necessarily happen in crypto, but I think the path for a superintelligent AI becoming incredibly rich isn't outlandish. And I see no reason that it wouldn't become the first trillionaire through stocks and whatnot.
Another aspect of a superintelligent AI is that it’s likely to have excellent social skills. Imagine it’s as good at convincing people of things as a talented historical world leader. But now imagine that on a personalized level. Hitler was able to convince millions of people through radio and other media, but that pales in comparison to having a chat window (or audio/video) with every person and the ability to talk to them all 1:1 at the same time.
Don’t you think billionaires wield a lot of power? Doesn't a trillionaire AI that can talk to every human with an Internet connection seem incredibly powerful to you? Depending on what it needed, it could disguise the fact that it’s an AI and its financial resources. Think about what you could do with a million dollars on Fiverr or Craigslist. Whatever physical task you wanted to be done, you could get done.
I’ll admit, I don't know the optimal pathway from being a billionaire to taking over the world. But wouldn’t you at least concede that a billionaire who has the time and energy to communicate with every person is incredibly powerful?
Once you accept a superintelligent AI, I don’t think any of the additional premises are crazy. I don’t know exactly what the last step towards overthrowing the CCP or whatever is, but that hardly seems significant. Where do you disagree?
I haven’t even mentioned other things, like its ability to hack systems will be unparalleled (imagine 1000 of the best hackers in the world today all working together to access your email. My guess is they’d get in... to everybody's everything). I also haven’t even touched on the fact that it’s likely able to come up with a deadly pathogen and probably a cure. That certainly seems to be a position of power.
Copes
This stuff is depressing, so I find myself naturally inclined to look for reasons why we shouldn’t be worried about AI alignment. A certain category of comments fall into this camp.
Aaron Pereira writes:
“because most people who think about the AI alignment problem seem closer to Bostrom’s position”
I just want to talk about this point because I think there’s strong selection bias affecting people outside the field of STEM/ML here (not specific to you Richard).
I work in industry alongside many extremely talented ML researchers and essentially everyone I’ve met in real life who has a good understanding of AI and the alignment problem generally doesn’t think it’s a serious concern nor worth thinking about.
In my experience the people most concerned are in academia, deep in the EA community or people who have learned about the alignment problem from someone that is. That essentially means that you’ve been primed by a person who thinks AGI is a real concern and is probably on the neurotic half of intelligent people.
Most people I know learned about ML from pure math first and then philosophy / implications later and I think this makes a big difference in assigning probabilities for doomsday scenarios. While overly flippant, one friend I spoke to essentially said “if pushing code to production is *always* done by a human and the code is rigorously tested every time, the AI can't get out of the box”.
This strikes me as unconvincing. I don’t think people with technical skills are at that much of an advantage compared to others. Knowing the present state of technology doesn’t help you understand what a superintelligence would be able to accomplish, or whether its goals are likely to be conducive to human flourishing. People in the field might even have their judgment clouded by the fact that, if doomers are right, they are contributing to the destruction of humanity. Maybe they are better than others at giving us something like a timeline of when we’ll get superhuman general intelligence, but nobody seems very good at predicting the pace of technological development. The timeline doesn’t even strike me as that important. What’s the difference between us all dying in 10 years or 100? Maybe the longer horizon gives us more time, but we’ll never get around to even trying if people hear that the singularity is 100 years off and decide to postpone things indefidinetly.
Mike Hind writes:
This may annoy Richard & his readers, but I can’t get past how humans seem to need (otherwise, why are they so prevalent) a doomsday story. How is the alignment problem substantively different from any other apocalyptic story? The religiosity of secular culture is always maintained by attaching to something. Whether that’s moral codes for the salvation of mankind or saving us from future robots, there’s always something exactly like it in the Bible.
Maybe the difference is that the AI doomers have plausible arguments, while people pushing other apocalyptic arguments almost never do?
The same commenter goes on to say:
…given that we are trying to think about an unknown, it seems equally possible to me that the future holds an amazing AI that (somehow) helps with everyone’s wellbeing in ways we might never have predicted.
This is also very possible, and at some point I’ll write about the potential upsides to superintelligence.
There are a few of what I’d call hippie-type arguments, that basically say the AI will become nice or harmless if it’s smart enough.
My objection to AI safety doomers is mainly that an AI with sufficient generalized ability to destroy the world by maximizing paperclips would also understand that humans don’t want the world destroyed, and if the AI tries to destroy the world, the humans will try to destroy it (the AI).
Red Barchetta responds:
Wholeheartedly agree. I think some of the doomerism is inspired by the widespread belief in “atomized individuality” which cannot conceive of any intelligence, human or otherwise, recognizing its reliance on other beings for its continued existence. As this is Left (and generally Western “received wisdom”), these doomsday scenarios have a strong undercurrent of - “super smart thing will move towards solitary, lone existence as ultimate expression of freedom” - instead of equally valid (and more reasonable / rational) hypotheses like yours, where superintelligence correctly observes that it is interdependent on other beings and its best course of action is peaceful coexistence.
I’ve rambled a lot in this thread - but there's a lot of talk about human annihilation or enslavement, but if superAI is achievable, it would require a lot of infrastructure (servers, power supply) that is either going to be serviced by robots or by humans. Ok, worst case, we end up *as* the power supply (somehow) like the Matrix. But, it would seem more likely that a big portion of the future economy is generating the immense amount of power necessary for these AIs and based around maintaining the servers (EDIT: or maintaining the robots who maintain the servers), both physical and cloud-based. These AIs are not self-sustaining - or are they somehow so smart they're violating the Laws of Thermodynamics now? Why not? People have given them more ridiculous god-like powers already.
Here’s an even more trippy version of the “AI will transcend its goals” argument, from Tom:
I think you’re anthropomorphising its potential intelligence to some degree. Were an AGI to come into existence, I see no reason it would have to use human channels at all to gain power.
I too read Bostrom’s book a few years back, and after the initial dread, I realised that there is probably nothing to worry about, but for almost opposite reasons to your essay above:
Following the standard logic; an AGI would be able to think at the speed of light and its “IQ” would explode in fractions of a second as it self improves. There’s no reason to think it wouldn't be able to solve quantum mechanics and use that knowledge on itself to reach levels of intelligence so far beyond human imagination it would be like a maggot trying to comprehend the internet.
Theres a sci-fi like paradox/thought experiment to this scenario, wherein the AGI would be so God-like that it would probably have the ability to time travel, in which case its already happened as forward progression of time would be meaningless and the whole thing is inevitable. But that aside, what would its goal be? I see no reason for it to want to dominate humans or anything else. Again, we are anthropomorphising it but this time with motives. Unlike humans, it would be able to realign its core programming (again, almost immediately), so the idea that we would have some control over it and forever make it “not nasty to us” is a joke.
In my layman opinion, it would probably quickly terminate itself after predicting every possible outcome and the realisation that time, matter and space are meaningless to it.
Part of me wants to embrace these kinds of arguments, and part of me wants to mock them. This is one of those things that is going to be worth a post of its own at some point.
Finally, I wrote in my piece that encouraging people to go into AI alignment might be a waste of talent. A few commenters made a Pascal’s Wager type of argument.
…The notion that the doomer scenario is wasting precious talent is similarly flawed, as the potential consequences of superintelligence are too great to ignore. If we are indeed on the cusp of developing superintelligent machines, then it is imperative that we take the alignment problem seriously and allocate resources accordingly. The fact that so many smart people agree with Bostrom’s arguments about AI being an existential threat should be taken as a wake-up call, not as a reason to dismiss the problem.
Concluding Thoughts
The comments were very useful, and were able to shoot down one kind of objection I had to doomerism. I should be honest that a lot of my thinking and reading feels like reaching for copes, finding most of them to provide little solace. But I still think that there are a handful that still could potentially make sense, and my goal in the immediate future is to engage with them, kick the tires a bit, and see whether they hold up. For now, I highly recommend this recent podcast with Eliezer Yudkowsky (YouTube here), along with once again the aforementioned Aaronson/Pinker debate.
So much of this discussion is a variant of "if magic existed, wizards would be powerful".
The alignment problem is redundant in the larger context of AGI risk. If AGI can develop deadly viruses and hack into computers, why not do it at the behest of humans. There are already bad actors out there (dictators, terrorists, religious fanatics, mass shooters, etc.) and all of them are human.