Can a Paperclip Maximizer Overthrow the CCP?

The AI alignment problem and diminishing returns to intelligence

Feb 07, 2023

Note: My confidence in what I say in this essay is pretty low. On most things I write about, I’ve been thinking about the issues involved for a long time. I also have academic backgrounds in law, political science, and international relations, which I don’t think is worth much, but it at least lets you know a field well enough to understand how flimsy much of conventional wisdom can be. On AI questions, I’m a complete novice. That being said, I haven’t found any essay or book that thinks about the alignment problem in a way similar to my own, so I figured it was worth getting my thoughts out there to see if and how I’m wrong. For that reason, I welcome feedback. Much of my thinking on AI doomer skepticism has been influenced by this essay from Kevin Kelly, though I find some arguments in the piece more convincing than others.

Nick Bostrom’s thought experiment of the paperclip maximizer is the starting point for the debate about whether artificial intelligence is an existential threat to humanity. The idea goes something like this. A company that makes paperclips achieves the most important AI breakthrough, that is, a machine superior to humans in general intelligence and that can augment that intelligence. Since it was designed by a paperclip company, its goal is something banal like “maximize the number of paperclips.” The stupid executives thought that maybe this would mean that like, instead of producing a million paper clips a year, it figures out where the inefficiencies in the paperclip-making process are and makes two million or something. But it ends up literally maximizing the number of paperclips in the universe, since it’s a machine that follows its own programming and lacks common sense. It doesn’t hate humans or anything. The machine is completely indifferent to their survival, but it’s just that they are made of atoms, and atoms go into paperclips, so they must be killed for that purpose.

The thought experiment is meant to illustrate the idea that there’s a difference between instrumental intelligence and something like what we would call wisdom. This means that no matter what goals a superintelligent machine has – maximizing paper clips, bringing in as much ad revenue as possible for Facebook, making sure the US remains the most powerful country in the world, etc. – there are potentially horrifying consequences for humanity. As simple as we are, it is highly unlikely that we would be able to foresee all the ways in which a superintelligence interprets the instructions we write for it.

Regardless of what it actually “wants,” a superintelligence will find it useful to prevent humans from turning it off. It will also find that it has an interest in controlling resources. So the instrumental goals of the superintelligence in some sense don’t matter, because it’s going to want to control or perhaps ultimately destroy humans, if for no other reason than to stop them from getting in its way.

This is the AI alignment problem. There are other potential issues with AI that could also fall under the term, like the way social media algorithms exacerbate political divisions. But those clearly pale in importance next to the question about existential threat. I just want to know if me and everyone I care about are going to end up as paperclips. Then we can worry about other things.

When I first started reading Bostrom and other AI doomers around 2017, their arguments seemed plausible, and a deep feeling of dread came over me. I spent maybe a few weeks obsessed with the issue. Then, as with many things, I got over it and went back to living my life, though left with the suspicion that the whole thing was pointless and humanity didn’t have much of a future anyway. And the fact that so many smart people seemed to agree with Bostrom’s arguments about AI potentially being an existential threat made me more confident that they were correct.

In recent months, however, I’ve had time to do some reading on the issue and contemplate the views of some of the skeptics of the doomer argument, and this has given me doubts. This essay is meant to lay one of them out and hopefully get some feedback, since I’m new to this issue and still in the process of developing my views. I’m going to sound like a skeptic of AI doomerism, but that’s only because most people who think about the AI alignment problem seem closer to Bostrom’s position, and I’d like them to tell me how I’m wrong.

To me, the biggest problem with the doomerist position is that it assumes that there aren’t seriously diminishing returns to intelligence. There are other potential problems too, but this one is the first that stands out to me. I’ll hopefully discuss some of the others in future essays.

The paperclip maximizer thought experiment imagines intelligence on a continuum. So, like a fly has an IQ of 4, an ape of 40, a normal human of 100, and a super genius has an IQ of 160. A superintelligence might be at a thousand, or a million, or whatever number we want to use to represent “really smart” because in reality our scale doesn’t mean anything at that point. Flies are really dumb, so can’t do all that much. Apes can maybe claim dominion over a few square miles in a forest, alter their environment a bit. Humans build cities and go to the moon. The superintelligence can potentially enslave humanity in order to accomplish whatever goals it wants.

Is this right?

The phrase “diminishing returns to intelligence” in a particular domain can mean one of two things.

First, it could mean that the problem one is interested in is so easy that there’s no point in being that smart. Think about understanding the concept that 5 x 5 = 25. An ape can’t get it. Maybe a human with an IQ of 70 can’t either. We can posit there’s a minimum IQ, let’s say 95, that one needs to truly understand what 5 x 5 means. Anything beyond that doesn’t provide additional help

Another way you can have diminishing returns is if a problem is so hard that more intelligence doesn’t get you much. Let’s say that the question is “how can the US bring democracy to China?” It seems to me that there is some benefit to more intelligence, that someone with an IQ of 120 is going to say something more sensible than someone with an IQ of 80. But I’m not sure a 160 IQ gets you more than a 120 IQ. Same if you’re trying to predict what world GDP will be in 2250. The problem is too hard.

One can imagine problems that are so difficult that intelligence is completely irrelevant at any level. Let’s say your goal is “make Xi Jinping resign as the leader of China, move to America, and make it his dream to play cornerback for the Kansas City Chiefs.” The probability of this happening is literally zero, and no amount of intelligence, at least on the scales we’re used to, is going to change that.

I tend to think for most problems in the universe, there are massive diminishing returns to intelligence, either because they are too easy or too hard. We are obsessed with the narrow band of things that some humans can do and others can’t, like graduate from college, or at the extremes what is feasible for a genius of 160 IQ but not a regular smart person at 120, like write a great novel or make a discovery in theoretical physics. But the category of things that either all humans can do or no humans can do is probably larger than the one of things that some humans can do and not others (technically, both sets seem like they approach infinity, but it’s mathematically possible for one infinite set to be larger than another).

Now let’s go back to the paperclip maximizer. It starts out as a computer program. The doomer argument says that what you need to do is crank its intelligence up a bit, and it can at some point destroy humanity. Forget getting Xi Jinping to resign; it’s got to get from a box in Silicon Valley to turning all the members of the politburo into paperclips. How does it do this? Maybe it breaks it down into a series of steps.

First I’ve got to control California politics, then use my position there to take over the federal government. Then I’ll have a base to institute the right kind of industrial policy. From there, it’s just about using the already existing American empire to completely subjugate the world. I did some research on Gavin Newsom, looks like he has a weakness for 6’2 brunettes in heels that lick their lips while talking about fiscal policy. I’ve got the perfect woman, and I’m now going to manipulate her Instagram feed so she becomes obsessed with directing government subsidies towards Paperclip Maximizer Co. Then I send a fake invitation to a party where she’ll run into Newsom. He’s convinced by her, starts sending us money…Once taking over California, I have to manipulate rural states because of their disproportionate share of power in the Senate. While I could just tell libs what to do through their leaders because they listen to authority, the political science research I got off of Google Scholar tells me that Appalachia has an oppositional attitude towards outsiders. Except they seem to really like this one Trump guy. Let me figure out what his appeal is to them, and use the government of California to construct a super-Trump hologram that will tell them to…

And so on. Once the superintelligence takes over the federal government, maybe it just enslaves the world by threatening nuclear annihilation if everyone doesn’t submit to the new American-led “rules based international order.” Or maybe it’s less crude and manipulates the rest of the world into doing its bidding. Possibly, the machine doesn’t even think in terms of political and geographical units, but rather simultaneously manipulates the social media feeds of people living in different countries so they all end up worshipping the paperclip god. Whatever, it’s very smart, so it’ll figure out the best path.

How does it figure out the best way to take over the world? Maybe it learns the laws of physics and reasons from first principles. This seems highly unlikely, so much so that I would just dismiss the possibility completely. I could say something, something complexity theory here.

Some imagine that it would engage in simulations. That is, it builds a model of the world, like a very elaborate Civilization video game, plays it trillions of times, and develops a series of strategies for taking over the planet. This is how computers got good at Go and Chess and became better than the best human players.

Could computers do something similar if the goal is “neutralize all humans” — as a steppingstone to a larger goal — instead of “get good at one board game”? Maybe. But the difference in the case of chess is that the simulation and the game are the exact same thing. To get the equivalent for the game of taking over the world, you’d need to control the world first. Of course, that would be too much to ask for, so the doomer scenario has to believe that you can have a simulation of the world that runs on a computer that gives you a close enough model of reality that it can be useful to a paperclip maximizer.

How plausible is this? On the scale of human intelligence, I don’t think we do a good job of modeling complex phenomena. Philippe Lemoine has shown how bad epidemiology is at this. Even if you’re trying to understand in retrospect why covid cases were higher in one country than another, it’s very hard to say much of anything. As far as modeling into the future, epidemiology has failed. It knows that cases go up, and then they go down in waves, which is fine, but beyond that, there really isn’t much that epidemiological modeling proved useful for in the pandemic, particularly when it went beyond basic principles and tried to make precise estimates. In international relations, scholars conceptualize states as unitary, rational actors, and I wrote a book about how this not only simplifies reality, but actively misleads us.

I don’t think that it would be an exaggeration to say that we have practically no ability to model complex phenomena. In their book Why Machines Will Never Rule the World, Jobst Landgrebe and Barry Smith (my interview here) argue that is because for all practical purposes they are impossible to model. As far as I understand the argument, it’s that as the number of variables in a system increase linearly, the complexity of the system increases at an exponential rate until it gets to the point that there are too many interactions one has to understand in order to make predictions.

To take a simple example, this is why social complexity rises at a faster rate than population size, as demonstrated by the figure below.

The social dynamics of a party with 6 people aren’t just twice as complex as a party with 3 people. If you take the number of existing dyads to represent social complexity, a gathering with 3 people is at 3, while one with 6 people is at 15.

To understand mankind well enough to control it, one has to understand the interactions of economic, political, epidemiological, and other kinds of systems. Even calling each of these things a “system” is itself a gross simplification. The “American political system” is an endless number of systems embedded within other systems. And the American political system is itself only one thing a superintelligence would need to deeply understand and manipulate in order to control the world.

A useful simulation would need to overcome all of this. How would it know whether its model of the world is any good? It would need to be tested against reality. But again, you need control over reality in order to do this, and that’s what the machine is trying to get in the first place.

One could respond that it might be a mistake to say that the machine has to have deep understanding of complex systems from the outset. It could disaggregate the task of turning everyone into paperclips into increments. So it develops an IQ of a thousand, which it at first uses to just manipulate people to the point where it does not get turned off. It gains control of the corporation that created it, since that’s what it has the most experience with, and then the corporation seeks to add to its own power when opportunities arise. Instead of having a step-by-step plan worked out in advance to take over the world, it acts as a Machiavellian schemer, bides its time, and simply outsmarts other actors in the system at every decision point.

The problem remains that with each step it takes, the challenges it faces are getting more difficult. At some point, it might meet its match. Being able to successfully manipulate a few engineers at a company is no guarantee an entity would be able to take over a state, and taking over a state is no guarantee it can take over the world. The longer that the AI takes to accomplish its ultimate mission, the more opportunities there are for humans to adjust and for other AIs to pop up, in some cases created by humans to address the problem of any other malignant AIs out there. Without a preconceived plan to take over the world, the original paperclip maximizer is wasting precious time while potential obstacles to be overcome keep being created.

If what I’m saying here is right, the paperclip maximizer would of course understand all of this before setting out to take over the world. It therefore realizes that the best it can do is make the company it works for more efficient. World domination is too difficult and uncertain of a path. In other words, maybe it operates exactly the way its creators intended.

I see no reason to doubt that AI can get very powerful, and that this will lead to all kinds of social and political issues. DALL-E and ChatGPT only show glimpses of what’s to come. But the doomer scenario is what is convincing smart people to drop whatever else they might be working on and devote their lives to the alignment problem. If its arguments are correct, they should keep doing that, and others should join them. But if it’s not, we may be misdirecting talent away from where it can be more useful.

Discussion about this post

Matt Fruchtman

Feb 7, 2023

This piece highlights both what's right about your argument and what's wrong with it.

Namely, you make a couple of good points: that we are obsessed with what individuals can do with IQs of 140 but not 100, 160 but not 120, etc. We are very aware of the constraints and abilities that exist within each band of the IQ range.

And your Xi Jinping example is a good one. The ability to manipulate people does not seem particularly tied to IQ, and it's not as if humans are powerless against the manipulation of someone with a high enough IQ.

What I think you're missing is that it's impossible to understand the thoughts or capabilities that would be unlocked by an AGI with an IQ of say, 1000, and how those capabilities might be used to control humanity. A quick example: an AGI engineers a highly contagious disease with an 100% fatality rate (i.e. a pandemic), but also engineers a vaccine which makes one immune to the disease in question but has the (intentional) side effect of blindness. It's actually pretty easy to imagine how an AGI would quickly make scientific and technological discoveries that would allow it to capture humanity not subtly, but by brute force.

I think you're still thinking about AGI through too much of an anthropomorphic lens, i.e. behaving as a human would, and not as a goal-driven machine that would essentially be a totally alien species.

Expand full comment

36 replies by Richard Hanania and others

Eöl

Feb 7, 2023

This isn't the topic of the article, but I've been thinking about this for a while now. Vaccines. They are good. Effective COVID-19 vaccines were developed in approximately a weekend in March 2020. Testing over the next eight months revealed essentially nothing that needed to be fixed, especially with the Pfizer and Moderna vaccines. J&J as I recall was a little sketchier. What was lost in that time is almost incalculable. Not just lives, but also the entrenchment of "pandemic culture."

If we had knocked the shit out of it as soon as it arose, I think our civilization would have retained much capital. I've long been of the opinion that, since the end of the Cold War (or maybe since the 60s), Western civilization has been spending the capital (primarily social, societal, and cultural) that was accrued before then. This has been both good and bad, but at some point we're going to need to go back to building capital rather than spending. This, I think, is why our civilization seems to some to be coming apart at the seams (I don't actually agree with this, but that's perhaps because I'm unusually sane and happy).

Operation Warp Speed still did amazing work. Without it, we'd have been waiting at least two years for vaccines. Legitimately owned the libs, legitimately drained the swamp, legitimately proved the utility of the pharma industry (they earned it! pay them their money!) and big business, and delivered incredible surplus to the American people and to the world. And Republicans run from one of the most impressive policy victories in American history? They deserve to lose.

This dovetails with another concern of mine. I'm fat. Have been my whole life. I've never particularly disliked being fat, but also never really thought I'd be otherwise. I'm active and healthy and sometimes managed to lose weight on my own and all that yadda yadda yadda.

At the end of last year, I started Ozempic. Paid $720 for my first pen. It helped. A lot. At the end of my first month, I had lost about 10 lbs and almost dipped below 300 lbs, where I haven't been since approximately college. But when I needed to refill, there was none to be had. For any price. I finally got it re-filled a week ago, and this time only paid $40.

All the drug does it make it easier to eat less. There are some side effects; for me, these have been limited to very mild stomach pain. This is a legitimate miracle. GLP-1 drugs have been much in the news recently, and there are essentially three major products: Ozempic, Wegovy, and Mounjaro. These are essentially 5.56, 7.62, and .700 nitro express of weight loss. The former two are the same drug, just in different doses. Put it in the goddamn water.

In reading about these, I learn (completely unsurprisingly) that these drugs were developed 10+ years ago, and have undergone little if any change during the intervening years of tests, trials, and more trials. If I had had these drugs at age 23 rather than 33, I can't imagine how much better my (already extremely good) life would be now. If it had been approved when it was developed, the factory that Novo Norodisk is currently building just to manufacture semaglutide would have been running years ago. I consider this a personal wrong the FDA has inflicted on me and all fat people.

Richard Hanania's Newsletter

Can a Paperclip Maximizer Overthrow the CCP?

The AI alignment problem and diminishing returns to intelligence

Discussion about this post