Pinker on Alignment and Intelligence as a "Magical Potion"
An email exchange on whether the machines will kill us
My recent article on diminishing returns to intelligence and what it means for AI alignment, along with my responses to some comments, sparked an email discussion with Steven Pinker. It helped shape my thinking on this topic, so I thought it would be a good idea to share the exchange.
Between this and the recent interview I just posted with Robin Hanson, it may seem like I’m setting out to disprove AI doomerism. But it just happens that I was already personally acquainted with two of the smartest anti-doomers alive before starting to investigate this issue, and they’ve been generous enough with their time to let me bounce ideas off of them. Naturally, these conversations have shifted my views quite a bit.
During our discussion, I brought up the possibility that debates about the g factor from psychometrics and the ridiculous objections to the concept have primed people to think about intelligence as something real that exists on a linear scale across species and the divide between animate and inanimate objects. Steve responds,
That’s an interesting speculation about resistance to IQ denial as a source of support for the concept of superintelligence. I suspect it’s not accurate – the superintelligence proponents I’ve seen don’t bring up or refute the Gouldian arguments against IQ, but just seem to operate under the folk theory that IQ is a magic potion that you can have various amounts of. I may be wrong, but I can’t recall any mentions of the psychometrics or behavioral genetics of intelligence in these discussions.
It was only after this conversation, however, that I read the second part of Steve’s discussion with Aaronson, and he answers this point directly in what I found to be a quite convincing way:
While I defend the existence and utility of IQ and its principal component, general intelligence or g, in the study of individual differences, I think it’s completely irrelevant to AI, AI scaling, and [existential risk from AI]. It’s a measure of differences among humans within the restricted range they occupy, developed more than a century ago. It’s a statistical construct with no theoretical foundation, and it has tenuous connections to any mechanistic understanding of cognition other than as an omnibus measure of processing efficiency (speed of neural transmission, amount of neural tissue, and so on). It exists as a coherent variable only because performance scores on subtests like vocabulary, digit string memorization, and factual knowledge intercorrelate, yielding a statistical principal component, probably a global measure of neural fitness.
In that regard, it’s like a Consumer Reports global rating of cars, or overall score in the pentathlon. It would not be surprising that a car with a more powerful engine also had a better suspension and sound system, or that better swimmers are also, on average, better fencers and shooters. But this tells us precisely nothing about how engines or human bodies work. And imagining an extrapolation to a supervehicle or a superathlete is an exercise in fantasy but not a means to develop new technologies.
Read the rest here. As someone who writes a lot, I can attest that it’s very easy to forget what points you have or haven’t addressed before, and things sometimes just start jumbling together. Looks like it happens to the best of us.
A lightly edited version of our conversation is below.
Steve
I liked your post very much, Richard. I made overlapping arguments in Enlightenment Now, and, regarding “superintelligence” in particular, in a two-part exchange with Scott Aaronson last summer.
I tend to agree that brilliant people rededicating their careers to hypothetical, indeed fantastical scenarios of “AI existential threat” is a dubious allocation of intellectual capital given the very real and very hard problems the world is facing. At least Scott (a good friend, and someone I respect enormously) is trying to figure how to digitally watermark GPT output, a constructive endeavor; I’m glad he’s not saving us from the paperclip maximizer.
Richard
Thanks a lot for sending along the Aaronson debate. I agree with him that something that can do what Einstein does but 1,000x faster would qualify as a “superintelligence.” Yet I don't know what to make of the assumption that such a thing is even consistent with the laws of physics, much less technologically possible at some point. Maybe I'm overlooking some reason why this is obviously the case, but I don't know.
Relatedly, I've always wondered whether there might be some deep connection between what we call intelligence and biological matter. Every computer program we've ever invented so far has been extremely narrow in what it can do compared to humans. We only have n = 1 for beings with humanlike intelligence, and the only example we have is biological. Not a huge sample size, but I don't know why we should assume that what can be done with cells, neurons, etc. can be replicated (or surpassed) with silicon. And if intelligence must be biological in some form, that implies certain limits, like you can't just multiply the ability by a thousand, at least without changing a lot of other things. This goes against the "substrate independence" assumption, which I've seen people in the AI community treat as axiomatic. Do you have thoughts, or at least intuitions, on this?
Steve
I think your question can be divided into three questions.
One is whether silicon (or some other substrate) could, in principle, duplicate the computations of the human brain which we call “intelligence.” As a cognitive scientist who’s long argued for the “computational theory of mind,” which implies substrate-independence, I’d answer “yes.” Certainly the brain doesn’t perform miracles; it only acts by the laws of physiology, and it seems unlikely that any protein or other molecule holds some magical power responsible for intelligence. Much more likely it’s the flow of information and the logical and statistical operations executed by synapses. We know that neural networks are (with access to external memory) Turing-equivalent, and as long as you don’t require infinite analogue precision (which the brain is almost certainly incapable of, if only because of noise), then even our limited understanding of the capabilities of (real) neural networks would suggest that their powers could be emulated in silicon.
A second is whether this is possible in practice. Since neurons grow and interconnect in three dimensions, and each neuron may form 1,000 synapses, together with possibly executing computations in the branching structure of its dendritic arbor, a manufactured 2D silicon system that tried to emulate our 86 billion neurons with its 100 trillion synapses may be impractical to build, program, or run.
A third is whether such a system, even if it emulated us, would be sentient in the “hard problem of consciousness” sense of having a subjective inner life (qualia, raw feels, first-person present tense experience), or whether that depends on some physico-chemical properties of brain tissue. You have philosophers on all sides of this issue, and we’ll never find out, because by definition it’s about what we can never observe objectively.
Richard
Thanks. I think my next post on this topic will be on the nature of intelligence question. I suspect a lot of people come to the AI issue with the debate over the psychological construct of g in their mind. They think it’s real, and see the arguments that deny it as weak. So they extrapolate from the idea that there’s a human-centric g factor, that is one related to reading and math and things we care about, to a universal g factor, in which something is so incredibly smart we must submit to it.
I was just introduced to Robin Hanson’s writing on this, I found this article particularly entertaining.
I get from your recent tweet that you don’t think “AI creates virus to kill us” is a realistic scenario, and I want to make sure I understand why that is. You say doing this wouldn’t be an inevitable or even a likely consequence of “intelligence,” ok, let’s just call it “being really good at making viruses,” and it wants us dead so we’re not in the way of it making paperclips or accomplishing whatever its goals are. Would your argument be something along the lines of this is highly unlikely, since whatever program that’s good at the goal of “maximizing paperclips” is not likely to be able to create viruses (or manipulate humans into doing so) because this assumes a too much broadly defined “intelligence” and a kind of universal g factor that doesn’t exist?
Steve
Thanks for the link to the old Hanson essay – I think it’s spot on. There’s a recurring fallacy in AI-existential-threat speculations to treat intelligence as a kind of magical pixie dust, a miracle elixir that, if a system only had enough of it, would grant it omniscience and omnipotence and the ability to instantly accomplish any outcome we can imagine. This is in contrast to what intelligence really is: a gadget that can compute particular outputs that are useful in particular worlds.
That’s an interesting speculation about resistance to IQ denial as a source of support for the concept of superintelligence. I suspect it’s not historically accurate – the superintelligence proponents I’ve seen don’t bring up or refute the Gouldian arguments against IQ, but just seem to operate under the folk theory that IQ is a measure of a magical potion that you can have in various amounts. I may be wrong, but I can’t recall any mentions of the psychometrics or behavioral genetics of intelligence in these discussions.
I think there are many things wrong with the argument that we should worry about AI creating a virus that kills us all.
First, why would an AI have the goal of killing us all (assuming we’re not talking about a James Bond villain who designs an AI with that in mind)? Why not the goal of building a life-size model of the Eiffel Tower out of popsicle sticks? There’s nothing inherent in being smart that turns a system into a genocidal maniac – the goals of a system are independent of the means to achieve a goal, which is what intelligence is. The confusion arises because intelligence and dominance happen to be bundled together in many members of Homo sapiens, but that’s because we’re products of natural selection, an inherently competitive process. (As I note in Enlightenment Now, “There is no law of complex systems that says that intelligent agents must turn into ruthless conquistadors. Indeed, we know of one highly advanced form of intelligence that evolved without this defect. They’re called women.”) An engineered system would pursue whatever goal it’s given.
Sometimes you see the assumption that any engineer would naturally program the generic goal of self-preservation, or self-aggrandizement, at all costs into an AI. No, only an idiot would do that. This is crude anthropomorphization, perhaps Freudian projection.
Second, and relatedly, these scenarios assume that an AI would be given a single goal and programmed to pursue it monomaniacally. But this is not Artificial Intelligence: it’s Artificial Stupidity. No product of engineering (or for that matter natural selection) pursues a single goal. It’s like worrying that since the purpose of a car is to get somewhere quickly, we should worry about autonomous vehicles that rocket in a straight line at 120 MPH, mowing down trees and pedestrians, without brakes or steering. I’ll quote myself again: “The ability to choose an action that best satisfies conflicting goals is not an add-on to intelligence that engineers might slap themselves in the forehead for forgetting to install; it *is* intelligence.” And “Of course, one can always imagine a Doomsday Computer that is malevolent, universally empowered, always on, and tamperproof. The way to deal with this threat is straightforward: don’t build one.”
The third fallacy is one that I mentioned in the excerpt you reposted: that sheer rational cogitation is sufficient to solve any problem. In reality intelligence is limited by knowledge of the world, which is an exponential space of possibilities governed by countless chaotic and random processes. Knowledge of the world is expensive and time-consuming to attain incrementally. Me again: “Unlike Laplace’s demon, the mythical being that knows the location and momentum of every particle in the universe and feeds them into equations for physical laws to calculate the state of everything at any time in the future, a real-life knower has to acquire information about the messy world of objects and people by engaging with it one domain at a time. Understanding does not obey Moore’s Law: knowledge is acquired by formulating explanations and testing them against reality, not by running an algorithm faster and faster. Devouring the information on the Internet will not confer omniscience either: big data is still finite data, and the universe of knowledge is infinite.”
Even the Bond-villain scenario is too facile. As Kevin Kelley noted in “The Myth of the Lone Villain,” in real life we don’t see solitary evil geniuses who wreak mass havoc, because it takes a team to do anything impressive, which multiplies the risk of detection and defection, and it inevitably faces a massively larger coalition of smarter people working to prevent the havoc from happening. And as Kelley and Hanson point out, no technology accomplishes something awesome the first time it’s turned on; there are always bugs and crashes, which would tip off the white hats. This doesn’t guarantee that there won’t be a successful solitary sociopathic AI-virus-designer-designer, but it’s not terribly likely.
Many of the scenarios pile up more layers of Artificial Stupidity, such as assuming that human flesh is a good source of material for paperclips, or even that annihilating humans is plausible means to the end of self-preservation.
The AI-existential-threat discussions are unmoored from evolutionary biology, cognitive psychology, real AI, sociology, the history of technology and other sources of knowledge outside the theater of the imagination. I think this points to a meta-problem. The AI-ET community shares a bad epistemic habit (not to mention membership) with parts of the Rationality and EA communities, at least since they jumped the shark from preventing malaria in the developing world to seeding the galaxy with supercomputers hosting trillions of consciousnesses from uploaded connectomes. They start with a couple of assumptions, and lay out a chain of abstract reasoning, throwing in one dubious assumption after another, till they end up way beyond the land of experience or plausibility. The whole deduction exponentiates our ignorance with each link in the chain of hypotheticals, and depends on blowing off the countless messy and unanticipatable nuisances of the human and physical world. It’s an occupational hazard of belonging to a “community” that distinguishes itself by raw brainpower. OK, enough for today – hope you find some of it interesting.
Richard
Thanks Steve, I was about to say the goal of killing humanity was more logical than building a new Eiffel Tower, because humans are threatening to it and could prevent it from achieving its goal, so it’ll see them as rivals.
But I think you answered that in the second point. No computer program works like this, it seems like you’re saying that you’d have to work hard to get it to behave that stupidly, while the doomers sort of assume that it would be the default of anything with high enough “IQ.”
The third point strikes me as not that convincing. How hard is it to engineer a bioweapon that kills everyone? I mean we’re not that smart and we can do it. I don’t think one would need the AI to conduct experiments or whatever, I suspect you could derive it from the stuff that’s been engineered and published on already. So you’re right, diminishing returns to intelligence implies AI can’t enslave humanity, as I initially argued, but finding a way to kill us all doesn’t seem to be all that hard to me.
As for g, in the comments to my article on EA needing to be anti-woke, someone actually pointed out to me that the anti-woke portion of EA tends to be AI doomers, so they wouldn’t see any reason to worry about something as silly as wokeness when humanity will soon be destroyed. My impression is that’s probably right, see the latest Bostrom controversy, as one example. Bostrom, etc. don’t write much about the IQ debate, but my impression is that many of them are aware of it and have come to the anti-Gould position.
Anyway, thanks a lot. You’ve given me a lot to think about.
Steve
Thanks, Richard. Quick comment:
Thanks Steve, I was about to say the goal of killing humanity was more logical than building a new Eiffel Tower, because humans are threatening to it and could prevent it from achieving its goal, so it’ll see them as rivals.
This assumes that it is programmed to stop at nothing to achieve that goal, which is an example of Artificial Stupidity rather than Artificial Intelligence, since pursuing one goal at all costs is not “intelligence.” This includes the goal of self-preservation – no engineered system is designed to keep going no matter what (electrical wiring has ground fault interrupters, computers have “off” switches, appliances have fuses, factories have automatic shutdowns, etc. – this is just engineering).
The idea that it’s somehow “natural” to build an AI with the goal of maximizing its power, or even preserving itself under all circumstances, could only come from a hypothetical clueless engineer whose chief goal is really to give AI-existential-threat speculators something to speculate about.
I'm a huge fan of Steven Pinker in general, but IMO he's always been terrible on this issue and has persistently misunderstood the (best) arguments being offered. This isn't to suggest the AI doomsayers are correct, just that Pinker has ignored their responses for years. It's a little baffling, but I guess I don't necessarily blame him too much for this; we presumably all ignore stuff regarding topics we aren't interested in or don't take very seriously.
One example is when he asks why AI's would have the goal of "killing us all." The common point that alarmists make - and in fact one you sort of touch on - is not that AI's will programmed specifically to be genocidal, but that they'll be programmed to value things that just coincidentally happen to be incompatible with our continued existence. The most famous/cute example is the paperclip maximizer, which doesn't hate humans but wants to turn everything into paperclips because its designers didn't think through what exactly the goal of "maximize number of paperclips" actually entails if you have overwhelming power. A very slightly more realistic example, and one I like more, is Marvin Minsky's: a superhuman AGI that is programmed to want to prove or disprove the Riemann Hypothesis in math. On a surface level, this doesn't seem like it involves wanting change the world... except maybe it turns out that its task is computationally extremely difficult, and so it would be best solved by maximizing the number of supercomputing clusters that will allow it to numerically hunt for counterexamples.
The term to google here is "instrumental convergence." Almost regardless of what your ultimate goal is, maximizing power/resources and preventing others from stopping you from pursuing that goal is going to be extremely useful. Pinker writes that "the idea that it’s somehow 'natural' to build an AI with the goal of maximizing its power... could only come from a hypothetical clueless engineer," but this is clearly wrong. Maximizing power is, in fact, a "natural" intermediary step to doing pretty much anything else you might want to do, and the only way to adjust for this is to make sure "what the AI wants to do" ultimately represents something benevolent to us. But the AI's we're currently building are huge black boxes and we might not know how to either formally specify human-compatible goals to it in a way that has literally zero loopholes, or to figure out (once we've finished programming it) what its current goals actually are.
Really starting to think that the reason behind the AI doomerism isn't because of actual fears about an AI about to kill us all - a true threat like that would be all-consuming and render people paralyzed. This is really just about trying to increase status for people who tangentially work on AI related problems, but maybe are not in the center of it like the key AI personnel at Open AI or Google. This doesn't mean there aren't true believers like Yudkowsky who are prominent enough and probably don't need the status boost.
The reality is that a lot of current AI/ML implementation is fairly mundane - doing optical character recognition, parsing text, labeling images, etc. The reality of coding this stuff is well boring, most data science work is not that exciting, and no would find it sexy. What is sexy is battling a superdemon AI that is about to kill everyone and being one of the few that can stop it, or even just discussing that with people when you tell them you work with AI. That's an instant boost to status and power. This narrative also piggy-backs on the messianic religion-tinted narratives of apocalypse that pop up in the US and Europe every now and then, further increasing status for the people warning about AI.
Edit: AI can cause serious disruptions and we do need to be careful about - but worrying about IP issues or disruptions to the labor market are not at the level of destroying all of humanity. I don't want to put all the people worrying about AI issues in the same bucket.