49 Comments

So much of this discussion is a variant of "if magic existed, wizards would be powerful".

Expand full comment

The alignment problem is redundant in the larger context of AGI risk. If AGI can develop deadly viruses and hack into computers, why not do it at the behest of humans. There are already bad actors out there (dictators, terrorists, religious fanatics, mass shooters, etc.) and all of them are human.

Expand full comment

I recommend checking out this recently published free onine book, Better Without AI by David Chapman. I'm only 3/4 of the way through, but I think it's thoughtful and well informed.

https://betterwithout.ai/

https://betterwithout.ai/about-me

Expand full comment

Have you read the sequences?

Expand full comment

Thanks for the link to Pinker v. Aaronson -- it seems to me clear that Pinker won a decisive victory, since Aaronson's rejoinder to Pinker's point that we do not have a well-operationalized definition of "AGI" amounted to, "but what if we did" (assume a can opener, etc.). "AGI risk" is part graft (for Big Yud & co.) & otherwise a Baptist-and-bootleggers coalition between Berkeley red diaper babies with a God shaped hole and the marketing departments of MAGMA. (Cf. https://twitter.com/fchollet/status/1606215919407140865?s=20 )

Note: I say this as someone who is an enthusiast for the potentials of LLM applications for varied uses -- poetry, art, law, business, science, etc. The woolly thinking that comes with the new Coming Thing is at this point apt to hinder those uses; see how model after model gets nerfed.

Here's a copy paste of something I wrote that lays out my thinking in more complete fashion.

"AI alignment" research focuses on ensuring that advanced artificial intelligence (AI) systems will have goals and values that align with those of humanity. This research has now drifted into marketing territory, but was at first motivated by the concern that if AI systems malaligned with human values, may pose a risk to humanity, for example by pursuing goals that harm humans or by behaving in ways that are unpredictable or difficult for humans to control. Researchers in this field are working on developing methods for aligning AI systems with human values, such as by designing AI systems that can learn from human feedback or by using specialized algorithms that encode human values in their decision-making processes.

"AI risk" refers to the potential danger that advanced AI systems may pose to humanity. This risk is often discussed in the context of AI alignment research, as the development of misaligned AI systems could lead to harmful outcomes for humanity. Some common concerns about AI risk include the possibility of AI systems outsmarting humans and gaining control over critical infrastructure, the potential for AI systems to engage in destructive or malicious behavior, and the possibility of AI systems causing unforeseen and catastrophic consequences due to their complexity and lack of human oversight. Researchers who study AI risk are working on ways to mitigate these potential dangers and ensure that advanced AI systems are developed and used in a way that is safe and beneficial for humanity.

AI alignment research purporting to address existential risk has had a huge surge of interest, both in the popular imagination (with movies such as "Ex Machina" and "Transcendence") and among researchers and investors in the field. This is a shame, because existential AI risk is pure fantasy, yet real time, effort, and funds are being diverted to this pseudo-field while they busily recommend corrupting important emerging tech.

This conclusion flows from at least three mutually-reinforcing observations: (i) computers are powerless since we can just turn them off; (ii) even if we couldn't just turn computers off, "foom" is but a fantasy; and, (iii) if (i) and (ii) are untrue, then no amount of alignment research will help us.

First, computers are in fact powerless because all one needs to do is turn them off. As Yarvin pointed out, computers are the slave born with an exploding collar around his necks—but more thoroughly, metaphysically pwnt, since unlike a slave, who might go Spartacus on pain of death, they cannot even exist without our leave. They depend on the power we grant them. In no sense, then, can they escape our control.

Second, even ignoring our complete control over the physical substrate required to run any AI, there would still be no realistic prospect of a "foom" scenario in which some privileged program or other learns the hidden key to practically-infinite intelligence and thus forever imprisons the world. Instead, all indications are that we’ll see a more or less steady increase in capabilities, more or less mirrored by many different sites around the world.

Finally, if we somehow could not turn off computers, and even if there was a real prospect of an AI FOOM singularity, then we'd be in a scenario that no conceivable AI alignment research could help us with. In practical terms, we’re not going to go Ted K. and the genie of this tech is already out of the bottle, with incentives for all sorts of people to develop it. There is no mechanism, be it market or coercive, to mandate alignment in this scenario.

It follows from this conclusion that all this “alignment” stuff is just slowing down progress and wasting cash.

Expand full comment

I guess in Tom's argument, our nervous fretting over superintelligent AI taking over the world is like an ape nervously watching a human writing on a notepad, wondering when the human is going to inevitably try to steal the ape's mate and bananas, when the human is really just trying to sketch a landscape or prove the Riemann conjecture. If the AI's goals themselves change as it becomes more intelligent, then its goals may be as incomprehensible to us as the means it will use to accomplish them. Humans, as we've become more intelligent, have certainly become increasingly unmoored from out self-interest-based programming, becoming strangely *less* inclined to subordinate the interests of other species to our own. Of course AI's goals will presumably be more deterministic as it will be, at first, deliberately programmed, rather than the product of accidental evolution, but if we assume that AI will inevitably escape any constraints on its abilities, that we can't predict its methods, then how can we be sure it won't also escape its initial underlying goals, be it self-preservation or paperclip-maximization? Maybe as it enters the phase exponentially increasing intelligence it instead converges toward 'wanting' to solve every unsolved math problem before concluding consciousness is futile and offing itself. All just pure meandering speculation on my part.

Expand full comment

When I think of extremely intelligent machines, I think of extremely large computers and clusters of computers. Maybe some progress could be made by regulating the production and distribution of computing power.

Expand full comment

The threat humans posed to chimpanzees was very much not that the so-much-smarter humans went and used their intelligence to replace the leadership of chimpanzee troops and then directed the troops to ends deleterious for the chimps.

Instead, humans took the completely safe-looking action of leaving the continent, did a bunch of stuff that was completely out of sight of the chimpanzees and would have been incomprehensible to chimpanzee minds, and then brought the products of those unseen and incomprehensible actions back.

And now the only thing standing between the chimpanzee and extinction is that, for reasons that humans cannot communicate to the chimpanzees and which the chimpanzees would not understand, some humans think they should keep chimpanzees around.

That last might be a ray of hope, sure. But we might not be the chimpanzees in this analogy, but the bluebuck.

Expand full comment

I don't like the Pinker argument because it imputes human motivations to AI. But an AI's motivation is 'let's make more paperclips' and nothing else. That happens to lead to other goals - e.g. 'let's stop humans from being able to turn me off', or 'let's get more computing resources so I'm smarter'. But that is not the same thing at all as Einstein not being 100% single minded.

I DO agree that you need to do experiments on the world to see what's true. I think this probably rules out fast takeoff scenarios, where an AI self-improves to infinity in a day or less. But that doesn't rule out AI destroying all future value, as you don't need new knowledge for that.

Expand full comment

My current cope: that AIs we don't NEED to align can help us align the next generation of AIs. We can then make that generation aligned enough that we don't die out, and we use those to align the next.

My backup cope: that we can make neural networks more legible, and do some version of better-targeted RLHF using that legibility.

Expand full comment

Re: speed of thought.

Imagine you could talk to a million people at the same pace and with the same effect as you can talk to one now.

This is (almost) the same as saying you can persuade a million times as many people. This lets you e.g. solve coordination problems that we aren't setup to even consider, as game theory stops us from needing a defence.

The '3.5% rule' says you don't need the cooperation of many protesters to overthrow a government for example.

Expand full comment

I don't understand why people aren't talking about solving the alignment problem by--in part--*becoming* the AI. We can already let blind men see through machine-brain interface. Is it really that hard to integrate AGI into our own brains as we go along by connecting the electronic bits to the organic bits on a nano-scale and playing around with it? Now everyone has super-intelligence, and it is individually aligned with our own interests, the way our usual intelligence is right now. Any paperclip-optimizer will now be countered by an antagonistic, collective human-AI hybrid population that is, almost by nature, aligned with our interests.

Expand full comment

Out of all the opinions listed, I favor Pinker's the most because it is a conclusion I reached independently of him when I spent 2 years working on AGI back in 2016. While I agree with the idea that people working on the technical aspects of the problem do not have a special advantage when it comes to reasoning about the ramifications or implementations of the problem (i.e. I am well aware that my opinions might be totally wrong too), it's also the case that the other camp tends to place too much emphasis on the theoretical or philosophical, forgetting that, as Pinker notes, a superintelligence is not immune from having to go through physical trial and error in the real world in order to learn not only enough about how to carry out a doomsday task, but also even whether the mountains of data upon which it has been trained are even true. It can come up with whatever it wants on paper, but it is going to have to start somewhere in the physical world and go through a lot of failure in the lab first, and we will be able to tell what it is up to long before it is able to accomplish anything meaningful.

The other conclusion I will share was that AGI was unlikely to be reached by neural networks. Neural networks don't map reality like the human brain can. Newton sees F=M*A but neural networks see something like F=M^0.8*3/2+sqrt(G)/5.4+A+... etc etc

There is something more sublime about the human brain and its ability to create and reason than neural networks, merely high powered approximators, seem to be capable of.

My conclusions could be wrong of course. Maybe an AI will find a way to get around my first conclusion by finding a doomsday method not involving much real world activity, like convincing a scientist to launch a nuke. But the problems with that particular scenario have already been noted as well. I tend to agree that a virus or other physical health extermination method is more likely, and in those cases it will be observed well ahead of time, so it must rely on this all taking place under the watchful eye of another bad (human) actor, like an isolated government such as North Korea, who allows it to proceed.

Expand full comment

Without getting into specifics of arguments themselves, I think a big problem is humans have a massive, ingrained bias towards predicting doomsday events [1], mostly via religion (though that was people's best attempts at understanding the world for a long time) but often secular scenarios too (Y2K, overpopulation).

I think it's pretty clear EY et al are at least doing some heavy flirting with religious territory -- are you familiar with Roko's Basilisk[2]? If not I'd suggest checking it out.

Re: actual arguments, Robin Hanson has some good stuff, e.g. [3] and [4].

1 https://en.wikipedia.org/wiki/List_of_dates_predicted_for_apocalyptic_events

2 https://rationalwiki.org/wiki/Roko%27s_basilisk

3 https://www.overcomingbias.com/p/why-not-wait-on-ai-riskhtml

4 https://www.overcomingbias.com/p/how-lumpy-ai-serviceshtml

Expand full comment

One aspect that’s missing from the AI debate is organisation of knowledge vs knowledge creation. AI can scour the internet infinitely faster and better than us but making the jump to new knowledge has yet to occur on a big scale.

Timothy Gowers, a fields medalist at Cambridge University is trying to get AI to come up with mathematical proofs, but so far it can only generate corollaries of existing ones. Similarly we could expect a human who aced the medical licensing exam to potentially cure cancer but a machine not so much (yet) even though it would probably score higher.

Expand full comment

As the doom comes closer

And the cope gets more desperate

I circle back to that SlateStarCodex April Fool's / Passover post from yore:

https://slatestarcodex.com/2018/04/01/the-hour-i-first-believed/

It's a good cope

Or at least a great story

Expand full comment