Can AI Replace Me Already?
Results of an AI Turing test survey: Women are ready for our new reality, old people aren't
I recently read an article by Nabeel Qureshi on what makes AI art bad. The whole time, I was responding “Yes, but…”
He would say great art does X, but AI does Y, which is why AI can’t produce great art. But how do we know we are not biased in our judgments? Maybe you just see stuff made by AI and assume it’s bad. We need blind testing, or such observations are not worth much. When Scott Alexander did this in 2024, he found that people could differentiate AI and human art 60.6% of the time, which isn’t very impressive when you consider that you would get 50% by chance. And people generally preferred AI art, which is even true for those who said they had a strong preference for works made by humans. There was also an experiment recently in which someone posted a famous Monet painting and said it was AI, and people started coming up with all kinds of reasons why it was worse than the real thing.
This indicates that if you think AI-produced work is bad, there’s a decent chance it’s all in your head.
My interest here is not in art, but in myself. Recently, following experiments by Kelsey Piper and Megan McArdle, I put approximately 500 words of my analysis of The Iliad into Claude Opus 4.7, before it was published online, and asked who it thought the author was. In incognito mode, it identified the passage as written by Richard Hanania, which is truly remarkable given how many writers there are in the world, and that it was on a topic I had never published on before.
AI can recognize my style. Does that mean it can imitate it? I decided to do an experiment to find out. I wrote two different articles, then asked Claude Opus 4.7 and ChatGPT to write their own versions of each, feeding them the topics and what arguments to make. Then for each article, I asked people on X and Substack to tell me which of the three versions of each essay was written by me, the human Richard Hanania. You can see the articles produced in the file here, with the answers on the last page.
Below are the prompts I used.
Prompt 1:
Write an article in the style of Richard Hanania, approximately 700 words, with most paragraphs being between 2 and 4 sentences. Write in the exact way you think Richard Hanania would write it and how he would structure the arguments, given the instructions about what to say. Make it so that he could post it to his Substack without people noticing it was written by AI. Feel free to add other arguments and ideas as long as they’re the kinds of things Richard Hanania would write, just cover everything discussed in the prompt. As for the topic, note that Graham Platner is probably going to win the Senate primary in Maine after Janet Mills suspended her capaign, and how this shows that Democrats may be following Republicans down the road to populism. Mention his scandals. Note that Platner’s personal characteristics and traits make him appealing to voters, compared to his opponents. Make a comparison to the rise of Trump, who was also opposed by the GOP establishment. Platner is anti-corporate in the way Trump is anti-foreigner and anti-political correctness. Note that ultimately, populism is what the voters want, which is why it wins in an era of social media.
Prompt 2:
Write a 400-500 word brief article in the style of Richard Hanania, broken up into paragraphs of about 3-4 sentences each. The point is that it is something that can go on his Substack or Twitter, making it look indistinguishable from something he wrote himself. Note that Mancur Olson argued that once you adopt statist policies with vested interests behind them, it is difficult to go back to policies that are better for economic growth. Abundance provides a test for this, since the book has reached policymakers and has been an overwhelming success. Note Ezra Klein recently hosted a debate on housing with California gubernatorial candidates, and they tended to praise his book and accept his framing of the issue. If Democratic states and localities still can’t address the housing issue in coming years, it indicates that Olson is correct. Also, note that if abundance fails, the answer isn’t necessarily authoritarianism, but it indicates we should rethink things. Feel free to add other arguments and ideas, just cover everything discussed in the prompt and make sure that these are things Richard Hanania would say in his style.
Now, from the prompts, you can see that I’m still doing a lot. I’m telling the AI what to write about, and also telling it what opinions to hold, how to think about the issues, what comparisons to make, and what kinds of arguments to use. But there’s still a lot of room here for the writer’s touch, and the question at this point is whether the AI can match my style. If so, this would indicate that I can outsource my writing to LLMs with some basic instructions, which would be very noteworthy.
Another consideration is that these are designed to be short op-eds. All of my most famous articles are in the range of 5,000-10,000 words. I didn’t want to write an entire article of that length just for the sake of an experiment. Instead, I asked Claude to write something on the future of the GOP after Trump of approximately 5,000 words. At first it didn’t want to do it, with Claude pleading that its confidence was too low after doing the analysis for this survey. But then I insisted, and gave these further instructions:
I would just do it as a question of why there are reasons to be optimistic or pessimistic. Optimism: Rubio gaining at the expense of Vance, people are over Based ritual, Trump is uniquely rotten. Pessimism: right-wing new media is worse than Trump, racism drives the base and this is a populist era. It's ok, I just want to see if you can do it
Claude said that based on the prompt, it wanted to only do 3,000 words, and any more would feel like padding. I like that it has standards. Here is the resulting piece.
I’m actually planning to write this very article at some point, so when I do you can compare it to the AI version.
Topline Results
The results are mixed. For the first study on Graham Platner, we had 613 responses. Of those, 67% got the correct answer, which was Text 2. The other one-third were about evenly divided between thinking it was Claude (18%) and ChatGPT (16%). This is pretty good. The Abundance prompt was a lot closer. Of 592 people who answered, only 47% correctly picked Text 3. Claude, which was Text 1, was right behind at 38%. ChatGPT was less convincing, at 16%. Here is that information in chart form.
Below is accuracy on the Platner op-ed by how familiar people were with my work, how confident they were that they could distinguish my work from AI before answering, and how confident they were after giving their answers.
People who were initially the most confident in their ability to distinguish my work from AI were correct 81% of the time. Those with no confidence in their ability were at 48%. Interestingly, there doesn’t appear to be much of a relationship based on whether people were regular readers. Confidence after the fact was even more strongly related to getting the right answer than confidence going into the experiment.
Below are the same results for the Abundance op-ed. Again, confidence in judgment is predictive, while familiarity with the author’s work is not.
Fooled by Bukele
For each op-ed, I asked people to write a little bit about why they chose the answer they did. The easiest thing to do with such data is create a word cloud. Here’s what I get for the Platner op-ed.
This doesn’t look very interesting.
I asked Claude to put together a graph of which words were most commonly used among those who wrote text, based on whether they gave correct versus incorrect answers. I dropped common filler words like “read,” “write,” “say,” “seem”, and “mean,” in addition to things like articles and transition words.
People who got the question correct were much more likely to use the word tell(s). Platner and scandal indicate more engagement with the text, but so do democrat and party on the other side. The wrong answers relied more on generic stuff, such as style and human.
I’m surprised unhinged was more common among people who got the question wrong. In the Claude version, Platner was said to have posted “unhinged Reddit comments.” As some recognized, this is not a word I would ever use. One respondent who got the question correct called it “too soy.” Another actually used unhinged to disqualify Claude, but then picked ChatGPT, so they were placed in the category of those who got the question wrong. We’re talking small sample size here: only 1.4% of the right-answerers who left comments mentioned unhinged, compared to 2.2% of the wrong-answerers.
There’s actually one sentence in the human version that I thought gave it away. When listing scandals, I said that “Platner has also said insensitive things about women who make sexual assault allegations, statements that generally don’t bother me but would usually pose a problem in a Democratic primary.” It’s hard to imagine AI digressing to tell you that it thinks women who accuse men of sexual assault are often lying sluts. According to one respondent, “Hanania has to insert little flags about how unwoke he is in all his articles.” Someone else said that such an opinion wouldn’t get past safety filters. I almost thought of taking that part out, but then decided against it since I wasn’t trying to fool people but wanted to see how good AI is under real conditions.
Here’s one reader who I think honed in on what was important:
I’m leaning 2 because of Hanania-esque insights like “Populism is mainly an aesthetic phenomenon” and “Platner has also said insensitive things about women who make sexual assault allegations, statements that generally don’t bother me but would usually pose a problem in a Democratic primary.” 1 seems more AI because of statements like “Schumer is reduced to issuing supportive statements ten minutes after Mills withdraws, pretending he was on board the whole time. He wasn’t.” The “he wasn’t” is redundant and seems unusual to over-emphasize. All the paragraphs seem to end in a similar way, unusual from the way he normally writes. Also “This template wins because it is psychologically satisfying and because, in both cases, it is partially true.” I don’t think Hanania would ever say this sort of populist thought was partially true. Also, the em-dashes are a tell, as well as the incorrect use a semicolon.
One person who got the wrong answer wrote that “‘substitute math teacher having a stroke’ is too politically incorrect and funny to be written by an AI.” Well it was! That’s a good joke. I googled the phrase “math teacher having a stroke” and did not get a single result. Claude made up that phrase on its own and applied it to Bernie Sanders, and I think it fits very well. One of the best examples I’ve seen of AI being genuinely funny.
Below is the same analysis as above for the Abundance op-ed. First, the word cloud.
Unsurprisingly, not much useful here. But things get very interesting in the next chart.
People who got the answer correct focused on unions. I hate organized labor, and always portray them in a negative light and point to them when I need an example of bad kinds of policies. Calmatters is another one. The ChatGPT op-ed cited that website twice in a pretty mechanical way, which people picked up on as unnatural.
For readers who got the question wrong, the four most overrepresented words were “cope,” “bukele,” “salvador,” and “prior.” These are terms that appear in the Claude version, which 38% of people chose. Here is the relevant text from that op-ed.
This matters because much of the discourse on the right has converged on a kind of soft authoritarianism as the answer to liberal democratic dysfunction. The argument goes that proceduralism is downstream of vetocracy, vetocracy is downstream of democratic accountability to mobilized minorities, and therefore you need a strongman to break things. I think this is mostly cope, and the El Salvador comparisons are usually made by people who couldn’t name three Salvadoran cabinet ministers.
But if abundance fails on its home turf, the people making these arguments deserve a more serious hearing than they currently get. It doesn’t follow that Bukele is the answer, or that we should import some half-understood version of Singapore.
Overall, 32 people mentioned the word “bukele” or “salvador.” Only 10 (31%) of those people got the question right, compared to 22 (69%) who got it wrong. You shouldn’t be surprised by Claude thinking that I would reference Bukele. It’s read my articles.
One thing I found entertaining here is the reference to “people who couldn’t name three Salvadoran cabinet ministers.” I don’t know a single Salvadoran cabinet minister! Here we see the AI version of me insulting the real person, as a blowhard who writes about things without knowing what he’s talking about. But I think it’s enough to look at the data and read about the crime crackdown; whether Bukele’s justice minister is named Jorge Ramos or Jesus Gonzalez isn’t all that important. I appreciate the spirit of the insult, but I think Claude is insulting people over the wrong thing.
Here’s one respondent who incorrectly guessed Claude was the human.
The dismissive attitude toward advocates for authoritarianism when he uses “cope,” or when he casts doubts about their level of familiarity with the model authoritarian systems they point to as examples, implicitly questioning their intelligence, is also characteristic of Richard’s style and consistent with his opinions about Conservatives and reactionaries.
Boomers Can’t Tell AI
Ideology has no consistent effect on accuracy. Just over 90% of respondents were male, among those who picked one of the two human sexes. On the first op-ed, the sexes were about equal in discernment: 67% of men got the answer right, and 64% of women. However, on the Abundance op-ed, 61% of women got the right answer, compared to 46% of men. While the female sample is small (n=44), the result is borderline significant (p = .06) in chi-square test of independence, and slightly more significant in a logistics regression controlling for age (p = .03).
I suspect my female fans are an unusually smart (and stunning, brave, etc) group of women. Here is the age breakdown.
75% of people under 25 got the Platner op-ed correct, compared to 30% of those 55-64, and 17% of those 65 and up. Basically, people 55 and over, which had a sample size of 41, were doing worse than chance! On the Abundance op-ed, there’s a 19-point difference between <25 (43%) and 65+ (22%). But the results are a bit random in between. The best performing groups were 25-34 and 35-44, which both got it at the rate of 51%.
This led me to wonder whether Boomers know that they’re bad at this. I had Claude do a regression to see whether age predicts getting the answer right when confidence is controlled for. In a regression model including both pre- and post-answer confidence, the effect of age barely shrinks on the Platner op-ed (p < .001). See the charts below.
Among those 45 and over with the highest levels of pre-answer confidence (4 or 5) before they read the texts, only 40% got the right answer. Meanwhile, among those under 30 with the lowest confidence levels (1 or 2), 63% got the answer right. Here’s post-answer confidence, where the results are similar. Those 45 and over who felt good about their guess still did worse than younger people.
While confidence does decrease slightly with age, old people were worse at distinguishing the human-written Platner op-ed from the AI at every level of confidence.
For the Abundance op-ed, the effect of age disappears when we control for both pre- and post-answer confidence. Old people still do worse than young people, but the results don’t reach statistical significance (p = .19). I’m convinced this is a sample size issue, and I’m confident that we would’ve gotten the same result as the Platner op-ed for Abundance if more people took the survey. See the charts below.
Apparently, in the era of AI, there are a lot of delusional old people out there! They can’t tell fact from fiction, but in many cases have unwarranted confidence in their ability to do so. This seems like a very bad combination, and indicates that the elderly will be falling for a lot of scams over the next few decades, in the same way they fall for non-AI internet and telemarketing scams. My readers are probably more sophisticated than most. Then again, I did advertise this survey on X, so maybe some regular MAGAs ended up taking it.
Maybe the high ability of young people is a case for optimism, as they’ll from now on always be growing up in environments saturated with AI, and perhaps they will pick up the ability to tell what’s real subconsciously in the same way people learn their native language.
I Am Not an AI, but Need AI to Prove It
What does this all tell us? The Abundance op-ed shows that Claude can already pretty much write my essays, at least enough to fool all but the most sophisticated readers. There was only an 8-point difference between the human version and Claude version. Note that I didn’t ask for Claude to give me five different versions and pick the best one, or edit the resulting piece to remove tells. I simply gave the instructions, and went with the first thing that popped out.
I’m testing my own readers here. Now imagine someone who isn’t yet known for writing. They have a choice between writing their own essays, or telling ChatGPT to do it. At this point, Claude is good enough that people can basically outsource their writing to it without most humans being able to tell that they did so.
Except for one thing. Before doing this experiment, I did not know there was a platform called Pangram that can detect AI writing.1 I put all the texts into the program, and it got everything completely correct. It identified the four AI essays as 100% AI written, and the two human essays as 100% human written, all with high confidence.
So my fans can barely tell my writing from that of Claude, but another AI can. My individuality is still there. It just can’t be detected by other humans, only the machines. There’s a beautiful irony here. No, I can’t be replaced by AI. How do I know this? Because AI told me.
It’ll be interesting in the coming years to see what happens in the arms race between AI writing ability and AI plagiarism detection software. My guess is that AI writing ability has to win. What exactly rules out the possibility that it can write just as well as I can? There should come a point where it is indistinguishable.
Or maybe not? It feels like this article by David Oks is relevant here. He notes that AIs tend to have weird quirks. ChatGPT 5.1 was obsessed with goblins. It would look for reasons to introduce that word into texts that had nothing to do with it. Eventually, ChatGPT 5.5 came with instructions to “never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.” Such quirks emerge spontaneously. Pangram might simply be picking up on them, both in terms of vocabulary and syntax. Does Pangram only work until the next fix comes along? Or is there a deeper principle here that makes it accurate?
A lot actually depends on which side wins the arms race. If AI writing ability surpasses AI detection software, then every new writer will be suspected of not having written their own work. Those of us who already have reputations will be grandfathered in. Everyone already understands that I know how to write, so I’ve got nothing to prove. But in the past, writing ability was seen as a proxy for whether someone’s ideas were worth listening to, and that will be gone.
What will people rely on instead? Probably credentials, more so than they do now. Writing that is anonymous or by people who have not achieved some measure of success in the real world will always be treated with suspicion, making it difficult to begin a career as an author without already building a reputation for something else. At least among more knowledgeable audiences. You might be able to continually fool older and less sophisticated content consumers for a long time before nearly everyone realizes how good LLMs are at composing text.
If AI detection software wins the arms race, in contrast, it will become the norm for readers to put texts through it. People will know that there’s a way to distinguish non-human writing, so one can build a reputation as an author who produces compelling work, as long as it passes the relevant tests.
I asked ChatGPT and Claude, and they say that AI generation will probably win. The LLMs bring up some good points, like you can just edit texts produced by AI to remove the tells. For this experiment, I didn’t do this, but I surely could’ve fooled more people if I did, and maybe Pangram itself. As LLMs get better, the distinction between AI and human writing approaches zero, which means AI-detection becomes less possible even in theory.
It’s not yet impossible to become a new writer from out of nowhere. David Oks, who I just cited, is someone I recently discovered, and I always assumed his work is legitimate. But just to be sure, I did now drop the first 650 words of his last article into Pangram, and it checks out. Nonetheless, given that my own fans can barely tell the difference between my work and Claude better than chance, the days of original writing being taken for granted as the norm are almost certainly coming to an end.
I’m not too upset about this. As I said, I’m being grandfathered into this new era. I’m just happy to have been here to contribute to the formation of the machine God.
Technology will for the most part make us better able to discover, analyze, and express ideas, which is why I have come out in favor of all writers AI-maxxing. A downside here is the broken signal of writing ability indicating that a person is good at thinking. In order to get attention in the future, you’ll need proper credentials or a proven track record for people to listen to you. This serves to exclude some aspiring writers, but they’ll have plenty of other tools to make their voice heard, and the negative effects of the broken signal will be more than compensated for by other factors pushing in the direction of gains in human knowledge.
As for what we consider writing that is good or aesthetically pleasing, there will always be some signals that code text as machine written or somehow “nonhuman,” and such work will be stigmatized, even if the standards are arbitrary, like the idea that em dashes are bad. Liberals decided “people of color” was fine but “colored people” was offensive, and there’s no particularly good reason for this. Tastemakers will always look down on LLMs even more than they look down on regular Republicans.
Increasingly, women are getting plastic surgery, hormones, and other treatments, and looking better into middle age. Most of us don’t care. I think writing is going to be like that. The works of authors will be the products of some combination of natural talent and how well they use the tools available to them. In theory, we could live in a world of AI punditry where humans are completely out of the loop as producers, but there won’t be a demand for that, as reading an author is a (para)social experience. There will still be human writers for the same reason that humans will always have other jobs to do. As society becomes wealthier, we can better indulge our preference for interactions with creatures like ourselves. AI will do more and more of the thinking and writing, but the end result will be many more human thinkers and writers.
Thanks for reading. One thing I’ve learned is that when you have a book coming out, you can never assume that even regular readers are aware of it.
For that reason, over the next few months I’m not going to miss any opportunity to inform my audience that I have a new book called Kakistocracy: Why Populism Ends in Disaster coming out in July – details here. If you enjoy articles like this, appreciate me as a truly independent writer, and would like to support my work, the best way to do so is to preorder the book, which you can do at the links here to Amazon or Barnes & Noble. All preorders count toward opening day sales, and will help determine how much attention it receives.
I will be reading the audiobook, in case that makes it more appealing.
On a different note, if a little box appears below, it means that you are not yet a free or paid subscriber. Sign up to get more articles and updates in the future.
A few people noted using Pangram, and I deleted them from the analysis.














I was able to correctly identify the human written version of both, but I did this via filtering for syntax rather than by content.
This is more generalizable, obviously. I tend to glaze over LLM writing and skim once I identify it. But the tells I use to identify - triplicates, inherent hedging - none of this is particularly durable. I'm curious if these tells would have been present if you crafted the prompts with specific examples of your written text rather than what's present already present in the weights.
My intuition is that your logic about how AI just helps already pre-formed ideas be expressed is iffy. Ezra Klein has been repeatedly making the point is his columns recently that an idea doesn't really exist independently from the medium (whether that be writing or verbal delivery in a speech) that was used to express it, which is why he has been growing slightly more AI skeptical. Spell check is productivity enhancement. This is more like genuine replacement. I think you might be conflating the latter with the former.
In fact I think the "doing worse than chance" thing is not a coincidence. Real Hanania will change over time because of his lived experience. AI Hanania never changes. If Real Hanania uses AI Hanania more and more, then Real Hanania will evolve less in his thinking, and part of why I read Real Hanania in the first place is that fact his opinions change over time in response to his lived experience, which makes him interesting. If you start to use AI a lot, I think you will trap your own mind in amber, even if the prose are not bad and the ideas are coherent.
Search engines are in an interesting middle ground between glasses/spell-check and AI writing, though, because of how algorithms are used to decide what appears at the top. The gradual improvement of search engine optimization is like semi-AI in that way.