28 Comments
User's avatar
Emiliano's avatar

I was able to correctly identify the human written version of both, but I did this via filtering for syntax rather than by content.

This is more generalizable, obviously. I tend to glaze over LLM writing and skim once I identify it. But the tells I use to identify - triplicates, inherent hedging - none of this is particularly durable. I'm curious if these tells would have been present if you crafted the prompts with specific examples of your written text rather than what's present already present in the weights.

Dylan Black's avatar

Exactly what I did as well. I think the results of the survey undersell the conclusions, this test was easy. If you strip out the tropes and de-sanitize the writing I could not have differentiated them.

B.P. Majors's avatar

I've had "AI detection" programs identify my original writing as written by AI.

Should I ask my parents if I am really theirs?

Michael's avatar

My intuition is that your logic about how AI just helps already pre-formed ideas be expressed is iffy. Ezra Klein has been repeatedly making the point is his columns recently that an idea doesn't really exist independently from the medium (whether that be writing or verbal delivery in a speech) that was used to express it, which is why he has been growing slightly more AI skeptical. Spell check is productivity enhancement. This is more like genuine replacement. I think you might be conflating the latter with the former.

In fact I think the "doing worse than chance" thing is not a coincidence. Real Hanania will change over time because of his lived experience. AI Hanania never changes. If Real Hanania uses AI Hanania more and more, then Real Hanania will evolve less in his thinking, and part of why I read Real Hanania in the first place is that fact his opinions change over time in response to his lived experience, which makes him interesting. If you start to use AI a lot, I think you will trap your own mind in amber, even if the prose are not bad and the ideas are coherent.

Search engines are in an interesting middle ground between glasses/spell-check and AI writing, though, because of how algorithms are used to decide what appears at the top. The gradual improvement of search engine optimization is like semi-AI in that way.

Arif's avatar
17hEdited

“I suspect my female fans are an unusually smart (and stunning, brave etc.) group of women” got a good chuckle out of me.

Andy Iverson's avatar

In contrast to some of the other comments here by people who are feeling superior, allow me to admit that I got both wrong, and I was especially wrong on the first one, which most people got right. This makes me wonder if I am actually kind of bad at reading.

fox's avatar
10hEdited

I got both right on the survey and expressed confusion about what the prompt was for the abundance essay because the arguments felt weird and disjoint. Now seeing the prompts it makes much sense because you told the AI's to add in additional filler theories. The AI responses definitely lost some coherence beyond the bullet points in the prompt and that threw me a little when i was trying identify the human version because I assumed they were all supposed to cover the same points.

Also, I'm not sure how much i would read into this. If your typical output was the quality of any of the abundance essays i don't think you would have become a popular writer in the first place.

Scott Sumner's avatar

It is important not to read too much into the sort of Turing tests on painting done by Scott Alexander. It has long been known that humans could imitate the work of great artists in ways that were difficult for experts to distinguish. I have no artistic talent at all, and yet I might be able to produce a Mondrian painting that would fool the experts. But that fact would not make me anywhere near as talented an artist as Mondrian, just as the fact that I might be able to build a light bulb in my basement would not make me as talented an inventor as Edison.

It seems very possible that AIs might eventually be able to produce great art. But we are not there yet. If and when they do produce great art, I believe they will have consciousness and be deserving of "human rights".

Eric R. Ward's avatar

It would be cool to try the next level up—tee up a situation without providing such a specific prompt—and ask Claude to perform a “Hanania-like analysis” of and write a short resulting essay.

Reader's avatar

I guessed both right but had no confidence I was right haha

Shockwell's avatar

I think humans will naturally get better at detecting AI writing as the tells become more widely understood. I went in with only a vague idea of what to look for ("it's not X, it's Y", repetitive structure, etc). But afterward I did a little research and I think I would now be able to identify the AI writing more quickly and with higher confidence. Even the best of the AI articles here (Abundance Text 1) is IMO pretty obvious if you actually know what specific elements to look for. I find the most consistent giveaway the repetitive paragraph construction that reeks of English 1. Current AI models are good at following rules, but human writers quickly develop a personal style that uses generic guidelines as foundation rather than dogma. AI isn't so good at that yet.

Unless and until we hit AGI I'm not sure the tells will change all that much, though they might become more subtle. It's probably going to be pretty hard to make that leap from competent to professional since it involves breaking rules rather than following them, and I don't know if it's going to be a priority moving forward. AI prose is already plenty good enough for most people's needs and I'm not sure there's enough demand for a Pynchon-bot or whatever.

Chastity's avatar

> It’ll be interesting in the coming years to see what happens in the arms race between AI writing ability and AI plagiarism detection software. My guess is that AI writing ability has to win. What exactly rules out the possibility that it can write just as well as I can? There should come a point where it is indistinguishable.

I did listen to the Odd Lots episode where they talk to a Pangram person, it's not about the quality of the writing.

Rather, you could imagine you want to say something like, "I think Russia will invade Ukraine, with a high degree of confidence, because countries don't mass troops on other countries' borders for fun." There are lots of words you could use to express this thought, lots of ordering of the sentence. You could break it into two sentences, or maybe three. You could use words like "generally"; you could get poetic. Even that phrasing is particular to me: "for fun" could be replaced by "without the intent to use it". And so on, and so forth. Almost every word is picked out of a possibility space of thousands. Each AI tends to pick a particular path through putting together its sentences, and it must do that for every single sentence. Each sentence might individually be confused for human writing, but over time, the odds that a human "just so happens" to be writing like Gemini or Claude or ChatGPT get to be too high. Pangram was also trained in part by creating synthetic data - find a human five-star review of Denny's that's 78 words long, then ask the AI to write a five-star review of Denny's that's 78 words long in the style of the first one, then use those to train Pangram to recognize the A/B differences.

This also suggests it will be hard to get an AI model that can get around it. The fact that the LLM *isn't* just pulling words at random out of a hat is why its quality is high relative to a Markov chain. Based on how Pangram works, it seems like only a private model (e.g. a personal fine-tuned model with a large corpus) would be able to get around it. Or, I suppose, the big model creators could decide to introduce features entirely intended to get around Pangram and the like (for example, randomly using lower-probability tokens), but literally the only reason to do that is to help people cheat on tests.

luke's avatar

"Another actually used unhinged to disqualify Claude, but then picked ChatGPT"

Fuck this was me, I specifically wrote about this in my answer but still got it wrong anyway

Evan's avatar

I'm shocked scores were so low, I thought the test was very easy.

JP's avatar
11hEdited

I was one of the ones who pointed out Pangram (I got the Platner one correct before checking with Pangram after submitting, and submitted a second version where I mentioned Pangram). One dynamic I wonder about is how people’s individual styles might change as a result of ubiquitous AI writing. My guess: many will converge, while some will intentionally diverge.

That is, most will not think about it, and unconsciously begin to incorporate various quirks and stylistic choices into their own writing in the same way we often model our friends’ quirks. This may already be starting to happen at a conceptual level, as has been pointed out: People around the globe may be beginning to think more similarly to Americans because of chatbots, which are overwhelmingly trained on American data.

Otoh, some will intentionally adopt a very different style to stand apart, like with Picasso and the cubists in response to photographs. Obviously, this is complicated by LLM’s ability to adopt any specified style, but that simply opens up the door to increasingly creative solutions.

In any case, it’ll be an interesting dynamic to follow.

David Spies's avatar

Ooh! Ooh! Do I get to say "I told you so"?

In this comment I said "By this time next year" and that was only seven months ago:

https://substack.com/@dspyz/note/c-165629769?r=r8oz

stealthbomber10's avatar

> In order to get attention in the future, you’ll need proper credentials or a proven track record for people to listen to you. This serves to exclude some aspiring writers, but they’ll have plenty of other tools to make their voice heard

Other tools like what?

Daniel Greco's avatar

Good point. Maybe I should put it like this. I wouldn't be surprised if you could fine tune a context so that even the stuff you pointed to here--the stuff about unions, or sexual assault, or "unhinged"--wouldn't be as telling. But that would involve more back and forth. Eg, imagine you uploaded this post to a new instance of Claude, emphasizing the tells you mentioned here, and then asked it for a new essay with new prompt. You don't think it might do even better than an instance with less context?

Richard Hanania's avatar

I do! I think I say that in the article: "There was only an 8-point difference between the human version and Claude version. Note that I didn’t ask for Claude to give me five different versions and pick the best one, or edit the resulting piece to remove tells. I simply gave the instructions, and went with the first thing that popped out." That's why it's over for our ability to distinguish human and AI work.