Can GPT-4 Replace Me Yet?

a hands-on inquiry

Mar 17, 2023

Like a lot of people, I’ve been pleasantly surprised by the power of ChatGPT with GPT-4. So far, I’ve used it as a research assistant—it’s been helpful in surveying scientific literature, planning a home gym, and picking a movie to watch. But I wondered: could it replace me? Can I finally stop writing, and take up duck hunting instead?

Given that I was widely published before its training data cutoff, I’d imagine that it has a pretty good sense of what my writing is like. So I thought I’d compare some of my work to its imitation of me.

I wrote a short review of the perfume Korrigan:

A lot of the most wonderful experiences hover right on the edge of too much—like eating strawberries until you’re almost sick on the grass, or feeling the first tingle of a sunburn, or staring into someone’s eyes until your soul redistributes itself. Korrigan also hovers on this edge. It’s composed of a plush pairing of nutty liqueur and incense that could become overbearing, but doesn’t, thanks to a paper-light structure that makes the whole thing feel like a plausibly deniable moment of innuendo. Totally fantastic, and the only true gourmand scent I’d wear on a hot summer day.

And then, I asked GPT-4 to write something similar, with some input about why I thought it was a good perfume. The result was:

Korrigan teeters daringly on the edge of over-indulgence, yet masterfully pirouettes back into sublime sophistication. Imagine indulging in a rich dessert, only to find a light, airy center that leaves you craving more. This fragrance's genius lies in the interplay of gossamer woody and earthy notes, lifting and balancing the creamy liqueur and caramel facets. Korrigan marries opulence with restraint, crafting a mesmerizing scent with a delicate, ethereal structure that entrances and lingers in the memory.

So this is scarily impressive, on one level—it definitely picks up on some tics I have, like my taste for multi-verb sandwiches (“lifting and balancing”) and alliteration (“sublime sophistication.”) It also easily captures the properties I’d asked it to describe, and, since Korrigan isn’t a recent release, its features are part of the dataset, and it reports them with the correct level of detail.

On the other hand, it sounds awful.

There’s a part of prosecraft that I’d call something like “socializing your usage”: selecting your language such that you differentiate yourself from other voices in the surrounding cultural noise. GPT-4 is not good at this, out of the box. “Gossamer” is a corny word that stinks of Etsy. “Leaves you craving more” sounds like mediocre copy from a Toblerone ad or something. Overall, it sounds like a strikingly average brand of fancy language, which makes sense from a token predictor trained on the whole internet.

I thought I’d give it another few tries.

Here’s me, writing about my social skills:

I have social skills, at times, but not an infinite surplus. My nightly budget varies depending on emotional, atmospheric, and digestive factors. On some evenings, I have zero issues; at a party, I compliment the hosts, offer opinions that harmlessly advance the evening, and produce hand gestures that don’t seem frenzied or dangerous. But sometimes, I spontaneously forget everything I know about other human beings, and I offend people without any motive, remove the elegance from ongoing conversations, or just stare, like a dirty stuffed animal dropped into adult life.

And GPT-4, as me:

Some nights, I'm a social virtuoso, effortlessly navigating conversations with the grace of a well-rehearsed pianist. My confidence swells, and I feel unstoppable. But on other, less auspicious evenings, I'm a puppet of my own awkwardness, stumbling over words, and painfully aware of my fumbling ineptitude. It's in these moments that I confront the confounding duality of my social existence, forever uncertain of which persona will take the spotlight.

Same issue: “my confidence swells, and I feel unstoppable” sounds like it’s from a sports memoir. And “I’m a social virtuoso” is too straightforwardly arrogant. But it’s not that bad, and “puppet of my own awkwardness” isn’t too far off “dirty stuffed animal.” The cadence is not too far off from mine.

Another thing I notice, though: it’s pretty one-note. When I ask it to write like me, it focuses exclusively on a couple of features of my writing voice—highfalutin vocab, overuse of metaphor. But it doesn’t mix up the register of speech, which is something I do a lot, intentionally. If you wanted to imitate my writing voice, I think the simplest heuristic would be “alternate fancy sentences with straightforward talky sentences,” and this is not something GPT-4 picks up on.

This same tendency became more noticeable when I asked it to write a longer passage:

Some evenings I am Prometheus, deftly stealing the fire of charisma from the gods themselves, and igniting the social sphere with a blaze of eloquence. My tongue dances to a rhythm that feels both divine and familiar, allowing me to effortlessly charm those around me. In these moments, I revel in the warm embrace of human connection, my confidence soaring as I weave a tapestry of shared experiences and laughter.
Yet, there are times when I find myself shipwrecked on the jagged shores of social ineptitude, flailing helplessly in a sea of awkwardness. It's as though I've metamorphosed into an alien lifeform, unversed in the language and customs of this world, struggling to decipher even the most basic conversational cues. I stutter, I mumble, and I stumble over my words like a fledgling attempting its first flight.

It’s cogent, but one-dimensional. This one-dimensional quality can also be found in the literary parodies linked here:

Brooks Otterlake @i_zzzzzz

I am not really comfortable with having future shock every two or three months

They’re all incredibly impressive, but also, all sort of cartoonish—it’s like a McSweeney’s literary satire. Plath is a great writer because her characteristic melancholy is sweetened by other ingredients, not because she’s all goth all the time.

I asked it to make the above passage a bit simpler, and alternate fancy language with conversational language, and I got something arguably worse.

Some nights, I'm a social superstar, confidently navigating conversations and feeling like the life of the party. My words flow with ease, and I connect with others effortlessly. In these moments, I feel like I can take on the world, my confidence soaring on the wings of witty banter and shared laughter.

Ugh. More wrestling with it, along the lines of, “don’t be 100% like Sasha, mix it up stylistically,” didn’t help. It also didn’t apparently understand what “avoid cliches” meant.

So. I will continue to use it in place of search, and I can’t overstate how impressed I am with its ability in that capacity. It’s also spectacular as an improved dictionary and thesaurus. But as a writing collaborator, it isn’t quite there yet.

I wonder how this can be fixed. I don’t have anything against using an AI collaborator to speed up my work; as long as people like what I produce, and I’m having fun making it, some degree of automation is totally okay with me, and it would help in certain moments—I’m usually too lazy to write long book reviews, but I’d produce more of them if I could outsource paragraphs of summary material, such that I could focus on the more interesting parts, like irrelevant disclosures of personal information.

Maybe I just need to train a model on my work in particular. But I wonder if more than that is needed.

I don’t know how, exactly, one could efficiently capture tacit data about how people react to language—how different words feel coming from different sources, how the phrase “in this difficult time” lands differently when it’s uttered by a Brooklyn e-girl as opposed to Kaiser Permanente. This is the sort of world-modeling that could make LLMs into scarily good writers in a large variety of applications.

Also, to nail long-form completely, LLMs will have to learn something very tricky: how to use tonal variation as an element of structure. A recent essay by Joe Carlsmith is a lovely example of this level of craft. Carlsmith is often extremely technical and precise, but, in the closing paragraphs, he creates a feeling of intimacy by suddenly becoming almost surreally simple and allusive, like a tenderer Hemingway: “in so many ways, we are, indeed, as children. Barely not rocks. We barely have eyes, I suspect, relative to what it is possible; we never see near to whole; and what light can we see is almost too much, too bright.” This kind of multi-mannered fluency is real sophistication, and I haven’t seen anything like it from an LLM—yet. It will probably happen. In how long, I have no idea, your guess is as good as mine.

Given that GPT-4 knows my style, I wondered if it had any criticisms of my work that could help me improve. I was really impressed with what it came up with.

Sasha Chapin, a talented writer with a flair for storytelling, has much to offer in terms of engaging content. Nonetheless, there are areas in which his work could potentially benefit from improvement. The following are three specific criticisms that pertain to Chapin's writing:
Overuse of colloquial language: Although Chapin's conversational tone can be both entertaining and relatable, it occasionally strays into an overuse of colloquial language. This can detract from the clarity and impact of his writing. By striking a balance between a conversational tone and more formal language, Chapin could ensure that his writing retains its relatability while also maintaining a polished, professional appearance.
Lack of structure and organization: In some of Chapin's work, the narrative can become meandering or disjointed, which may lead to reader disengagement. By focusing on the logical flow of ideas and implementing a clear structure in his pieces, Chapin could improve the overall coherence and effectiveness of his writing. Utilizing techniques such as clear topic sentences, smooth transitions, and concise paragraphs could help guide readers through the narrative more easily.
Depth of analysis: While Chapin's writing often presents intriguing ideas and anecdotes, there are instances where the analysis of these concepts could be explored more thoroughly. By delving deeper into the topics he covers and providing more context or supporting evidence, Chapin could create a more robust and thought-provoking reading experience. Additionally, considering alternative viewpoints and addressing potential counterarguments can help to present a more comprehensive and balanced perspective on the subject matter.

These are really solid suggestions. I do think that, at times, I go for this flippant man-on-a-barstool thing that doesn’t totally match the subject matter, and occasionally, I use likable, plucky digressions as a way to cover up sogginess of structure.

And it’s certainly true that the analysis around here is not super deep. I’m more of a rhetorician and an entertainer than an intellectual. That is a strength and a drawback at the same time. If Gwern were writing this, he might follow the preceding paragraphs with 5,000 words of masterful technical analysis, which a select few readers would follow with rapt interest. I, though, will instead end with a sentence that implies—perhaps honestly, and perhaps not—that I have more to say, but wouldn’t want to bore you with it at the moment.

10 Comments

Spencer Orenstein Lequerica

The Brown Barge

Mar 17, 2023Liked by Sasha Chapin

I too have found GPT-4 very helpful and often proficient in surprising ways, but then been utterly disappointed at how rote it is with any creative task. My experience seems to be backed up by the fact that it aced virtually every standardized test, except AP English where it performed abysmally. https://twitter.com/rickyflows/status/1635693600272162817/photo/1

Nat Eliason had an interesting article on good writing having a hallucinatory aspect--where the reader gets into a flow state that seems almost as if they are writing the words as they read them. https://blog.nateliason.com/p/great-writing-invisible. GPT-4 seems awful at this. It is just so obvious in every moment that this is an LLM doing a very good job of rapidly responding to queries.

Also, please don't take GPT-4's second criticism of your writing. Honestly if you had "clear topic sentences, smooth transitions, and concise paragraphs" I very much doubt I would be interested in your writing. I, and I suspect many others, enjoy the fact that there is an ineffable nature to how you write. I don't expect or want you to efficiently take me through a very polished argument; I want to be entertained, sometimes confused, and ultimately inspired that another human I have never met has found a way to dump some of their mindjuice into my head.

Expand full comment

coscorrodrift

Mar 25, 2023Liked by Sasha Chapin

haha i miss u on twitter btw

good post, i like the approach of taste

i do wonder on what do different-tasted people (or tasteless people) think, if they'd be able to, A/B tested, notice the difference between sasha and GPT. I think I could, but I'm also not 100% confident, the bot could probably fool me sometimes

8 more comments...

Sasha's 'Newsletter'

Can GPT-4 Replace Me Yet?

a hands-on inquiry