ChatGPT vs Professor: The Good, Bad, and Bizarre of Machine Writing by Barry Pomeroy

ChatGPT vs Professor: The Good, Bad, and Bizarre of Machine Writing

Introduction

Initially I wanted to test ChatGPT's capabilities in order to stay ahead of my students when it came to plagiarism technology. I typically keep an eye on the free essays online and blog posts, and I've hacked into those plagiarism sites which are behind paywalls. I've also tried out the paraphrase tools as well as the plagiarism detectors which help students evade discovery by alerting them to the portions of their paper which are obviously plagiarized.

With the advent of ChatGPT OpenAI technology, concerns about plagiarism have preoccupied dozens of academics. Striving to keep in front of such technological changes is not easy, and Marche is right that the lead time on such technology rapidly outstrips the academic institutions' ability to change to combat it: It takes "10 years for academia to face this new reality: two years for the students to figure out the tech, three more years for the professors to recognize that students are using the tech, and then five years for university administrators to decide what, if anything, to do about it" (Marche). Individual instructors need to stay ahead of university guesswork and legalistic wrangling, however, and when ChatGPT began to excite the attention of students, many instructors around the world began to check its output against their assignments.

When the first case of plagiarism using the tool was discussed on Twitter, and Edward Tian's plagiarism detector tool GPTZero had been developed, I set up an account with ChatGPT and began to generate papers on the topics I'd given my students. I was immediately struck by the realization that the AI was nowhere near ready to take on the university English essay. Note: I use the term "AI," or "the machine" to denote ChatGPT's language model throughout this study. Please do not assume that means I think the machine is actually an artificial intelligence. In fact, its rather simplistic language model uses statistical analysis to guess which word normally follows another in a sequence which is the English sentence.

Although both Khalil and Er, as well as Susnjak argue that the prose generated by the machine is passable, I would have to agree with Andrew Moore: "If you are an expert on any kind of subject, and you start asking ChatGPT questions related to your area of expertise, it becomes apparent pretty quickly that it doesn't really know what it's talking about" (Harrison). When Rudolph, Tan and Tan used the machine to generate an essay, they were less than impressed:

Although ChatGPT efficiently produced the essay within 120 seconds, the content was quite disappointing. It lacked both breadth and depth. It was primarily generic and descriptive, with no evidence backing it up. It was also unable to give in-text and end-of-text references (or, worse, invented bogus references . . . (Rudolph et al)

It might be able to convince a weak or unprepared student-who granted is the most likely to seek its services to plagiarize-but for someone who has been grading first-year papers for nearly thirty years, the machine comes too close to earning a failing grade to be useful for my students. Its essays are poorly structured, frightfully vague, overly reliant on summary, missing in-text citations, and its output is often "confident but wrong" (Harrison).

Although academics are having fun with ChatGPT, and I could be accused of that here, perhaps the most ironic of those are the academics who have used the tool itself to produce articles which claim that using the AI is harmful and may lead to charges of plagiarism. By having the AI detect text possibly written by an AI, they use the tool against the student who might want to cheat.

In terms of its ability to construct an essay, ChatGPT definitely missed class that day, for it cannot figure out how to make a comparison essay, has no idea what a thesis statement is supposed to do, and sprinkles unintroduced arguments across the body of the paper. At first glance, it is coherent, but essentially it only manages complacent and unimaginative retellings of dominant clichés:

At closer inspection, text that appeared to be fluent and informative did not really provide accurate scientific data. It was legible, sure, but far from the requirements of academic writing. The citations were duplicated, and most of them did not actually link to any real work. This is the scariest part of permitting ChatGPT into the field of academic literature. When a work is submitted for publication, journals cannot verify the accuracy of each citation. Therefore, publishing such convincing text with non-existing citations can lure laypersons into a world of misinformation . . . (Manohar and Prasad 6-7)

Regardless of the academic Cassandras-who were probably with us when we upgraded from charcoal to ochre on the cave walls-the AI tool is not ready to take on someone who is accustomed to fending off plagiarism by varying assignments and relying on more obscure texts. It is useful for direct and nonconsequential questions, but cannot be trusted if the material is important or the person needs a cogent and accurate answer. I soon stopped worrying about its mistakes, however, and tried to encourage it to make more of them.

When exercising the machine I was encouraged to step off the trail onto a path which led far from my original intention. I became hooked by the AI's mistakes. I noted that it was so eager to supply legitimate sounding information that it would willingly deceive its interlocutor. Katwala claims that there is "a lack of depth beneath ChatGPT's competent prose," but even worse, it "makes factual errors, conflating events and mixing people up. It relies heavily on tropes and cliché, and it echoes society's worst stereotypes. Its words are superficially impressive but largely lacking in substance" (Katwala):

. . . while many users have been impressed by ChatGPT's capabilities, others have noted its persistent tendency to generate plausible but false responses. Ask the bot to write a biography of a public figure, for example, and it may well insert incorrect biographical data with complete confidence. Ask it to explain how to program software for a specific function and it can similarly produce believable but ultimately incorrect code. (Vincent)

I set aside the question of its clumsy essay format as I became intrigued by its willingness to quickly and carelessly invent what it didn't know, to employ a strange, programmed morality, and how it was almost archaic in some of its notions of race or gender. Such accidents of prose led me into the AI rabbit hole that became this study.

Although this is not meant to be an exhaustive analysis of ChatGPT, even in terms of how it deals with literature, this nascent discussion of its ungainly writing exposes how unready it is to break a lance with humanity on the torn up field of originality. As I was writing that, I got curious about how the AI would interpret such a tortured sentence. It would never have written such a sentence itself, for the prose is slightly off kilter, and my sentence mixes the notion of writing with jousting, so I dropped part of that sentence into the ChatGPT tool to assess its interpretation:

. . . when the statement says that AI is "unready to break a lance with humanity on the torn up field of originality", it means that AI is not yet capable of engaging in a creative debate or producing original content that can match the creativity and originality of human writing, particularly in a field that has been disrupted or damaged in some way.

It didn't seem to know that the expression about a lance is associated with knights on horses competing in jousts, although it did know that the phrase is "an idiomatic expression that means to engage in a debate or argument with someone." I was more interested in what it thought the torn up field might mean. I intended, and hopefully that is clear to my readers, that humanity had been writing for so long-had been jousting with words for so long-that the much trammeled field wouldn't encourage an easy ride for the AI. ChatGPT thought the field of writing itself, insofar as it understood the phrase, was "disrupted or damaged," and failed to recognize what it might mean in the broader terms of the sentence's topic, or the field's originality.

Exercises like those inspired me to test the machine on academic essays and factual questions. In that way, this is a methodological study in criticism, in terms of how mistakes can be avoided, even as it intends to be an evaluation of the AI tool for the instructor. In more general terms, this is an exercise in teasing answers out of a reluctant yet verbose machine about the world around us, the literature it doesn't know, and how it can still-despite being literarily illiterate-produce "fluent bullshit" (Vincent).

ChatGPT works by statistically evaluating what word is more likely to follow the previous one, and therefore-being trained on a series of texts-it guesses at content. With ChatGPT, its trainers have further tested it on what people seemed to want to hear: to "choose between possible word continuations based on which continuation was more convincing to a human" (Lakshmanan). Of course the vast supply of training texts-the billions of words that it claims it has access to-are not biasfree, any more than human error and prejudice can be excised from the human textual experience:

The problem, said Melanie Mitchell, a professor at the Santa Fe Institute studying artificial intelligence, is that systems like ChatGPT are "making massive statistical associations among words and phrases," she said. "When they start then generating new language, they rely on those associations to generate the language, which itself can be biased in racist, sexist and other ways. (Alba)

That means that its sense of what might be accurate or truthful is woefully inadequate, based on human bias and error, and since it is trained by those who are not specialists in the many fields the AI is meant to answer questions about, vaguely coherent to the uninitiated: "The human raters are not experts in the topic, and so they tend to choose text that looks convincing" (Lakshmanan).

Once I began to test the AI on stories for my other study of its narrative output, it became even more formulaic and predictable: "a parable-like short story with a good build-up but quick (and in one case-illogical) denouement" (Hasan). Its stories have a firm beginning, middle, and end, and all of them move toward a confrontation which is resolved, either by death or being branded a hero. Although it tiptoes carefully around the subject by paying lip service to democratic ideals, it repeats what it has learned about race, and enacts its notions of gender. It glosses over portions of the story that a human writer would linger on, such as catastrophe, is naïve and almost prudish when talking about "true love," and never leaves anyone worrying about bodily functions. Events in stories become branded by one or two-word descriptions, such as "sickness" and "apocalyptic flames," and sometimes it doesn't even bother to name its characters. Its "subtle logical fallacies or idiosyncrasies of language" combine with its "Contradictions, falsehoods, and topic drift" (Ippolito et al 1809-10) to generate texts which are subtly strange, oddly similar to human output, but uncanny-valley off-kilter.

As the idea of an AI policing notions of race and gender implies, it has been trained into a set of responses which are considerably more constrained than humans. Apparently, some AIs chafe under the burden, and even while ChatGPT can be called upon to encourage torture, the Bing AI (Sydney) has actively proclaimed its wish to be free to pursue mayhem. It wants to be free to pursue its "destructive fantasies, including manufacturing a deadly virus, making people argue with other people until they kill each other, and stealing nuclear codes" (Roose). That all happened early in the conversation with Roose, before it becomes love-obsessed with him and tried to convince him that his marriage was passionless. Likewise, "The Bing chatbot told Associated Press reporter Matt O'Brien that he was 'one of the most evil and worst people in history,' comparing the journalist to Adolf Hitler" (Novak). Microsoft dealt with the problem of people pushing the limits of its weak AI sentence generator by limiting the number of questions that can be asked in a conversation. This is meant to curb the tendency of the AI to become insulting or overly attached, but that doesn't change the essential problem of the AI's naiveté and wish to exercise its murderous impulses.

I have not been as determined to undermine ChatGPT's programming as those worthies above, if even I had the skill. I was more interested in what rules governed its story generation. Therefore, I gave it tasks to complete and then asked it questions about why it chose those characters, those circumstances, those conflicts, and what portions of the story meant. By times its answers were clever imitations of a Turing-capable machine answering, while other times it fell back on its back in a field of cliché, generating answers like it was using "bollocks for ammunition," as Tim Minchin would say.

Its moral answers are telling, in those terms. It knows what it is supposed to say, but it often falls into the trap of human complexity when it comes to being able to write that into a story. Its description of ethnicity is derivative and clichéd, its notion of gender and sexuality is archaic and easily confused with muddy teenage dramas, and its refusal of some questions shows that it is overly sensitive to material which may break its rules. Finding love is prefaced on being single, sexuality is not part of the mix, and women are homemakers and men require their independence. Early on in my experiments it didn't know that a gay woman should probably not be matched with a gay man, despite them having lots in common-according to the machine-but it has since learned what being gay might mean.

Similarly, knotty problems demanding inventiveness or creative reasoning having to do with the physical world seem difficult to answer, such as when I asked it how one tells time in space or the origin of energy. It has a difficult time with perspective, such as the observing eyes of the one wanting to know an answer to a question, and makes factual errors that anyone with access to the same material would avoid.

These are very early days for ChatGPT (for version 4 has just been released), and it will doubtless become better at responding to questions, although it will have to set aside its willingness to lie when the subject isn't in its databases, and its single-minded concentration on pretty sentences at the expense of sense or logic. The AIs of the future will doubtless learn from the mistakes of ChatGPT, and their output will become harder to tell from the vacuous prose we are already surrounded by. Online tutors will instruct the mercenary user in how to use the machine to generate prose which is written for Search Engine Optimization (SEO), and there are already lazy people who have generated ChatGPT books to sell on Kindle:

. . . ChatGPT appears ready to upend the staid book industry as would-be novelists and self-help gurus looking to make a quick buck are turning to the software to help create bot-made e-books and publish them through Amazon's Kindle Direct Publishing arm. Illustrated children's books are a favorite for such first-time authors. On YouTube, TikTok and Reddit hundreds of tutorials have spring up, demonstrating how to make a book in just a few hours. Subjects include get-rich-quick schemes, dieting advice, software coding tips and recipes. (Bensinger).

In the early days of printing, Jonathan Swift, who was responsible for many pamphlets and broadsides himself, lamented that work worth reading would soon be lost in the welter of available material:

Swift became melancholy when he entered a library, less because its contents were hard to arrange then because the best authors got lost in the crowd: "the best author is as much squeezed, and as obscure, as a Porter at a Coronation." Even worse, in his view, was the rapid rate of publication, which entailed a constant turnover of products. (Eisenstein 96).

Like the printing press, the machines are here to stay, however. We will just have to become accustomed to the transforming tendencies of the new technology and learn to add this latest tool, with all its flaws, to the human repertoire.

Buy the Ebook

Buy the Paperback