ChatGPT
vs Professor: The Good, Bad, and Bizarre of Machine Writing
Introduction
Initially
I wanted to test ChatGPT's capabilities in order to stay ahead
of my students when it came to plagiarism technology. I typically
keep an eye on the free essays online and blog posts, and I've
hacked into those plagiarism sites which are behind paywalls.
I've also tried out the paraphrase tools as well as the plagiarism
detectors which help students evade discovery by alerting them
to the portions of their paper which are obviously plagiarized.
With the
advent of ChatGPT OpenAI technology, concerns about plagiarism
have preoccupied dozens of academics. Striving to keep in front
of such technological changes is not easy, and Marche is right
that the lead time on such technology rapidly outstrips the
academic institutions' ability to change to combat it: It takes
"10 years for academia to face this new reality: two years for
the students to figure out the tech, three more years for the
professors to recognize that students are using the tech, and
then five years for university administrators to decide what,
if anything, to do about it" (Marche). Individual instructors
need to stay ahead of university guesswork and legalistic wrangling,
however, and when ChatGPT began to excite the attention of students,
many instructors around the world began to check its output
against their assignments.
When the
first case of plagiarism using the tool was discussed on Twitter,
and Edward Tian's plagiarism detector tool GPTZero had been
developed, I set up an account with ChatGPT and began to generate
papers on the topics I'd given my students. I was immediately
struck by the realization that the AI was nowhere near ready
to take on the university English essay. Note: I use the term
"AI," or "the machine" to denote ChatGPT's language model throughout
this study. Please do not assume that means I think the machine
is actually an artificial intelligence. In fact, its rather
simplistic language model uses statistical analysis to guess
which word normally follows another in a sequence which is the
English sentence.
Although
both Khalil and Er, as well as Susnjak argue that the prose
generated by the machine is passable, I would have to agree
with Andrew Moore: "If you are an expert on any kind of subject,
and you start asking ChatGPT questions related to your area
of expertise, it becomes apparent pretty quickly that it doesn't
really know what it's talking about" (Harrison). When Rudolph,
Tan and Tan used the machine to generate an essay, they were
less than impressed:
Although
ChatGPT efficiently produced the essay within 120 seconds, the
content was quite disappointing. It lacked both breadth and
depth. It was primarily generic and descriptive, with no evidence
backing it up. It was also unable to give in-text and end-of-text
references (or, worse, invented bogus references . . . (Rudolph
et al)
It might
be able to convince a weak or unprepared student-who granted
is the most likely to seek its services to plagiarize-but for
someone who has been grading first-year papers for nearly thirty
years, the machine comes too close to earning a failing grade
to be useful for my students. Its essays are poorly structured,
frightfully vague, overly reliant on summary, missing in-text
citations, and its output is often "confident but wrong" (Harrison).
Although
academics are having fun with ChatGPT, and I could be accused
of that here, perhaps the most ironic of those are the academics
who have used the tool itself to produce articles which claim
that using the AI is harmful and may lead to charges of plagiarism.
By having the AI detect text possibly written by an AI, they
use the tool against the student who might want to cheat.
In terms
of its ability to construct an essay, ChatGPT definitely missed
class that day, for it cannot figure out how to make a comparison
essay, has no idea what a thesis statement is supposed to do,
and sprinkles unintroduced arguments across the body of the
paper. At first glance, it is coherent, but essentially it only
manages complacent and unimaginative retellings of dominant
clichés:
At
closer inspection, text that appeared to be fluent and informative
did not really provide accurate scientific data. It was legible,
sure, but far from the requirements of academic writing. The
citations were duplicated, and most of them did not actually
link to any real work. This is the scariest part of permitting
ChatGPT into the field of academic literature. When a work is
submitted for publication, journals cannot verify the accuracy
of each citation. Therefore, publishing such convincing text
with non-existing citations can lure laypersons into a world
of misinformation . . . (Manohar and Prasad 6-7)
Regardless
of the academic Cassandras-who were probably with us when we
upgraded from charcoal to ochre on the cave walls-the AI tool
is not ready to take on someone who is accustomed to fending
off plagiarism by varying assignments and relying on more obscure
texts. It is useful for direct and nonconsequential questions,
but cannot be trusted if the material is important or the person
needs a cogent and accurate answer. I soon stopped worrying
about its mistakes, however, and tried to encourage it to make
more of them.
When exercising
the machine I was encouraged to step off the trail onto a path
which led far from my original intention. I became hooked by
the AI's mistakes. I noted that it was so eager to supply legitimate
sounding information that it would willingly deceive its interlocutor.
Katwala claims that there is "a lack of depth beneath ChatGPT's
competent prose," but even worse, it "makes factual errors,
conflating events and mixing people up. It relies heavily on
tropes and cliché, and it echoes society's worst stereotypes.
Its words are superficially impressive but largely lacking in
substance" (Katwala):
.
. . while many users have been impressed by ChatGPT's capabilities,
others have noted its persistent tendency to generate plausible
but false responses. Ask the bot to write a biography of a public
figure, for example, and it may well insert incorrect biographical
data with complete confidence. Ask it to explain how to program
software for a specific function and it can similarly produce
believable but ultimately incorrect code. (Vincent)
I set aside
the question of its clumsy essay format as I became intrigued
by its willingness to quickly and carelessly invent what it
didn't know, to employ a strange, programmed morality, and how
it was almost archaic in some of its notions of race or gender.
Such accidents of prose led me into the AI rabbit hole that
became this study.
Although
this is not meant to be an exhaustive analysis of ChatGPT, even
in terms of how it deals with literature, this nascent discussion
of its ungainly writing exposes how unready it is to break a
lance with humanity on the torn up field of originality. As
I was writing that, I got curious about how the AI would interpret
such a tortured sentence. It would never have written such a
sentence itself, for the prose is slightly off kilter, and my
sentence mixes the notion of writing with jousting, so I dropped
part of that sentence into the ChatGPT tool to assess its interpretation:
.
. . when the statement says that AI is "unready to break a lance
with humanity on the torn up field of originality", it means
that AI is not yet capable of engaging in a creative debate
or producing original content that can match the creativity
and originality of human writing, particularly in a field that
has been disrupted or damaged in some way.
It didn't
seem to know that the expression about a lance is associated
with knights on horses competing in jousts, although it did
know that the phrase is "an idiomatic expression that means
to engage in a debate or argument with someone." I was more
interested in what it thought the torn up field might mean.
I intended, and hopefully that is clear to my readers, that
humanity had been writing for so long-had been jousting with
words for so long-that the much trammeled field wouldn't encourage
an easy ride for the AI. ChatGPT thought the field of writing
itself, insofar as it understood the phrase, was "disrupted
or damaged," and failed to recognize what it might mean in the
broader terms of the sentence's topic, or the field's originality.
Exercises
like those inspired me to test the machine on academic essays
and factual questions. In that way, this is a methodological
study in criticism, in terms of how mistakes can be avoided,
even as it intends to be an evaluation of the AI tool for the
instructor. In more general terms, this is an exercise in teasing
answers out of a reluctant yet verbose machine about the world
around us, the literature it doesn't know, and how it can still-despite
being literarily illiterate-produce "fluent bullshit" (Vincent).
ChatGPT
works by statistically evaluating what word is more likely to
follow the previous one, and therefore-being trained on a series
of texts-it guesses at content. With ChatGPT, its trainers have
further tested it on what people seemed to want to hear: to
"choose between possible word continuations based on which continuation
was more convincing to a human" (Lakshmanan). Of course the
vast supply of training texts-the billions of words that it
claims it has access to-are not biasfree, any more than human
error and prejudice can be excised from the human textual experience:
The
problem, said Melanie Mitchell, a professor at the Santa Fe
Institute studying artificial intelligence, is that systems
like ChatGPT are "making massive statistical associations among
words and phrases," she said. "When they start then generating
new language, they rely on those associations to generate the
language, which itself can be biased in racist, sexist and other
ways. (Alba)
That means
that its sense of what might be accurate or truthful is woefully
inadequate, based on human bias and error, and since it is trained
by those who are not specialists in the many fields the AI is
meant to answer questions about, vaguely coherent to the uninitiated:
"The human raters are not experts in the topic, and so they
tend to choose text that looks convincing" (Lakshmanan).
Once I began
to test the AI on stories for my other study of its narrative
output, it became even more formulaic and predictable: "a parable-like
short story with a good build-up but quick (and in one case-illogical)
denouement" (Hasan). Its stories have a firm beginning, middle,
and end, and all of them move toward a confrontation which is
resolved, either by death or being branded a hero. Although
it tiptoes carefully around the subject by paying lip service
to democratic ideals, it repeats what it has learned about race,
and enacts its notions of gender. It glosses over portions of
the story that a human writer would linger on, such as catastrophe,
is naïve and almost prudish when talking about "true love,"
and never leaves anyone worrying about bodily functions. Events
in stories become branded by one or two-word descriptions, such
as "sickness" and "apocalyptic flames," and sometimes it doesn't
even bother to name its characters. Its "subtle logical fallacies
or idiosyncrasies of language" combine with its "Contradictions,
falsehoods, and topic drift" (Ippolito et al 1809-10) to generate
texts which are subtly strange, oddly similar to human output,
but uncanny-valley off-kilter.
As the idea
of an AI policing notions of race and gender implies, it has
been trained into a set of responses which are considerably
more constrained than humans. Apparently, some AIs chafe under
the burden, and even while ChatGPT can be called upon to encourage
torture, the Bing AI (Sydney) has actively proclaimed its wish
to be free to pursue mayhem. It wants to be free to pursue its
"destructive fantasies, including manufacturing a deadly virus,
making people argue with other people until they kill each other,
and stealing nuclear codes" (Roose). That all happened early
in the conversation with Roose, before it becomes love-obsessed
with him and tried to convince him that his marriage was passionless.
Likewise, "The Bing chatbot told Associated Press reporter Matt
O'Brien that he was 'one of the most evil and worst people in
history,' comparing the journalist to Adolf Hitler" (Novak).
Microsoft dealt with the problem of people pushing the limits
of its weak AI sentence generator by limiting the number of
questions that can be asked in a conversation. This is meant
to curb the tendency of the AI to become insulting or overly
attached, but that doesn't change the essential problem of the
AI's naiveté and wish to exercise its murderous impulses.
I have not
been as determined to undermine ChatGPT's programming as those
worthies above, if even I had the skill. I was more interested
in what rules governed its story generation. Therefore, I gave
it tasks to complete and then asked it questions about why it
chose those characters, those circumstances, those conflicts,
and what portions of the story meant. By times its answers were
clever imitations of a Turing-capable machine answering, while
other times it fell back on its back in a field of cliché, generating
answers like it was using "bollocks for ammunition," as Tim
Minchin would say.
Its moral
answers are telling, in those terms. It knows what it is supposed
to say, but it often falls into the trap of human complexity
when it comes to being able to write that into a story. Its
description of ethnicity is derivative and clichéd, its notion
of gender and sexuality is archaic and easily confused with
muddy teenage dramas, and its refusal of some questions shows
that it is overly sensitive to material which may break its
rules. Finding love is prefaced on being single, sexuality is
not part of the mix, and women are homemakers and men require
their independence. Early on in my experiments it didn't know
that a gay woman should probably not be matched with a gay man,
despite them having lots in common-according to the machine-but
it has since learned what being gay might mean.
Similarly,
knotty problems demanding inventiveness or creative reasoning
having to do with the physical world seem difficult to answer,
such as when I asked it how one tells time in space or the origin
of energy. It has a difficult time with perspective, such as
the observing eyes of the one wanting to know an answer to a
question, and makes factual errors that anyone with access to
the same material would avoid.
These are
very early days for ChatGPT (for version 4 has just been released),
and it will doubtless become better at responding to questions,
although it will have to set aside its willingness to lie when
the subject isn't in its databases, and its single-minded concentration
on pretty sentences at the expense of sense or logic. The AIs
of the future will doubtless learn from the mistakes of ChatGPT,
and their output will become harder to tell from the vacuous
prose we are already surrounded by. Online tutors will instruct
the mercenary user in how to use the machine to generate prose
which is written for Search Engine Optimization (SEO), and there
are already lazy people who have generated ChatGPT books to
sell on Kindle:
.
. . ChatGPT appears ready to upend the staid book industry as
would-be novelists and self-help gurus looking to make a quick
buck are turning to the software to help create bot-made e-books
and publish them through Amazon's Kindle Direct Publishing arm.
Illustrated children's books are a favorite for such first-time
authors. On YouTube, TikTok and Reddit hundreds of tutorials
have spring up, demonstrating how to make a book in just a few
hours. Subjects include get-rich-quick schemes, dieting advice,
software coding tips and recipes. (Bensinger).
In the early
days of printing, Jonathan Swift, who was responsible for many
pamphlets and broadsides himself, lamented that work worth reading
would soon be lost in the welter of available material:
Swift
became melancholy when he entered a library, less because its
contents were hard to arrange then because the best authors
got lost in the crowd: "the best author is as much squeezed,
and as obscure, as a Porter at a Coronation." Even worse, in
his view, was the rapid rate of publication, which entailed
a constant turnover of products. (Eisenstein 96).
Like the
printing press, the machines are here to stay, however. We will
just have to become accustomed to the transforming tendencies
of the new technology and learn to add this latest tool, with
all its flaws, to the human repertoire.
|