Just under two months ago, the US artificial intelligence company OpenAI introduced a program called ChatGPT. Essentially an advanced chatbot, it has been the subject of much debate.
Some commentators have described its answers as very impressive, while others have drawn attention to factual errors in its output. Nevertheless, the product has been hailed as a potentially disruptive innovation for many different industries.
A significant amount of what has been written about ChatGPT so far has focused on its implications for education. The program’s much-vaunted capacity to provide detailed answers to queries on a vast range of topics has raised concerns it could have harmful effects on learning and enable students to “cheat” on exams and homework.
ChatGPT has already “taken” a number of tests, including the US Bar exam, actuarial and medical examinations. In all cases it performed at or near a pass level “out of the box”. While some work on ChatGPT looks at downsides and concerns, our more optimistic perspective is that ChatGPT could well take the form of a low-cost, or even free, electronic research assistant.
Our study published in Finance Research Letters aimed to see whether ChatGPT could be used to write a finance paper that would be accepted for an academic journal. The program passed the test, but performed better in some areas than in others. Furthermore, adding in our own expertise helped overcome the program’s limitations in the eyes of journal reviewers. The findings suggest that ChatGPT could be an important aide for research and not necessarily a threat.
From good to great
Our thinking was: if it’s easy to get good outcomes from ChatGPT by simply using it, maybe there’s something extra we can do to turn these good results into great ones.
We first asked ChatGPT to generate the standard four parts of a research study: research idea, literature review (an evaluation of previous academic research on the same topic), dataset, and suggestions for testing and examination. We specified only the broad subject and that the output should be capable of being published in “a good finance journal”.
This was version one of how we chose to use ChatGPT. For version two, we pasted into the ChatGPT window just under 200 abstracts (summaries) of relevant, existing research studies.
We then asked that the program take these into account when creating the four research stages. Finally, for version three, we added “domain expertise” — input from academic researchers. We read the answers produced by the computer program and made suggestions for improvements. In doing so, we integrated our expertise with that of ChatGPT.
We then requested a panel of 32 reviewers each review one version of how ChatGPT can be used to generate an academic study. Reviewers were asked to rate whether the output was sufficiently comprehensive, correct, and whether it made a contribution sufficiently novel for it to be published in a “good” academic finance journal.
The big take-home lesson was that all these studies were generally considered acceptable by the expert reviewers. This is rather astounding: a chatbot was deemed capable of generating quality academic research ideas. This raises fundamental questions around the meaning of creativity and ownership of creative ideas — questions to which nobody yet has solid answers.
Strengths and weaknesses
The results also highlight some potential strengths and weaknesses of ChatGPT. We found that different research sections were rated differently. The research idea and the dataset tended to be rated highly. There was a lower, but still acceptable, rating for the literature reviews and testing suggestions.
Our suspicion here is that ChatGPT is particularly strong at taking a set of external texts and connecting them (the essence of a research idea), or taking easily identifiable sections from one document and adjusting them (an example is the data summary — an easily identifiable “text chunk” in most research studies).
A relative weakness of the platform became apparent when the task was more complex – when there are too many stages to the conceptual process. Literature reviews and testing tend to fall into this category. ChatGPT tended to be good at some of these steps but not all of them. This seems to have been picked up by the reviewers.
We were, however, able to overcome these limitations in our most advanced version (version three), where we worked with ChatGPT to come up with acceptable outcomes. All sections of the advanced research study were then rated highly by reviewers, which suggests the role of academic researchers is not dead yet.
ChatGPT is a tool. In our study, we showed that, with some care, it can be used to generate an acceptable finance research study. Even without care, it generates plausible work.
This has some clear ethical implications. Research integrity is already a pressing problem in academia and websites such as RetractionWatch convey a steady stream of fake, plagiarised, and just plain wrong, research studies. Might ChatGPT make this problem even worse?
It might, is the short answer. But there’s no putting the genie back in the bottle. The technology will also only get better (and quickly). How exactly we might acknowledge and police the role of ChatGPT in research is a bigger question for another day. But our findings are also useful in this regard – by finding that the ChatGPT study version with researcher expertise is superior, we show the input of human researchers is still vital in acceptable research.
For now, we suggest that researchers see ChatGPT as an aide, not a threat. It may particularly be an aide for groups of researchers who tend to lack the financial resources for traditional (human) research assistance – emerging economy researchers, graduate students, and early career researchers.
It’s probably the most optimistic possible conclusion, but it’s just possible that ChatGPT (and similar programs) could help democratise the research process.