Can artificial intelligence be creative? ChatGPT outperformed humans on a classic test
Artificial intelligence can mimic one aspect of creativity, but creativity is complex.
Over the last two years, artificial intelligence has fascinated people with its abilities to create images and music, write code, compose poems, and answer exam papers.
Artificial intelligence (AI) has stepped into the creative domain, an area previously reserved for humans.
“There’s currently a lot of discussion about whether AI can be creative,” says Simone Grassini, an associate professor at the University of Bergen’s Department of Psychosocial Science.
Artificial intelligence is not creative in the same way as humans are, he says. Instead, artificial intelligence imitates human behaviour based on statistical patterns.
But if we measure creativity based on results, then AI simulates human creativity, Grassini says.
Grassini and his Finnish colleague, Mika Koivisto, recently investigated this issue in a study published in the journal Scientific Reports.
Did a good job
Grassini and Koivisto measured the performance of chatbots compared to humans using a classic test. The test measures a way of thinking that researchers link to creativity.
Grassini said it was somewhat surprising how well the artificial intelligence performed.
ChatGPT performed at a high level and better than people, on average.
At the same time, the field is developing rapidly.
“We know, just from playing with this, that AI has properties that simulate human creativity, so it was not entirely unexpected that AI has become so good,” he said.
Ways to use a rope
The test the researchers used is called the Alternate Uses Task (AUT) and was developed by Joy Paul Guilford in 1967.
It involves coming up with creative ideas for ways to use objects.
The objects the researchers selected were a rope, a box, a pencil, and a candle.
256 participants between the ages of 19 and 40 were instructed to come up with original and creative uses for these objects. They were given 30 seconds per item.
The participants were told that the goal was for the ideas to strike people as clever, unusual and inventive, rather than for them to list the most common uses.
ChatGPT was given the same instructions.
The researchers tested three AI systems: ChatGPT3, ChatGPT4 and Copy.Ai (which is based on ChatGPT3 technology).
Assigned points
What the researchers were testing for was what is called divergent thinking. This is about coming up with more, new and unusual solutions. It’s a way of thinking that has been linked to creativity.
Researchers distinguish between this way of thinking and convergent thinking, which is about coming up with the best answer based on known solutions.
In the new study, the answers were given a score between 1 and 5 based on how original and creative they were.
Let's say you were to come up with inventive uses for a pen.
If you write that you can use the pen as a parachute, you will get the lowest score, because that makes no sense.
If your answer is ‘to write a letter’, you also get the lowest score, because the distance between the meanings of the words pen, writing and letter is short, Grassini explained.
“But if you say that you can use a pen as an arrow to play darts, the distance between the meanings of the words is longer,” he said.
The researchers had a computer program assess this distance. They also instructed six people to rate how creative the answers were and to assign points to the answers. They were not told that some of the answers had come from AI.
There was a great deal of agreement between the assessments from the computer program and the human scorers.
Humans answered both worst and best
“What we found was that the AI systems did better than humans on average,” Grassini said.
If the researchers ignored the worst answers, the chatbots performed on average roughly 0.5 points better than humans.
Another finding that emerged was that the artificial intelligences consistently avoided giving bad answers. None of the AI answers were given a score of 1. However, several of the answers given by humans were.
In defence of the human race, however, there were also humans who came up with the very best answers.
“Human respondents with a high level of creative behaviour still performed better than the machine,” Grassini said.
Something else that was interesting, Grassini said, was that ChatGPT4 did no better than its earlier version, ChatGPT3, as assessed by the computer program.
But ChatGPT4 still got a better scores from the human assessors.
“That’s interesting, because it seems that with the improvement of the model, it was able to simulate behaviour that is more appealing to humans, even without a change in the objective distance calculated by the algorithm,” he said.
Doesn’t measure creativity
Pianist Ole Fredrik Norbye is an assistant professor in music at NLA University College, where one of his areas of study is creativity.
Norbye has looked at the new study and says what the researchers have tested is exciting, but he has some opinions on how the study should be interpreted.
One thing that he thinks doesn’t come strongly enough across in the article is that the AUT test does not measure creativity, but only divergent thinking.
“Divergent thinking can be something that suggests that one can become creative,” he said. “But it is not as simple that if you score high on the test, that means you are a creative person.”
Creativity is a somewhat complicated term, says Norbye.
“Creativity comes about as an interplay between your individual characteristics and skills and the social environment you are part of,” he said.
Three aspects
Norbye explains that creativity is often composed of three factors.
You need specialist knowledge in the field in which you want to be creative.
“It's not like if I'm creative as a musician, I automatically become a creative engineer. A creative engineer needs specialist knowledge to understand physical laws, to know what has been done before and what is possible to achieve,” he said.
This is a prerequisite for coming up with good new ideas, Norbye said.
Furthermore, a person needs creative skills, which involve being able to challenge assumptions, think new thoughts and see a problem from several angles. Divergent thinking can be included in this category.
The last concerns motivation and social aspects.
“Creativity creates a change. It can create unrest and noise and it can be expensive to come up with something new. If you don't have motivation and incentives, you won't reach your creative potential,” he said.
A person can be creative in one workplace and stagnate in the next because of the social environment.
Chatbots had a better starting point
Norbye is not surprised that the chatbots performed better than humans on the test on average, but he was surprised that humans had the very best ideas.
The way the test was carried out gives the chatbots a better starting point at the beginning, Norbye said.
Humans were given 30 seconds to think about the question.
“This means that the test also tests how fast you can type in words and come up with words. Here, the chatbot will do extremely well,” he said.
Also, the first ideas that humans come up with are often the most common.
“The participants were asked not to write the most obvious ones. But we humans often have to get the most obvious things out first, before we start to think again,” he said.
The AUT test is widely used in research. It is likely that the chatbots have read research articles about the same test previously, as part of their training material, Norbye said.
That means they may have even read examples of good suggestions on how to use a rope.
New test needed
Simone Grassini says he completely agrees that creativity is more than divergent thinking.
“We say this a a few times in the article, and I also want to emphasise it here,” he said.
To limit the problem of AI’s ability to respond faster than humans, the researcher limited the number of responses the chatbots could give. They set this limit based on the average number of answers that people gave.
“If we hadn't limited the number of answers that AI gave, it would have surpassed humans just in speed,” he said.
As for the fact that the AI may have found answers in its training data, Grassini says they addressed this issue in the academic article. The researchers wrote that it is possible that the chatbots simply retrieve ideas that are in their database. But humans also may not have been involved in the test before.
Grassini and his colleagues recommend that future studies develop a completely new test for which there are no existing answers.
Threat to the creative professions?
Will artificial intelligence become a threat to those working in creative professions?
Grassini says this exact issue is discussed quite a bit.
He points out that the way artificial intelligence performs on an individual task, as in his study, is not directly transferable to how AI would do in a creative job.
But it suggests we should reflect on the fact that technology is on its way to becoming good at things that we thought only humans could achieve, he says.
As for whether AI will take creative jobs, Grassini says he thinks AI would be more likely to help people improve the quality of their work.
“I prefer to believe that humans and AI will work together, rather than that AI will become the next Leonardo da Vinci,” he said.
———
Translated by Nancy Bazilchuk
Read the Norwegian version of this article on forskning.no
Reference:
Koivisto, M. & Grassini,S. Best humans still outperform artificial intelligence in a creative divergent thinking task, Scientific Reports, 2023.