Cited papers

Software engineers’ perceptions of human vs AI code reviews

Len Williams

|May 7, 2025

The challenge

A key stage in any software development project is code review, where colleagues check a programmer’s work and provide constructive feedback. With the emergence of powerful large language models (LLMs), AI systems can now provide code reviews too. But how do software engineers respond to these code reviews provided by AI compared to those by human peers?

Researchers at the University of Southern Denmark and the University of Victoria set out to answer this question. To complete their study, they needed to:

Recruit professional software engineers
Ask them to participate in a code review
Get them to take part in Zoom interviews to discuss their responses to the code review

Analyzing software engineers’ responses to human and AI code review

In many professions, feedback on an individual’s output is a normal part of working life. It’s not always nice to hear but can also be an opportunity to learn and improve.

One of the impressive features of LLMs like ChatGPT is that they can review software code and provide written feedback, much as an engineer’s colleagues or line manager would typically do. Just like traditional code reviews, the LLM can identify coding errors or provide suggestions on how to improve the code.

For this study, the research team wanted to see how individual software engineers would respond to a code review provided by an LLM and compare this with how they responded to feedback written by humans. Would they prefer a human peers’ suggestions or find the LLM’s comments more useful? And how might software engineers respond emotionally to negative feedback when it’s written by a machine, compared to critiques from a human?

The study design was as follows:

20 software engineers were invited to submit an example of a piece of code they’d written
This code would be anonymously peer reviewed by at least two of the other participants (assigned randomly)
The code would also be reviewed by ChatGPT
Each participant was then sent feedback from the human and AI reviews
The researchers conducted interviews with each participant to gather their perceptions of the different kinds of feedback their work had received

Recruiting Domain Experts

The researchers needed to find software engineers to take part in the research. They turned to Prolific for support with recruiting qualified programmers.

Prolific made it easy for the researchers to zero in on the right people, surfacing 353 perfectly matched participants through a smart prescreening process. The researchers then invited these individuals to take part in a further prescreening survey to narrow down the pool. Potential participants needed to confirm that they would be willing to submit a coding sample, provide feedback on other participants’ work, and take part in an interview lasting up to 60 minutes.

The researchers carried out the study and conducted prescreening themselves to confirm the respondents were indeed software engineers. Prolific now offers verified Domain Experts like software engineers within the app. [Find out more].

As well as helping researchers find qualified software engineers, Prolific also made it easy to compensate participants who completed the prescreening surveys and interviews. And, unlike other competitors' websites, Prolific allowed the researchers to invite participants off our platform to complete the survey and take part in the Zoom interviews.

The results

This qualitative study produced fascinating insights into how software engineers respond to and engage with feedback on their work.

In some cases, the feedback from ChatGPT was actually preferred to that of human peers. As anyone who’s used an LLM knows, it is relentlessly upbeat in tone. This meant that even negative feedback was delivered in a positive manner, and was therefore received better by the participants. By contrast, human peers’ reviews were sometimes perceived as “harsh” or “picky”. The style of delivery when receiving feedback can be important. Software programmers are human, and receiving feedback that seems unpleasant can affect self-esteem and motivation.

In other ways, human feedback was still seen as superior. For example, their peers’ comments tended to be less verbose than ChatGPT’s, were more specific and more genuinely constructive. Participants found there was less “noise” with their peers’ feedback and more they could learn from.

The study provides valuable insights into how feedback is given to and received by developers. It could also help to improve how LLMs deliver code reviews too.

Prolific played an important part in the study, helping the team quickly find a pool of verified software engineers, enabling payments and supporting the logistical side of the project.

Interested in learning more about Prolific's participant recruitment capabilities? Access our pool of 200k+ verified participants, including Domain Experts, and find the right expertise for your AI and research projects.

Citation: Alami, A., Ernst, N., (2025), Human and Machine: How Software Engineers Perceive and Engage with AI-Assisted Code Reviews Compared to Their Peers: https://arxiv.org/html/2501.02092v1

Research institutions: Mærsk Mc-Kinney Møller Institute, University of Southern Denmark; Department of Computer Science, University of Victoria