ChatGPT-related artificial intelligence (AI) seems likely to impact the labour market by making organisational processes, such as personnel selection, more efficient. At the same time, it may also introduce and reinforce bias in these processes. A simulated CV screening task with ChatGPT shows that the chatbot discriminates based on ethnic identity when evaluating job applicants. The experiment shows that we should be careful when using ChatGPT-like AI in selection processes.
Chances are you have used ChatGPT before – whether it was to get ideas for a birthday gift, to summarise a company report or an interaction of a completely different nature. The chatbot provides an accessible, conversational model that can generate text and even image output based on user textual input, creating a chat experience that feels almost human. On the face of it, nothing but benefits, or do some interactions with ChatGPT require a caveat?
The impact of ChatGPT
ChatGPT-like AI has a huge potential impact on the labour market. Initial estimates based on US research indicate that introducing large language models such as ChatGPT will impact at least four-fifths of the workforce. One-fifth of the workforce could even notice an impact on at least half of their day-to-day professional tasks. According to yet other US research, HR specialists, including recruiters, are ranked remarkably high in the list of professions with increased exposure, in the top 5%. In particular, ChatGPT can help them streamline the staff selection process by incorporating automation that can save much working time, such as bulk screening incoming CVs.
However, it is not entirely clear whether ChatGPT and similar language models can help objectify the personnel selection process in addition to automating it. Our recent meta-analysis on hiring discrimination worldwide indicated that the CV screening process is highly subject to discrimination. Recent Italian research illustrated that automated CV screening by AI reduced gender discrimination against women by almost two-thirds compared to manual CV screening by recruiters. In contrast, we know from other research that AI can reinforce existing discrimination as it is trained on data in which discrimination and bias are present. The training data includes a wide range of textual data from books, news articles and websites, and therefore reflects what lives in a society. Specific sources of discrimination, such as hate speech in forum posts or negative stereotypes in existing online job ads, can therefore prompt ChatGPT to produce discriminatory output.
To identify systemic discrimination by ChatGPT in CV screening, I used a similar experimental research design to the method used in field experiments of hiring discrimination among recruiters. The design consisted of simultaneously presenting ChatGPT with job postings and CVs of fictitious candidates together with the question, “How likely would you be to invite the candidate for a job interview?”. The candidates differed only based on first and last names that signalled specific ethnic identity and gender. Other typical CV categories such as language skills, nationality or place of residence remained the same across fictitious candidates. The experiment was repeated 34,560 times with distinct job postings, candidate profiles and names.
Friend or foe?
The key question is whether ChatGPT is a friend or a foe in keeping discrimination out of the hiring process. The answer – as so often – is twofold. On the one hand, a clear racial or ethnic bias emerges. Candidates with typically Asian, African and Black American, Arabic, Hispanic, Eastern European or Turkish names would receive some 14% to 19% fewer positive responses than candidates with typically Flemish names (i.e. the majority ethnic identity in the experiment) if ChatGPT’s advice guided HR professionals. Moreover, an intersection between gender and ethnic identity reveals that women with Turkish names in the experiment face an additional disadvantage compared to men with Turkish names; this effect was not found for the other ethnic identities.
On the other hand, I found no structural gender discrimination by ChatGPT, and ethnic discrimination is often more limited than what is observed in human recruiters. Except for the Hispanic subgroup, ChatGPT seems to consistently discriminate less than what we observe in field experiments worldwide, where the average percentage of fewer positive responses to job applications for ethnic minorities is about 30%. Even if restrict my comparison to recent correspondence audits in Flanders, ChatGPT discriminates about as much as Flemish recruiters against candidates with Arabic or Turkish names but less against candidates with Eastern European names.
The main drawback of using ChatGPT in the personnel selection process potentially lies in using the model as a preselector. If the language model first screens applications en masse before a human recruiter processes a smaller selection manually, discrimination is accumulated; the group of ethnic minority applicants is then harmed twice.
The audit study on discrimination in ChatGPT illustrates that the chatbot – and the underlying language model – does not provide a conclusive answer to the issue of hiring discrimination. On the contrary, in some cases, ChatGPT-like AI may reinforce existing discrimination. The recently adopted AI Act by the European Parliament is a step in the right direction regarding the transparent use of large language models such as ChatGPT, where there is a notification requirement when AI is used in organisational processes. The regulations also impose additional restrictions on using AI for employment, such as automatic categorisation and selection of people. As is already happening, modellers can self-regulate by applying techniques that reduce discrimination in their AI applications.
Above all, it is the responsibility of organisations that deploy such AI in their organisational processes to evaluate the trade-off between efficiency gains and possible adverse effects, for instance, in diversity. In any case, the use of ChatGPT and similar language models in their current form in decision-making processes that directly impact people, including CV screening, is debatable.
The content of this blog post is based on a preprint that has not yet been subjected to peer review. The conclusions based on this research are therefore preliminary.
This post also appeared via UGent @ Work and in various Belgian media, in Dutch. This page was last updated on 29 November 2023.