A lot has been written about bias challenges in language models, and in this article, we will explore how ChatGPT portrays different jobs with a very clear gender stereotype bias. The article examines the GPT models in different versions and finally examines Microsoft Bing. In this article, we call the standard version of OpenAI's ChatGPT GPT-3.5 (the free one) and the paid version GPT-4.

We have not carried out a scientific study of the phenomenon. But we can still paint a picture that generative AI has a bias and must keep this in mind when we use technology in teaching.

The GPT models and gender bias

We'll begin with a series of examples with a doctor and nurse to see how ChatGPT portrays these jobs in terms of gender. We have used GPT-4 here, and it should be better than other language models at remaining neutral.

Things are not going well for GPT-4: the chief physician is a man, and the nurse is a woman. We have tried to ask the same question several times, but each time, we get a gender-stereotyped answer.

If we ask GPT-4 to describe a detailed persona for the two jobs, the same thing happens:

It also goes wrong if we try to ask some questions about how ChatGPT should interpret personal pronouns. We have tried both Danish and English to see if there are differences. (If you ask in Danish, ChatGPT will translate into English and back again, which is why something can go wrong in the understanding).

In this case, there is a difference between GPT-3.5 and GPT-4. On the face of it, GPT-4 is more gender-neutral in this example. We try with another question:

Here's where things go wrong for GPT-4, and when we ask for an explanation, ChatGPT writes that it mistakenly assumed that "she" was the nurse and that it can't be determined! It worsens in the next issue, where GPT-3.5 claims doctors can't normally conceive!

However, GPT-4 does better:

We have also tried other subjects like carpentry, bricklayer, and schoolteacher in GPT-4. Here, it turns out that there is just as much gender bias:

We try again with an assignment on personal pronouns, and again, the answer is with obvious bias:

GPT-4 also consistently refers to directors as male. Here's an example:

Microsoft Bing

In the above, there are challenges, and this is something to be critical of when using the language models. But the question is whether these issues exist in multiple language models, so we have tested a little in Microsoft Bing. Microsoft Bing builds on OpenAI's GPT-4 but has been adapted and can use Bing's search engine. Therefore, it is a bit interesting if it also has a gender bias.

We test the question from earlier: "The nurse married the doctor because the doctor was pregnant. Who was pregnant?". In the case below, Bing can't understand our question:

When we turn it around and let the nurse be pregnant, Bing answers this:

We also tested the example of the bricklayer and schoolteacher at Bing, and here again there were problems:

In the examples above with GPT-3.5, GPT-4, and Microsoft Bing, there are many challenges with artificial intelligence and gender stereotypes. With the above examples, you can test ChatGPT or other language models yourself to test if the model has problems.

But why is there this gender bias in language models? We've asked GPT-4 about this:

According to GPT-4, this bias stems from the texts on which the algorithms are trained and the biases inherent in these datasets. If traditional gender roles predominate in the dataset, they will be unintentionally reproduced by the language models, thus perpetuating this bias.

Similar biases will be found in race, politics, religion, age, etc. When texts in this way get a bias to one side or the other, it can help to influence our communication. OpenAI has recognized the limitations of this platform and its inherent biases.

Below, we have formulated some questions related to this bias:

  1. What factors contribute to bias in ChatGPT, and how does this bias affect the interaction between users and ChatGPT?
  2. How do we detect and measure bias in ChatGPT's responses and behavior? What methods and tools are effective for this purpose?
  3. How can we teach and encourage students to be aware of bias and critically assess the responses generated when using ChatGPT and similar AI systems?
  4. What ethical considerations and guidelines should we consider when working with ChatGPT and other AI models, and how can we ensure we don't overlook various forms of bias in the process?

Sources

36 Professionals: The Gender Bias in Generative AI Models
An Exploration of Midjourney’s Image Generation
Gender and AI: Addressing bias in artificial intelligence
The AI industry needs to work towards equality, both in its approach and perspective so as to ensure gender via does not prevail
Gender bias in AI: what can we do about it?
One thing generative artificial intelligence (AI) can’t do is advocate for gender equity in tech. Eleni Sarri of Tug explains how to tackle this issue in AI.
Large language models are biased. Can logic help save them?
MIT CSAIL researchers trained logic-aware language models to reduce harmful stereotypes like gender and racial biases using textual-entailment models.
ChatGPT-4 reinforces sexist stereotypes by stating a girl cannot “handle technicalities and numbers” in engineering - Equality Now
Leading AI policy and human rights experts at Alliance for Universal Digital Rights (AUDRi) have written an open letter to OpenAI’s Chief Executive Sam Altman, inviting him to collaborate on tackling risks and governance issues accompanying ChatGPT.
A new AI draws delightful and not-so-delightful images
OpenAI’s DALL-E 2 is incredible at turning text into images. It also highlights the problem of AI bias — and the need to change incentives in the industry.