Generative AI makes images: How do Dall-E 2 and Midjourney work?

Generative artificial intelligence (AI) is a type of artificial intelligence (AI) that can create new, original artifacts by using algorithms to generate content such as texts, images, videos, speech, and music from a prompt input. The most well-known generative AI applications are ChatGPT, Bing Chat, Google Bard (text-generating), Dall-E 2, Stable Diffusion, and Midjourney (image-generating).
This article will briefly explain how artificial intelligence can generate images from a descriptive text. We also compare 5 of the most used systems, plus Microsoft Bing, which can generate images in the latest edition.

In a series of upcoming articles, we'll dive into different ways to use image-generating artificial intelligence.

How does image-generating AI work?

The image-generating artificial intelligences are based on a technology called diffusion. In short, they work by destroying and rebuilding images through steps and learning from the process.

The first step in the process (forward) is to gradually destroy an image by adding larger and larger amounts of Gaussian noise until the image is completely gone.

https://developer.nvidia.com/blog/improving-diffusion-models-as-an-alternative-to-gans-part-1/

Then, the process is reversed, and the noise is gradually reduced through a lot of steps to create an image again. This happens in a neural network that gets input from the image with noise and a text describing the image, and the output in each step is an image with less noise. The model constantly learns from the individual steps and ends up recreating the image again. The model thus learns to generalize text to images by converting the noisy image back to its original representation. This way, the model learns how to match a text prompt to an image. When the model is trained on several million images with associated descriptions, it can generate images by starting with random noise. This text prompt describes what you want an image of and ends with an image of what you want.

Finally, the image is upscaled to the desired size using an algorithm. This algorithm works like the forward-reverse process by predicting a high-resolution image that could be downscaled to look like the original, low-resolution image. After training on many images, it can make high-resolution images from low-resolution images.

Kilde: https://artificialintelligence.oodles.io/blogs/upscaling-images-with-machine-learning/

If you want to know more about the process behind image generation and the math behind it, you can read about it in the link collection at the bottom of the page. The theory is very complex and involves advanced mathematics, which includes Markov chains.

We take a closer look at some systems:

So what can these image-generating artificial intelligence do, and what are their differences? We have taken a closer look at the most used systems: Midjourney, Dall-E 2, Stable diffusion, Adobe Firefly, and BlueWillow.

	Price	Copyright	Editing	Difficulty	Quality
Dall-E 2	Paid	No	Yes	Easy	Between

Midjourney	Paid	No	No	Difficult	High
Stable Diffusion	Paid/Free	No	No	Difficult	Low
Adobe Firefly	Free	Yes	Yes	Easy	High
BlueWillow	Free	No	No	Difficult	High
Microsoft Bing	Free	No	No	Easy	Between

⚠️

Please note that we have not examined GDPR and whether the tools below can be used in education. Therefore, care must be taken to let students work with the tools.

One must be very careful about using Discord as part of the teaching. Read The French SA fines DISCORD EUR 800,000.

Dall-E 2 (paid)

The simplest to go to is OpenAI's Dall-E 2. If you have already created a ChatGPT account with OpenAI, you can log in with it or start by creating an account.

Once done, you can access Dall-E 2 here: https://labs.openai.com/ and start exploring text-to-image. You get 15 credits for free every month (one credit is a request for Dall-E 2), and then you have to pay for more credits ($0.020 per image in 1024x1024, which is max. resolution)

Dall-E 2 is trained on a mix of publicly available images and images for which OpenAI has purchased a license. The model has some limitations that ensure it cannot create images that violate OpenAI's guidelines, such as violent, sexual, illegal, or political images. These can be viewed here. You risk being banned from using Dall-E 2 if you try to violate them. According to OpenAI owns the images you create and may use them commercially.

💡

The main features of Dall-E 2
- Makes four pictures at a time
- Input can be very detailed
- Fast and accurate
- No copyright on the images
- Editing and customization options (Outpainting)

Midjourney (paid)

Midjourney is a bit more complicated to get started with since it runs as a Discord bot. Once you have mastered Discord and have installed Midjourney, it is very easy to use. There are a lot of instructions for installation on the web.

https://support.discord.com/hc/en-us/articles/360045138571-Beginner-s-Guide-to-Discord

Midjourney is a small independent research company from San Francisco that works with an artificial intelligence that can create images from text. You can try Midjourney for free (currently free trials are paused), and then have to buy one of their three packs for $10, $30 or $60 respectively. The more expensive the plan, the faster the images are generated and the more simultaneous runs you can set up. You buy an amount of GPU time per month.

Midjourney is trained on images downloaded from the Internet - uncritically and without securing rights from the owners of the images. Similar to Dall-E 2, Midjourney is restricted by guidelines that exclude the creation of images that are disrespectful, illegal, aggressive, or sexual. Also, Midjourney excludes users who attempt to create images that violate these guidelines. You can use Midjourney images commercially if you have a paid account. According to Midjourney, you own the images you create.

💡

Key Features of Midjourney
- Very high-quality photos
- Advanced algorithm
- No copyright on the images
- No editing or customization options

Stable diffusion (paid/free)

Stable diffusion is made by the company Stability AI, together with researchers from Ludwig Maximilian University in Munich. Stable Diffusion is released as Open Source, so you can theoretically run it on your computer (it requires a newer Nvidia GPU with at least 4GB of VRAM). You can also access Stable Diffusion via Dream Studio, where you get 25 free credits (equivalent to approximately 125 images). Additional credits can be purchased for $10 per approximately 5000 images.

Stable Diffusion is also trained on several billion images downloaded from the Internet. Stability Ai believes training on copyrighted images falls under the "fair use" doctrine. Like the other models, Stable Diffusion also has a set of guidelines and limitations similar to those of the others. However, it is possible to bypass it if you install the model on your computer. Stability AI writes, albeit somewhat cautiously, that you own the images and may use them commercially.

💡

The main features of Stable Diffusion
- High-quality photos
- Can be run on your computer or Google Colab
- No copyright on the images
- Hard to get started with

Line drawing turned into photo in Stable diffusion (Bùi Xuân Phái)

Adobe Firefly (beta)

Adobe has very recently (March 21, 2023) launched its generative image tool, which it calls Firefly. Initially, it is only in a beta version, which you can sign up to try. Viden.ai has gained access to Firefly and can describe it here.

Adobe Firefly is an image generation tool that runs through a website but is later implemented in their Creative Cloud applications such as Photoshop, Illustrator, etc. Firefly will soon come in a version that can also make vector graphics. The advantage of Firefly is that Adobe has trained the model on professional stock images that they already have the rights to. In addition to securing rights 100%, it should provide better images as output. Adobe is also working to allow graphic designers to train the model on their work so that it can generate images in their style.

💡

Key Features of Adobe Firefly
- High-quality photos
- Good editing and customization options
- More creative tools for working with the images
- Images may not be used commercially while Firefly is in beta
- The images are labeled so you can see Adobe Firefly makes them

BlueWillow (free)

We have BlueWillow in the list of tools where you can make images. It works the same way as MidJourney, but the big difference is that it is free. In the free version, you can make all the images you want and subsequently process the images and use them commercially. However, it is important to note that BlueWillow may use your prompt and the images you generated for your own purposes.

It should be possible to buy into the service, and then you own all rights to images you mark as private or make in closed spaces on Discord. Here, you will have all rights over the images, which can still be processed and used commercially. We have not found any price information, but we expect it to be something to come.

Like MidJourney, BlueWillow uses Discord, and one can use the same commands. To get started with the tool, we recommend reading their documentation.

The interesting thing about BluwWillow is that it gathers many other tools depending on which prompt you type. Therefore, you do not know if it is Stable Diffusion or another AI that generates the images.

BlueWillow is not up to par with the other tools in our tests, but it is a good alternative to the paid editions.

💡

Key Features of BlueWillow
- Good quality photos
- Same features as Midjourney
-Free
- No copyright on the images
- No editing or customization options

Microsoft Bing

Microsoft announced on Thursday, 21/3-2023, that Bing can generate images from a text prompt. Bing Image Creator, their tool, is already available to anyone with preview access to Bing Chat with built-in artificial intelligence.

Bing Image Creator uses an advanced version of OpenAI's Dall-E 2 image generation algorithm, and in their preview version, which is available now, you can only prompt in English. In the preview version, you can access 25 Boost images (images that are generated quickly) and then slow down.

Microsoft has implemented its "responsive AI principles" to ensure that Image Creator cannot be used for illegal or harmful content. You can see the guidelines here.
You can generate images directly in Bing Chat by setting the conversation style to creative mode.

You can work further with the images generated, but handling text is not as seamless yet. It faces similar challenges with text as other generative image-generating AIs.

💡

The key features of Bing Chat include:
- Free to use
- No copyright on the images generated
- Limited editing or customization options
- Easy to get started with

Sources

On this page

How does image-generating AI work?

We take a closer look at some systems:

Dall-E 2 (paid)

Midjourney (paid)

Stable diffusion (paid/free)

Adobe Firefly (beta)

BlueWillow (free)

Microsoft Bing

Sources

Related Articles

Logo Design with AI Learning Objects

When Language Models Generate Falsehoods – 'Hallucinating' - Part 2

When language models generate falsehoods – "hallucinating" – Part 1

News of the week: Autumn break and DKK 100 million for AI research

A nuanced view of bias in language models

Visual Search with AI: Bing versus Bard