Matomo

Generative AI makes images: How do Dall-E 2 and Midjourney work?

· 14 min read
Generative AI makes images: How do Dall-E 2 and Midjourney work?

Generative artificial intelligence (AI) is a type of artificial intelligence (AI) that can create new, original artifacts by using algorithms to generate content such as texts, images, videos, speech, and music from a prompt input. The most well-known generative AI applications are ChatGPT, Bing Chat, Google Bard (text-generating), Dall-E 2, Stable Diffusion, and Midjourney (image-generating). 
This article will briefly explain how artificial intelligence can generate images from a descriptive text. We also compare 5 of the most used systems, plus Microsoft Bing, which can generate images in the latest edition. 

In a series of upcoming articles, we'll dive into different ways to use image-generating artificial intelligence.

How does image-generating AI work?

The image-generating artificial intelligences are based on a technology called diffusion. In short, they work by destroying and rebuilding images through steps and learning from the process.

The first step in the process (forward) is to gradually destroy an image by adding larger and larger amounts of Gaussian noise until the image is completely gone.

https://developer.nvidia.com/blog/improving-diffusion-models-as-an-alternative-to-gans-part-1/

Then, the process is reversed, and the noise is gradually reduced through a lot of steps to create an image again. This happens in a neural network that gets input from the image with noise and a text describing the image, and the output in each step is an image with less noise. The model constantly learns from the individual steps and ends up recreating the image again. The model thus learns to generalize text to images by converting the noisy image back to its original representation. This way, the model learns how to match a text prompt to an image. When the model is trained on several million images with associated descriptions, it can generate images by starting with random noise. This text prompt describes what you want an image of and ends with an image of what you want. 

Finally, the image is upscaled to the desired size using an algorithm. This algorithm works like the forward-reverse process by predicting a high-resolution image that could be downscaled to look like the original, low-resolution image. After training on many images, it can make high-resolution images from low-resolution images.

Kilde: https://artificialintelligence.oodles.io/blogs/upscaling-images-with-machine-learning/

If you want to know more about the process behind image generation and the math behind it, you can read about it in the link collection at the bottom of the page. The theory is very complex and involves advanced mathematics, which includes Markov chains.

We take a closer look at some systems:

So what can these image-generating artificial intelligence do, and what are their differences? We have taken a closer look at the most used systems: Midjourney, Dall-E 2, Stable diffusion, Adobe Firefly, and BlueWillow.

Price Copyright Editing Difficulty Quality
Dall-E 2 Paid No Yes Easy Between
Midjourney Paid No No Difficult High
Stable Diffusion Paid/Free No No Difficult Low
Adobe Firefly Free Yes Yes Easy High
BlueWillow Free No No Difficult High
Microsoft Bing Free No No Easy Between
⚠️
Please note that we have not examined GDPR and whether the tools below can be used in education. Therefore, care must be taken to let students work with the tools.

One must be very careful about using Discord as part of the teaching. Read The French SA fines DISCORD EUR 800,000.

Dall-E 2 (paid)

The simplest to go to is OpenAI's Dall-E 2. If you have already created a ChatGPT account with OpenAI, you can log in with it or start by creating an account.

Once done, you can access Dall-E 2 here: https://labs.openai.com/ and start exploring text-to-image. You get 15 credits for free every month (one credit is a request for Dall-E 2), and then you have to pay for more credits ($0.020 per image in 1024x1024, which is max. resolution)

Dall-E 2 is trained on a mix of publicly available images and images for which OpenAI has purchased a license. The model has some limitations that ensure it cannot create images that violate OpenAI's guidelines, such as violent, sexual, illegal, or political images. These can be viewed here. You risk being banned from using Dall-E 2 if you try to violate them. According to OpenAI owns the images you create and may use them commercially.

💡
The main features of Dall-E 2
- Makes four pictures at a time
- Input can be very detailed
- Fast and accurate
- No copyright on the images
- Editing and customization options (Outpainting)

Midjourney (paid)

Midjourney is a bit more complicated to get started with since it runs as a Discord bot. Once you have mastered Discord and have installed Midjourney, it is very easy to use. There are a lot of instructions for installation on the web.

Midjourney Quick Start Guide
Learn how to use the text-to-image service, Midjourney on Discord or the web to create custom images from simple text prompts.

https://support.discord.com/hc/en-us/articles/360045138571-Beginner-s-Guide-to-Discord

Midjourney is a small independent research company from San Francisco that works with an artificial intelligence that can create images from text. You can try Midjourney for free (currently free trials are paused), and then have to buy one of their three packs for $10, $30 or $60 respectively. The more expensive the plan, the faster the images are generated and the more simultaneous runs you can set up. You buy an amount of GPU time per month. 

Midjourney is trained on images downloaded from the Internet - uncritically and without securing rights from the owners of the images. Similar to Dall-E 2, Midjourney is restricted by guidelines that exclude the creation of images that are disrespectful, illegal, aggressive, or sexual. Also, Midjourney excludes users who attempt to create images that violate these guidelines. You can use Midjourney images commercially if you have a paid account. According to Midjourney, you own the images you create.

💡
Key Features of Midjourney
- Very high-quality photos
- Advanced algorithm
- No copyright on the images
- No editing or customization options

Stable diffusion (paid/free)

Stable diffusion is made by the company Stability AI, together with researchers from Ludwig Maximilian University in Munich. Stable Diffusion is released as Open Source, so you can theoretically run it on your computer (it requires a newer Nvidia GPU with at least 4GB of VRAM). You can also access Stable Diffusion via Dream Studio, where you get 25 free credits (equivalent to approximately 125 images). Additional credits can be purchased for $10 per approximately 5000 images.

Stable Diffusion is also trained on several billion images downloaded from the Internet. Stability Ai believes training on copyrighted images falls under the "fair use" doctrine. Like the other models, Stable Diffusion also has a set of guidelines and limitations similar to those of the others. However, it is possible to bypass it if you install the model on your computer. Stability AI writes, albeit somewhat cautiously, that you own the images and may use them commercially.

GitHub - TheLastBen/fast-stable-diffusion: fast-stable-diffusion + DreamBooth
fast-stable-diffusion + DreamBooth. Contribute to TheLastBen/fast-stable-diffusion development by creating an account on GitHub.

💡
The main features of Stable Diffusion
- High-quality photos
- Can be run on your computer or Google Colab
- No copyright on the images
- Hard to get started with

Adobe Firefly (beta)

Adobe has very recently (March 21, 2023) launched its generative image tool, which it calls Firefly. Initially, it is only in a beta version, which you can sign up to try. Viden.ai has gained access to Firefly and can describe it here.

Adobe Firefly is an image generation tool that runs through a website but is later implemented in their Creative Cloud applications such as Photoshop, Illustrator, etc. Firefly will soon come in a version that can also make vector graphics. The advantage of Firefly is that Adobe has trained the model on professional stock images that they already have the rights to. In addition to securing rights 100%, it should provide better images as output. Adobe is also working to allow graphic designers to train the model on their work so that it can generate images in their style.

Adobe Unveils Firefly, a Family of new Creative Generative AI
First Firefly model will empower customers of all experience levels to generate high quality images and stunning text effects Adobe launches beta of first Firefly model focused on commercial use Firefly will be integrated directly into Creative Cloud, Document Cloud, Experience Cloud and Adobe Expre…
💡
Key Features of Adobe Firefly
- High-quality photos
- Good editing and customization options
- More creative tools for working with the images
- Images may not be used commercially while Firefly is in beta
- The images are labeled so you can see Adobe Firefly makes them

BlueWillow (free)

We have BlueWillow in the list of tools where you can make images. It works the same way as MidJourney, but the big difference is that it is free. In the free version, you can make all the images you want and subsequently process the images and use them commercially. However, it is important to note that BlueWillow may use your prompt and the images you generated for your own purposes. 

It should be possible to buy into the service, and then you own all rights to images you mark as private or make in closed spaces on Discord. Here, you will have all rights over the images, which can still be processed and used commercially. We have not found any price information, but we expect it to be something to come.

Like MidJourney, BlueWillow uses Discord, and one can use the same commands. To get started with the tool, we recommend reading their documentation.

Getting Started - BlueWillow Documentation
Learn how to use the BlueWillow Bot on discord with simple text prompts.

The interesting thing about BluwWillow is that it gathers many other tools depending on which prompt you type. Therefore, you do not know if it is Stable Diffusion or another AI that generates the images.

BlueWillow is not up to par with the other tools in our tests, but it is a good alternative to the paid editions.

BlueWillow | Free AI Art Generator
BlueWillow is a free AI art generator that creates stunning AI-generated images. Beautiful, unique and inspiring AI creations are at your fingertips.
💡
Key Features of BlueWillow
- Good quality photos
- Same features as Midjourney
-Free
- No copyright on the images
- No editing or customization options

Microsoft Bing

Microsoft announced on Thursday, 21/3-2023, that Bing can generate images from a text prompt. Bing Image Creator, their tool, is already available to anyone with preview access to Bing Chat with built-in artificial intelligence. 

Bing Image Creator uses an advanced version of OpenAI's Dall-E 2 image generation algorithm, and in their preview version, which is available now, you can only prompt in English. In the preview version, you can access 25 Boost images (images that are generated quickly) and then slow down.

Microsoft has implemented its "responsive AI principles" to ensure that Image Creator cannot be used for illegal or harmful content. You can see the guidelines here.
You can generate images directly in Bing Chat by setting the conversation style to creative mode.

You can work further with the images generated, but handling text is not as seamless yet. It faces similar challenges with text as other generative image-generating AIs.

💡
The key features of Bing Chat include:
- Free to use
- No copyright on the images generated
- Limited editing or customization options
- Easy to get started with

Sources

Create images with your words - Bing Image Creator comes to the new Bing - The Official Microsoft Blog
Last month we introduced the new AI-powered Bing and Microsoft Edge, your copilot for the web – delivering better search, complete answers, a new chat experience and the ability to create content. Already, we have seen that chat is reinventing how people search with more than 100 million chats to da…
Microsoft’s Bing chatbot now lets you create images via OpenAI’s DALL-E
Bing’s chatbot gets even more AI-powered smarts.
Microsoft brings OpenAI’s DALL-E image creator to the new Bing
Microsoft has now integrated OpenAI’s DALL-E generative AI image creator into its new Bing Chat feature.

Danskere bliver snydt af falske billeder - kan du selv se forskellen?
Diffusion Model Clearly Explained!
How does AI artwork work? Understanding the tech behind the rise of AI-generated art.
Introduction to Diffusion Models for Machine Learning
The meteoric rise of Diffusion Models is one of the biggest developments in Machine Learning in the past several years. Learn everything you need to know about Diffusion Models in this easy-to-follow guide.
How do DALL-E, Midjourney, Stable Diffusion, and other forms of generative AI work?
Generative AI assembles meaningful pictures from meaningless noise. Is this a form of true intelligence or just a mimic?
How diffusion models work: the math from scratch | AI Summer
A deep dive into the mathematics and the intuition of diffusion models. Learn how the diffusion process is formulated, how we can guide the diffusion, the main principle behind stable diffusion, and their connections to score-based models.
Stable Diffusion Clearly Explained!
How does Stable Diffusion paint an AI artwork? Understanding the tech behind the rise of AI-generated art.
From DALL·E to Stable Diffusion: How Do Text-to-Image Generation Models Work?
This article was originally published at Tryolabs’ website. It is reprinted here with the permission of Tryolabs. The machine learning community lost its
Text-to-Image Models: Image Generation Explained
Artificial Intelligence (AI) research has grown exponentially over the past 20 years, with developments in areas like deep learning…
Stable Diffusion Clearly Explained!
How does Stable Diffusion paint an AI artwork? Understanding the tech behind the rise of AI-generated art.
What is Image Upscale, and Why is it Important?
An image upscale app like upscale.media can help boost the image resolution up to 4X to take care of your personal & business needs
Upscaling Images With Machine Learning for Optimum Resolution
Not just video streaming but upscaling images with machine learning can greatly enhance visual content for healthcare, eCommerce, and surveillance services.
Legal & ethical aspects of using DALL-E, Midjourney, & Stable Diffusion
Today, I want to compare DALL-E, Midjourney, and Stable Diffusion — with regard to their legal, ethical, and financial standpoints.
More Examples of AI-Generated DALL-E 2 Images
Some examples of images created using DALL-E 2, an AI program that can create almost any image from a text prompt.