Matomo

Image Analysis with Artificial Intelligence: MiniGPT-4

· 4 min read
Image Analysis with Artificial Intelligence: MiniGPT-4

On March 14, 2023, OpenAI demonstrated GPT-4, which can "view," describe and analyze images. (See this article for more information: https://viden.ai/gpt-4-flytter-graenserne-for-kunstig-intelligens-i-undervisningen/). However, we still need to see it and try it out in practice, as so far, only a few collaborators have access via API (e.g., Be My Eyes: https://openai.com/customer-stories/be-my-eyes).

A research group from Saudi Arabia has launched the open-source MiniGPT-4, which they describe as "Enhancing Vision-language Understanding with Advanced Large Language Models". According to their On March 14, 2023, OpenAI demonstrated GPT-4, which can "view," describe and analyze images. MiniGPT-4 can, among other things, describe and analyze images, create HTML websites from hand-drawn sketches, find recipes from pictures of dishes, explain why memes are funny, and much more - exactly the same as OpenAI has shown GPT-4 can. We have tested the Mini-GPT and are actually quite impressed with the model's capabilities.

Testing MiniGPT-4

In the first test, we used a picture we had taken across the Storebælt Bridge. We asked GPT-4 to describe it, and it did impressively well! It describes that the image was taken out of the windscreen of a car driving on a motorway - over a suspension bridge made of concrete and steel and held up by cables between tall towers. It describes many cars going in both directions and separating in the middle of the road. It describes the image as a panoramic, flat landscape with hills in the distance. Here, it is wrong, just as it sees trees and buildings on the horizon that are not in the picture. The clouds and weather hit it pretty well. All in all, that's quite an impressive result.

In the next example, we again used a picture we took ourselves. This time, we ask MiniGPT-4 to describe and analyze the image and then write a short story based on the image.

MiniGPT-4 analyzes the picture and writes the following story:

Once upon a time, there was a beautiful white plate with a delicate design on it. The plate was placed on a white tablecloth with a green napkin on top. On the plate were two silver forks, one of which had a piece of food on it.

As the waiter walked by, he noticed the plate and thought to himself, "That's a shame, someone left their food on the plate and didn't finish it." He decided to take the plate and forks away and bring a new, clean plate for the next customer.

But as he was walking towards the kitchen, he stumbled and dropped the plate. The plate shattered into pieces and the food fell onto the ground. The waiter was devastated, he had never made such a mistake before.

Just then, a kind customer saw what had happened and offered to help. Together, they cleaned up the mess and the customer even offered to pay for a new plate. The waiter was grateful and thanked the customer for their kindness.

From that day on, the waiter made sure to be more careful when carrying plates and to always finish his food. He never forgot the kindness of the customer who had helped him that day.

In the last test, we took a picture of a dog reading the newspaper and asked what was funny about the picture and why.

MiniGPT-4 comes with a nice explanation that it is funny that the dog wears glasses while reading the newspaper and that it is funny because it is abnormal for dogs to wear glasses. It is not mentioned that the dog's reading of the newspaper and its telling title is also part of the joke. But completely crooked, it is not.

We've also tested whether the MiniGPT-4 can analyze a graph, but it's not going well. The graph shows a daylight spectrum but is interpreted as the distribution of a disease in a group of people. In particular, it is interesting that it does not use the texts to get an indication of the graph's content.

We also tried to get MiniGPT-4 to create an HTML code from a sketch (like OpenAI's demo) but failed to get anything meaningful out of it in the tests we did.

In the above examples, we have chosen to use images that do not contain information in the image name or other metadata that explains something about the image. This ensures that this data has not been used for the analysis.

But it won't be long before the technology is good enough to decode graphs and create an HTML code. And these methods, like so many others, will become available to everyone.

Sources

Minigpt-4
Minigpt-4
MiniGPT-4: The Open-Source AI Overlord That’ll Blow Your Mind (and Your Socks Off)
Meet MiniGPT-4: AI’s Cool Younger Sibling