Both Google's Bard and Microsoft's Bing Chat have built-in visual search. You can upload an image that you can use as input for searching - or as we have tested here - and have the images described in detail. There is a big difference in the results in our tests, where it looks like Google Bard has an advantage here, whereas another test (see links at the bottom of the article) shows a slight advantage to Microsoft Bing.

It has not been disclosed exactly how Microsoft and Google have implemented the technology. However, both state that they use computer vision algorithms combined with visual search. Google probably uses its Google Lens technology to find similar images on the web, and Microsoft has developed Visual search technology for the same purpose. There is certainly also a built-in form of OCR (Optical Character Recognition), as both models can "read" texts in the images and translate them. Here, however, Google's Bard is the best.

We have done a small test where we asked Microsoft's and Google's chatbots to describe an image in detail. We took the picture out of the windscreen of a car on our way across the Storebælt Bridge, a day of queues in the opposite direction.

Google Bard describes the picture well and correctly identifies the Storebælt Bridge. It gives us some details about the bridge and also praises us a little for the image's composition. However, it also describes the water under the bridge, although it is not visible in the photo.

Google Bard billedanalyse

We feed Microsoft Bing Chat with the same image and set the same task. Again, we get a credible description of the image. However, the program adds much more knowledge than the image shows (e.g., what types of cars are seen and the cars have license plates!).

Microsoft Bing AI Image Analytics

When we ask Bing which bridge it is, things go very wrong. The bridge is identified as the Luzhijang Bridge in China. So Bing's image search isn't quite as good as Google Bard's in this case.

Microsoft Bing AI

We have also (for both chatbots) tried uploading a graph showing a light spectrum of the sky. Google Bard starts by telling us that it does not show a graph of the average height of a person over time but a graph of the relative intensity of light at different wavelengths. It also reads the text on the axes just right, identifies it as Danish, and translates the texts into English. Quite impressive. We also get a nice explanation of what the light spectrum shows and some knowledge about daylight in general. However, it is not quite sharp when we demand the wavelength with the highest light intensity. It claims the answer is 450 nm, whereas the correct answer is close to 500 nm.

Google Bard - graph analysis

Microsoft Bing Chat will be given the same task and will start by blurring any faces from the image. Both chatbots blur people in pictures before doing an image search. Then search for "Light spectrum from the sky" - a not-quite-perfect text reading. Bing can explain the image but sets the maximum to about 600 nm. Bing is doing an internet search on "Light spectrum from the sky" and finds something interesting about the Northern Lights and the starry sky in July 2023, as well as information about sound therapy and healing! Bing – as always – sources the information it has found.

Microsoft Bing AI - graph analysis

In our two small assignments, Google's Bard was best at image analysis and description. Many other comparisons have been made, and in some places, Microsoft has rated Bing as better (for example, in tasks that require counting the number of people in a photo). For other puzzles, Google Bard wins (such as rating images). However, one thing is certain: What the two chatbots can do with images is very impressive, even if they do not solve the tasks flawlessly.

Sources

What’s ahead for Bard: More global, more visual, more integrated
We’re ending the waitlist for Bard, adding support for more regions, introducing images and connecting with partner apps.
Microsoft’s Bing Chat A.I. bot now lets you search using images
Users can now take or upload a photo to Bing Chat and ask for more information on it via desktop or the Bing app.
11 Practical Uses for Bing’s Image Recognition
My experiments with Bing Chat’s newfound ability to see images and its practical applications.
How Good Is Bing (GPT-4) Multimodality?
In this blog post, we qualitatively analyze how well Bing’s combination of text and image input ability performs at object detection tasks.
Prompting Google Bard with Images & How it Compares to Bing
In this article, we will examine how Bard’s image input performs, how it stacks up against Microsoft Bing, and how we believe it works.