Introducing Multimodal Embed 3: Powering Enterprise Search Across Images and Text

Microsoft

Oct 23, 2024

We are excited to announce that Embed 3, Cohere's industry-leading AI search model, is now available in the Azure AI Model Catalog—and it's multimodal! With the ability to generate embeddings from both text and images, Embed 3 unlocks significant value for enterprises by allowing them to search and analyze their vast amounts of data, no matter the format. This upgrade positions Embed 3 as the most powerful and capable multimodal embedding model on the market, transforming how businesses search through complex assets like reports, product catalogs, and design files.

Transform Your Enterprise Search

In the world of enterprise AI, embedding models serve as the engine behind intelligent search applications. These models help employees and customers find specific information within vast libraries of data, enabling faster insights and more efficient decision-making.

How Embed 3 Works

Embed 3 translates input data—whether text or images—into long strings of numbers (embeddings) that represent the meaning of the data. These numerical representations are compared within a high-dimensional vector space to determine similarities and differences. Importantly, Embed 3 integrates both text and image embeddings into the same space, creating a seamless search experience. This advanced capability makes Embed 3 a cornerstone of enterprise search systems, while also playing a crucial role in retrieval-augmented generation (RAG) systems, where it ensures that generative models like Command R have the most relevant context to produce accurate and informed responses.

Real-World Use Cases for Multimodal Search

Every business, regardless of size or industry, can benefit from multimodal AI search. Embed 3 enables enterprises to search not just through text but also images, unlocking new possibilities for insight retrieval. Here are some key use cases:

Graphs & Charts

Visual data is essential for understanding complex information. With Embed 3, users can now search for specific graphs and charts based on a text query, empowering faster, more informed decision-making. This feature is particularly valuable for teams that rely on data-driven insights.

E-commerce Product Catalogs

Traditional search methods often restrict users to text-based queries. With Embed 3, retailers can enhance their product search experiences by enabling customers to find products that match their visual preferences. This transforms the shopping experience, increasing engagement and conversion rates.

Design Files & Templates

Designers typically manage large libraries of assets, making it challenging to find specific files. Embed 3 simplifies this process by allowing designers to search for UI mockups, visual templates, and presentation slides using descriptive text. This accelerates the creative process and streamlines workflows.

Industry-Leading Accuracy and Performance

According to Cohere, Embed 3 sets the standard for multimodal embedding models, offering state-of-the-art accuracy across a variety of retrieval tasks. Whether it’s text-to-text or text-to-image search, Embed 3 consistently outperforms other models on leading benchmarks, including BEIR for text-based retrievals and Flickr/CoCo for image retrieval tasks.

One of the key innovations of Embed 3 is the unified latent space for both text and image encoders. This simplifies the search process by allowing users to include text and image data in a single database without the need to re-index existing text corpora. Furthermore, the model is designed to compress embeddings to minimize database storage costs, ensuring efficiency at scale. It’s also fully multilingual, supporting over 100 languages and maintaining strong performance on noisy, real-world data.

Key Benefits:

- Mixed Modality Search: Uniquely proficient at searching across text and images in a unified space.
- High Accuracy: State-of-the-art results on industry-standard benchmarks.
- Multilingual Support: Compatible with over 100 languages, making it ideal for global businesses.

How to use Embed-3 on Azure ?

Here’s how you can effectively utilize the newly introduced Cohere Embed 3 models in the Azure AI Model Catalog:

Prerequisites:

If you don’t have an Azure subscription, get one here: https://azure.microsoft.com/en-us/pricing/purchase-options/pay-as-you-go

Familiarize yourself with Azure AI Model Catalog

Create an Azure AI Studio hub and project. Make sure you pick East US, West US3, South Central US, West US, North Central US, East US 2 or Sweden Central as the Azure region for the hub.

Create a deployment to obtain the inference API and key:

Open the model card in the model catalog on Azure AI Studio.

Click on Deploy and select the Pay-as-you-go option.

Subscribe to the Marketplace offer and deploy. You can also review the API pricing at this step.

You should land on the deployment page that shows you the API and key in less than a minute. You can try out your prompts in the playground.

The prerequisites and deployment steps are explained in the product documentation. You can use the API and key with various clients. Check out the samples to get started.

Conclusion

Embed 3 with enhanced image search capabilities is available today in Azure AI Model Catalog and Azure AI Studio. You can begin integrating this cutting-edge multimodal model into your enterprise search applications immediately.

Our team is excited to support your journey into multimodal AI search. If you’d like to learn more about Embed 3, we encourage you to sign up for Cohere+ Microsoft webinar on November 12 for a deep dive into its capabilities and how to leverage it for your business. Developers can also access detailed technical information through our API documentation.

Updated Oct 22, 2024

Version 1.0

artificial intelligence

azure ai studio

azure machine learning

machine learning

Sharmichock

Microsoft

Joined December 14, 2023

View Profile

AI - Machine Learning Blog

Follow this blog board to get notified when there's new activity

Blog Post

Introducing Multimodal Embed 3: Powering Enterprise Search Across Images and Text

Share