Multimodal parsing for RAG: Azure OpenAI GPT-4o, LlamaParse and Azure AI Search

allisonsparrow

Microsoft

Nov 25, 2024

Azure OpenAI endpoints are now available in LlamaParse. Extract unstructured data using Azure OpenAI’s GPT-4o model family with LlamaParse Premium, build a RAG app with Azure AI Search and LlamaIndex.

Azure OpenAI endpoints are now available in LlamaParse. This new integration allows you to extract unstructured data using Azure OpenAI’s GPT-4o family of models for document transformations.

Here’s what you get:

Direct connectivity with models like Azure OpenAI GPT-4o and GPT-4o-mini models
Multimodal document parsing in LlamaParse, through Azure OpenAI’s multimodal support
LLM-optimized outputs for enhanced retrieval and semantic search
Seamless ingestion into Azure AI Search’s vector store via LlamaIndex
Enterprise-grade security and compliance for sensitive workloads

Azure AI Search is currently one of the top downloaded vector stores for Llamaindex users. The Llamaindex and Azure AI Search teams maintain an ongoing, active collaboration to foster community growth and deliver great products across the RAG pipeline.

Earlier this year, we co-authored a blog on advanced RAG techniques, and how to optimize retrieval quality with query transformations and advanced retrieval techniques.

Azure AI Search now offers native query rewriting and a new reranking model, both highly tuned to multi-lingual, multi-industry RAG use cases. These products were built and trained on extensive datasets that specifically mirror production RAG scenarios.

Build a RAG app with Azure AI Search and LlamaParse

Here’s how you can build a cohesive RAG workflow using LlamaCloud, Azure AI Search and Azure OpenAI.

Parse and enrich: Use LlamaParse Premium with Azure OpenAI for advanced document extraction, and generate LLM-optimized output formats like Markdown, LaTeX, and Mermaid diagrams.
Chunk and embed: Ingest, chunk, embed and index your parsed content in one integrated flow using Azure AI Search as a vector store and embedding models from the Azure AI model catalog.
Search and generate: Use query rewriting and semantic reranking in Azure AI Search to boost retrieval quality. Build GenAI applications by synthesizing retrieved information with Azure AI Search and Azure OpenAI, orchestrated through Llamaindex.

Example code

Use LlamaParse with Azure OpenAI GPT-4o family endpoints.

parser = LlamaParse(
    result_type="markdown",
    use_vendor_multimodal_model=True,
    azure_openai_endpoint=f"{AZURE_OPENAI_ENDPOINT}openai/deployments/{AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME}/chat/completions?api-version=2024-10-01-preview",
    azure_openai_api_version="2024-10-01-preview",
    azure_openai_key=AZURE_OPENAI_API_KEY,
)

Create or update an Azure AI Search vector store with parsed_text_markdown from LlamaParse and vector embeddings from Azure OpenAI.

metadata_fields = {
    "page_num": ("page_num", MetadataIndexFieldType.INT64),
    "image_path": ("image_path", MetadataIndexFieldType.STRING),
    "parsed_text_markdown": ("parsed_text_markdown", MetadataIndexFieldType.STRING),
}

vector_store = AzureAISearchVectorStore(
    search_or_index_client=index_client,
    filterable_metadata_field_keys=metadata_fields,
    index_name="YOU-INDEX-NAME",
    index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
    id_field_key="id",
    chunk_field_key="parsed_text_markdown",
    embedding_field_key="embedding",
    embedding_dimensionality=1536,
    metadata_string_field_key="metadata",
    doc_id_field_key="doc_id",
    filterable_metadata_field_keys=metadata_fields,
    language_analyzer="en.lucene",
    vector_algorithm_type="hnsw",
)

Create the query engine that uses Azure AI Search as the retriever and Azure OpenAI GPT-4o Multimodal LLM as the generator (tip: use the new semantic ranker by setting VectorStoreQueryMode.SEMANTIC_HYBRID).

query_engine = MultimodalQueryEngine(
    retriever=index.as_retriever(
        vector_store_query_mode=VectorStoreQueryMode.DEFAULT, similarity_top_k=3 # default is pure vector search, try HYBRID or SEMANTIC_HYBRID
    ),
    multi_modal_llm=azure_openai_mm_llm,
)

Activate your RAG pipeline and enable multimodal question / answering.

# Example query focused on visual and textual content
query = "what was OpenAI's GPT-4 model major dimensions of transparency for capabilities %?"

# Execute the query
response = query_engine.query(query)

# Display the query and multimodal response
display_query_and_multimodal_response(query, response, 8, 20)

Get Started

We hope you find the latest integrations and sample code useful. Be sure to check out our example notebooks and best practices to build apps with advanced RAG techniques.