Azure AI Search: Microsoft OneLake integration plus more features now generally available

gia_mondragon

Microsoft

Sep 30, 2025

From ingestion to retrieval, Azure AI Search releases enterprise-grade GA features: new connectors, enrichment skills, vector/semantic capabilities and wizard improvements—enabling smarter agentic systems and scalable RAG experiences.

Introduction

Azure AI Search has officially launched its latest generally available (GA) REST API version 2025-09-01 release, delivering enterprise-grade enhancements across the stack—from expanded data source support and enriched indexing pipelines to advanced vector search, semantic scoring, and improved developer experience. Whether you're building agentic systems, RAG applications, or classic keyword search, this release brings new capabilities to help you scale securely, search smarter, and deliver faster. Below, we break down the key features, and how they support evolving customer needs.

Indexing & Ingestion Pipeline

Microsoft OneLake Indexer (Files)

Across industries—from manufacturing to healthcare—multiple organizations rely on lakehouse architectures to manage large volumes of data. With native support for Microsoft OneLake, teams can now index files directly from lakehouse storage, including shortcuts to ADLS Gen2, Amazon S3, and Google Cloud Storage. The OneLake files indexer offers secure connectivity, incremental refresh, and built-in AI enrichment, with basic parsing support for diverse formats like PDF, Office, and XML files—making enterprise content searchable for multiple variety of applications. For more information take a look at Azure AI Search integration with Microsoft OneLake blog post.

Figure 1. Microsoft OneLake indexer data flow

Document Intelligence Layout Skill

Many enterprise documents—like invoices, forms, and scanned PDFs—contain rich information embedded in visual structure. The layout skill extracts headers, images and sections from these formats, making it easier to surface meaningful content during search. Whether parsing insurance forms or onboarding packets, this skill helps transform visual data into searchable that is of very high value in multiple scenarios, especially for multimodal search. Regional availability has been expanded as part of this release to support multiple geographies across the globe. Review the layout skill documentation for details.

Retrieval: Vector, Keyword, Hybrid and Semantic Ranking

Binary Quantization Rescoring

Enterprises deploying vector search often face trade-offs between performance and precision. Binary quantization compresses embeddings for faster search and now supports rescoring even when original vectors are discarded.

Truncate Dimensions (MRL)

Text embeddings can be large and costly to store, especially at scale. With support for Matryoshka Representation Learning (MRL), teams can truncate dimensions while preserving semantic richness. This is ideal for applications like search over internal documentation, where speed and cost-efficiency are critical without sacrificing relevance.

Scoring Profiles in Semantic Reranking

Different industries prioritize different signals—recency, semantic closeness. Scoring profiles now can be applied while performing semantic reranking, allowing custom business logic—like freshness, or popularity—to influence final result order. This dual-pass scoring ensures that semantic relevance doesn’t override critical ranking signals. A boosted score field combines semantic ranker output with scoring profile functions, enabling precise control over hybrid, vector, and keyword queries and finally over the final score. This helps teams surface the most contextually relevant and strategically prioritized results across use cases.

Hybrid Search Sub-scores & Debug Mode

Hybrid search—combining keyword and vector signals—is powerful, but getting the desired relevance for some scenarios can be complex. With sub-scores and debug mode, teams gain visibility into how results are ranked across BM25 algorithm, vector, and semantic layers. This transparency makes it easier to iterate, validate, and adjust the search behavior for scenarios where the highest precision is critical.

Index Normalizers

Across industries, data inconsistencies—like casing—can degrade keyword search quality. Normalizers standardize text while filtering the data, improving match accuracy for names, codes, and terminology. This is especially useful in multilingual environments or when indexing data sources with varied source formatting.

Index “Description” Field

When building agentic systems and retrieval-augmented generation (RAG), metadata plays a critical role in grounding responses. The index “description” field allows developers to enrich the index with contextual summaries or semantic hints, that would help choosing the right source index depending on the scenario.

HNSW Storage Optimization

High-performance vector search often requires large memory footprints. With storage optimizations for HNSW graphs, teams can reduce resource usage while maintaining retrieval quality.

Portal Enhancements for Increased Developer Experience

Portal UX Improvements

The updated Azure AI Search portal experience with “Import data (new)” wizard improves data prep for keyword search, making it easier for developers and analysts to configure indexes and test relevance—all within an improved guided UI that standardizes the experience across our wizards. This lowers the barrier to entry for teams building internal tools or customer-facing search for easier navigation.

Figure 2. Import data (new) wizard

Bonus Preview Feature: Azure AI Search in Azure AI Foundry

One-click Knowledge Grounding (Preview)

Connecting your enterprise data to Azure AI Search-powered agents is now easier with the new native experience in Azure AI Foundry. During the “Add knowledge” step in agent creation, you can now ingest content directly from Azure Blob Storage, ADLS Gen2, or Microsoft OneLake (Fabric), and Foundry will automatically create and configure an Azure AI Search index—no prior setup required.

This streamlined flow means you simply select your data source, pick an embedding model, and Foundry handles the rest: ingestion, chunking, embedding, and hybrid index creation. The process respects Azure RBAC and network isolation, so your data remains secure throughout. The index is ready for both vector and keyword retrieval, enabling advanced retrieval-augmented generation (RAG) scenarios with minimal friction. Learn more here: Ground your agents faster with native Azure AI Search indexing in Foundry