A few years ago, it became clear to our team that AI could bring value to our customers, from improvements in ingestion to data exploration. We knew we had a lot of these valuable assets around Microsoft, so our team set out on a mission to bring as much “intelligence” as we could to the product then known as Azure Search.
In the first phase of this mission, we took on "unsearchable" content; about 80% of business relevant data is in unstructured formats such as PDFs, PowerPoints, Word documents, JPEGs, CSVs, etc. We added AI powered enrichments to our ingestion process, enabling the ability to extract structure, insights and transform information from your data. These capabilities were well received by our customers, culminating in a product rebrand as "Azure Cognitive Search".
I am happy to announce that in our continuation of this journey, we are bringing state of the art AI capabilities to the “head” of our product, the core search sub-system. In partnership with the Bing team, we have integrated their semantic search investments (100s of development years and millions of dollars in compute time) into our query infrastructure, effectively enabling any developer to leverage this investment over searchable content that you own and manage. We believe semantic search on Azure Cognitive Search offers the best combination of search relevance, developer experience, and cloud service capabilities available on the market.
This post explains what new capabilities are available to you and how you can get started today. I would also encourage you to look at the post called “Bing’s AI behind semantic search” that goes deeper into the Bing technology that made semantic search possible.
Today, we are launching several exciting semantic search features in a public preview:
Customers have grown accustomed to using natural language queries in web search engines, but these queries usually do not perform as well when using a traditional keyword-based retrieval approach with ranking only based on term frequencies. To demonstrate this, consider what happens when a customer types a query like “how to add a user in Office” in the Microsoft documentation. For this purpose, we loaded all the Microsoft documentation dataset into Azure Cognitive Search so that we could compare the results between the default lexical based ranking algorithm and the semantic ranking algorithm.
Traditional retrieval and ranking approach
The default ranker (BM25) uses words as discrete units and predicts relevance by using the frequencies of terms in the corpus. BM25 works well when searching for keywords, but it struggles to find the most relevant documents when issuing a natural language query.
Note that the results do meet the lexical frequency requirements. For instance, inspecting the top document “Troubleshoot user errors with Office Add-ins” shows that there are lot of mentions of terms like “office”, “user”, “add” and “how to” in the document – but unfortunately the article is not providing the information we meant to query for.
With the release of semantic search, now we can enable a ranking algorithm that will use deep neural networks to rank the articles based on how “meaningful” they are relative to the query. Internally, this is a ranker that is applied on top of the results returned by the BM25-based ranker. Using semantic search capabilities, these are the top results for our query:
I read the content of the top-document called “Add users and assign licenses at the same time”, and it is clear that this is exactly the document I need! Semantic search made this connection even though the title and content are not syntactically close to my query.
In the previous example, the title of the document by itself did not make it very easy for me to catch if that was a relevant document or not. I still had to read it to find the snippet in the documentation that told me how to add a user to Office.
The good news is that now you can also get semantic answers! It is one of my favorite features; it uses an AI model that extracts relevant passages from the top documents, and then ranks them on their likelihood of being an answer to the query. If we find a passage that has a high likelihood of answering the question, we will promote it as a semantic answer.
This is what it looks like, in this case. Note that we even leveraged a model from Bing to provide highlights for the most relevant section in the semantic answer.
Similarly, we can extract the most relevant section of each document returned so you can quickly skim through the results and see if they have the content that you care about; making it easier for you to triage the results briefly and go deeper into the ones that you think are relevant given your context.
Get started today!
Using semantic search is easy. After you sign up for the preview at http://aka.ms/semanticpreview, all you need to do is change your query parameters as part of the request as shown below. Note that there is no need to re-index any of your content!
Set to “semantic” to indicate that you would like semantic ranking and answers. Other values supported: “simple” and “full”.
Ordered list of fields that semantic ranking should be applied on. If you have a title or a short field that describes your document, we recommend that to be your first field. Follow that by the url (if any), then the body of the document, and then any other relevant fields.
“en-us” is the only supported value today. We will be adding more languages soon. Stay tuned.
Set to “lexicon” if you would like spell correction to occur on the query terms. Otherwise set to “none”.
Set to “extractive” if you would like to get extractive answers. Otherwise set to “none”.
POST https://[service name].search.windows.net/indexes/[index name]/docs/search?api-version=2020-06-30-preview
"search": " Where was Alan Turing born?",
"text": "Turing was born in Maida Vale, London, while his father, Julius…",
"highlights": " Turing was born in <strong>Maida Vale, London</strong> , while …",
"text": " Alan Mathison Turing, (born June 23, 1912,
London, England—died June 7, 1954…",
"highlights": " Alan Mathison Turing, (born June 23, 1912,
<strong/>London, England</strong>—died June…",
I am personally super excited about these new capabilities, the efficiencies that they will bring to you, and the progression of our vision to bring the best AI capabilities at Microsoft to Azure developers!
Luis Cabrera – on behalf of the Azure Cognitive Search team
Customers & Partners
Case Study: PPL Electric Utilities Corporation, a utilities company, is working with Neudesic to create a web application with Azure Cognitive search to empower its field workers to find the most relevant information wherever they are.
Case Study: Howden teamed up with OrangeNXT to further improve Smart Records using key elements of their digitalNXT Search, a fully managed cloud solution powered by Azure Cognitive Search.