Microsoft Foundry Blog

5 MIN READ

Semantic Search in Action

Former Employee

Aug 21, 2021

Azure Cognitive search has included a new feature called semantic search. Customers have put this feature to action, so early in May, 2021, Ogilvy a subsidiary of WPP incorporated semantic search in their Enterprise Knowledge Management system called Starfish. The project is based around a content Discovery portal which should be the first point of contact for users and a key component in Ogilvy’s rich ecosystem. It uses cognitive search which provides intelligent document insights and recommendation son RFI, RFP’s and case studies, leading to faster and efficient response to new business requests.

A client typically ask a series of questions starting with inquiring about Ogilvy as a company, the capabilities and its accomplishments, similar works for a peer company, and fees. On the Starfish Portal they would ask the following

When Ogilvy receives an RFI, it will include some basic questions about Ogilvy
- Where are Ogilvy's headquarters?
- What are Ogilvy's core competencies?
- Who are Ogilvy's biggest customers?
RFI’s will also include deeper questions about Ogilvy’s experience and how they think/work
- Give an example of Ogilvy’s work <- good answer
Ogilvy may also want to reference past customers scenarios to show how they solved problems in the past
- What was Ogilvy's campaign for Fanta?
- When was Fanta discovered?

Without Semantic search query terms are analyzed via similarity algorithms, using a term frequency that count the number of times a term appears in a document or within a document corpus. A probability is applied and estimates if this is relevant. Intent is lacking in most web experience.

Overall Sematic search has significantly advanced the quality of search results:

Technology benefits:

Intelligent Ranking - uses a semantic ranking model , so search is based on the context and intent , it is elevating matches that make more sense given the relevance of the content in the results.
Better Query Understanding – it is based on meaning and not just the syntax of the word unlike others technologies that will use text frequency. WHO sent a message ( World Health Org) vs Who is the father…?
Semantic answers – It improves the quality of search results in two ways. First, the ranking of documents that are semantically closer to the intent of original query is a significant benefit. Second, results are more immediately consumable when captions, and potentially answers, are present on the page. At all times, the engine is working with existing content. Language models used in semantic search are designed to extract an intact string that looks like an answer but won't try to compose a new string as an answer to a query, or as a caption for a matching document.

We use Deep neural nets in Bing that understand the nuance of the language and trained on different models of the language – how words are related in various context and dimensions.

Figure 1.

Json Query

{

"search": "When was Fanta Orange discovered",

"queryType": "semantic",

"queryLanguage": "en-us",

"speller": "lexicon",

"answers": "extractive|count-3",

"searchFields": "content,metadata_storage_name",

"count": true

}

Response : Note the caption in the answer

{

"@odata.context": "https://ci-acs.search.windows.net/indexes('ogilvy-poc-index')/$metadata#docs(*)",

"@odata.count": 2115,

"@search.answers": [

{

"key": "79b0fe8e-0648-4cc5-bd5c-eaf0e2027855",

"text": "First launched Fanta began Fanta U.S. in U.S. phasing out in U.S. relaunch 1940 1941 1959 1987 2002 2005 First launched Minute Maid in Germany launched in U.S. As beverage choice has exploded in recent years, carbonated soft drinks (CSDs) have faced stiff competition.",

"highlights": null,

"score": 0.8339705

}

Versus the same query without Semantic Search :

{

"search": "When was Fanta discovered",

"queryType": "full",

"queryLanguage": "en-us",

"speller": "lexicon",

"count": true

}

{

"@odata.context": "https://ci-acs.search.windows.net/indexes('ogilvy-poc-index')/$metadata#docs(*)",

"@odata.count": 3253,

"@search.nextPageParameters": {

"search": "When was Fanta discovered",

"queryType": "full",

"queryLanguage": "en-us",

"speller": "lexicon",

"count": true,

"skip": 50

Response has several hits but not close

"value": [

{

"@search.score": 42.056797,

"content": "\n_rels/.rels\n\n\ndocProps/core.xml\n\n\ndocProps/app.xml\n\n\nppt/presentation.xml\n\n\nppt/_rels/presentation.xml.rels\n\n\nppt/presProps.xml\n\n\nppt/viewProps.xml\n\n\nppt/commentAuthors.xml\n\n\nppt/slideMasters/slideMaster1.xml\nTitle TextBody Level OneBody Level TwoBody Level ThreeBody Level FourBody Level Five\n\n\nppt/slideMasters/_rels/slideMaster1.xml.rels\n\n\nppt/theme/theme1.xml\n\n\nppt/slideLayouts/slideLayout1.xml\nTitle TextBody Level OneBody Level TwoBody Level ThreeBody Level FourBody Level Five\n\n\nppt/slideLayouts/_rels/slideLayout1.xml.rels\n\n\nppt/slideLayouts/slideLayout2.xml\nTitle TextBody Level OneBody Level TwoBody Level

Technology Background:

Semantic search adds a semantic ranking model; and second, it returns captions and answers in the response.

Semantic ranking looks for context and relatedness among terms, elevating matches that make more sense given the query. Language understanding finds summarizations or captions and answers within your content and includes them in the response, which can then be rendered on a search results page for a more productive search experience.

State-of-the-art pretrained models are used for summarization and ranking. To maintain the fast performance that users expect from search, semantic summarization and ranking are applied to just the top 50 results, as scored by the default similarity scoring algorithm ( BM25) . Using those results as the document corpus, semantic ranking re-scores those results based on the semantic strength of the match.. Scores are calculated based on the degree of linguistic similarity between query terms and matching terms in the index

The underlying technology is from Bing and Microsoft Research, and integrated into the Cognitive Search infrastructure as an add-on feature.

In the preparation step, the document corpus returned from the initial result set is analyzed at the sentence and paragraph level to find passages that summarize each document. In contrast with keyword search, this step uses machine reading and comprehension to evaluate the content. Through this stage of content processing, a semantic query returns captions and answers. To formulate them, semantic search uses language representation to extract and highlight key passages that best summarize a result. If the search query is a question - and answers are requested - the response will also include a text passage that best answers the question, as expressed by the search query.

For both captions and answers, existing text is used in the formulation. The semantic models do not compose new sentences or phrases from the available content, nor does it apply logic to arrive at new conclusions. In short, the system will never return content that doesn't already exist.

Results are then re-scored based on the conceptual similarity of query terms.

Key Success Measurements for Ogilvy

40% improvement in RFP/RFI response time.
Content growth per month
RFP Generator clicks
Content downloads
User Adoption and Collaboration
Quality Content Searches

Business Outcomes:

The biggest business impact will be to have a significant increase in win rate for RFI's which lead to a higher revenue, this was achieved by the portals ability to identify best answers to the RFI and layouts without having to perform multiple searches, saving time and resources. Being able to use routine methods, filters and cognitive function to refine the search results would eliminate redundancy by almost 40%, reducing the costs of the process, and enhance customer experience and satisfaction.

Published Aug 21, 2021

Version 1.0

azure ai services

knowledge mining

Sonia Ang

Former Employee

Joined September 24, 2018

View Profile

Microsoft Foundry Blog

Follow this blog board to get notified when there's new activity