Blog Post

AI - Azure AI services Blog
5 MIN READ

Semantic Search in Action

Sonia Ang's avatar
Sonia Ang
Icon for Microsoft rankMicrosoft
Aug 22, 2021

 

Azure Cognitive search has included a new feature called semantic search. Customers have put  this feature to action, so early in May, 2021, Ogilvy a subsidiary of WPP incorporated semantic  search in  their  Enterprise Knowledge Management  system called Starfish. The project is based around  a content Discovery portal which should be the first point of contact for users and a key component in Ogilvy’s rich ecosystem. It uses cognitive search which provides intelligent document insights and recommendation son RFI, RFP’s and case studies, leading to faster and efficient response to new business requests.

A client typically ask a series of questions starting with inquiring about Ogilvy as a company, the capabilities and its accomplishments, similar works  for a peer company, and fees. On the Starfish Portal they would ask the  following

  • When Ogilvy receives an RFI, it will include some basic questions about Ogilvy
    • Where are Ogilvy's headquarters?
    • What are Ogilvy's core competencies?
    • Who are Ogilvy's biggest customers?
  • RFI’s will also include deeper questions about Ogilvy’s experience and how they think/work
    • Give an example of Ogilvy’s work <- good answer
  • Ogilvy may also want to reference past customers scenarios to show how they solved problems in the past
    • What was Ogilvy's campaign for Fanta?
    • When was Fanta discovered?

 

Without Semantic search  query terms are analyzed  via similarity algorithms, using a term frequency that count the number of times a term appears in a document or within a document corpus.  A  probability is applied and estimates if this is relevant. Intent is lacking  in most web experience.

Overall Sematic search has significantly advanced the quality of search results:

Technology benefits:  

  1. Intelligent  Ranking  - uses a  semantic ranking model , so search is based on the  context and intent , it is elevating matches that make more sense given the  relevance of the content in the results.
  2. Better Query Understanding – it is  based on meaning  and not just the syntax of the word unlike others technologies  that will use   text frequency. WHO sent a message ( World Health Org) vs Who is the  father…?
  3. Semantic answers –  It improves the quality of search results in two ways. First, the ranking  of documents that are semantically closer to the intent of original query is a significant benefit. Second, results are more immediately consumable when captions, and potentially answers, are present on the page. At all times, the engine is working with existing content. Language models used in semantic search are designed to extract an intact string that looks like an answer but won't try to compose a new string as an answer to a query, or as a caption for a matching document. 

We use Deep neural nets in Bing that understand the nuance of the language  and trained on different models of the language – how words are related in various context and dimensions.

 

Figure 1.

Json Query

{

    "search": "When was Fanta Orange discovered",

    "queryType": "semantic",

    "queryLanguage": "en-us",

    "speller": "lexicon",

    "answers": "extractive|count-3",

    "searchFields": "content,metadata_storage_name",

    "count": true

}

 

 

Response : Note the caption in the answer

{

    "@odata.context": "https://ci-acs.search.windows.net/indexes('ogilvy-poc-index')/$metadata#docs(*)",

    "@odata.count": 2115,

    "@search.answers": [

        {

            "key": "79b0fe8e-0648-4cc5-bd5c-eaf0e2027855",

            "text": "First launched Fanta began Fanta U.S. in U.S. phasing out in U.S. relaunch 1940 1941 1959 1987 2002 2005 First launched Minute Maid in Germany launched in U.S. As beverage choice has exploded in recent years, carbonated soft drinks (CSDs) have faced stiff competition.",

            "highlights": null,

            "score": 0.8339705

        }

Versus the same  query without Semantic Search :

{

    "search": "When was Fanta discovered",

    "queryType": "full",

    "queryLanguage": "en-us",

    "speller": "lexicon",

    "count": true

}

{

    "@odata.context": "https://ci-acs.search.windows.net/indexes('ogilvy-poc-index')/$metadata#docs(*)",

    "@odata.count": 3253,

    "@search.nextPageParameters": {

        "search": "When was Fanta discovered",

        "queryType": "full",

        "queryLanguage": "en-us",

        "speller": "lexicon",

        "count": true,

        "skip": 50

    },

 

Response  has several hits  but not close

 

"value": [

        {

            "@search.score": 42.056797,

            "content": "\n_rels/.rels\n\n\ndocProps/core.xml\n\n\ndocProps/app.xml\n\n\nppt/presentation.xml\n\n\nppt/_rels/presentation.xml.rels\n\n\nppt/presProps.xml\n\n\nppt/viewProps.xml\n\n\nppt/commentAuthors.xml\n\n\nppt/slideMasters/slideMaster1.xml\nTitle TextBody Level OneBody Level TwoBody Level ThreeBody Level FourBody Level Five\n\n\nppt/slideMasters/_rels/slideMaster1.xml.rels\n\n\nppt/theme/theme1.xml\n\n\nppt/slideLayouts/slideLayout1.xml\nTitle TextBody Level OneBody Level TwoBody Level ThreeBody Level FourBody Level Five\n\n\nppt/slideLayouts/_rels/slideLayout1.xml.rels\n\n\nppt/slideLayouts/slideLayout2.xml\nTitle TextBody Level OneBody Level TwoBody Level 

 

Technology Background:

 Semantic search adds  a semantic ranking model; and second, it returns captions and answers in the response.

Semantic ranking looks for context and relatedness among terms, elevating matches that make more sense given the query. Language understanding finds summarizations or captions and answers within your content and includes them in the response, which can then be rendered on a search results page for a more productive search experience.

State-of-the-art pretrained models are used for summarization and ranking. To maintain the fast performance that users expect from search, semantic summarization and ranking are applied to just the top 50 results, as scored by the default similarity scoring algorithm ( BM25) . Using those results as the document corpus, semantic ranking re-scores those results based on the semantic strength of the match.. Scores are calculated based on the degree of linguistic similarity between query terms and matching terms in the index

The underlying technology is from Bing and Microsoft Research, and integrated into the Cognitive Search infrastructure as an add-on feature.

In the preparation step, the document corpus returned from the initial result set is analyzed at the sentence and paragraph level to find passages that summarize each document. In contrast with keyword search, this step uses machine reading and comprehension to evaluate the content. Through this stage of content processing, a semantic query returns captions and answers. To formulate them, semantic search uses language representation to extract and highlight key passages that best summarize a result. If the search query is a question - and answers are requested - the response will also include a text passage that best answers the question, as expressed by the search query.

For both captions and answers, existing text is used in the formulation. The semantic models do not compose new sentences or phrases from the available content, nor does it apply logic to arrive at new conclusions. In short, the system will never return content that doesn't already exist.

Results are then re-scored based on the conceptual similarity of query terms.

 

 

Key Success Measurements for Ogilvy

  • 40% improvement in RFP/RFI response time.
  • Content growth per month
  • RFP Generator clicks
  • Content downloads
  • User Adoption and Collaboration
  • Quality Content Searches

 

Business Outcomes:

The biggest business impact will be to have a significant increase in win rate for RFI's which lead to a higher revenue, this was achieved by the portals ability to identify best answers to the RFI and layouts without having to perform multiple searches, saving time and resources. Being able to use routine methods, filters and cognitive function to refine the search results would eliminate redundancy by almost 40%, reducing the costs of the process, and enhance customer experience and satisfaction.

Published Aug 22, 2021
Version 1.0
  • abhi97's avatar
    abhi97
    Copper Contributor

    How to get semantic search results frm Sharepoint docs?