Knowledge Mining with Azure Cognitive Search

Published Nov 20 2019 08:39 AM 7,083 Views

The evolution of enterprise search technology

For decades, the term “search” has been ubiquitous with exploring, finding, and making sense of the information at your disposal. It started with keyword search, allowing users to find specific information that matched a certain keyword or phrase. Then, it evolved into more effective solutions powered by advances in natural language processing and other machine learning technologies. Features like fuzzy search, spellers, autosuggest, etc. significantly improved the quality and relevance of search results.


Now, there is cognitive search, which broadens the mediums users can explore effectively (PDFs, spreadsheets, images, audio files, etc.), and moves from “finding” to “understanding.” This is made possible by a combination of natural language processing, computer vision, and new advances in machine learning technology.


We formally started on this journey with the “cognitive search” feature in Azure Search, as a way to talk about the new Cognitive Services-based capabilities in the enrichment pipeline. But AI is deeply embedded into the entire product, from ingestion to exploration:




Azure Search will continue to have significant elements of machine learning powering content understanding, ranking, and more, and we needed a name to reflect that: Azure Cognitive Search. This single name captures both our core value prop (“Search”) and our approach to making it better and more broadly applicable with AI (“Cognitive”).


What’s possible with Azure Cognitive Search

We’re empowering developers to create cognitive search solutions by simplifying the process into to three main steps:

  • Ingest: scale to ingest a multitude of data types. Search is no longer just about text contained in documents and web pages. Cognitive search solutions can also handle nontraditional enterprise data like images, video, and audio across multiple data stores.
  • Enrich: apply artificial intelligence to transform, structure and enhance your data. In order to extract the latent value in general documents, Cognitive Search helps you extract structured data from unstructured content. You can apply built-in cognitive skills such as printed text recognition or entity extraction or you can enrich your content with your own custom skills with support for Azure Machine Learning and Azure Cognitive Services.
  • Explore: have the flexibility to build a range of application experiences. Search is not just about a text box in an enterprise portal. Enterprises build a range of applications powered by search such as customer 360 applications, digital asset management solutions, and healthcare research tools. Business also want to create new interfaces to explore their data with bots, as well as AI-driven data visualizations in applications like PowerBI. Azure Cognitive Search creates a search index to power your search experiences, but also allows you to store the tree of information found in your data in a knowledge store, which you can use to power any custom experience that you want to create.


What’s new?

Ingest more file formats from more data sources: As announced at our recent Ignite conference, we are expanding the scope of data sources supported by our pull indexers. We have added built-in support for ADLS Gen 2, Cosmos DB Gremlin API, and Cosmos DB Cassandra API. We have also introduced the ability to extract content and metadata from a document as a skill in case you need to apply any transformations to your content before that stage. Also, we've made it possible for you to create custom document cracking skills.


New skills

  • Custom Entity Lookup: This built-in cognitive skill finds user-defined entities in a given text. This is a common scenario to tailor search and exploration to your industry or line of business. There may be certain terms that are especially interesting for you to identify through the enrichment pipeline, such as manufacturing parts or your retail inventory. Our entity lookup skill allows you to provide a list of entities that you care about, and have Cognitive Search find and tag them across all your content.
  • Power skills: In addition to our built-in cognitive skills, power skills are a collection of useful functions that can be deployed with a click of a button into your Azure subscription, and then used as cognitive skills for Azure Cognitive Search. We also invite you to contribute your own work by submitting a pull request.


AI management: As discussed earlier, the biggest pain points in the AI space come with the management and orchestration of several machine-powered services together. Our enrichment pipeline streamlines this process, and we’ve made it even simpler in the latest updates.

  • Code-free skillset management: We added the ability to edit the skillset straight from the Azure portal, and improved the error and warning experience so you can more easily debug skillsets and indexers.
  • Re-indexing: One pain-point that we heard from developers is that they sometimes need to change the skillset definition, and that has required them to re-index all of their content from scratch, and therefore have to run all the skills all over again – No more! We now have support for incremental indexing. Incremental indexing has smart reprocessing capabilities, as long as you tell the service to cache intermediate results, it will use that cache to only perform the skills that are absolutely necessary, and not have to re-run skills that were not modified or impacted by a modification.  Think of this as "incremental build" for your knowledge store.

Explore: At the end of the day, the point of creating a sophisticated Cognitive Search solution is often to be able to apply it to an application. With that in mind, we’ve improved the process of moving from raw content to exploration with the following:

  • Web app builder: We’ve added a custom web app builder so that within the Azure Portal, you can create your own search index and end-user interface seamlessly.



  • PowerBI solution templates over Knowledge Store: We've released built-in templates for easily creating a PowerBI experience on top of your Knowledge Store. This allows you to rapidly create powerful analytics experiences on top of your enriched documents.
  • Project multimedia into the knowledge store: Up until now, the Cognitive Search knowledge store was limited to text artifacts. Today we’re announcing the support for projecting images and other multimedia data types into your knowledge store, which allows you to now fully represent knowledge across any data type – not just text.


Getting Started


Version history
Last update:
‎Nov 20 2019 08:38 AM
Updated by: