Cognitive Search is an AI-first approach to content understanding, powered by Azure Search and Cognitive Services. A custom skill is a Cognitive Search capability that extends your knowledge mining solution through REST API calls. Azure Functions is a serverless hosting platform in which Python support went to GA on August 19th, 2019.
In this article you will learn how to combine these services to expand your Knowledge Mining solution by adding an Azure-Functions-for-Python Custom skill, to filter unwanted terms created by Predefined skills like Key-Phrases or Entities extractions. This custom skill will prevent those terms to be loaded into your Azure Search index.
Predefined and Custom skills are used within Azure Cognitive Search Enrichment Pipeline, when Microsoft AI is used to created metadata about your data. To learn more about this process, click here to see the product documentation or here to read our blog posts about it.
Figure 1- The simplified view of the solution architecture diagram
Why a Python Custom Skill
Python became the data science lingua franca, a common tongue that bridges language gaps within and between organizations. Because of containerization, community contributions and knowledge, open source libraries, and frameworks, a python code for custom skill allows you to easily deploy, migrate, or maintain the code.
Another key factor for the decision to use Python as a custom skill is that Azure Functions support for it is Generally Available. This allows you to create serverless computation with minimal effort, on demand performance, and predictable costs on demand. Azure Functions is the default Azure compute option for Cognitive Search custom skills and now you don’t need to only use C# to create one.
Why Azure Functions
Python code can run within containers and has many deployment options like Azure Kubernetes Service (AKS) and Azure Container Instances (ACI). Both are flexible and abstract the required infrastructure required to run your containers. Both also can be provisioned and deployed with AML SDK for python, allowing you to prepare and deploy your code as a web service (REST API) right from Jupyter Notebook, Visual Studio Code, or any other IDE you prefer.
But Azure Functions has key advantages, starting with the pricing. It will charge you on per-second resource consumption and executions. Consumption plan pricing includes a monthly free grant of 1 million requests and 400,000 GB-s of resource consumption per month. For most of the cases, this free tier will be enough to run the all workload. At the same time, AKS and ACI deployments will charge you around the clock, being used or not, creating unnecessary costs for this post specific scenario, with a big initial load and a small daily usage.
The last Azure Function decisive feature is the automatic protocol management. Azure Search requires https for custom skills, adding extra work for AKS or ACI deployments: certificate acquisition and management. Azure Search will test if the certificate is valid, so there is no workaround to a valid certificate usage. In the other hand, Azure Functions handles that for you automatically, using by default the configuration of the image below, that is compliant with Azure Search security requirements. It will also scale out automatically, important feature for unpredictable workloads.
Figure 2: https settings
Code and Deployment
The code of this solution is available in this GitHub repo and here are some important guidelines:
Figure 3 - Details of the Python code
Key Lessons Learned
Here is a list of good practices from our experience when creating this solution for a client:
PowerSkills – Official Custom Skills
Breaking news! For 1-click-to-deploy C# Custom skills, created by the Azure Search Team, click here. This initiative is called PowerSkills and doesn't have require previous C# knowledge or software installations. There are Custom Skills for Lookup, Distinct (duplicates removal), and more.
Related Links
Useful links for those who want to know more about Knowledge Mining:
Conclusion
This solution addresses the requirement to remove unwanted terms, cleaning the general entities or key phrases created by Cognitive Search. But it can be easily adapted to wanted terms (lookup) or duplicates removal (deduplication). Our Knowledge Mining Accelerator (KMA) implements a C# version of the unwanted terms custom skill; you can clone the repo and use that open source code. Stay tuned, we will also publish here posts on how to do filtering and deduplication using python custom skills!!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.