Azure Cognitive Services provides a suite of AI services and APIs that lets developers work with AI technologies without having a deep expertise in machine learning. This post will cover how we can use two of these services together, Custom Vision and Bing Image Search APIs, along with a .net core console application for rapid prototyping of Custom Vision models.
Custom Vision is a service that lets user build and deploy customized computer vision models using their own image datasets. The process of training a customized computer vision model is simplified as the machine learning happening under the hood is all managed by Azure, only the image data for the model itself is required by the user. A separate user interface is also provided as part of the Custom Vision service which makes it very simple to understand and use.
Bing Image Search APIs is a service that executes a search query and returns a result of images and functions very similarly to an image search done the web version of Bing Image Search. Query filters can also be applied as part of the Bing Image Search APIs to refine the results e.g: filtering for specific colours, selecting image type (photograph, clipart, GIF). The image below shows the Bing Image Search APIs through a visual interface that users can try their own search terms on, as well as apply some query filters such as the image type and content freshness.
To create a Custom Vision model, it is recommended to have at least 50 images for each label before beginning to train a model. This can be a time consuming process especially when you have no pre-existing datasets and looking to prototype multiple models. By using a combination of the Bing Image Search APIs and the Custom Vision REST APIs, the process of populating a Custom Vision project with tagged images can be accelerated, and once all the images are in the Custom Vision project and tagged, a model can immediately be trained. The flow of this process is captured in a .net core console application that easily be altered to test this process with different Bing Image Search terms and query filters to understand what results are returned and how to further improve the model. The below diagram shows the flow between the components of this application.
After creating the necessary resources on Azure, the console application of this solution can be opened to specify the name of the tag and the search term to be queried in Bing Image Search. In this example, two subjects are set, the first one with a tag name of "Apple" and a search term of "Red Apple", and the second one with a tag name of "Pear" and a search term of "Green Pear". Afterwards, the console application is run and the user populates all the required values such as the resource keys. This will then trigger off the application at it starts with carrying out a search query and populates the specified Custom Vision Project. Once the application has finished running, the Custom Vision project should be populated with tagged images of red apples and green pears. To now train the model, the user can select between two options: quick training and advanced training.
Quick training trains the model in a few minutes which is good for quick testing of simpler models.
Advanced training option provides the option of allocating virtual machines over a selected amount of time to train a more in-depth model.
In this example, within 2 minutes after selecting the quick training option, a model for distinguishing between apples and pears has been trained. To test my model, I've used a photo of an apple at home which has been correctly identified as being an apple. If the user wanted to expand on this and include more fruit as part of this model, this can easily be done with very minor changes to the code. Otherwise, by also changing the count & offset values when running the console application, more images of apples and pears can be populated in the project to retrain an updated model.
More detailed steps on running this solution are available in the Readme as part of the GitHub repository for this solution which can be used to not just classify between apples and pears but any other examples you have in mind - I have also used this solution to create a Custom Vision model that classifies between 5+ different car models. At the time of writing this post, this solution can be run on the free tiers of both Custom Vision and Bing Image Search APIs so please feel free to try this in your own environment.