This post was co-authored by @JS Tan, @Patrick Buehler, @Anupam Sharma and @Jun Ki Min
In recent years, we've seen extraordinary growth in Computer Vision, with applications in image understanding, search, mapping, semi-autonomous or autonomous vehicles and many more.
The ability for models to understand actions in a video, a task that was unthinkable just a few years ago, is now something that we can achieve with relatively high accuracy and in near real-time.
Action Recognition
However, the field is not particularly welcoming for newcomers. Without prior experience or guidance, building an accurate classifier can easily take weeks. Unless you're ready to spend a long-time learning computer vision, it's extremely hard to master the basics, let alone begin to explore some of the cutting-edge technologies in the field. Even for computer vision experts, building a quick Proof of Concept (POC) can be nontrivial and could easily end up taking many days to put together.
At Microsoft, we have been working for many years on diverse Computer Vision solutions for our customers and collected our learnings into our new public Microsoft repository: https://github.com/microsoft/ComputerVision-recipes.
The goal of this repository is to provide examples and best practice guidelines for building computer vision systems on Azure, and to share this with the open-source community. More specifically, our goal was to create a repository that will help us to provide solutions rapidly to the community and to customers that we work with, or with on-boarding new team members who may have expertise in data science, but not specifically in computer vision. From mastering some of the most common scenarios in the field, like image classification, object detection, and image similarity, to exploring cutting edge scenarios like activity recognition and crowd counting, this repo will guide you through building models, fine-tuning them, and using them in real-world scenarios.
We're kicking off our repo with 5 scenarios:
Scenario |
Support |
Description |
Base |
Image Classification is a way to learn and predict the category of a given image. (Ex: Is the picture of a ‘dog’ or a ‘cat’?) |
|
Base |
Image Similarity is a way to compute a similarity score given a pair of images. Given an image, it allows you to identify the most similar images in a dataset. (Ex: This picture of a dog is the most like which of the following images of animals?) |
|
Base |
Object Detection is a supervised machine learning technique that allows you to detect where on a given image an object of interest is. (Ex: Where in the image are there animals?) |
|
Contrib |
Action Recognition is used to identify in video footage what actions are performed and at what respective start/end times. (Ex: When is there someone drinking in the video?) |
|
Contrib |
Crowd Counting is a use-case that leverages supervised machine learning techniques to count the number of people in an image – this applies to both low-crowd-density (e.g. less than 50 people) and high-crowd-density (e.g. thousands of people). (Ex. How many pedestrians are in this image of a street?) |
Rather than creating implementations from scratch, we draw from popular state-of-the-art libraries (e.g. fast.ai and torchvision), and we build additional utility around loading image data, optimizing models, and evaluating models. In addition, we aim to answer the frequently asked questions, try to explain the deep learning intuitions, and highlight common pitfalls.
Whether you are an expert in computer vision or just getting your hands wet, we believe this repository offers something for you. For the beginner, this repo will guide you through building a state-of-the-art model and help you develop an intuition for the craft. For the experts, this repository can quickly get you to a strong baseline model which is easy to extend using custom Python/PyTorch code. In addition, the repository also aims to provide support with 1) the full data science process, and 2) the tooling to succeed on Azure.
We hope that these examples and utilities will make it easier and faster for developers to create custom vision applications.
The Computer Vision Recipes GitHub repository shows you how to approach the five key steps of the data science process and provides utilities to enrich each of the steps:
Inside the computer vision recipes repo, we have added a lot of utility to support common tasks such as loading datasets in the format expected by different algorithms, splitting training/test data, and evaluating model outputs.
This computer vision repository also has deep integration with the Azure Machine Learning service to complement your work locally. We provide code examples on how you can optionally and easily scale your training into the cloud, and how you can deploy your models for production workloads.
Azure Cognitive Services
Note that for certain computer vision problems, you may not need to build your own models. Instead, pre-built or easily customizable solutions exist which do not require any custom coding or machine learning expertise.
Before using the Computer Vision repository, we strongly recommend evaluating if these can sufficiently solve your problem.
To give you a sense of how you can use our repo to build a state of the art (SOTA) model, here is a preview of how simple it is to create an Object Detection model. Of course, you can go much deeper and add custom PyTorch code, but getting started is as simple as this:
1. Load your data
The first step is to load your data – we help you do this with a simple object that automatically parses your data and the annotations:
from utils_cv.detection.data import DetectionLoader
data = DetectionLoader("path/to/data")
2. Train/fine-tune your model
Then we create a 'learner' object that helps you manage and train your model. By default, it will use torchvision's Faster R-CNN model. But you can easily switch it out.
from utils_cv.detection.model import DetectionLearner
detector = DetectionLearner(data)
detector.fit()
3. Evaluate
Finally, lets evaluate our model using the built-in helper functions. We can look at the precision and recall curves to give us a sense of how our model is performing.
from utils_cv.detection.plot import plot_pr_curves
eval = detector.evaluate()
plot_pr_curves(eval)
As we continue to build out of repository, we will be looking for new computer vision scenarios to unlock. Feel free to reach out to cvbp@microsoft.com or post an issue if you wish to see us cover a scenario.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.