In the ever-evolving world of artificial intelligence and machine learning, the ability to process images stands swiftly and accurately at the forefront of technological advancements. Azure Machine Learning (AzureML) and Azure AI Studio, cutting-edge machine learning platforms by Microsoft, has consistently been at the helm of such innovations. In our latest update, we've introduced an exciting advancement: the availability of SAM (Segment Anything Model) models in Azure AI model catalog. Now users have the flexibility to create their own SAM endpoints, tailored to their specific project needs. This Model is a game-changer in the realm of image processing, particularly for generating segmentation masks in scenarios where extensive datasets are unavailable.
Traditionally, the generation of accurate segmentation masks requires large and comprehensive datasets. However, many real-world applications lack this luxury, especially when dealing with niche or rapidly changing subjects. This is where SAM steps in, offering a novel solution by leveraging bounding box data from object detection (OD) models.
In this blog, we delve into the applications of SAM in scenarios where sufficient data for training a traditional segmentation model is not available.
Throughout this article, we will provide a comprehensive step-by-step guide, from training OD models with constrained datasets to deploying these models effectively. Our aim is to illustrate how SAM can be harnessed to achieve precise image segmentation, especially in contexts where data limitations have traditionally hindered the development of effective segmentation models. Whether you're well-versed in AI or just beginning your journey, this guide will offer a clear understanding of SAM's capabilities on Azure Machine Learning Platform and its transformative potential in your next image segmentation project.
SAM, or the Segment Anything Model, introduced by Meta, stands at the forefront of a new era in computer vision. Segment Anything task involves identifying and isolating specific objects within an image, regardless of the object type or the image domain.
What sets SAM apart is its ability to understand and respond to a variety of input prompts. These prompts can be as simple as foreground/background points, rough boxes, or masks around the object of interest. Additionally, SAM can interpret freeform text and interactive clicks, allowing users to specify or refine the object to be segmented with ease. This adaptability makes SAM a highly versatile tool, capable of catering to a wide array of segmentation needs.
For more detailed information about this model, visit Segment-Anything.com.
Fine-Tune and deploy an OD (Object Detection) Model on Azure Machine Learning/Azure AI Studio.
Infer the deployed OD model endpoint for getting the Bounding Box Prompts.
Deploying the SAM Model.
Infer the SAM Model with the Bounding Box prompts generated by OD model.
For a practical walkthrough, refer to our detailed Jupyter notebook, which complements this blog. It's a great resource for those who want to dive into the code and see these steps in action. Access it here: Jupyter Notebook for SAM Segmentation Masks.
We will start by fine-tuning an Object Detection (OD) model using the odFridgeObjects dataset, a collection of 128 images featuring four types of beverage containers (can, carton, milk bottle, water bottle) against different.
For our current task, we have selected 'YoloF' (identified as ‘mmd-3x-yolof_r50_c5_8x8_1x_coco’) from the model catalog. However, users have the freedom to choose any Object Detection model from the Model Catalog. In Azure AI model catalog, we have curated a selection of OD models that are optimized with a rich set of defaults, offering a good out-of-the-box performance for a diverse range of datasets. In addition to the curated models, you can use any model from OpenMMLab’s MMDetection Model Zoo. This flexibility and ease of use open a plethora of possibilities for users to tailor their projects according to their specific requirements.
Please check out this Blog to get the comprehensive overview of available vision models in Azure AI model catalog.
To fine-tune our model, we'll be utilizing a comprehensive guide outlined in Jupyter notebook. This Jupyter notebook is dedicated to detailing the process to fine-tune, deploy and infer on the models from the OpenMMLab’s MMDetection zoo within Azure Machine Learning.
Next, we'll run inference on the deployed Object Detection (OD) model to generate bounding box prompts. These prompts effectively highlight the areas of interest within the images, marking out the specific regions for subsequent segmentation.
Here’s how deployed YoloF finetuned model online endpoint would look after successful deployment and finetuning of the model:
To ensure successful inference with the YoloF model, it's crucial that both the input and output formats align with what is expected by the deployed YoloF endpoint. For guidance on the correct formats, you can refer to the sample inputs and outputs provided in the YoloF model card. Additionally, the notebook mentioned earlier offers detailed instructions and examples to help you achieve successful inference with YoloF.
Now, let’s deploy SAM to an online endpoint into our AzureML/Azure AI Studio Workspace, to process the input for creating segmentation masks.
You can deploy SAM in your project effortlessly either by coding with the SDK, guided by this reference notebook, or using CLI, guided by this example, or using Azure Machine Learning/Azure AI Studio UI for a seamless no-code experience. We recommend checking the SAM model card for required input and output formats before starting.
Azure AI model catalog currently offers three versions of the SAM model: ' facebook-sam-vit-base', ' facebook-sam-vit-large ', and ' facebook-sam-vit-huge'. For our experiment, we have chosen 'facebook-sam-vit-huge’ to balance accuracy, compute requirements, and inference latency. However, you have the flexibility to select the model version that best aligns with your project’s needs, whether it is prioritizing higher accuracy, available computational resources, or faster inference times.
The final step in our image segmentation process is to use SAM for inference. We start by converting the bounding box prompts from our Object Detection model into a format compatible with SAM, as detailed in the SAM model card. Typically, OD model bounding boxes are normalized ('topX', 'topY', 'bottomX', 'bottomY'), but SAM requires absolute coordinate values. This conversion is key for SAM to accurately generate precise segmentation.
When inputting data into SAM, ensure it matches the format specified in the SAM model card. Depending on your needs, set the 'multimask_output' variable accordingly. Setting it to True provides multiple masks for each bounding box, allowing for varied segmentation options. If set to False, SAM generates a single mask per prompt.
The output is structured as a JSON response, which includes the encoded binary mask and the Intersection over Union (IoU) score for each mask. This format provides a clear and comprehensive view of the segmentation results, allowing for easy interpretation and application in subsequent processes.
After obtaining the output from the SAM model, the subsequent action involves decoding the encoded binary mask. Utilize the following code snippet to efficiently convert and store the generated binary mask:
import base64
import io
from PIL import Image
import os
def save_image_from_base64(base64_string, save_path):
# Decode the base64 string
image_data = base64.b64decode(base64_string)
# Convert binary data to a file-like object
image_file = io.BytesIO(image_data)
# Open the image file using PIL
image = Image.open(image_file)
# Save the image
image.save(save_path, format='PNG') # You can change the format if necessary
# Usage example
base64_string = 'your_base64_string_here' # Replace with your base64 string
save_path = 'path_to_save_image/image_name.png' # Replace with your desired save path and file name
save_image_from_base64(base64_string, save_path)
Finally, we can store the segmentation masks corresponding to each bounding box prediction made by the OD model for various objects in our images.
In our analysis, we utilized an 80:20 split for training and testing data with the odFridgeObjects dataset. After training, the bounding boxes generated by the 'YoloF' Object Detection model were fed into the SAM model to create segmentation masks. We then conducted a comprehensive evaluation of these SAM-generated masks against the ground truth using the test split of the dataset. To assess the quality of the masks, we employed metrics such as Intersection over Union (IoU) and Accuracy. Below are the results and insights derived from this evaluation process.
The table shows the different evaluation metrics and their values:
Metric |
Value |
Average IoU |
0.951422254326566 |
Average Accuracy |
0.9280402048325956 |
As we wrap up our exploration of using the SAM model for advanced image segmentation, it's exciting to note the high evaluation metrics we achieved without training a traditional segmentation model. By harnessing the combination of an OD model and SAM, we've navigated scenarios with limited data and still attained impressive results. This method not only showcases significant progress in computer vision but also demonstrates the practicality and effectiveness of the OD+SAM approach. We hope this technique proves to be an asset to your next project.
Remember, the journey does not end here. Azure Machine Learning and Azure AI Studio offer a rich catalog of vision models, each with unique capabilities and applications. We encourage you to explore these models and discover how they can further enhance your machine learning endeavors. Happy experimenting, and we look forward to seeing the innovative ways you apply these technologies in your work!
Learn more:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.