It’s now the holiday season, you may want to send greetings, beautiful photos to your beloved one, friends, co-workers, etc. What if there is a tool to help you edit an existing image with the way you want, you can use natural language to explain what you want to enhance/change, then the new image is generated automatically, isn’t it cool!
This blog talks about how to edit an image using Azure OpenAI and Azure Machine Learning (AzureML) models.
Goal: edit an existing image on a specified portion. This way you can preserve certain portions of the image untouched.
Section 1. Overall steps:
- Generate a mask on the existing image. I will introduce five ways to generate masks in later sections. One of the ways is to use prompt to explain the area you want to change, and this way needs genAI model, here I use GPT-4o in my example. Then the mask is generated.
Note: For using SAM (Segment Anything Model), please reference the github repo here: https://github.com/Azure/gen-cv/blob/main/dalle2-api/DALLE2-Segment-Anything-edits.ipynb
- Edit the image on the masked area. Here I use runwayml-stable-diffusion-inpainting model in my example. You enter a prompt to explain how the image should be changed, then the new image is generated.
The code example is here: https://github.com/Azure/gen-cv/blob/main/deploy-stable-diffusion-on-azure-ml/image_editing.py
To run the code: python image_editing.py orig_image.png mask_image.png final_image.png
Now, let me go over the process step by step in detail.
Section 2. Preparation.
In Azure OpenAI Service, deploy GPT-4o model, get endpoint and api_key.
In AzureML studio, deploy runwayml-stable-diffusion-inpainting model, get endpoint url and api_key.
Visit Azure AI Vision Studio here. I will explain how to integrate these services into image editing.
To achieve good results, prepare the image as a square.
This is the image I’m going to use as an example, it’s 720*720 pixel.
Section 3. Mask generation.
I’m introducing five methods to generate a mask.
3.1. Method 1: Use prompt to generate mask.
In order to generate mask accurately to cover the desired area, you’d better have an idea of the existing image size in pixel unit. Here is my example prompt:
image size 720*720-pixel, y axis is top down, please generate a polygon covering the right side TV and underneath area including decors and cabinets, and the plant at the right corner.
GPT-4o model will find the points in the polygon, but the response includes some text besides the polygon coordinates.
You need to enter another prompt to generate numpy array only. Here is the second prompt:
Please provide the polygon in numpy array format in a single row without comments
Response:
Once you verify the numpy format is correct, and enter ’yes’, the mask is generated.
Next, you can enter prompt for editing the image.
Prompt: replace with kids' toys
Below are the original image, mask image and edited image:
3.2. Method 2: Use mouse click to generate mask.
In this method, the image will be open, you can use computer mouse to click the points wherever you want to include in the polygon, at last, type ‘enter’ key to close the polygon, then the mask is generated.
Prompt for editing the image: a big basket with colorful flowers
Here are the original image, mask image and edited image:
3.3. Method 3: Reverse existing foreground matting.
You can use Azure AI Vision studio to get foreground matting.
Then you save the foreground matting image, and input the image path to the prompt, the mask is generated by reserving the black and white areas.
Prompt for editing the image: big movie posters
Below are foreground matting, mask and edited image.
3.4. Method 4: Bring your own mask.
If you have an existing mask image file, just use it.
Prompt for editing the image: sound insulated ceiling
Below are the original image, mask image and edited image.
3.5. Method 5: Create mask with points coordinates.
If you have a mask polygon with numpy array format, you can input it in prompt.
Prompt for editing the image: Steinway baby grand piano
3.6. Other options.
Another option I want to mention is that you can use Azure AI Vision Object Detection service to get object bounding boxes, like showing in screen shot below.
Then you convert the coordinates into numpy array format, use above method 5 to generate mask.
Acknowledgement:
Thanks Takuto Higuchi, Anton Slutsky, Vincent Houdebine for reviewing the blog.
Updated Nov 27, 2024
Version 1.0Helen_Zeng
Microsoft
Joined September 20, 2023
AI - Machine Learning Blog
Follow this blog board to get notified when there's new activity