Vision Fine tuning for advanced use cases
We are excited to introduce a groundbreaking feature in the Azure OpenAI Service that allows you to fine-tune models with images in your JSONL files. This enhancement opens new possibilities for creating more dynamic and interactive AI applications.
Fine-Tuning with Images
You can now include images in your training data, just as you can send image inputs to chat completions. Images can be provided either as publicly accessible URLs or data URIs containing base64 encoded images. This feature allows you to create more comprehensive training datasets that include visual elements, enhancing the model's ability to understand and generate content based on images.
Use cases
- In the retail and e-commerce sector, vision fine-tuning can significantly enhance product recommendations by analyzing images of products that customers have viewed or purchased. This leads to higher conversion rates and increased customer loyalty by creating personalized shopping experiences. Additionally, automating the tagging and categorization of product images simplifies inventory management, especially for large inventories.
- In agriculture, fine-tuning models with images of crops can help identify diseases, pests, and nutrient deficiencies early, saving crops and reducing losses. This is particularly effective when combined with drones and satellite imagery for large-scale monitoring. For example, a model fine-tuned with images of different stages of crop growth can provide insights into the health and development of the crops.
- In the manufacturing industry, vision fine-tuning is invaluable for quality control and defect detection. By training models with images of products at various stages of production, manufacturers can identify specific defects such as cracks, misalignments, or surface imperfections early in the process. This ensures that only high-quality products reach the market, reducing waste and improving efficiency.
- For security and surveillance, fine-tuning models with images from security cameras enhances the ability to detect and recognize suspicious activities or objects. This is particularly useful in monitoring public spaces, airports, and critical infrastructure. Integrating these models with other security systems, such as alarms or access control, provides a more comprehensive security solution.
- In healthcare, beyond diagnosing diseases from medical images, vision fine-tuning can be used to monitor patient progress over time. For instance, models can be trained with images of wounds or skin conditions to track healing and provide recommendations for treatment. This continuous monitoring helps healthcare providers offer personalized care and improve patient outcomes. Additionally, the potential for remote consultations and telemedicine can be highlighted, making healthcare more accessible.
These use cases demonstrate the versatility and potential of vision fine-tuning across various industries.
Image Dataset Requirements
To ensure the best performance and compliance, there are specific requirements for your image datasets:
- Size: Your training file can contain up to 50,000 examples with images, with each example having a maximum of 64 images. Each image can be up to 10 MB.
- Format: Images must be in JPEG, PNG, or WEBP format and in RGB or RGBA mode. Images cannot be included as output from messages with the assistant role.
- Content Moderation: Images are scanned before training to ensure compliance with our usage policy. Images containing people, faces, or CAPTCHAs will be excluded from the dataset.
Handling Skipped Images
If your images are skipped during the training process, it could be due to several reasons such as containing CAPTCHAs, people, faces, inaccessible URLs, large file sizes, invalid mode or invalid formats. Ensure your images meet the specified requirements to avoid these issues.
Uploading Large Files
For large training files, you can upload files up to 8 GB in multiple parts using the Uploads API. This is particularly useful for extensive datasets that exceed the 512 MB limit of the Files API.
Reducing Training Costs
To optimize training costs, you can set the detail parameter for an image to low, which resizes the image to 512 by 512 pixels and represents it by 85 tokens regardless of its size. This reduces the cost of training while maintaining the quality of the model.
Additional Considerations
To control the fidelity of image understanding, you can set the detail parameter of image_url to low, high, or auto for each image. This affects the number of tokens per image that the model sees during training and impacts the cost of training.
We are thrilled to see how you will leverage these new capabilities to create innovative and engaging AI applications. For more detailed information, please refer to our documentation on Azure OpenAI Service.
Stay tuned for more updates and happy fine-tuning!
Ready to get started?
- Learn more about Azure OpenAI Service
- Watch this Ignite session about new fine-tuning capabilities in Azure OpenAI Service
- Check out our How-To Guide for Fine Tuning with Azure OpenAI
- Try it out with Azure AI Foundry