New Hugging Face Models on Azure AI: Phi-3 Variants from the Community

Microsoft

Oct 16, 2024

· (New) Learn why the Future of AI is: Model Choice .

Building generative AI applications starts with model selection and picking the right model to suit your application needs. The Azure AI Model Catalog offers over 1.78K models, including foundation models from core partners and nearly 1.6K open-source models from the Hugging Face community. This post is part of a monthly series to raise awareness of new models added to the Hugging Face collection on Azure. Check out the previous Hugging Face Models roundup post here.

The Hugging Face model hub has over 1M models. We select ~20 models each month to add to Azure based on feedback from our customers and developer community. Want to request a specific Hugging Face model be added? Make the request in just 3 steps:

Search the Hugging Face Hub for the desired model - click to view its Model Card.
Click the “Deploy” dropdown on that page - select the Azure ML option.
Look for a “Request to add” button in the pop-up dialog and complete the flow.

If the model already exists in the Azure AI model catalog, you will see a “Go to model in Azure ML” button that should direct you to the model card on Azure AI Studio.

18 New Hugging Face Models Added in September

September saw the addition of 18 new models into the Hugging Face Collection on Azure AI. Models included community-created variants of popular base models from Meta’s Llama family (LLM), Microsoft’s Phi-3 family (SLM) and more. We also saw a number of these fine-tuned models were “made with Unsloth” - we’ll talk about this in a bit. First, let’s review the models added and highlight any notable features about them.

Note: These community-created models are suitable for research & prototyping but will require additional assessment for use in production. Read the model card (linked to each) for usage guidance and limitations and conduct your own quality and safety evaluations to assess them for your specific application scenario.

#	Model Name · Inference Task	Notable Features
01	Groq/Llama-3-Groq-70B-Tool-Use · Text Generation	Fine-tuned for tool use · 90.76% accuracy on BFCL (best in 70B class) · Meta-Llama-3-70B based
02	Groq/Llama-3-Groq-8B-Tool-Use · Text Generation	Fine-tuned for tool use · 89.06% accuracy on BFCL (best in 8B class) · Meta-Llama-3-8B based
03	LenguajeNaturalAI/leniachat-qwen2-1.5B-v0 · Text Generation · Spanish	Fine-tuned for Spanish users · Trained exclusively in Spanish for high-quality chat, instructions · Qwen/Qwen2-1.5B based
04	gokaygokay/Flux-Prompt-Enhance · Text-to-Text Generation ·	Create enhanced prompts for image creation with Flux models · Creator Blog · google-t4/t5-base based
05	Ba2han/Llama-Phi-3_DoRA · Text Generation · made with unsloth	Trained on fltered versions of tagged datasets and llama-3-70B generated examples · Highest on MMLU, Winogrande for its class · microsoft/phi-3-mini
06	unsloth/Phi-3.5-mini-instruct · Text Generation · made with unsloth	2X faster fine-tuning · 50% less memory use · Notebook (add data) · microsoft/phi-3.5-mini
07	cognitivecomputations/dolphin-2.9.2-Phi-3-Medium-abliterated · Text Generation · made with unsloth	Filtered dataset to remove alignment/bias · Uncensored model (see: research) · Needs dev effort for alignment, responsible AI · unsloth/Phi-3-mini-4k-instruct
08	third-intellect/Phi-3-mini-4k-instruct-orca-math-word-problems-200k-model-16bit · Text Generation · made with unsloth	Conversational math problems · Orca-Math problems dataset · unsloth/Phi-3-mini-4k-instruct-bnb-4bit based
09	unsloth/Phi-3-medium-4k-instruct · Text Generation · made with unsloth	2X faster fine-tuning · Notebook (add data) · Quantized bnb-4bit model · microsoft/phi-3-medium-4k-instruct
10	vonjack/Phi-3-mini-4k-instruct-LLaMAfied · Text Generation	Chat completion · Recalibrated to fit Llama/Llama-2 model structure · microsoft/phi-3-mini-4k-instruct based
11	Sreenington/Phi-3-mini-4k-instruct-AWQ · Text Generation	Uses AutoAWQ for 4-bit quantization · Higher throughput with smaller GPUs · microsoft/phi-3-mini-4k-instruct based
12	Skywork/Skywork-Reward-Gemma-2-27B · Text Generation	High-perf reward model · Skywork Reward Dataset · gemma-2-27b-it · Top 3 in RewardBench leaderboard
13	TinyLlama/TinyLlama-1.1B-Chat-v1.0 · Text Generation	Pre-train 1.1B-Llama model on 3T tokens · Use UltraChat data set variant · Edge devices · Real-time dialog
14	lemon07r/Gemma-2-Ataraxy-9B · Text Generation · Creative Writing	Creative Writing · Top-ranked in Eq-Bench leaderboard · Model merge using SLERP mergekit · google/gemma2-9b fine-tune with merges
15	MLP-KTLim/llama-3-Korean-Bllossom-8B · Text Generation · Korean	Korean-English bilingual model · Vocabulary expansion · Human Feedback (DPO) · Bllossom ELO model has SOTA score on LogicKor for <10B models · meta/llama-3 based
16	weblab-GENIAC/Tanuki-8B-dpo-v1.0 · · Text Generation · Japanese	Japanese dialogue · Pre-trained · Supervised fine-tuning · Direct Preference Optimization (DPO) · meta/llama arch
17	aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct · Text Generation · Multi-lingual	Fine-tuned with 100K English, 50K ASEAN language pairs · Commercially permissive · High-quality datasets · Ranks top on SEA HELM · llama3-8b-cpt (continued pre-trained)
18	aisingapore/llama3-8b-cpt-sea-lionv2-base · Text Generation · Multi-lingual	Pre-trained, instruction-tuned for Southeast Asia (SEA) · Evaluated well on BHASA benchmarks · Not aligned for safety · meta/llama-3-8b-instruct based

Observed Themes

The 18 models added also help us identify useful themes or trends in community created variants in terms of use cases, tools and processes. This is what we observed:

Multi-lingual Models Continue to Shine - We added models tailored for Spanish, Japanese, Korean and Southeast Asian languages, with many models scoring well on their respective evaluation leaderboards. This underscores growing demand for conversational tasks that can reflect regional vocabularies and culture effectively.
Phi-3 Variants Continue to Grow - Microsoft’s Phi-3 family of “small language models” (SLM) outperforms other comparable models in equivalent or adjacent size classes. We are now seeing more variants developed, potentially for scenarios on mobile and edge devices. More on this below.
Fine-Tuning Tools Have Value - Community-authored variants focus on fine-tuning popular base models, but this is time-intensive and costly. We are now seeing more models “made with Unsloth”, reflecting interest in tools and processes that speed-up fine-tuning with less memory - without sacrificing accuracy. More on this below.
Meta/Llama Remains Popular - The Meta/Llama family of models continues to influence community creators in different ways. First, as a base model for fine-tuning (e.g., multi-lingual with SEA-lion, tool usage with Groq) and second as a target for optimization (e.g., TinyLlama family for mobile and edge devices). We also see adaptation of other base models (e.g., Llamafied version of Phi-3) to make those models fit a familiar structure for usage.

Model Spotlight: Phi-3 Community Variants

The Hugging Face models hub shows Phi-3 variants are created on a daily basis. In general, the Phi-3 family of models outperforms others in its size class (and adjacent) making it ideal for use cases targeting edge and mobile devices. We added 7 of these models to Azure this month, fine-tuned with Phi-3 and Phi-3.5 base models, but with different objectives. Let’s review these briefly.

1. vonjack/Phi-3-mini-4k-instruct-LLaMAfied

2. Sreenington/Phi-3-mini-4k-instruct-AWQ

3. Ba2han/Llama-Phi-3_DoRA · made with unsloth

4. unsloth/Phi-3.5-mini-instruct · made with unsloth

5. cognitivecomputations/dolphin-2.9.2-Phi-3-Medium-abliterated · made with unsloth

6. third-intellect/Phi-3-mini-4k-instruct-orca-math-word-problems-200k-model-16bit · made with unsloth

7. unsloth/Phi-3-medium-4k-instruct · made with unsloth

The first variant recalibrates the model to fit the Llama2/Llama3 model structure for developer familiarity. The second uses AutoAWQ for 4-bit quantization to get a model that can work on smaller GPUs. The third uses Llama-3 generated examples for fine-tuning, scoring well on two popular benchmarks. Variants 4 and 7 are from Unsloth and showcase their fine-tuning techniques as explained in the next section. Variant 5 shows an example of fine-tuning Phi-3 to “abilterate” information, uncensoring it. Variant 6 showcases Phi-3’s math capabilities, fine-tuned with the popular Orca Math Word problems dataset.

Want to explore to explore Phi-3 capabilities but don’t know where to start? Bookmark the Phi-3 Cookbook from Microsoft and start with the Welcome to the Phi-3 Family page. Then explore the Table Of Contents for links to quick-starts, tutorials and E2E samples. Then, try using one of the fine-tuned variants above to see the difference.

Community Call for Action

1. Help Us Spotlight Your Work!

Have you used Hugging Face models on Azure to build interesting AI applications? Have you published your own fine-tuned variants of popular foundation models? We want to know more. Leave a comment on this blog with links to articles or repositories you’ve authored - we’d love to learn more and amplify our model creator community.

2. Get Started with Hugging Face Models on Azure

New to the Azure AI Model catalog and want to get started using Hugging Face models on Azure? Here are three resources to kickstart your learning journey:

Azure AI Model Catalog: Explore the Hugging Face Collection
Azure ML Documentation: Model Catalog and Collections
Azure ML Sample Notebooks: Explore inference tasks with code

Reminder: We add ~20 models to the Hugging Face Collection on Azure each month and we need your feedback to help make these decisions! Request a model using the 3-step process outlined earlier in the article and tell us more about how you’re using it today!

Updated Nov 12, 2024

Version 5.0

Microsoft

Joined February 12, 2019

View Profile

Microsoft Foundry Blog