azure
7842 TopicsMicrosoft Finland - Software Developing Companies monthly community series.
Tervetuloa jälleen mukaan Microsoftin webinaarisarjaan teknologiayrityksille! Microsoft Finlandin järjestämä Software Development monthly Community series on webinaarisarja, joka tarjoaa ohjelmistotaloille ajankohtaista tietoa, konkreettisia esimerkkejä ja strategisia näkemyksiä siitä, miten yhteistyö Microsoftin kanssa voi vauhdittaa kasvua ja avata uusia liiketoimintamahdollisuuksia. Sarja on suunnattu kaikenkokoisille ja eri kehitysvaiheissa oleville teknologiayrityksille - startupeista globaaleihin toimijoihin. Jokaisessa jaksossa pureudutaan käytännönläheisesti siihen, miten ohjelmistoyritykset voivat hyödyntää Microsoftin ekosysteemiä, teknologioita ja kumppanuusohjelmia omassa liiketoiminnassaan. Huom. Microsoft Software Developing Companies monthly community webinars -webinaarisarja järjestetään Cloud Champion -sivustolla, josta webinaarit ovat kätevästi saatavilla tallenteina pari tuntia live-lähetyksen jälkeen. Muistathan rekisteröityä Cloud Champion -alustalle ensimmäisellä kerralla, jonka jälkeen pääset aina sisältöön sekä tallenteisiin käsiksi. Pääset rekisteröitymään, "Register now"-kohdasta. Täytä tietosi ja valitse Distributor kohtaan - Other, mikäli et tiedä Microsoft-tukkurianne. Webinaarit: 27.2.2026 klo 09:00-09:30 - M-Files polku menestykseen yhdessä Microsoftin kanssa Mitä globaalin kumppanuuden rakentaminen M-Files:in ja Microsoft:in välillä on vaatinut – ja mitä hyötyä siitä on syntynyt? Tässä webinaarissa kuulet insiderit suoraan M-Filesin Kimmo Järvensivulta, Stategic Alliances Director: miten kumppanuus Microsoft kanssa on rakennettu, mitä matkalla on opittu ja miten yhteistyö on vauhdittanut kasvua. M-Files on älykäs tiedonhallinta-alusta, joka auttaa organisaatioita hallitsemaan dokumentteja ja tietoa metatiedon avulla sijainnista riippumatta. Se tehostaa tiedon löytämistä, parantaa vaatimustenmukaisuutta ja tukee modernia työtä Microsoft-ekosysteemissä. Tule kuulemaan, mitä menestyksekäs kumppanuus todella vaatii, ja miten siitä tehdään strateginen kilpailuetu. Katso nauhoite: Microsoft Finland – Software Developing Companies Monthly Community Series – M-Files polku menestykseen yhdessä Microsoftin kanssa – Finland Cloud Champion Asiantuntijat: Kimmi Järvensivu, Strategic Alliances Director, M-Files Mikko Marttinen, Sr Partner Development Manager, Microsoft Eetu Roponen, Sr Partner Development Manager, Microsoft 30.1.2026 klo 09:00-09:30 - Model Context Protocol (MCP)—avoin standardi, joka mullistaa AI-integraatiot Webinaarissa käymme läpi, mikä on Model Context Protocol (MCP), miten se mahdollistaa turvalliset ja skaalautuvat yhteydet AI‑mallien ja ulkoisten järjestelmien välillä ilman räätälöityä koodia, mikä on Microsoftin lähestyminen MCP‑protokollan hyödyntämiseen sekä miten softayritykset voivat hyödyntää MCP‑standardin tarjoamia liiketoimintamahdollisuuksia. Webinaarissa käymme läpi: Mikä MCP on ja miksi se on tärkeä nykyaikaisissa AI‑prosesseissa Kuinka MCP vähentää integraatioiden monimutkaisuutta ja nopeuttaa kehitystä Käytännön esimerkkejä Webiinarin asiaosuus käydään läpi englanniksi. Katso nauhoite: 30.1.2026 klo 09:00-09:30 – Model Context Protocol (MCP)—avoin standardi, joka mullistaa AI-integraatiot – Finland Cloud Champion Asiantuntijat: Massimo Caterino, Kumppaniteknologiastrategisti, Microsoft Europe North Mikko Marttinen, Sr Partner Development Manager, Microsoft Eetu Roponen, Sr Partner Development Manager, Microsoft 12.12. klo 09:00-09:30 - Mitä Suomen Azure-regioona tarkoittaa ohjelmistotaloille? Microsoftin uusi datakeskusalue Suomeen tuo pilvipalvelut lähemmäksi suomalaisia ohjelmistotaloja – olipa kyseessä startup, scaleup tai globaali toimija. Webinaarissa pureudumme siihen, mitä mahdollisuuksia uusi Azure-regioona avaa datan sijainnin, suorituskyvyn, sääntelyn ja asiakasvaatimusten näkökulmasta. Keskustelemme muun muassa: Miten datan paikallinen sijainti tukee asiakasvaatimuksia ja sääntelyä? Mitä hyötyä ohjelmistotaloille on pienemmästä latenssista ja paremmasta suorituskyvystä? Miten Azure-regioona tukee yhteismyyntiä ja skaalautumista Suomessa? Miten valmistautua teknisesti ja kaupallisesti uuden regioonan avaamiseen? Puhujat: Fama Doumbouya, Sales Director, Cloud Infra and Security, Microsoft Mikko Marttinen, Sr Partner Development Manager, Microsoft Eetu Roponen, Sr Partner Development Manager, Microsoft Katso nauhoite: Microsoft Finland – Software Developing Companies Monthly Community Series – Mitä Suomen Azure-regioona tarkoittaa ohjelmistotaloille? – Finland Cloud Champion 28.11. klo 09:00-09:30 - Pilvipalvelut omilla ehdoilla – mitä Microsoftin Sovereign Cloud tarkoittaa ohjelmistotaloille? Yhä useampi ohjelmistotalo kohtaa vaatimuksia datan sijainnista, sääntelyn noudattamisesta ja operatiivisesta kontrollista – erityisesti julkisella sektorilla ja säädellyillä toimialoilla. Tässä webinaarissa pureudumme siihen, miten Microsoftin uusi Sovereign Cloud -tarjonta vastaa näihin tarpeisiin ja mitä mahdollisuuksia se avaa suomalaisille ohjelmistoyrityksille. Keskustelemme muun muassa: Miten Sovereign Public ja Private Cloud eroavat ja mitä ne mahdollistavat? Miten datan hallinta, salaus ja operatiivinen suvereniteetti toteutuvat eurooppalaisessa kontekstissa? Mitä tämä tarkoittaa ohjelmistoyrityksille, jotka rakentavat ratkaisuja julkiselle sektorille tai säädellyille toimialoille? Puhujat: Juha Karppinen, National Security Officer, Microsoft Mikko Marttinen, Sr Partner Development Manager, Microsoft Eetu Roponen, Sr Partner Development Manager, Microsoft Katso nauhoite: Microsoft Finland – Software Developing Companies Monthly Community Series – Pilvipalvelut omilla ehdoilla – mitä Microsoftin Sovereign Cloud tarkoittaa ohjelmistotaloille? – Finland Cloud Champion 31.10. klo 09:00-09:30 - Kasvua ja näkyvyyttä ohjelmistotaloille – hyödynnä ISV Success ja Azure Marketplace rewards -ohjelmia Tässä webinaarissa pureudumme ohjelmistotaloille suunnattuihin Microsoftin keskeisiin kiihdytinohjelmiin, jotka tukevat kasvua, skaalautuvuutta ja kansainvälistä näkyvyyttä. Käymme läpi, miten ISV Success -ohjelma tarjoaa teknistä ja kaupallista tukea ohjelmistoyrityksille eri kehitysvaiheissa, ja miten Azure Marketplace toimii tehokkaana myyntikanavana uusien asiakkaiden tavoittamiseen. Lisäksi esittelemme Marketplace Rewards -edut, jotka tukevat markkinointia, yhteismyyntiä ja asiakashankintaa Microsoftin ekosysteemissä. Webinaari tarjoaa: Konkreettisia esimerkkejä ohjelmien hyödyistä Käytännön vinkkejä ohjelmiin liittymiseen ja hyödyntämiseen Näkemyksiä siitä, miten ohjelmistotalot voivat linjata strategiansa Microsoftin tarjoamiin mahdollisuuksiin Puhujat: Mikko Marttinen, Sr Partner Development Manager, Microsoft Eetu Roponen, Sr Partner Development Manager, Microsoft Nauhoite: Microsoft Finland – Software Developing Companies Monthly Community Series – Kasvua ja näkyvyyttä ohjelmistotaloille – hyödynnä ISV Success ja Azure Marketplace rewards -ohjelmia – Finland Cloud Champion 3.10. klo 09:00-09:30 - Autonomiset ratkaisut ohjelmistotaloille – Azure AI Foundry ja agenttiteknologioiden uudet mahdollisuudet Agenttiteknologiat mullistavat tapaa, jolla ohjelmistotalot voivat rakentaa älykkäitä ja skaalautuvia ratkaisuja. Tässä webinaarissa tutustumme siihen, miten Azure AI Foundry tarjoaa kehittäjille ja tuoteomistajille työkalut autonomisten agenttien rakentamiseen – mahdollistaen monimutkaisten prosessien automatisoinnin ja uudenlaisen asiakasarvon tuottamisen. Kuulet mm. Miten agenttiteknologiat muuttavat ohjelmistokehitystä ja liiketoimintaa. Miten Azure AI Foundry tukee agenttien suunnittelua, kehitystä ja käyttöönottoa. Miten ohjelmistotalot voivat hyödyntää agentteja kilpailuetuna. Puhujat: Juha Karvonen, Sr Partner Tech Strategist Mikko Marttinen, Sr Partner Development Manager, Microsoft Eetu Roponen, Sr Partner Development Manager, Microsoft Katso nauhoite täältä: Microsoft Finland – Software Developing Companies Monthly Community Series – Autonomiset ratkaisut ohjelmistotaloille – Azure AI Foundry ja agenttiteknologioiden uudet mahdollisuudet – Finland Cloud Champion 5.9.2025 klo 09:00-09:30 - Teknologiayritysten ja Microsoftin prioriteetit syksylle 2025. Tervetuloa jälleen mukaan Microsoftin webinaarisarjaan teknologiayrityksille! Jatkamme sarjassa kuukausittain pureutumista siihen, miten yhteistyö Microsoftin kanssa voi vauhdittaa kasvua ja avata uusia mahdollisuuksia eri vaiheissa oleville ohjelmistotaloille – olipa yritys sitten start-up, scale-up tai globaalia toimintaa harjoittava. Jokaisessa jaksossa jaamme konkreettisia esimerkkejä, näkemyksiä ja strategioita, jotka tukevat teknologia-alan yritysten liiketoiminnan kehitystä ja innovaatioita. Elokuun lopun jaksossa keskitymme syksyn 2025 prioriteetteihin ja uusiin mahdollisuuksiin, jotka tukevat ohjelmistoyritysten oman toiminnan suunnittelua, kehittämistä ja kasvun vauhdittamista. Käymme läpi, mitkä ovat Microsoftin strategiset painopisteet tulevalle tilikaudelle – ja ennen kaikkea, miten ohjelmistotalot voivat hyödyntää niitä omassa liiketoiminnassaan. Tavoitteena on tarjota kuulijoille selkeä ymmärrys siitä, miten oma tuote, palvelu tai markkinastrategia voidaan linjata ekosysteemin kehityksen kanssa, ja miten Microsoft voi tukea tätä matkaa konkreettisin keinoin. Puhujat: Mikko Marttinen, Sr Partner Development Manager, Microsoft Eetu Roponen, Sr Partner Development Manager, Microsoft Katso nauhoitus täältä: Teknologiayritysten ja Microsoftin prioriteetit syksylle 2025. – Finland Cloud Champion382Views0likes0CommentsHow to Re-Register MFA
Working closely with nonprofits every day, I often come across a common challenge faced by MFA users. Recently, I worked with a nonprofit leader who faced an issue after getting a new phone. She was unable to authenticate into her Microsoft 365 environment because her MFA setup was tied to her old device. This experience highlighted how important it is to have a process in place for MFA re-registration. Without it, even routine changes like upgrading a phone can disrupt access to your everyday tools and technologies, delaying important work such as submitting a grant proposal. Why MFA is Essential for Nonprofits Before we discuss how to reset MFA, let’s take a step back and discuss why MFA is a necessity for nonprofits the way it is important for any organization. In the nonprofit world, protecting sensitive or confidential data—like donor information, financial records, and program details—is a top priority. One of the best ways to step up your security game is by using Multi-Factor Authentication (MFA). MFA adds an extra layer of protection on top of passwords by requiring something you have (like a mobile app or text message) or something you are (like a fingerprint). This makes it a lot harder for cybercriminals to get unauthorized access. If your nonprofit uses Azure Active Directory (AAD), or Microsoft Entra (as it is now called), with Microsoft 365, MFA can make a big difference in keeping your work safe. Since Microsoft Entra is built to work together with other Microsoft tools, it’s easy to set up and enforce secure sign-in methods across your whole organization. To make sure this added protection stays effective, it’s a good idea to occasionally ask users to update how they verify their identity. What Does MFA Re-Registration Mean for Nonprofits? MFA re-registration is just a fancy way of saying users need to update or reset how they authenticate, or verify, themselves. This might mean setting up MFA on a new phone (like the woman in the scenario above), adding an extra security option (like a hardware token), or simply confirming their existing setup. It’s all about making sure the methods and devices your users rely on for MFA are secure and under their control. When and Why Should Nonprofits Require MFA Re-Registration? Outside of getting a new phone, there may be other situations that raise cause for reason to re-register your MFA. A few scenarios include: Lost or Stolen Devices: Similar to the scenario above, if someone loses their phone or it gets stolen, you will have to re-register the new device. Role Changes: If someone’s responsibilities change, their MFA setup can be adjusted to match their new access needs. Security Enhancements: Organizations may require users to re-register for MFA to adopt more secure authentication methods, such as moving from SMS-based MFA to an app-based MFA like Microsoft Authenticator Policy Updates: When an organization updates its security policies, it might require all users to re-register for MFA to comply with new standards Account Compromise: If there is a suspicion that an account has been compromised, re-registering for MFA can help secure the account by ensuring that only the legitimate user has access With Microsoft Entra, managing MFA re-registration is straightforward and can be done with an administrator to the organization’s tenant. How to require re-registration of MFA To reset or require re-registration of MFA in Microsoft Entra, please follow the steps below. Navigate to portal.azure.com with your nonprofit admin account. Select Microsoft Entra ID Select the drop-down for Manage In the left-hand menu bar select Users > Select the user's name that you want to reregister to MFA (not shown). Once in their profile, select Manage MFA authentication methods Select Require re-register multifactor authentication Congratulations! The user will now be required to re-register the account in the Microsoft Authentication app.6.5KViews2likes1CommentRunning Text to Image and Text to Video with ComfyUI and Nvidia H100 GPU
This guide provides instructions on how to set up and run Text to Image and Text to Video generation using ComfyUI with an Nvidia H100 GPU on Azure VMs. ComfyUI is a node-based user interface for Stable Diffusion and other AI models. It allows users to create complex workflows for image and video generation using a visual interface. With the power of GPUs, you can significantly speed up the generation process for high-quality images and videos. Steps to create the infrastructure Option 1. Using Terraform (Recommended) In this guide, the provided Terraform template available here: ai-course/550_comfyui_on_vm at main · HoussemDellai/ai-course will create the following: Create the infrastructure for Ubuntu VM with Nvidia H100 GPU Install CUDA drivers on the VM Install ComfyUI on the VM Download the models for Text to Image (Z-Image-Turbo) and Text to Video generation (Wan 2.2 and LTX-2) Deploy the Terraform template using the following commands: # Initialize Terraform terraform init # Review the Terraform plan terraform plan tfplan # Apply the Terraform configuration to create resources terraform apply tfplan This should take about 15 minutes to create all the resources with the configuration defined in the Terraform files. The following resources will be created: If you choose to use Terraform, after the deployment is complete, you can access the ComfyUI portal using the output link shown in the Terraform output. It should look like this http://<VM_IP_ADDRESS>:8188. And that should be the end of the setup. You can then proceed to use ComfyUI for Text to Image and Text to Video generation as described in the later sections. Option 2. Manual Setup 0. Create a Virtual Machine with Nvidia H100 GPU Create an Azure virtual machine with Nvidia H100 GPUs like sku: Standard NC40ads H100 v5. Choose a Linux distribution of your choice like Ubuntu Pro 24.04 LTS. 1. Install Nvidia GPU and CUDA Drivers SSH into the Ubuntu VM and install the CUDA drivers by following the official Microsoft documentation: Install CUDA drivers on N-series VMs. # 1. Install ubuntu-drivers utility: sudo apt-get update sudo apt-get install ubuntu-drivers-common -y # 2. Install the latest NVIDIA drivers: sudo ubuntu-drivers install # 3. Download and install the CUDA toolkit from NVIDIA: wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get -y install cuda-toolkit-13-1 # 4. Reboot the system to apply changes sudo reboot The machine will now reboot. After rebooting, you can verify the installation of the NVIDIA drivers and CUDA toolkit. # 5. Verify that the GPU is correctly recognized (after reboot): nvidia-smi # 6. We recommend that you periodically update NVIDIA drivers after deployment. sudo apt-get update sudo apt-get full-upgrade -y 2. Install ComfyUI on Ubuntu Follow the instructions from the ComfyUI Wiki to install ComfyUI on your Ubuntu VM using Comfy CLI: Install ComfyUI using Comfy CLI. # Step 1: System Environment Preparation # ComfyUI requires Python 3.12 or higher (Python 3.13 is recommended). Check your Python version: python3 --version # If Python is not installed or the version is too low, install it following these steps: sudo apt-get update sudo apt-get install python3 python3-pip python3-venv -y # Create Virtual Environment # Using a virtual environment can avoid package conflict issues python3 -m venv comfy-env # Activate the virtual environment source comfy-env/bin/activate # Note: You need to activate the virtual environment each time before using ComfyUI. To exit the virtual environment, use the deactivate command. # Step 2: Install Comfy CLI # Install comfy-cli in the activated virtual environment: pip install comfy-cli # Step 3: Install ComfyUI using Comfy CLI with NVIDIA GPU Support # use 'yes' to accept all prompts yes | comfy install --nvidia # Step 4: Install GPU Support for PyTorch pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu130 # Note: Please choose the corresponding PyTorch version based on your CUDA version. Visit the PyTorch website for the latest installation commands. # Step 5. Launch ComfyUI # By default, ComfyUI will run on http://localhost:8188. # and don't forget the double -- comfy launch --background -- --listen 0.0.0.0 --port 8188 Note that you can run ComfyUI with different modes based on your hardware capabilities: --cpu: Use CPU mode, if you don't have a compatible GPU --lowvram: Low VRAM mode --novram: Ultra-low VRAM mode 3. Using ComfyUI for Text to Image Once ComfyUI is running, you can access the web interface via your browser at http://<VM_IP_ADDRESS>:8188 (replace <VM_IP_ADDRESS> with the actual IP address of your VM). Note that you should ensure that the VM's network security group (NSG) allows inbound traffic on port 8188. You can create Text to Image generation workflows using the templates available in ComfyUI. Go to Workflows and select a Text to Image template to get started. Choose Z-Image-Turbo Text to Image as an example. After that, ComfyUI will detect that there are some missing models to download. You will need to download each model into its corresponding folder. For example, the Stable Diffusion model should be placed in the models/Stable-diffusion folder. The models download links and their corresponding folders are shown in the ComfyUI interface. Let's download the required models for Z-Image-Turbo. cd comfy/ComfyUI/ wget -P models/text_encoders/ https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/text_encoders/qwen_3_4b.safetensors wget -P models/vae/ https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/vae/ae.safetensors wget -P models/diffusion_models/ https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors wget -P models/loras/ https://huggingface.co/tarn59/pixel_art_style_lora_z_image_turbo/resolve/main/pixel_art_style_z_image_turbo.safetensors Note that here you can either use comfy model download command or wget to download the models into their corresponding folders. Once the models are downloaded, you can run the Text to Image workflow in ComfyUI. You can also change the parameters as needed like the prompt. When ready, click the Run blue button at the top right to start generating the image. It will take some time depending on the size of the image and the complexity of the prompt. Then you should see the generated image in the output node. 5. Using ComfyUI for Text to Video To use ComfyUI for Text to Video generation, you can select a Text to Video template from the Workflows section. Choose Wan 2.2 Text to Video as an example. Then you will need to install the required models. wget -P models/text_encoders/ https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors wget -P models/vae/ https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors wget -P models/diffusion_models/ https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors wget -P models/diffusion_models/ https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors wget -P models/loras/ https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_t2v_lightx2v_4steps_lora_v1.1_high_noise.safetensors wget -P models/loras/ https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_t2v_lightx2v_4steps_lora_v1.1_low_noise.safetensors Models for LTX-2 Text to Video can be downloaded similarly. wget -P models/checkpoints/ https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-19b-dev-fp8.safetensors wget -P models/text_encoders/ https://huggingface.co/Comfy-Org/ltx-2/resolve/main/split_files/text_encoders/gemma_3_12B_it_fp4_mixed.safetensors wget -P models/latent_upscale_models/ https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-spatial-upscaler-x2-1.0.safetensors wget -P models/loras/ https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-19b-distilled-lora-384.safetensors wget -P models/loras/ https://huggingface.co/Lightricks/LTX-2-19b-LoRA-Camera-Control-Dolly-Left/resolve/main/ltx-2-19b-lora-camera-control-dolly-left.safetensors Models for Qwen Image 2512 Text to Image can be downloaded similarly. wget -P models/text_encoders/ https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors wget -P models/vae/ https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors wget -P models/diffusion_models/ https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_2512_fp8_e4m3fn.safetensors wget -P models/loras/ https://huggingface.co/lightx2v/Qwen-Image-Lightning/resolve/main/Qwen-Image-Lightning-4steps-V1.0.safetensors Models for Flux2 Klein Text to Image 9B can be downloaded similarly. wget -P models/text_encoders/ https://huggingface.co/Comfy-Org/flux2-klein-9B/resolve/main/split_files/text_encoders/qwen_3_8b_fp8mixed.safetensors wget -P models/vae/ https://huggingface.co/Comfy-Org/flux2-dev/resolve/main/split_files/vae/flux2-vae.safetensors wget -P models/diffusion_models/ https://huggingface.co/black-forest-labs/FLUX.2-klein-base-9b-fp8/resolve/main/flux-2-klein-base-9b-fp8.safetensors wget -P models/diffusion_models/ https://huggingface.co/black-forest-labs/FLUX.2-klein-9b-fp8/resolve/main/flux-2-klein-9b-fp8.safetensors Important notes Secure Boot is not supported using Windows or Linux extensions. For more information on manually installing GPU drivers with Secure Boot enabled, see Azure N-series GPU driver setup for Linux. Src: https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-linux Sources - Install CUDA drivers on N-series VMs: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup#install-cuda-drivers-on-n-series-vms - Install ComfyUI using Comfy CLI: https://comfyui-wiki.com/en/install/install-comfyui/install-comfyui-on-linux Disclaimer The sample scripts are not supported under any Microsoft standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.Per certification designed badges
Hi First Microsoft opted out from awesome Credly (awesome, as learners collected “all” personal certifications in one place, no matter the vendor - easy to share the Credly profile link for various reasons) And now you have quit creating “per certification branded badge”s, and only provide standard “Associate” & “Expert” badges with a “Learn diploma) showing the name of the certification “in text” (the new Fabric exam as example) For us globally in roles like “Alliance Managers”, “Partner Managers”, driving and summarizing partners excellence in the area of Microsoft + pushing with marketing us and Microsoft- this is bad! Example on how we earlier are using the per certification badges Is it just by mistake you have taken this path? Or is it just me and my learners that have missed where they can download per exam branded badges for newer certifications now? Regards Gabriel1KViews5likes3CommentsAzure Migrate: Now Supporting Premium SSD V2, Ultra and ZRS Disks as Targets
We are excited to announce that we have added assessment and migration support for Premium SSD v2,Ultra Disk and ZRS Disks as storage options in Azure Migrate, with Premium SSD v2 and ZRS Disks now Generally Available and Ultra Disk in Public Preview. This further enhances the assessment and migration experience Azure Migrate offers and allows you to bring your mission critical workloads to these key Azure Storage offerings seamlessly. What’s New Additional Assessment targets: Premium SSD v2 and Ultra Disks As part of the migration journey to the cloud, Azure Migrate makes recommendations on what cloud resources to move your workloads to. Post successful discovery of on-prem workloads, Azure Migrate utilizes multiple parameters like size, IOPS, and throughput to make target recommendations in Azure. Instead of just static sizing, assessments can map actual performance demand to Azure VM and disk SKUs, optimizing performance, resiliency, and total cost of ownership to give you a tailored recommendation that fits your cloud migration journey. With today’s announcement, we are adding more supported disks to Azure Migrate, providing you with improved guidance to ensure that you land on the resources in Azure that align with your goals. If you are looking to migrate your demanding on-premises applications and workloads to Azure, you will benefit from these advanced disk options, which come with greater flexibility and enhanced performance. For example, Premium SSD v2 disks decouple capacity from performance, allowing you to dial IOPS and throughput precisely to your workload’s needs. For high-end scenarios, Ultra Disks offer the highest performance among Azure managed disks, while ZRS disks provide zonally redundant storage to further protect your data. With these included in Azure Migrate’s assessment engine, you end up with a right‑sized, data‑driven target configuration that aligns Azure storage choices with how workloads actually run. Below is a snippet of how the assessment recommendations appear in Azure Migrate for Premium V2 SSD disks. Customers can get details on the disk type, provisioned IOPS, throughput, cost, and seamlessly migrate using the assessment to the recommended target. Migrating to Premium SSD v2 and Ultra Disks in Azure Migrate When Premium SSD v2 or Ultra disks are identified as the optimal targets based on workload characteristics during the assessment phase, they can be auto-populated seamlessly into the migration process. This workflow accelerates the lift-and-shift of on-prem disks to Azure’s high performance managed disks. Below is a snippet from the replication step during migration: Assessing and Migrating to ZRS Disks in Azure Migrate Azure Migrate also has enhanced resiliency by supporting migration to ZRS Disks during Migration. Zone-Redundant Storage (ZRS) for Azure Disks synchronously replicates data across three physically separate availability zones within a region - each with independent power, cooling, and networking - enhancing Disk availability and resiliency. While creating Assessments in Azure Migrate, you can configure a range of target preferences, including the newly introduced option to enable zone-redundant storage (ZRS). You can opt-in to enable ZRS Disk recommendations by editing the Server (Machine) default settings in the Advanced settings blade. Since the preview announcement for these capabilities, recommendations for Ultra, Premium v2 and ZRS Disks have led to petabytes of data being successfully migrated into Azure. Below is a quote from our Premium v2 (Pv2) customer that was provided during the preview: "Through this preview, we have Pv2 disks recommendations in place of Pv1, which is beneficial for our estate during migration in terms of both cost and performance. We are now awaiting General Availability " – Yogesh Patil, Cloud Enterprise Architect, Tata Consultancy Services (TCS) With these added capabilities, Azure Migrate and Azure disk storage are more ready than ever for migrating your most demanding and mission-critical workloads. Learn more about Azure Migrate and for expert migration help, please try Azure Accelerate. You can also contact your preferred partner or Microsoft field for next steps. Get started in Azure today!212Views1like1CommentOptimising AI Costs with Microsoft Foundry Model Router
Microsoft Foundry Model Router analyses each prompt in real-time and forwards it to the most appropriate LLM from a pool of underlying models. Simple requests go to fast, cheap models; complex requests go to premium ones, all automatically. I built an interactive demo app so you can see the routing decisions, measure latencies, and compare costs yourself. This post walks through how it works, what we measured, and when it makes sense to use. The Problem: One Model for Everything Is Wasteful Traditional deployments force a single choice: Strategy Upside Downside Use a small model Fast, cheap Struggles with complex tasks Use a large model Handles everything Overpay for simple tasks Build your own router Full control Maintenance burden; hard to optimise Most production workloads are mixed-complexity. Classification, FAQ look-ups, and data extraction sit alongside code analysis, multi-constraint planning, and long-document summarisation. Paying premium-model prices for the simple 40% is money left on the table. The Solution: Model Router Model Router is a trained language model deployed as a single Azure endpoint. For each incoming request it: Analyses the prompt — complexity, task type, context length Selects an underlying model from the routing pool Forwards the request and returns the response Exposes the choice via the response.model field You interact with one deployment. No if/else routing logic in your code. Routing Modes Mode Goal Trade-off Balanced (default) Best cost-quality ratio General-purpose Cost Minimise spend May use smaller models more aggressively Quality Maximise accuracy Higher cost for complex tasks Modes are configured in the Foundry Portal, no code change needed to switch. Building the Demo To make routing decisions tangible, we built a React + TypeScript app that sends the same prompt through both Model Router and a fixed standard deployment (e.g. GPT-5-nano), then compares: Which model the router selected Latency (ms) Token usage (prompt + completion) Estimated cost (based on per-model pricing) Select a prompt, choose a routing mode, and hit Run Both to compare side-by-side What You Can Do 10 pre-built prompts spanning simple classification to complex multi-constraint planning Custom prompt input enter any text and benchmarks run automatically Three routing modes switch and re-run to see how distribution changes Batch mode run all 10 prompts in one click to gather aggregate stats API Integration The integration is a standard Azure OpenAI chat completion call. The only difference is the deployment name ( model-router instead of a specific model): const response = await fetch( `${endpoint}/openai/deployments/model-router/chat/completions?api-version=2024-10-21`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'api-key': apiKey, }, body: JSON.stringify({ messages: [{ role: 'user', content: prompt }], max_completion_tokens: 1024, }), } ); const data = await response.json(); // The key insight: response.model reveals the underlying model const selectedModel = data.model; // e.g. "gpt-5-nano-2025-08-07" That data.model field is what makes cost tracking and distribution analysis possible. Results: What the Data Shows We ran all 10 prompts through both Model Router (Balanced mode) and a fixed standard deployment. Note: Results vary by run, region, model versions, and Azure load. These numbers are from a representative sample run. Side-by-side comparison across all 10 prompts in Balanced mode Summary Metric Router (Balanced) Standard (GPT-5-nano) Avg Latency ~7,800 ms ~7,700 ms Total Cost (10 prompts) ~$0.029 ~$0.030 Cost Savings ~4.5% — Models Used 4 1 Model Distribution The router used 4 different models across 10 prompts: Model Requests Share Typical Use gpt-5-nano 5 50% Classification, summarisation, planning gpt-5-mini 2 20% FAQ answers, data extraction gpt-oss-120b 2 20% Long-context analysis, creative tasks gpt-4.1-mini 1 10% Complex debugging & reasoning Routing distribution chart — the router favours efficient models for simple prompts Across All Three Modes Metric Balanced Cost-Optimised Quality-Optimised Cost Savings ~4.5% ~4.7% ~14.2% Avg Latency (Router) ~7,800 ms ~7,800 ms ~6,800 ms Avg Latency (Standard) ~7,700 ms ~7,300 ms ~8,300 ms Primary Goal Balance cost + quality Minimise spend Maximise accuracy Model Selection Mixed (4 models) Prefers cheaper Prefers premium Cost-optimised mode — routes more aggressively to nano/mini models Quality-optimised mode — routes to larger models for complex tasks Analysis What Worked Well Intelligent distribution The router didn't just default to one model. It used 4 different models and mapped prompt complexity to model capability: simple classification → nano, FAQ answers → mini, long-context documents → oss-120b, complex debugging → 4.1-mini. Measurable cost savings across all modes 4.5% in Balanced, 4.7% in Cost, and 14.2% in Quality mode. Quality mode was the surprise winner by choosing faster, cheaper models for simple prompts, it actually saved the most while still routing complex requests to capable models. Zero routing logic in application code One endpoint, one deployment name. The complexity lives in Azure's infrastructure, not yours. Operational flexibility Switch between Balanced, Cost, and Quality modes in the Foundry Portal without redeploying your app. Need to cut costs for a high-traffic period? Switch to Cost mode. Need accuracy for a compliance run? Switch to Quality. Future-proofing As Azure adds new models to the routing pool, your deployment benefits automatically. No code changes needed. Trade-offs to Consider Latency is comparable, not always faster In Balanced mode, Router averaged ~7,800 ms vs Standard's ~7,700 ms nearly identical. In Quality mode, the Router was actually faster (~6,800 ms vs ~8,300 ms) because it chose more efficient models for simple prompts. The delta depends on which models the router selects. Savings scale with workload diversity Our 10-prompt test set showed 4.5–14.2% savings. Production workloads with a wider spread of simple vs complex prompts should see larger savings, since the router has more opportunity to route simple requests to cheaper models. Opaque routing decisions You can see which model was picked via response.model , but you can't see why. For most applications this is fine; for debugging edge cases you may want to test specific prompts in the demo first. Custom Prompt Testing One of the most practical features of the demo is testing your own prompts before committing to Model Router in production. Enter any prompt `the quantum computing example is a medium-complexity educational prompt` Benchmarks execute automatically, showing the selected model, latency, tokens, and cost Workflow: Click ✏️ Custom in the prompt selector Enter your production-representative prompt Click ✓ Use This Prompt — Router and Standard run automatically Compare results — repeat with different routing modes Use the data to inform your deployment strategy This lets you predict costs and validate routing behaviour with your actual workload before going to production. When to Use Model Router Great Fit Mixed-complexity workloads — chatbots, customer service, content pipelines Cost-sensitive deployments — where even single-digit percentage savings matter at scale Teams wanting simplicity — one endpoint beats managing multi-model routing logic Rapid experimentation — try new models without changing application code Consider Carefully Ultra-low-latency requirements — if you need sub-second responses, the routing overhead matters Single-task, single-model workloads — if one model is clearly optimal for 100% of your traffic, a router adds complexity without benefit Full control over model selection — if you need deterministic model choice per request Mode Selection Guide Is accuracy critical (compliance, legal, medical)? Is accuracy critical (compliance, legal, medical)? └─ YES → Quality-Optimised └─ NO → Strict budget constraints? └─ YES → Cost-Optimised └─ NO → Balanced (recommended) Best Practices Start with Balanced mode — measure actual results, then optimise Test with your real prompts — use the Custom Prompt feature to validate routing before production Monitor model distribution — track which models handle your traffic over time Compare against a baseline — always keep a standard deployment to measure savings Review regularly — as new models enter the routing pool, distributions shift Technical Stack Technology Purpose React 19 + TypeScript 5.9 UI and type safety Vite 7 Dev server and build tool Tailwind CSS 4 Styling Recharts 3 Distribution and comparison charts Azure OpenAI API (2024-10-21) Model Router and standard completions Security measures include an ErrorBoundary for crash resilience, sanitised API error messages, AbortController request timeouts, input length validation, and restrictive security headers. API keys are loaded from environment variables and gitignored. Source: leestott/router-demo-app: An interactive web application demonstrating the power of Microsoft Foundry Model Router - an intelligent routing system that automatically selects the optimal language model for each request based on complexity, reasoning requirements, and task type. ⚠️ This demo calls Azure OpenAI directly from the browser. This is fine for local development. For production, proxy through a backend and use Managed Identity. Try It Yourself Quick Start git clone https://github.com/leestott/router-demo-app/ cd router-demo-app # Option A: Use the setup script (recommended) # Windows: .\setup.ps1 -StartDev # macOS/Linux: chmod +x setup.sh && ./setup.sh --start-dev # Option B: Manual npm install cp .env.example .env.local # Edit .env.local with your Azure credentials npm run dev Open http://localhost:5173 , select a prompt, and click ⚡ Run Both. Get Your Credentials Go to ai.azure.com → open your project Copy the Project connection string (endpoint URL) Navigate to Deployments → confirm model-router is deployed Get your API key from Project Settings → Keys Configuration Edit .env.local : VITE_ROUTER_ENDPOINT=https://your-resource.cognitiveservices.azure.com VITE_ROUTER_API_KEY=your-api-key VITE_ROUTER_DEPLOYMENT=model-router VITE_STANDARD_ENDPOINT=https://your-resource.cognitiveservices.azure.com VITE_STANDARD_API_KEY=your-api-key VITE_STANDARD_DEPLOYMENT=gpt-5-nano Ideas for Enhancement Historical analysis — persist results to track routing trends over time Cost projections — estimate monthly spend based on prompt patterns and volume A/B testing framework — compare modes with statistical significance Streaming support — show model selection for streaming responses Export reports — download benchmark data as CSV/JSON for further analysis Conclusion Model Router addresses a real problem: most AI workloads have mixed complexity, but most deployments use a single model. By routing each request to the right model automatically, you get: Cost savings (~4.5–14.2% measured across modes, scaling with volume) Intelligent distribution (4 models used, zero routing code) Operational simplicity (one endpoint, mode changes via portal) Future-proofing (new models added to the pool automatically) The latency trade-off is minimal — in Quality mode, the Router was actually faster than the standard deployment. The real value is flexibility: tune for cost, quality, or balance without touching your code. Ready to try it? Clone the demo repository, plug in your Azure credentials, and test with your own prompts. Resources Model Router Benchmark Sample Sample App Model Router Concepts Official documentation Model Router How-To Deployment guide Microsoft Foundry Portal Deploy and manage Model Router in the Catalog Model listing Azure OpenAI Managed Identity Production auth Built to explore Model Router and share findings with the developer community. Feedback and contributions welcome, open an issue or PR on GitHub.Building a Privacy-First Hybrid AI Briefing Tool with Foundry Local and Azure OpenAI
Introduction Management consultants face a critical challenge: they need instant AI-powered insights from sensitive client documents, but traditional cloud-only AI solutions create unacceptable data privacy risks. Every document uploaded to a cloud API potentially exposes confidential client information, violates data residency requirements, and creates compliance headaches. The solution lies in a hybrid architecture that combines the speed and privacy of on-device AI with the sophistication of cloud models—but only when explicitly requested. This article walks through building a production-ready briefing assistant that runs AI inference locally first, then optionally refines outputs using Azure OpenAI for executive-quality presentations. We'll explore a sample implementation using FL-Client-Briefing-Assistant, built with Next.js 14, TypeScript, and Microsoft Foundry Local. You'll learn how to architect privacy-first AI applications, implement sub-second local inference, and design transparent hybrid workflows that give users complete control over their data. Why Hybrid AI Architecture Matters for Enterprise Applications Before diving into implementation details, let's understand why a hybrid approach is essential for enterprise AI applications, particularly in consulting and professional services. Cloud-only AI services like OpenAI's GPT-4 offer remarkable capabilities, but they introduce several critical challenges. First, every API call sends your data to external servers, creating audit trails and potential exposure points. For consultants handling merger documents, financial reports, or strategic plans, this is often a non-starter. Second, cloud APIs introduce latency, typically 2-5 seconds per request due to network round-trips and queue times. Third, costs scale linearly with usage, making high-volume document analysis expensive at scale. Local-only AI solves privacy and latency concerns but sacrifices quality. Small language models (SLMs) running on laptops produce quick summaries, but they lack the nuanced reasoning and polish needed for C-suite presentations. You get fast, private results that may require significant manual refinement. The hybrid approach gives you the best of both worlds: instant, private local processing as the default, with optional cloud refinement only when quality matters most. This architecture respects data privacy by default while maintaining the flexibility to produce executive-grade outputs when needed. Architecture Overview: Three-Layer Design for Privacy and Performance The FL-Client-Briefing-Assistant implements a clean three-layer architecture that separates concerns and ensures privacy at every level. At the frontend, a Next.js 14 application provides the user interface with strong TypeScript typing throughout. Users interact with four quick-action templates: document summarization, talking points generation, risk analysis, and executive summaries. The UI clearly indicates which model (local or cloud) processed each request, ensuring transparency. The middle tier consists of Next.js API routes that act as orchestration endpoints. These routes validate requests using Zod schemas, route to appropriate inference services, and enforce privacy settings. Critically, the API layer never persists user content unless explicitly opted in via privacy settings. The inference layer contains two distinct services. The local service uses Foundry Local SDK to communicate with a locally running Phi-4 model (or similar SLM). This provides sub-second inference, typical 500ms-1s response times, completely offline. The cloud service connects to Azure OpenAI using the official JavaScript SDK, accessed via Managed Identity or API keys, with proper timeout and retry logic. Setting Up Foundry Local for On-Device Inference Foundry Local is Microsoft's runtime for running AI models entirely on your device—no internet required, no data leaving your machine. Here's how to get it running for this application. First, install Foundry Local on Windows using Windows Package Manager: winget install Microsoft.FoundryLocal After installation, verify the service is ready: foundry service start foundry service status The status command will show you the service endpoint, typically running on a dynamic port like http://127.0.0.1:5272 . This port changes between restarts, so your application must query it programmatically. Next, load an appropriate model. For briefing tasks, Phi-4 Mini provides an excellent balance of quality and speed: foundry model load phi-4 The model downloads (approximately 3.6GB) and loads into memory. This takes 2-5 minutes on first run but persists between sessions. Once loaded, inference is nearly instant, most requests complete in under 1 second. In your application, configure the connection in .env.local : the port for foundry local is dynamic so please ensure you add the correct port. FOUNDRY_LOCAL_ENDPOINT=http://127.0.0.1:**** The application uses the Foundry Local SDK to query the running service: import { FoundryLocalClient } from 'foundry-local-sdk'; const client = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT }); const response = await client.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: 'You are a professional consultant assistant.' }, { role: 'user', content: 'Summarize this document: ...' } ], max_tokens: 500, temperature: 0.3 }); This code demonstrates several best practices: Explicit model specification: Always name the model to ensure consistency across environments System message framing: Set the appropriate professional context for consulting use cases Conservative temperature: Use 0.3 for factual summarization tasks to reduce hallucination Token limits: Cap outputs to prevent excessive generation times and costs Implementing Privacy-First API Routes The Next.js API routes form the security boundary of the application. Every request must be validated, sanitized, and routed according to privacy settings before reaching inference services. Here's the core local inference route ( app/api/briefing/local/route.ts ): import { NextRequest, NextResponse } from 'next/server'; import { z } from 'zod'; import { FoundryLocalClient } from 'foundry-local-sdk'; const RequestSchema = z.object({ prompt: z.string().min(10).max(5000), template: z.enum(['summary', 'talking-points', 'risk-analysis', 'executive']), context: z.string().optional() }); export async function POST(request: NextRequest) { try { // Validate and parse request body const body = await request.json(); const validated = RequestSchema.parse(body); // Initialize Foundry Local client const client = new FoundryLocalClient({ endpoint: process.env.FOUNDRY_LOCAL_ENDPOINT! }); // Build system prompt based on template const systemPrompts = { 'summary': 'You are a consultant creating concise document summaries.', 'talking-points': 'You are preparing structured talking points for meetings.', 'risk-analysis': 'You are analyzing risks and opportunities systematically.', 'executive': 'You are crafting executive-level briefing notes.' }; // Execute local inference const startTime = Date.now(); const completion = await client.chat.completions.create({ model: 'phi-4', messages: [ { role: 'system', content: systemPrompts[validated.template] }, { role: 'user', content: validated.prompt } ], temperature: 0.3, max_tokens: 500 }); const latency = Date.now() - startTime; // Return structured response with metadata return NextResponse.json({ content: completion.choices[0].message.content, model: 'phi-4 (local)', latency_ms: latency, tokens: completion.usage?.total_tokens, timestamp: new Date().toISOString() }); } catch (error) { if (error instanceof z.ZodError) { return NextResponse.json( { error: 'Invalid request format', details: error.errors }, { status: 400 } ); } console.error('Local inference error:', error); return NextResponse.json( { error: 'Inference failed', message: error.message }, { status: 500 } ); } } This implementation demonstrates several critical security and quality patterns: Request validation with Zod: Every field is type-checked and bounded before processing, preventing injection attacks and malformed inputs Template-based system prompts: Different use cases get optimized prompts, improving output quality and consistency Comprehensive error handling: Validation errors, inference failures, and network issues are caught and reported with appropriate HTTP status codes Performance tracking: Latency measurement enables monitoring and helps users understand response times Metadata enrichment: Responses include model attribution, token usage, and timestamps for auditing The cloud refinement route follows a similar pattern but adds privacy checks: export async function POST(request: NextRequest) { try { const body = await request.json(); const validated = RequestSchema.parse(body); // Check privacy settings from cookie/header const confidentialMode = request.cookies.get('confidential-mode')?.value === 'true'; if (confidentialMode) { return NextResponse.json( { error: 'Cloud refinement disabled in confidential mode' }, { status: 403 } ); } // Proceed with Azure OpenAI call only if privacy allows const client = new OpenAI({ apiKey: process.env.AZURE_OPENAI_KEY, baseURL: process.env.AZURE_OPENAI_ENDPOINT, defaultHeaders: { 'api-key': process.env.AZURE_OPENAI_KEY } }); const completion = await client.chat.completions.create({ model: process.env.AZURE_OPENAI_DEPLOYMENT!, messages: [/* ... */], temperature: 0.5, // Slightly higher for creative refinement max_tokens: 800 }); return NextResponse.json({ content: completion.choices[0].message.content, model: `${process.env.AZURE_OPENAI_DEPLOYMENT} (cloud)`, privacy_notice: 'Content processed by Azure OpenAI', // ... metadata }); } catch (error) { // Error handling } } The confidential mode check is crucial—it ensures that even if a user accidentally clicks the refinement button, no data leaves the device when privacy mode is enabled. This fail-safe design prevents data leakage through UI mistakes or automated workflows. Building the Frontend: Transparent Privacy Controls The user interface must make privacy decisions explicit and visible. Users need to understand which AI service processed their content and make informed choices about cloud refinement. The main briefing interface ( app/page.tsx ) implements this transparency through clear visual indicators: 'use client'; import { useState, useEffect } from 'react'; import { PrivacySettings } from '@/components/PrivacySettings'; export default function BriefingAssistant() { const [confidentialMode, setConfidentialMode] = useState(true); // Privacy by default const [content, setContent] = useState(''); const [result, setResult] = useState(null); const [loading, setLoading] = useState(false); // Load privacy preference from localStorage useEffect(() => { const saved = localStorage.getItem('confidential-mode'); if (saved !== null) { setConfidentialMode(saved === 'true'); } }, []); async function generateBriefing(template: string, useCloud: boolean = false) { if (useCloud && confidentialMode) { alert('Cloud refinement is disabled in confidential mode. Adjust settings to enable.'); return; } setLoading(true); const endpoint = useCloud ? '/api/briefing/cloud' : '/api/briefing/local'; try { const response = await fetch(endpoint, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt: content, template }) }); const data = await response.json(); setResult({ ...data, processedBy: useCloud ? 'cloud' : 'local' }); } catch (error) { console.error('Briefing generation failed:', error); } finally { setLoading(false); } } return ( <div className="briefing-assistant"> <header> <h1>Client Briefing Assistant</h1> <div className="status-bar"> <span className={confidentialMode ? 'confidential' : 'standard'}> {confidentialMode ? '🔒 Confidential Mode' : '🌐 Standard Mode'} </span> <PrivacySettings confidentialMode={confidentialMode} onChange={setConfidentialMode} /> </div> </header> <div className="quick-actions"> <button onClick={() => generateBriefing('summary')}> 📄 Summarize Document </button> <button onClick={() => generateBriefing('talking-points')}> 💬 Generate Talking Points </button> <button onClick={() => generateBriefing('risk-analysis')}> 🎯 Risk Analysis </button> <button onClick={() => generateBriefing('executive')}> 📊 Executive Summary </button> </div> <textarea value={content} onChange={(e) => setContent(e.target.value)} placeholder="Paste client document or meeting notes here..." /> {result && ( <div className="result-card"> <div className="result-header"> <span className="model-badge">{result.model}</span> <span className="latency">{result.latency_ms}ms</span> </div> <div className="result-content">{result.content}</div> {result.processedBy === 'local' && !confidentialMode && ( <button onClick={() => generateBriefing(result.template, true)} className="refine-btn" > ✨ Refine for Executive Presentation </button> )} </div> )} </div> ); } This interface design embodies several principles of responsible AI UX: Privacy by default: Confidential mode is enabled unless explicitly changed, ensuring accidental cloud usage requires multiple intentional actions Clear attribution: Every result shows which model generated it and how long it took, building user trust through transparency Conditional refinement: The cloud refinement button only appears when privacy allows and local inference has completed, preventing premature cloud requests Persistent settings: Privacy preferences save to localStorage, respecting user choices across sessions Visual status indicators: The header always shows current privacy mode with recognizable icons (🔒 for confidential, 🌐 for standard) Testing Privacy and Performance Requirements A privacy-first application demands rigorous testing to ensure data never leaks unintentionally. The project includes comprehensive test suites using Vitest for unit tests and Playwright for end-to-end scenarios. Here's a critical privacy test ( tests/privacy.test.ts ): import { describe, it, expect, beforeEach } from 'vitest'; import { TestUtils } from './utils/test-helpers'; describe('Privacy Controls', () => { let testUtils: TestUtils; beforeEach(() => { testUtils = new TestUtils(); testUtils.enableConfidentialMode(); }); it('should prevent cloud API calls when confidential mode is enabled', async () => { const response = await testUtils.requestBriefing({ template: 'summary', prompt: 'Confidential merger document...', cloud: true }); expect(response.status).toBe(403); expect(response.error).toContain('disabled in confidential mode'); }); it('should allow local inference in confidential mode', async () => { const response = await testUtils.requestBriefing({ template: 'summary', prompt: 'Confidential merger document...', cloud: false }); expect(response.status).toBe(200); expect(response.model).toContain('local'); expect(response.content).toBeTruthy(); }); it('should not persist sensitive content without opt-in', async () => { await testUtils.requestBriefing({ template: 'executive', prompt: 'Strategic acquisition plan...', cloud: false }); const history = await testUtils.getConversationHistory(); expect(history).toHaveLength(0); // No storage by default }); it('should support opt-in history with explicit consent', async () => { testUtils.enableHistorySaving(); await testUtils.requestBriefing({ template: 'executive', prompt: 'Strategic acquisition plan...', cloud: false }); const history = await testUtils.getConversationHistory(); expect(history).toHaveLength(1); expect(history[0].prompt).toContain('acquisition'); }); }); Performance testing ensures local inference meets the sub-second requirement: describe('Performance SLA', () => { it('should complete local inference in under 1 second', async () => { const samples = []; for (let i = 0; i < 10; i++) { const start = Date.now(); await testUtils.requestBriefing({ template: 'summary', prompt: 'Standard 500-word document...', cloud: false }); samples.push(Date.now() - start); } const p95 = calculatePercentile(samples, 95); expect(p95).toBeLessThan(1000); // 95th percentile under 1s }); it('should handle 5 concurrent requests without degradation', async () => { const requests = Array(5).fill(null).map(() => testUtils.requestBriefing({ template: 'talking-points', prompt: 'Meeting agenda...', cloud: false }) ); const results = await Promise.all(requests); expect(results.every(r => r.status === 200)).toBe(true); expect(results.every(r => r.latency_ms < 2000)).toBe(true); }); }); These tests validate the core promise: local inference is fast, private, and reliable under realistic loads. Deployment Considerations and Production Readiness Moving from development to production requires addressing several operational concerns: model distribution, environment configuration, monitoring, and incident response. For Foundry Local deployment, ensure IT teams pre-install the runtime and required models on consultant laptops. Use MDM (Mobile Device Management) systems or Group Policy to automate model downloads during onboarding. Models can be cached in shared network locations to avoid redundant downloads across teams. Environment configuration should separate local and cloud credentials cleanly: # .env.local (local development) FOUNDRY_LOCAL_ENDPOINT=http://127.0.0.1:5272 AZURE_OPENAI_ENDPOINT=https://your-org.openai.azure.com AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini AZURE_OPENAI_KEY=your-key-here # For production, use Azure Managed Identity instead of API keys USE_MANAGED_IDENTITY=true Managed Identity eliminates API key management—the application authenticates using Azure AD, with permissions controlled via IAM policies. This prevents key leakage and simplifies rotation. Monitoring should track both local and cloud usage patterns. Implement structured logging with clear privacy labels: logger.info('Briefing generated', { model: 'local', template: 'summary', latency_ms: 847, tokens: 312, privacy_mode: 'confidential', user_id: hash(userId), // Never log raw user IDs timestamp: new Date().toISOString() }); This approach enables operational insights (average latency, most-used templates, error rates) without exposing sensitive content or user identities. For incident response, establish clear escalation paths. If Foundry Local fails, the application should gracefully degrade—inform users that local inference is unavailable and offer cloud-only mode (with explicit consent). If cloud services fail, local inference continues uninterrupted, ensuring the application remains useful even during Azure outages. Key Takeaways and Next Steps Building a privacy-first hybrid AI application requires careful architectural decisions that prioritize user data protection while maintaining high-quality outputs. The FL-Client-Briefing-Assistant demonstrates that you can achieve sub-second local inference, transparent privacy controls, and optional cloud refinement in a production-ready package. Key lessons from this implementation: Privacy must be the default, not an opt-in feature—confidential mode should require explicit action to disable Transparency builds trust—always show users which model processed their data and how long it took Fallback strategies ensure reliability—graceful degradation when services fail keeps the application useful Testing validates promises—comprehensive tests for privacy, performance, and functionality are non-negotiable Operational visibility without privacy leaks—structured logging enables monitoring without exposing sensitive content To extend this application, consider adding: Document parsing: Integrate PDF, DOCX, and PPTX extractors to analyze file uploads directly Multi-document synthesis: Combine insights from multiple client documents into unified briefings Custom templates: Allow consultants to define their own briefing formats and save them for reuse Offline mode indicators: Detect network connectivity and disable cloud features automatically Audit logging: For regulated industries, implement immutable audit trails showing when cloud refinement was used The full implementation, including all code, tests, and deployment guides, is available at github.com/leestott/FL-Client-Briefing-Assistant. Clone the repository, follow the setup guide, and experience privacy-first AI in action. Resources and Further Reading FL-Client-Briefing-Assistant Repository - Complete source code and documentation Microsoft Foundry Local Documentation - Official runtime documentation and API reference Azure OpenAI Service - Cloud refinement integration guide Project Specification - Detailed requirements and acceptance criteria Implementation Guide - Architecture decisions and design patterns Testing Guide - How to run and interpret comprehensive test suitesAnnouncing the General Availability (GA) of the Premium v2 tier of Azure API Management
Superior capacity, highest entity limits, unlimited included calls, and the most comprehensive set of features set the Premium v2 tier apart from other API Management tiers. Customers rely on the Premium v2 tier for running enterprise-wide API programs at scale, with high availability, and performance. The Premium v2 tier has a new architecture that eliminates management traffic from the customer VNet, making private networking much more secure and easier to setup. During the creation of a Premium v2 instance, you can choose between VNet injection or VNet integration (introduced in the Standard v2 tier) options. In addition, today we are also adding three new features to Premium v2: Inbound Private Link: You can now enable private endpoint connectivity to restrict inbound access to your Premium v2 instance. It can be enabled along with VNet injection or VNet integration or without a VNet. Availability zone support: Premium v2 now supports availability zones (zone redundancy) to enhance the reliability and resilience of your API gateway. Custom CA certificates: Azure API management v2 gateway can now validate TLS connections with the backend service using custom CA certificates. New and improved VNet injection Using VNet injection in Premium v2 no longer requires configuring routes or service endpoints. Customers can secure their API workloads without impacting API Management dependencies, while Microsoft can secure the infrastructure without interfering with customer API workloads. In short, the new VNet injection implementation enables both parties to manage network security and configuration settings independently and without affecting each other. You can now configure your APIs with complete networking flexibility: force tunnel all outbound traffic to on-premises, send all outbound traffic through an NVA, or add a WAF device to monitor all inbound traffic to your API Management Premium v2—all without constraints. Inbound Private Link Customers can now configure an inbound private endpoint for their API Management Premium v2 instance to allow your API consumers securely access the API Management gateway over Azure Private Link. The private endpoint uses an IP address from an Azure virtual network in which it's hosted. Network traffic between a client on your private network and API Management traverses over the virtual network and a Private Link on the Microsoft backbone network, eliminating exposure from the public internet. Further, you can configure custom DNS settings or an Azure DNS private zone to map the API Management hostname to the endpoint's private IP address. With a private endpoint and Private Link, you can: Create multiple Private Link connections to an API Management instance. Use the private endpoint to send inbound traffic on a secure connection. Apply different API Management policies based on whether traffic comes from the private endpoint. Limit incoming traffic only to private endpoints, preventing data exfiltration. Combine with inbound virtual network injection or outbound virtual network integration to provide end-to-end network isolation of your API Management clients and backend services. More details can be found here Today, only the API Management instance’s Gateway endpoint supports inbound private link connections. Each API management instance can support at most 100 Private Link connections. Availability zones Azure API Management Premium v2 now supports Availability Zones (AZ) redundancy to enhance the reliability and resilience of your API gateway. When deploying an API Management instance in an AZ-enabled region, users can choose to enable zone redundancy. This distributes the service's units, including Gateway, management plane, and developer portal, across multiple, physically separate AZs within that region. Learn how to enable AZs here. CA certificates If the API Management Gateway needs to connect to the backends secured with TLS certificates issued by private certificate authorities (CA), you need to configure custom CA certificates in the API Management instance. Custom CA certificates can be added and managed as Authorization Credentials in the Backend entities. The Backend entity has been extended with new properties allowing customers to specify a list of certificate thumbprints or subject name + issuer thumbprint pairs that Gateway should trust when establishing TLS connection with associated backend endpoint. More details can be found here. Region availability The Premium v2 tier is now generally available in six public regions (Australia East, East US2, Germany West Central, Korea Central, Norway East and UK South) with additional regions coming soon. For pricing information and regional availability, please visit the API Management pricing page. Learn more API Management v2 tiers FAQ API Management v2 tiers documentation API Management overview documentationDon’t Get Locked Out: Why Every Organization Needs Emergency Access Accounts
When systems fail—or when administrators suddenly lose access—the ability to regain control quickly can determine whether your nonprofit continues delivering essential services or faces major disruption. Emergency Access Accounts (also known as break‑glass accounts) give you a crucial safety net, ensuring your team can restore services, manage users, and adjust security settings even when normal admin access is unavailable. This updated guide explains why these accounts are vital, how to configure them correctly, and how nonprofits can secure them within Microsoft Entra ID. Why Emergency Access Accounts Matter In our previous discussion, we highlighted that resilience starts with preparation. If your primary admin accounts become locked out due to MFA issues, Conditional Access misconfigurations, outages, or human error, break‑glass accounts are your only guaranteed path to recovery. To function safely and effectively, these accounts must be: Highly secure Isolated from daily operations Able to bypass standard access controls Protected with passwordless authentication (Passkeys/FIDO2, certificates, Windows Hello) And every organization—nonprofit or otherwise—should maintain at least two for redundancy and continuity. Best Practices for Nonprofits Creating Emergency Access Accounts Before setting up a break‑glass account, review these nonprofit‑aligned security practices: 1. Use Non‑Obvious Naming Avoid predictable names like "breakglass" or "emergencyadmin." Use neutral, coded names known only to trusted administrators. 2. Create Cloud‑Only Accounts Do not sync these accounts from on‑premises directories. Cloud‑only accounts remain available even if local infrastructure goes down. 3. Don’t Assign Licenses Licenses add unnecessary exposure. Break‑glass accounts should not use email, Teams, or any cloud workloads. 4. Don’t Link the Account to a Real Person These accounts belong to the organization, not an individual. Avoid personal MFA methods like individual phones or emails. 5. Enforce Strong Password Standards 32‑character complex password (minimum) Rotate securely twice per year Do not reuse passwords Store them under a tightly governed, documented process 6. Disable Password Expiration If passwords auto‑expire, the account can break at the worst time. Rotate manually under a secure, audited process. 7. Exclude From Conditional Access Policies Break‑glass accounts must still work even when Conditional Access doesn’t. Exclude them from any policy that might block sign‑in. 8. Assign Permanent Global Administrator Role Emergency accounts need always‑on permissions. Do not use PIM‑eligible roles or time‑restricted activation. How to Create an Emergency Access Account in Microsoft Entra ID Step 1 — Create the Account Open Microsoft Entra Admin Center. Navigate to Entra ID → Users → All users. Select + New user → Create new user. Use the .onmicrosoft.com domain. Ensure Account enabled is selected. Set the Usage location. 7. Assign the Global Administrator role. 8. Review and create. Repeat the steps to establish a second emergency account as needed. Step 2 — Enable Passwordless Authentication Break‑glass accounts should always be secured using passwordless methods: Passkeys (FIDO2) Certificate‑based authentication (CBA) How to Enable FIDO2 Passkeys Go to: Entra ID → Security → Authentication methods → Policies → FIDO2 Security Key Enable FIDO2 if not already enabled and click Save. How to Enable Certificate‑Based Authentication (CBA) Step 1 — Upload Your Certificate Authority Entra Admin Center → Entra ID → Certificate authorities Upload your Root CA Mark as Root CA (if applicable) Add any intermediate CAs Provide the CRL (Certificate Revocation List) URL for revocation checks This is required so Entra can check for revoked certificates Step 2 — Turn on Certificate‑Based Authentication Go to: Entra ID → Authentication methods → Policies Choose Certificate‑based authentication 3. Switch Enable → On 4. Under Include, target only your break‑glass accounts Conclusion Emergency access accounts aren’t just a security measure—they’re an operational safeguard that protects your mission. When the unexpected happens, these accounts ensure your organization can recover quickly and continue serving your community.113Views1like0Comments