ai

10 Topics

Unified AI Weather Forecasting Pipeline thru Aurora, Foundry, and Microsoft Planetary Computer Pro
Weather shapes some of the most critical decisions we make, from protecting our critical infrastructure and global supply chains, to keeping communities safe during extreme events. As climate variability becomes more volatile, an organization’s ability to predict, assess, and plan their response to extreme weather is a defining capability for modern infrastructure owners & operators. This is especially true for the energy and utility sector — even small delays in preparations and response can cascade into massive operational risk and financial impacts, including widespread outages and millions in recovery costs. Operators of critical power infrastructure are increasingly turning to AI-powered solutions to reduce their operational and service delivery risk. “As the physical risks to our grid systems grow, so too does our technological capacity to anticipate them. Artificial intelligence has quietly reached a maturity point in utility operations-not just as a tool for optimization, about as a strategic foresight engine. The opportunity is clear: with the right data, infrastructure, and operational alignment, AI outage prediction utility grid strategies can now forecast vulnerabilities with precision and help utilities transition from reactive to preventive risk models.” – Article by Think Power Solutions Providing direct control of their data and AI analytics allows providers to make better, more actionable insights for their operations. Today, we’ll demonstrate and explore how organizations can use the state-of-the-art Aurora weather model in Microsoft Foundry with weather data provided by Microsoft Planetary Computer (MPC), an Azure based geospatial data management platform, to develop a utility industry-specific impact prediction capability. Taking Control of your Weather Prediction Microsoft Research first announced Aurora in June 2024, a cutting-edge AI foundation model enabling locally executed, on-demand, global weather forecasting, and storm-trajectory prediction generated from publicly available weather data. Two months later, Aurora became available on Microsoft Foundry elevating on-demand weather forecasting from a self-hosted experience to managed deployments, readying Aurora for broader enterprise and public adoption. Aurora’s scientific foundations and forecasting performance were peer‑reviewed and published in Nature, providing independent validation across global benchmarks. Its evolution continues with a strong commitment to openness and interoperability: In November 2025, Microsoft announced plans to open-source Aurora to accelerate innovation across the global research and developer community. Building upon the innovation and continued development of Aurora, today we are showcasing how organizations can operationalize this state-of-the-art capability with Microsoft Planetary Computer and Microsoft Planetary Computer Pro. By bringing together the vast public geospatial data stores in Planetary Computer, with the private data managed by Planetary Computer Pro, organizations can unify their weather prediction and geospatial data in a single platform, simplifying data processing pipelines and data management. This advancement allows enterprise customers to take control of their own weather forecasting on their own timeline. A Unified Weather Prediction Data Pipeline In addition, a key pain-point for energy and utility companies is the inability to reliably ingest, store, and operationalize high-volume weather data. Model inputs and outputs often sit scattered across fragmented pipelines and platforms, making decisions difficult to trace, reproduce, and reference over time. For example, referenced in articles, many utility companies have to pull public data from various silos, maintain GIS layers in another, and run operational planning in a separate environment—forcing teams to manually stitch together forecasts, assets, and risk assessments, introducing delays exactly when rapid decisions matter most. With the MPC Pro + Microsoft Foundry pipeline, utility companies transition from fragmented, manual workflows to a single operating platform – where the value lies in a seamless end-to-end data-to-model pipeline. Users can leverage Aurora on Microsoft Foundry alongside Microsoft Planetary Computer Pro’s geospatial data platform to unlock the following unified workflow: Source near real time weather data from Planetary Computer Run Aurora in Microsoft Foundry Fuse weather prediction results with geospatial data in Planetary Computer Pro for rapid assessment and post processing A Ready-to-use reference architecture This reference architecture provides a reusable pattern for operationalizing frontier weather models with Microsoft Planetary Computer Pro and Microsoft Foundry. Our architecture feeds updated global weather data, hosted by Microsoft Planetary Computer, to the Microsoft Foundry hosted model, then fuses those prediction results with enterprise geospatial context for analysis, decision-making, and action. Each component plays a distinct role in ensuring forecasts are timely, scalable, and directly usable within operational workflows. Near Real-Time Weather Data Microsoft Planetary Computer automatically ingests, indexes, and distributes up-to-date global weather data from the European Centre for Medium-Range Weather Forecasts (ECMWF) four times per day. This fully managed data pipeline ensures that the latest atmospheric datasets are continuously refreshed, standardized, and readily accessible, eliminating the need for manual data acquisition or preprocessing. Storing and Centralizing Public and Private Geospatial Data on Microsoft Planetary Computer Pro Microsoft Planetary Computer Pro enables utility operators to store, manage, and access both public and private geospatial datasets within a single Azure platform. With a Microsoft Planetary Computer Pro GeoCatalog, organizations can centralize ECMWF weather data alongside infrastructure and location data to support downstream analyses. Microsoft Foundry Hosts and Runs Weather Prediction Model on Demand Microsoft Foundry provides model access and the infrastructure required to support execution of Aurora and other weather forecasting models. Users can provision Aurora inference endpoints on their own dedicated compute. After provisioned, the user would be able to open the python notebook and run the model to execute weather forecasts on demand. Weather Forecast Outputs are Fused with Existing Data Sources on Microsoft Planetary Computer Pro Aurora’s weather prediction outputs are seamlessly integrated back into Microsoft Planetary Computer Pro, where they are fused with existing public or private geospatial datasets. This makes forecast results immediately accessible for visualization, post-processing, and analysis—such as identifying assets at risk, estimating localized impact, informing operational response plans, or pre-positioning needed assets for quick recovery. By combining AI-driven forecasts with geospatial context, organizations can move from raw predictions to actionable insights in a single workflow. This solution also provides organizations with a centralized platform to store and catalog geospatial data for future traceability. Unified Weather Prediction Demonstration This demonstration visualizes the forecast storm track (Figure 2), along with projected damage impact along the storm path and associated coastal surge areas (Figure 3 & 4). This enables users to assess asset exposure, anticipate damage due to winds, pre-position crews, and proactively protect critical infrastructure—helping reduce outage duration, lower operational costs, and improve grid resilience. & Powerplants) Getting Started The python notebook supports tracking of historical storm events, forecasting real-time storm trajectories, and overlaying critical power infrastructure structure data from OpenStreetMap to visualize overlap. To get started, deploy this solution in your Azure environment to begin generating weather forecasts and storm-track predictions. The code and documentation for running this notebook are available in the linked GitHub Repo. Sample output for you to explore are linked within this HTML. For additional resources, visit the following MS Learn pages: Microsoft Planetary Computer Pro Microsoft Foundry The interoperability between ‘GeoAI models + data platform’ extends far beyond weather prediction. It empowers organizations to take control of their geospatial data; to generate actionable insights on their own timeline, and to meet their own specific needs. With Microsoft Planetary Computer and Microsoft Foundry together, organizations will unify their enterprise geospatial data, and unlock its value with powerful, and state of the art AI solutions.
Yves-Pitsch
Mar 05, 2026 Place Microsoft Foundry Blog
451Views
2likes
0Comments
Introducing Phi-4-Reasoning-Vision to Microsoft Foundry
Vision reasoning models unlock a critical capability for developers: the ability to move beyond passive perception toward systems that can understand, reason over, and act on visual information. Instead of treating images, diagrams, documents, or UI screens as unstructured inputs, vision reasoning models enable developers to build applications that can interpret visual structure, connect it with textual context, and perform multi-step reasoning to reach actionable conclusions. Today, we are excited to announce Phi-4-Reasoning-Vision-15B is available in Microsoft Foundry and Hugging Face. This model brings high‑fidelity vision to the reasoning‑focused Phi‑4 family, extending small language models (SLMs) beyond perception into structured, multi‑step visual reasoning for agents, analytical tools, and scientific workflows. What’s new? The Phi model family has advanced toward combining efficient visual understanding with strong reasoning in small language models. Earlier Phi‑4 models demonstrated reliable perception and grounding across images and text, while later iterations introduced structured reasoning to improve performance on complex tasks. Phi‑4‑reasoning-vision-15B brings these threads together, pairing high‑resolution visual perception with selective, task‑aware reasoning. As a result, the model can reason deeply when needed while remaining fast and efficient for perception‑focused scenarios—making it well suited for interactive, real‑world applications. Key capabilities Reasoning behavior is explicitly enabled via prompting: Developers can explicitly enable or disable reasoning to balance latency and accuracy at runtime. Optimized for vision reasoning and can be used for: diagram-based math, document, chart, and table understanding, GUI interpretations and grounding for agent scenarios to interpret screens and actions, Computer-use agent scenarios, and General image chat and answering questions Benchmarks The following results summarize Phi-4-reasoning-vision-15B performance across a set of established multimodal reasoning, mathematics, and computer use benchmarks. The following benchmarks are the result of internal evaluations. Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B – force no think Phi-4-mm-instruct Kimi-VL-A3B-Instruct gemma-3-12b-it Qwen3-VL-8B-Instruct-4K Qwen3-VL-8B-Instruct-32K Qwen3-VL-32B-Instruct-4K Qwen3-VL-32B-Instruct-32K AI2D _TEST 84.8 84.7 68.6 84.6 80.4 82.7 83 84.8 85 ChartQA _TEST 83.3 76.5 23.5 87 39 83.1 83.2 84.3 84 HallusionBench 64.4 63.1 56 65.2 65.3 73.5 74.1 74.4 74.9 MathVerse _MINI 44.9 43.8 32.4 41.7 29.8 54.5 57.4 64.2 64.2 MathVision _MINI 36.2 34.2 20 28.3 31.9 45.7 50 54.3 60.5 MathVista _MINI 75.2 68.7 50.5 67.1 57.4 77.1 76.4 82.5 81.8 MMMU _VAL 54.3 52 42.3 52 50 60.7 64.6 68.6 70.6 MMStar 64.5 63.3 45.9 60 59.4 68.9 69.9 73.7 74.3 OCRBench 76 75.6 62.6 86.5 75.3 89.2 90 88.5 88.5 ScreenSpot _v2 88.2 88.3 28.5 89.8 3.5 91.5 91.5 93.7 93.9 Table 1: Accuracy comparisons relative to popular open-weight, non-thinking models Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B - force thinking Kimi-VL-A3B-Thinking gemma-3-12b-it Qwen3-VL-8B-Thinking-4K Qwen3-VL-8B-Thinking-40K Qwen3-VL-32B-Thiking-4K Qwen3-VL-32B-Thinking-40K AI2D_TEST 84.8 79.7 81.2 80.4 83.5 83.9 86.9 87.2 ChartQA _TEST 83.3 82.9 73.3 39 78 78.6 78.5 79.1 HallusionBench 64.4 63.9 70.6 65.3 71.6 73 76.4 76.6 MathVerse _MINI 44.9 53.1 61 29.8 67.3 73.3 78.3 78.2 MathVision _MINI 36.2 36.2 50.3 31.9 43.1 50.7 60.9 58.6 MathVista _MINI 75.2 74.1 78.6 57.4 77.7 79.5 83.9 83.8 MMMU _VAL 54.3 55 60.2 50 59.3 65.3 72 72.2 MMStar 64.5 63.9 69.6 59.4 69.3 72.3 75.5 75.7 OCRBench 76 73.7 79.9 75.3 81.2 82 83.7 85 ScreenSpot _v2 88.2 88.1 81.8 3.5 93.3 92.7 83.1 83.1 Table 2: Accuracy comparisons relative to popular open-weight, thinking models All results were obtained using a consistent evaluation setup and prompts across models; numbers are provided for comparison and analysis rather than as leaderboard claims. For more information regarding benchmarks and evaluations, please read the technical paper on the Microsoft Research hub. Suggested use cases and applications Phi‑4‑Reasoning-Vision-15B supports applications that require both high‑fidelity visual perception and structured inference. Two representative scenarios include scientific and mathematical reasoning over visual inputs, and computer‑using agents (CUAs) that operate directly on graphical user interfaces. In both cases, the model provides grounded visual understanding paired with controllable, low‑latency reasoning suitable for interactive systems. Computer use agents in retail scenarios For computer use agents, Phi‑4‑Reasoning-Vision-15B provides the perception and grounding layer required to understand and act within live ecommerce interfaces. For example, in an online shopping experience, the model interprets screen content—products, prices, filters, promotions, buttons, and cart state—and produces grounded observations that agentic models like Fara-7B can use to select actions. Its compact size and low latency inference make it well suited for CUA workflows and agentic applications. Visual reasoning for education Another practical use of visual reasoning models is education. A developer could build a K‑12 tutoring app with Phi‑4‑Reasoning‑Vision‑15B where students upload photos of worksheets, charts, or diagrams to get guided help—not answers. The model can understand the visual content, identify where the student went wrong, and explain the correct steps clearly. Over time, the app can adapt by serving new examples matched to the student’s learning level, turning visual problem‑solving into a personalized learning experience. Microsoft Responsible AI principles At Microsoft, our mission to empower people and organizations remains constant—especially in the age of AI, where the potential for human achievement is greater than ever. We recognize that trust is foundational to AI adoption, and earning that trust requires a commitment to transparency, safety, and accountability. As with other Phi models, Phi-4-Reasoning-Vision-15B was developed with safety as a core consideration throughout training and evaluation. The model was trained on a mixture of public safety datasets and internally generated examples designed to elicit behaviors the model should appropriately refuse, in alignment with Microsoft’s Responsible AI Principles. These safety focused training signals help the model recognize and decline requests that fall outside intended or acceptable use. Additional details on the model’s safety considerations, evaluation approach, and known limitations are provided in the accompanying technical blog and model card. Getting started Start using Phi‑4‑Reasoning-Vision-15B in Microsoft Foundry today. Microsoft Foundry provides a unified environment for model discovery, evaluation, and deployment, making it straightforward to move from initial experimentation to production use while applying appropriate safety and governance practices. Deploy the new model on Microsoft Foundry. Learn more about the Phi family on Foundry Labs and in the Phi Cookbook Connect to the Microsoft Developer Community on Discord Read the technical paper on Microsoft Research Read more use cases on the Educators Developer blog
yashlara
Mar 04, 2026 Place Microsoft Foundry Blog
623Views
0likes
0Comments
Unlocking High-Performance Inference for DeepSeek with NVFP4 on NVIDIA Blackwell
Summary We partnered closely with NVIDIA to unlock high-performance single-node inference for DeepSeek-V3.2 on NVIDIA Blackwell. By leveraging NVIDIA’s new NVFP4 checkpoint for DeepSeek-V3.2 combined with NVIDIA TensorRT LLM on NVIDIA Blackwell, we achieved breakthrough inference performance. These experiments were performed on a single node (2 Grace Blackwell superchips) of the NVIDIA GB200 NVL72 platform (hereafter referred to as an NVIDIA GB200 node), similar to the Standard_ND128isr_NDR_GB200_v6 VM available on Azure. Using an aligned apples-to-apples benchmark methodology, single-node inference using NVIDIA GB200 nodes with NVFP4 and TensorRT LLM delivers up to 2.5x lower per-user latency than similar inference configurations with NVIDIA H200 GPUs. Beyond increased performance, using NVIDIA GB200 nodes with NVFP4 dramatically increases the number of users which can be served from the same GPU footprint as an H200 deployment. While maintaining a consistent latency target across both GB200 and H200 deployments, our experiments demonstrated that single-node deployments of DeepSeek-V3.2 can serve up to 16 times as many users per GPU when using NVIDIA GB200 nodes versus NVIDIA H200 nodes. The results came from end-to-end co-optimization across three layers: Hardware: Our experiments are performed on individual nodes of the NVIDIA GB200 NVL72, with each node consisting of 2 Grace CPUs and 4 Blackwell GPUs. Model weights: NVFP4-quantized weights for DeepSeek-V3.2 are optimized to deliver high inference efficiency while preserving model quality Inference runtime: TensorRT LLM used as the production-grade serving and execution engine, configured to optimize DeepSeek on Blackwell GPUs This post details how we achieved these results, including our serving and benchmarking setup. This configuration is now used to serve DeepSeek-V3.2 on Microsoft Foundry. See here for information about DeepSeek-V3.2 on Microsoft Foundry. Unlocking Blackwell Performance DeepSeek-V3.2 offers strong reasoning performance and broad task coverage, making it well-suited for real-world production workloads. However, the sheer scale of this 690-billion-parameter Mixture-of-Experts (MoE) model presents an inherent challenge. Achieving efficient, cost-effective inference at this magnitude demands careful, end-to-end optimization across the entire stack—from model representation to runtime and system configuration—all while preserving output quality and predictable latency. The NVIDIA GB200 NVL72 platform, integrated with NVFP4 quantization and the TensorRT LLM inference engine, offers a powerful solution for delivering high-performance, cost-effective inference for DeepSeek-V3.2. NVIDIA GB200 NVL72 The NVIDIA GB200 NVL72 is a rack-scale solution that leverages 72 NVIDIA Blackwell GPUs with 36 NVIDIA Grace CPUs to deliver high performance inference. Each of the 72 Blackwell GPUs contains 186 GB of high-bandwidth HBM3e memory, a 32% increase in per-GPU memory compared to NVIDIA H200 GPUs. NVIDIA Blackwell’s second-generation transformer engines provide 10 PFLOPS of dense NVFP4 performance per GPU, a 5x increase over 2 PFLOPS for dense FP8 on H200. For MoE models like DeepSeekV3.2, Blackwell’s superior memory capacity and NVFP4 compute throughput enable higher inference performance. NVFP4 Floating Point Precision NVFP4 is an innovative 4-bit floating point format introduced with the NVIDIA Blackwell architecture. By encoding quantized blocks with non-power-of-two scaling factors, NVFP4 simultaneously enables higher performance, reduced memory footprint, and preserved model accuracy when compared with FP8 and FP16 floating point formats. NVFP4 is optimized to take advantage of NVIDIA Blackwell’s native Tensor Core support. For DeepSeek-V3.2, NVIDIA’s NVFP4 quantization reduced the memory footprint of the model by 1.7x compared to the model’s original FP8 format (415 GB vs. 690 GB), leading to significant boosts in throughput and cost savings. NVIDIA has published comprehensive quality benchmarking results for the DeepSeek-V3.2 NVFP4 model, showing that the quantized weights maintain accuracy closely aligned with the original FP8 model across a broad set of industry-standard benchmarks. Precision MMLU Pro GPQA Diamond LiveCodeBench V6 SciCode AIME 2025 FP8 0.802 0.849 0.756 0.391 0.934 NVFP4 0.799 0.835 0.756 0.401 0.923 Table 1: Results of DeepSeek-V3.2’s Accuracy Benchmark Results Across FP8 and NVFP4 Checkpoints Across reasoning, coding, and scientific benchmarks, NVFP4 delivers near-parity results relative to FP8, validating its suitability for production inference where memory efficiency and throughput are critical. For detailed model quality metrics and benchmark comparisons, see the NVIDIA DeepSeek-V3.2-NVFP4 model card. TensorRT LLM TensorRT LLM is an open-source library used for optimizing LLM inference. It provides high-performance optimizations for NVIDIA GPUs such as low-precision serving, in-flight batching, custom attention kernels, and much more. TensorRT-LLM's optimized support for sparse attention and large context windows enables DeepSeek-V3.2 to achieve breakthrough performance. Benchmark Methodology We utilized a fair and practical benchmark methodology, reflecting production-style inference patterns. Although multi-node inference is anticipated to deliver higher per-GPU performance by leveraging features like disaggregated serving and Wide EP, we focused on isolating our experiments to single-node performance to ensure a clear comparison between the two platforms. Parameter Value Input length 2,400 tokens Output length 1,600 tokens Concurrent Requests 1, 2, 4, 8, 16, 32, 64 Dataset ShareGPT_V3_unfiltered_cleaned_split Target Metrics Output Throughput (tokens/sec), End-to-End Latency (ms) Table 2: Load Profile for Performance Benchmarks As real-world inference performance varies with the quantity of active requests, our experiments measured system performance under different loads, as shown in the "Concurrent Requests" parameter in Table 2. This is sometimes referred to as “concurrency”. Our two targeted metrics are output throughput and end-to-end latency. Output throughput measures the cumulative number of tokens produced by the model per second across all concurrent requests. End-to-end latency is a measure of the total time taken by the model to complete a request. It is measured from the time the request is sent to the model, to the time the response’s final token is generated. The following script was used to benchmark performance at multiple request concurrencies using SGLang’s sglang.bench_serving tool. The script sends prompt requests to a provided inference endpoint, then gathers performance data such as throughput, time-to-first-token, and end-to-end latency. Read more about the sglang.bench_serving tool here. #!/usr/bin/env bash set -euo pipefail CONCURRENCY_LIST=(1 2 4 8 16 32 64) for max_concurrency in "${CONCURRENCY_LIST[@]}"; do echo "==> Running benchmark with --max-concurrency ${max_concurrency}" python3 -m sglang.bench_serving \ --backend sglang-oai \ --model ./DeepSeek-V3.2-NVFP4/ \ --num-prompts 500 \ --max-concurrency "${max_concurrency}" \ --tokenizer ./DeepSeek-V3.2-NVFP4/ \ --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json \ --sharegpt-output-len 1600 \ --sharegpt-context-len 2400 done Results: NVIDIA GB200 Node with NVFP4 Achieves Best Performance Figure 1: Single-Node Inference: Throughput (output tokens per second) vs Median End-to-End Latency (ms) Figure 1 plots throughput (output tokens per second) against median end-to-end latency (ms). The annotations on the graph denoted “cX:Y” outline the request concurrency at which the data point was gathered, where X is the request concurrency, and Y is the throughput observed at that request concurrency, measured in tokens per second. Higher and further left indicates better efficiency. Across all concurrencies, the configuration using GB200, NVFP4, and TensorRT LLM consistently achieves the best efficiency. Configuration Concurrency Throughput (tks/s) Median E2E Latency (ms) Throughput/Latency GB200 with NVFP4 1 272 5,801 0.047 GB200 with FP8 1 228 7,015 0.033 H200 with FP8 1 109 14,716 0.007 Table 3: Single-concurrency requests Key findings: Up to 2.5x lower end-to-end latency: A GB200 node with NVFP4 delivers 5801 ms median latency vs. 14716 ms median latency on an H200 node at concurrency 1, a 2.5x improvement. Best efficiency curve: Figure 1 shows that at each concurrency, a GB200 node with NVFP4 has the highest throughput and lowest latency compared to both a GB200 node with FP8 and an H200 node with FP8. Serve up to 16x more users per GPU: Given an end-to-end latency target of 15,000 milliseconds, single-node inference for DeepSeek-V3.2 with NVFP4 on an NVIDIA GB200 node yields 8x the throughput and can serve up to 8 concurrent users, while an NVIDIA H200 node can serve only 1 user. Since our GB200 NVL72 nodes contain 4 GPUs, and our H200 nodes contain 8 GPUs, this translates to 16x higher performance per GPU. Serving Configuration Reference This section shows how we served DeepSeek-V3.2 in our experiments. Depending on the hardware and software configurations used, replicated results may vary. Parameter Blackwell NVFP4 Blackwell FP8 Hopper FP8 GPUs 4x GB200 4x GB200 8x H200 Nodes 1 Tensor Parallelism 4 4 8 Max Batch Size 64 MTP Enabled Yes Inference Engine TensorRT LLM v1.2.0rc8 Model Checkpoint nvidia/DeepSeek-V3.2-NVFP4 deepseek-ai/DeepSeek-V3.2 deepseek-ai/DeepSeek-V3.2 Table 4: Serving Configuration Parameters Config File cuda_graph_config: enable_padding: true batch_sizes: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,32,64] kv_cache_config: free_gpu_memory_fraction: 0.8 dtype: fp8 moe_config: backend: TRTLLM speculative_config: decoding_type: MTP num_nextn_predict_layers: 3 Note: While model weights use NVFP4, the KV cache remains FP8 to balance memory efficiency with numerical stability during decode. Serving the Model The following command launches the TensorRT LLM inference server for the DeepSeek-V3.2 model, using the preceding configuration file and exposing the service on port 30000 for high-throughput serving. trtllm-serve serve ./DeepSeek-V3.2-NVFP4/ \ --tp_size 4 \ --max_batch_size 64 \ --trust_remote_code \ --extra_llm_api_options ./config.yaml \ --host 0.0.0.0 \ --port 30000 What’s Next Rack-Scale Inference This blog post focuses on single-node inference (2 Grace CPUs and 4 Blackwell GPUs and 8 GPUs for H200). We anticipate even greater performance improvements with multi-node serving configurations, including those leveraging disaggregated serving, TensorRT LLM’s Wide EP capabilities, and all 72 GPUs on the NVIDIA GB200 NVL72 rack system Apply Approach to New Models We plan to apply the same approach of introducing Blackwell, NVFP4, TensorRT LLM, and kernel tuning to additional model families. Acknowledgements This work was enabled by close collaboration between engineering teams from Microsoft and NVIDIA. Key contributors include Xiaoran Li, Tao Wang, and Vivek Ramaswamy from Microsoft; and Stephen McCullough, Anurag Mukkara, and Nikhar Maheshwari from NVIDIA.
xiaoranli
Feb 27, 2026 Place Microsoft Foundry Blog
886Views
0likes
0Comments
Open-Source SDK for Evaluating AI Model Outputs (Sharing Resource)
Hi everyone, I wanted to share a helpful open-source resource for developers working with LLMs, AI agents, or prompt-based applications. One common challenge in AI development is evaluating model outputs in a consistent and structured way. Manual evaluation can be subjective and time-consuming. The project below provides a framework to help with that: AI-Evaluation SDK https://github.com/future-agi/ai-evaluation Key Features: - Ready-to-use evaluation metrics - Supports text, image, and audio evaluation - Pre-defined prompt templates - Quickstart examples available in Python and TypeScript - Can integrate with workflows using toolkits like LangChain Use Case: If you are comparing different models or experimenting with prompt variations, this SDK helps standardize the evaluation process and reduces manual scoring effort. If anyone has experience with other evaluation tools or best practices, I’d be interested to hear what approaches you use
vihargadhesariya
Jan 01, 2026 Place Microsoft Foundry Discussions
125Views
0likes
1Comment
Predictions for Artificial Intelligence in next 2-3 years!!!!
2025 - start of agentic AI -Oct 2025: Chatgpt 5 get released (proven to be 10000x times more powerful than chatgpt 4 and can run task automatically) 2026 AI benchmark matches human, beginning of Artificial general intelligence 2027 A new website called letsbuiltai is open source and encourages everyone to train AI. Instead of you training your own AI or an Ai company training their own AI. This would involves everyone training a particular AI simultaneously, paving way for faster and quicker AI growth
HaroldL105
Sep 01, 2025 Place Microsoft Foundry Discussions
268Views
0likes
2Comments
Introducing Azure AI Models: The Practical, Hands-On Course for Real Azure AI Skills
Hello everyone, Today, I’m excited to share something close to my heart. After watching so many developers, including myself—get lost in a maze of scattered docs and endless tutorials, I knew there had to be a better way to learn Azure AI. So, I decided to build a guide from scratch, with a goal to break things down step by step—making it easy for beginners to get started with Azure, My aim was to remove the guesswork and create a resource where anyone could jump in, follow along, and actually see results without feeling overwhelmed. Introducing Azure AI Models Guide. This is a brand new, solo-built, open-source repo aimed at making Azure AI accessible for everyone—whether you’re just getting started or want to build real, production-ready apps using Microsoft’s latest AI tools. The idea is simple: bring all the essentials into one place. You’ll find clear lessons, hands-on projects, and sample code in Python, JavaScript, C#, and REST—all structured so you can learn step by step, at your own pace. I wanted this to be the resource I wish I’d had when I started: straightforward, practical, and friendly to beginners and pros alike. It’s early days for the project, but I’m excited to see it grow. If you’re curious.. Check out the repo at https://github.com/DrHazemAli/Azure-AI-Models Your feedback—and maybe even your contributions—will help shape where it goes next!
Solved
hazem
Jul 21, 2025 Place Microsoft Foundry Discussions
1KViews
1like
5Comments
Introducing AzureSoraSDK: A Community C# SDK for Azure OpenAI Sora Video Generation
Hello everyone! I’m excited to share the first community release of AzureSoraSDK, a fully-featured .NET 6+ class library that makes it incredibly easy to generate AI-driven videos using Azure’s OpenAI Sora model and even improve your prompts on the fly. 🔗 Repository: https://github.com/DrHazemAli/AzureSoraSDK
hazem
Jun 11, 2025 Place Microsoft Foundry Discussions
419Views
0likes
2Comments
PacketMind: My Take on Building a Smarter DPI Tool with Azure AI
Just wanted to share a small but meaningful project I recently put together PacketMind. It’s a lightweight Deep Packet Inspection (DPI) tool designed to help detect suspicious network traffic using Azure’s AI capabilities. And, honestly, this project is a personal experiment that stemmed from one simple thought: Why does DPI always have to be bulky, expensive, and stuck in legacy systems? I mean, think about it. Most of the time, we have to jump through hoops just to get basic packet inspection features, let alone advanced AI-powered traffic analysis. So I figured – let’s see how far we can go by combining Azure’s language models with some good old packet sniffing on Linux. What’s Next? Let’s be honest – PacketMind is an early prototype. There’s a lot I’d love to add: - GUI Interface for easier use - Custom Model Integration (right now it’s tied to a specific Azure model) - More Protocol Support – think beyond HTTP/S - Alerting Features – maybe even Slack/Discord hooks But for now, I’m keeping it simple and focusing on making the core functionality solid. Why Share This? You know, I could’ve just kept this as a side project on my machine, but sharing is part of the fun. If even one person finds PacketMind useful or gets inspired to build something similar, I’ll consider it a win. So, if you’re into networking, AI, or just like to mess with packet data for fun – check it out. Fork it, test it, break it, and let me know how you’d make it better. Here’s the repo: https://github.com/DrHazemAli/packetmind Would love to hear your thoughts, suggestions, or just a thumbs up if you think it’s cool. Cheers!
hazem
Jun 02, 2025 Place Microsoft Foundry Discussions
112Views
1like
0Comments
Using Neural Network to Learn Profitable Trading in the FOREX Markets
I am using Neural Networks (NN) to teach them how to recognize profitable trading opportunities in the Foreign Exchange (FOREX) markets, using 10 currencies simultaneously. I am using 3rd-order Cubic Splines as input to give the NNs a sense of how the critical variables change over time. I am using free FOREX historical trading data to train the NNs how to trade profitably in the future. I don't just feed the trading levels of the FOREX currency pairs as input to the NNs. Instead, I use a variation of the computed DXY Index for all 10 currencies in order to isolate the value change of each of the individual currencies, using Cubic Splines to detail how those values change over various time periods. The end result is Neural Networks that recognize which currencies to Buy and which ones to Sell at the most profitable times. If anyone is interested in the details, please reach out and I will provide more details.
dbaechtel
Apr 20, 2025 Place Microsoft Foundry Discussions
419Views
1like
3Comments
Cognitive Services adds Brazilian Portuguese to Neural Text to Speech
We are expanding our available neural TTS voices with Francisca, a new Brazilian Portuguese (pt-BR) voice. With natural-sounding speech, neural TTS significantly reduces listening fatigue when users are interacting with AI systems.
QinyingLiao
Apr 02, 2020 Place Microsoft Foundry Blog
7.3KViews
0likes
0Comments