static web apps
50 TopicsBuilding Static Web Apps with database connections: Best Practices
With the announcement of Static Web Apps' database connections feature, when should you use database connections versus building your own backend APIs? What is Data API builder and how does it relate to Static Web Apps' database connections feature? We cover these topics and more in this blog post.11KViews6likes6CommentsCloud-native architectures on Azure: Microsoft Ignite update
Learn about our latest innovation on containers and serverless that empowers every developer to take advantage of flexible choices to create microservices applications. Join us at Microsoft Ignite to dive deeper with demos on cloud-native technologies.9.4KViews5likes0CommentsFrom Timeouts to Triumph: Optimizing GPT-4o-mini for Speed, Efficiency, and Reliability
The Challenge Large-scale generative AI deployments can stretch system boundaries — especially when thousands of concurrent requests require both high throughput and low latency. In one such production environment, GPT-4o-mini deployments running under Provisioned Throughput Units (PTUs) began showing sporadic 408 (timeout) and 429 (throttling) errors. Requests that normally completed in seconds were occasionally hitting the 60-second timeout window, causing degraded experiences and unnecessary retries. Initial suspicion pointed toward PTU capacity limitations, but deeper telemetry revealed a different cause. What the Data Revealed Using Azure Data Explorer (Kusto), API Management (APIM) logs, and OpenAI billing telemetry, a detailed investigation uncovered several insights: Latency was not correlated with PTU utilization: PTU resources were healthy and performing within SLA even during spikes. Time-Between-Tokens (TBT) stayed consistently low (~8–10 ms): The model was generating tokens steadily. Excessive token output was the real bottleneck: Requests generating 6K–8K tokens simply required more time than allowed in the 60-second completion window. In short — the model wasn’t slow; the workload was oversized. The Optimization Opportunity The analysis opened a broader optimization opportunity: Balance token length with throughput targets. Introduce architectural patterns to prevent timeout or throttling cascades under load. Enforce automatic token governance instead of relying on client-side discipline. The Solution Three engineering measures delivered immediate impact: token optimization, spillover routing, and policy enforcement. Right-size the Token Budget Empirical throughput for GPT-4o-mini: ~33 tokens/sec → ~2K tokens in 60s. Enforced max_tokens = 2000 for synchronous requests. Enabled streaming responses for longer outputs, allowing incremental delivery without hitting timeout limits. Enable Spillover for Continuity Implemented multi-region spillover using Azure Front Door and APIM Premium gateways. When PTU queues reached capacity or 429s appeared, requests were routed to Standard deployments in secondary regions. The result: graceful degradation and uninterrupted user experience. Govern with APIM Policies Added inbound policies to inspect and adjust max_tokens dynamically. On 408/429 responses, APIM retried and rerouted traffic based on spillover logic. The Results After optimization, improvements were immediate and measurable: Latency Reduction: Significant improvement in end-to-end response times across high-volume workloads Reliability Gains: 408/429 errors fell from >1% to near zero. Cost Efficiency: Average token generation decreased by ~60%, reducing per-request costs. Scalability: Spillover routing ensured consistent performance during regional or capacity surges. Governance: APIM policies established a reusable token-control framework for future AI workloads. Lessons Learned Latency isn’t always about capacity: Investigate workload patterns before scaling hardware. Token budgets define the user experience: Over-generation can quietly break SLA compliance. Design for elasticity: Spillover and multi-region routing maintain continuity during spikes. Measure everything: Combine KQL telemetry, latency and token tracking for faster diagnostics. The Outcome By applying data-driven analysis, architectural tuning, and automated governance, the team turned an operational bottleneck into a model of consistent, scalable performance. The result: Faster responses. Lower costs. Higher trust. A blueprint for building resilient, high-throughput AI systems on Azure.261Views4likes0CommentsAnnouncing Distributed Functions (Preview) for Azure Static Web Apps
Today, we're announcing distributed functions for Azure Static Web Apps managed functions. With distributed functions, the managed functions of your static web app are distributed to regions of high request loads, improving your app performance, decreasing your network latency, and enhancing your request responsiveness.4.9KViews4likes0CommentsImproved Next.js support (Preview) for Azure Static Web Apps
Next.js is a popular framework for building modern web applications with React, making it a common workload to deploy to Azure Static Web Apps’ optimized hosting of frontend web applications. We are excited to announce that we have improved our support for Next.js on Azure Static Web Apps, increasing the compatibility with recent Next.js features and providing support for the newest Next.js versions, enabling you to deploy and run your Next.js applications with full feature access on Azure.5.8KViews4likes1CommentAccess network isolated APIs and databases from Azure Static Web Apps
Azure Static Web Apps are often complemented with APIs, microservices and databases. At times though, these downstream services are isolated within a virtual network, preventing access from Static Web Apps’ managed Azure functions. This article explains how we can configure our Static Web Apps to support such access by leveraging Static Web Apps’ linked backends feature.6KViews4likes2Comments