javascript
36 TopicsJS AI Build-a-thon Setup in 5 Easy Steps
đĽ TL;DR â Youâre 5 Steps Away from an AI Adventure Set up your project repo, follow the quests, build cool stuff, and level up. Everythingâs automated, community-backed, and designed to help you actually learn AI â using the skills you already have. Letâs build the future. One quest at a time. đ Join the Build-a-thon | Chat on DiscordUse AI for Free with GitHub Models and TypeScript! đ¸đ¸đ¸
Learn how to use AI for free with GitHub Models! Test models like GPT-4o without paying for APIs or setting up infrastructure. This step-by-step guide shows how to integrate GitHub Models with TypeScript in the Microblog AI Remix project. Start exploring AI for free today!RealâTime AI Streaming with Azure OpenAI and SignalR
TL;DR Weâll build a real-time AI app where Azure OpenAI streams responses and SignalR broadcasts them live to an Angular client. Users see answers appear incrementally just like ChatGPT while Azure SignalR Service handles scale. Youâll learn the architecture, streaming code, Angular integration, and optional enhancements like typing indicators and multi-agent scenarios. Why This Matters Modern users expect instant feedback. Waiting for a full AI response feels slow and breaks engagement. Streaming responses: Reduces perceived latency: Users see content as itâs generated. Improves UX: Mimics ChatGPTâs typing effect. Keeps users engaged: Especially for long-form answers. Scales for enterprise: Azure SignalR Service handles thousands of concurrent connections. What youâll build A SignalR Hub that calls Azure OpenAI with streaming enabled and forwards partial output to clients as it arrives. An Angular client that connects over WebSockets/SSE to the hub and renders partial content with a typing indicator. An optional Azure SignalR Service layer for scalable connection management (thousands to millions of longâlived connections). References: SignalR hosting & scale; Azure SignalR Service concepts. Architecture The hub calls Azure OpenAI with streaming enabled (await foreach over updates) and broadcasts partials to clients. Azure SignalR Service (optional) offloads connection scale and removes stickyâsession complexity in multiânode deployments. References: Streaming code pattern; scale/ARR affinity; Azure SignalR integration. Prerequisites Azure OpenAI resource with a deployed model (e.g., gpt-4o or gpt-4o-mini) .NET 8 API + ASP.NET Core SignalR backend Angular 16+ frontend (using microsoftâ/signalr) StepâbyâStep Implementation 1) Backend: ASP.NET Core + SignalR Install packages dotnet add package Microsoft.AspNetCore.SignalR dotnet add package Azure.AI.OpenAI --prerelease dotnet add package Azure.Identity dotnet add package Microsoft.Extensions.AI dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease # Optional (managed scale): Azure SignalR Service dotnet add package Microsoft.Azure.SignalR Using DefaultAzureCredential (Entra ID) avoids storing raw keys in code and is the recommended auth model for Azure services. Program.cs var builder = WebApplication.CreateBuilder(args); builder.Services.AddSignalR(); // To offload connection management to Azure SignalR Service, uncomment: // builder.Services.AddSignalR().AddAzureSignalR(); builder.Services.AddSingleton<AiStreamingService>(); var app = builder.Build(); app.MapHub<ChatHub>("/chat"); app.Run(); AiStreamingService.cs - streams content from Azure OpenAI using Microsoft.Extensions.AI; using Azure.AI.OpenAI; using Azure.Identity; public class AiStreamingService { private readonly IChatClient _chatClient; public AiStreamingService(IConfiguration config) { var endpoint = new Uri(config["AZURE_OPENAI_ENDPOINT"]!); var deployment = config["AZURE_OPENAI_DEPLOYMENT"]!; // e.g., "gpt-4o-mini" var azureClient = new AzureOpenAIClient(endpoint, new DefaultAzureCredential()); _chatClient = azureClient.GetChatClient(deployment).AsIChatClient(); } public async IAsyncEnumerable<string> StreamReplyAsync(string userMessage) { var messages = new List<ChatMessage> { ChatMessage.CreateSystemMessage("You are a helpful assistant."), ChatMessage.CreateUserMessage(userMessage) }; await foreach (var update in _chatClient.CompleteChatStreamingAsync(messages)) { // Only text parts; ignore tool calls/annotations var chunk = string.Join("", update.Content .Where(p => p.Kind == ChatMessageContentPartKind.Text) .Select(p => ((TextContent)p).Text)); if (!string.IsNullOrEmpty(chunk)) yield return chunk; } } } Modern .NET AI extensions (Microsoft.Extensions.AI) expose a unified streaming pattern via CompleteChatStreamingAsync. ChatHub.cs - pushes partials to the caller using Microsoft.AspNetCore.SignalR; public class ChatHub : Hub { private readonly AiStreamingService _ai; public ChatHub(AiStreamingService ai) => _ai = ai; // Client calls: connection.invoke("AskAi", prompt) public async Task AskAi(string prompt) { var messageId = Guid.NewGuid().ToString("N"); await Clients.Caller.SendAsync("typing", messageId, true); await foreach (var partial in _ai.StreamReplyAsync(prompt)) { await Clients.Caller.SendAsync("partial", messageId, partial); } await Clients.Caller.SendAsync("typing", messageId, false); await Clients.Caller.SendAsync("completed", messageId); } } 2) Frontend: Angular client with microsoftâ/signalr Install the SignalR client npm i microsoft/signalr Create a SignalR service (Angular) // src/app/services/ai-stream.service.ts import { Injectable } from '@angular/core'; import * as signalR from '@microsoft/signalr'; import { BehaviorSubject, Observable } from 'rxjs'; @Injectable({ providedIn: 'root' }) export class AiStreamService { private connection?: signalR.HubConnection; private typing$ = new BehaviorSubject<boolean>(false); private partial$ = new BehaviorSubject<string>(''); private completed$ = new BehaviorSubject<boolean>(false); get typing(): Observable<boolean> { return this.typing$.asObservable(); } get partial(): Observable<string> { return this.partial$.asObservable(); } get completed(): Observable<boolean> { return this.completed$.asObservable(); } async start(): Promise<void> { this.connection = new signalR.HubConnectionBuilder() .withUrl('/chat') // same origin; use absolute URL if CORS .withAutomaticReconnect() .configureLogging(signalR.LogLevel.Information) .build(); this.connection.on('typing', (_id: string, on: boolean) => this.typing$.next(on)); this.connection.on('partial', (_id: string, text: string) => { // Append incremental content this.partial$.next((this.partial$.value || '') + text); }); this.connection.on('completed', (_id: string) => this.completed$.next(true)); await this.connection.start(); } async ask(prompt: string): Promise<void> { // Reset state per request this.partial$.next(''); this.completed$.next(false); await this.connection?.invoke('AskAi', prompt); } } Angular component // src/app/components/ai-chat/ai-chat.component.ts import { Component, OnInit } from '@angular/core'; import { AiStreamService } from '../../services/ai-stream.service'; @Component({ selector: 'app-ai-chat', templateUrl: './ai-chat.component.html', styleUrls: ['./ai-chat.component.css'] }) export class AiChatComponent implements OnInit { prompt = ''; output = ''; typing = false; done = false; constructor(private ai: AiStreamService) {} async ngOnInit() { await this.ai.start(); this.ai.typing.subscribe(on => this.typing = on); this.ai.partial.subscribe(text => this.output = text); this.ai.completed.subscribe(done => this.done = done); } async send() { this.output = ''; this.done = false; await this.ai.ask(this.prompt); } } HTML Template <!-- src/app/components/ai-chat/ai-chat.component.html --> <div class="chat"> <div class="prompt"> <input [(ngModel)]="prompt" placeholder="Ask me anythingâŚ" /> <button (click)="send()">Send</button> </div> <div class="response"> <pre>{{ output }}</pre> <div class="typing" *ngIf="typing">Assistant is typingâŚ</div> <div class="done" *ngIf="done">â Completed</div> </div> </div> Streaming modes, content filters, and UX Azure OpenAI streaming interacts with content filtering in two ways: Default streaming: The service buffers output into content chunks and runs content filters before each chunk is emitted; you still stream, but not necessarily tokenâbyâtoken. Asynchronous Filter (optional): The service returns tokenâlevel updates immediately and runs filters asynchronously. You get ultraâsmooth streaming but must handle delayed moderation signals (e.g., redaction or halting the stream). Best practices Append partials in small batches clientâside to avoid DOM thrash; finalize formatting on "completed". Log full messages serverâside only after completion to keep histories consistent (mirrors agent frameworks). Security & compliance Auth: Prefer Microsoft Entra ID (DefaultAzureCredential) to avoid key sprawl; use RBAC and Managed Identities where possible. Secrets: Store Azure SignalR connection strings in Key Vault and rotate periodically; never hardcode. CORS & crossâdomain: When hosting frontend and hub on different origins, configure CORS and use absolute URLs in withUrl(...). Connection management & scaling tips Persistent connection load: SignalR consumes TCP resources; separate heavy realâtime workloads or use Azure SignalR to protect other apps. Sticky sessions (selfâhosted): Required in most multiâserver scenarios unless WebSocketsâonly + SkipNegotiation applies; Azure SignalR removes this requirement. Learn more AIâPowered Group Chat sample (ASP.NET Core): Azure OpenAI .NET client (auth & streaming): SignalR JavaScript ClientAI Repo of the Week: Generative AI for Beginners with JavaScript
Introduction Ready to explore the fascinating world of Generative AI using your JavaScript skills? This weekâs featured repository, Generative AI for Beginners with JavaScript, is your launchpad into the future of application development. Whether you're just starting out or looking to expand your AI toolbox, this open-source GitHub resource offers a rich, hands-on journey. It includes interactive lessons, quizzes, and even time-travel storytelling featuring historical legends like Leonardo da Vinci and Ada Lovelace. Each chapter combines narrative-driven learning with practical exercises, helping you understand foundational AI concepts and apply them directly in code. Itâs immersive, educational, and genuinely fun. What You'll Learn 1. đ§ Foundations of Generative AI and LLMs Start with the basics: What is generative AI? How do large language models (LLMs) work? This chapter lays the groundwork for how these technologies are transforming JavaScript development. 2. đ Build Your First AI-Powered App Walk through setting up your environment and creating your first AI app. Learn how to configure prompts and unlock the potential of AI in your own projects. 3. đŻ Prompt Engineering Essentials Get hands-on with prompt engineering techniques that shape how AI models respond. Explore strategies for crafting prompts that are clear, targeted, and effective. 4. đŚ Structured Output with JSON Learn how to guide the model to return structured data formats like JSONâcritical for integrating AI into real-world applications. 5. đ Retrieval-Augmented Generation (RAG) Go beyond static prompts by combining LLMs with external data sources. Discover how RAG lets your app pull in live, contextual information for more intelligent results. 6. đ ď¸ Function Calling and Tool Use Give your LLM new powers! Learn how to connect your own functions and tools to your app, enabling more dynamic and actionable AI interactions. 7. đ Model Context Protocol (MCP) Dive into MCP, a new standard for organizing prompts, tools, and resources. Learn how it simplifies AI app development and fosters consistency across projects. 8. âď¸ Enhancing MCP Clients with LLMs Build on what youâve learned by integrating LLMs directly into your MCP clients. See how to make them smarter, faster, and more helpful. ⨠More chapters coming soonâwatch the repo for updates! Companion App: Interact with History Experience the power of generative AI in action through the companion web appâwhere you can chat with historical figures and witness how JavaScript brings AI to life in real time. Conclusion Generative AI for Beginners with JavaScript is more than a courseâitâs an adventure into how storytelling, coding, and AI can come together to create something fun and educational. Whether you're here to upskill, experiment, or build the next big thing, this repository is your all-in-one resource to get started with confidence. đ Jump into the future of developmentâcheck out the repo and start building with AI today!Add speech input & output to your app with the free browser APIs
One of the amazing benefits of modern machine learning is that computers can reliably turn text into speech, or transcribe speech into text, across multiple languages and accents. We can then use those capabilities to make our web apps more accessible for anyone who has a situational, temporary, or chronic issue that makes typing difficult. That describes so many people - for example, a parent holding a squirmy toddler in their hands, an athlete with a broken arm, or an individual with Parkinson's disease. There are two approaches we can use to add speech capabilities to our apps: Use the built-in browser APIs: the SpeechRecognition API and SpeechSynthesis API. Use a cloud-based service, like the Azure Speech API. Which one to use? The great thing about the browser APIs is that they're free and available in most modern browsers and operating systems. The drawback of the APIs is that they're often not as powerful and flexible as cloud-based services, and the speech output often sounds more robotic. There are also a few niche browser/OS combos where the built-in APIs don't work. That's why we decided to add both options to our most popular RAG chat solution, to give developers the option to decide for themselves. However, in this post, I'm going to show you how to add speech capabilities using the free built-in browser APIs, since free APIs are often easier to get started with and it's important to do what we can to improve the accessibility of our apps. The GIF below shows the end result, a chat app with both speech input and output buttons: All of the code described in this post is part of openai-chat-vision-quickstart, so you can grab the full code yourself after seeing how it works. Speech input with SpeechRecognition API To make it easier to add a speech input button to any app, I'm wrapping the functionality inside a custom HTML element, SpeechInputButton . First I construct the speech input button element with an instance of the SpeechRecognition API, making sure to use the browser's preferred language if any are set: class SpeechInputButton extends HTMLElement { constructor() { super(); this.isRecording = false; const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition; if (!SpeechRecognition) { this.dispatchEvent( new CustomEvent("speecherror", { detail: { error: "SpeechRecognition not supported" }, }) ); return; } this.speechRecognition = new SpeechRecognition(); this.speechRecognition.lang = navigator.language || navigator.userLanguage; this.speechRecognition.interimResults = false; this.speechRecognition.continuous = true; this.speechRecognition.maxAlternatives = 1; } Then I define the connectedCallback() method that will be called whenever this custom element has been added to the DOM. When that happens, I define the inner HTML to render a button and attach event listeners for both mouse and keyboard events. Since we want this to be fully accessible, keyboard support is important. connectedCallback() { this.innerHTML = ` <button class="btn btn-outline-secondary" type="button" title="Start recording (Shift + Space)"> <i class="bi bi-mic"></i> </button>`; this.recordButton = this.querySelector('button'); this.recordButton.addEventListener('click', () => this.toggleRecording()); document.addEventListener('keydown', this.handleKeydown.bind(this)); } handleKeydown(event) { if (event.key === 'Escape') { this.abortRecording(); } else if (event.key === ' ' && event.shiftKey) { // Shift + Space event.preventDefault(); this.toggleRecording(); } } toggleRecording() { if (this.isRecording) { this.stopRecording(); } else { this.startRecording(); } } The majority of the code is in the startRecording function. It sets up a listener for the "result" event from the SpeechRecognition instance, which contains the transcribed text. It also sets up a listener for the "end" event, which is triggered either automatically after a few seconds of silence (in some browsers) or when the user ends the recording by clicking the button. Finally, it sets up a listener for any "error" events. Once all listeners are ready, it calls start() on the SpeechRecognition instance and styles the button to be in an active state. startRecording() { if (this.speechRecognition == null) { this.dispatchEvent( new CustomEvent("speech-input-error", { detail: { error: "SpeechRecognition not supported" }, }) ); } this.speechRecognition.onresult = (event) => { let input = ""; for (const result of event.results) { input += result[0].transcript; } this.dispatchEvent( new CustomEvent("speech-input-result", { detail: { transcript: input }, }) ); }; this.speechRecognition.onend = () => { this.isRecording = false; this.renderButtonOff(); this.dispatchEvent(new Event("speech-input-end")); }; this.speechRecognition.onerror = (event) => { if (this.speechRecognition) { this.speechRecognition.stop(); if (event.error == "no-speech") { this.dispatchEvent( new CustomEvent("speech-input-error", { detail: {error: "No speech was detected. Please check your system audio settings and try again."}, })); } else if (event.error == "language-not-supported") { this.dispatchEvent( new CustomEvent("speech-input-error", { detail: {error: "The selected language is not supported. Please try a different language.", }})); } else if (event.error != "aborted") { this.dispatchEvent( new CustomEvent("speech-input-error", { detail: {error: "An error occurred while recording. Please try again: " + event.error}, })); } } }; this.speechRecognition.start(); this.isRecording = true; this.renderButtonOn(); } If the user stops the recording using the keyboard shortcut or button click, we call stop() on the SpeechRecognition instance. At that point, anything the user had said will be transcribed and become available via the "result" event. stopRecording() { if (this.speechRecognition) { this.speechRecognition.stop(); } } Alternatively, if the user presses the Escape keyboard shortcut, we instead call abort() on the SpeechRecognition instance, which stops the recording and does not send any previously untranscribed speech over. abortRecording() { if (this.speechRecognition) { this.speechRecognition.abort(); } } Once the custom HTML element is fully defined, we register it with the desired tag name, speech-input-button : customElements.define("speech-input-button", SpeechInputButton); To use the custom speech-input-button element in a chat application, we add it to the HTML for the chat form: <speech-input-button></speech-input-button> <input id="message" name="message" type="text" rows="1"></input> Then we attach an event listener for the custom events dispatched by the element, and we update the input text field with the transcribed text: const speechInputButton = document.querySelector("speech-input-button"); speechInputButton.addEventListener("speech-input-result", (event) => { messageInput.value += " " + event.detail.transcript.trim(); messageInput.focus(); }); You can see the full custom HTML element code in speech-input.js and the usage in index.html. There's also a fun pulsing animation for the button's active state in styles.css. Speech output with SpeechSynthesis API Once again, to make it easier to add a speech output button to any app, I'm wrapping the functionality inside a custom HTML element, SpeechOutputButton . When defining the custom element, we specify an observed attribute named "text", to store whatever text should be turned into speech when the button is clicked. class SpeechOutputButton extends HTMLElement { static observedAttributes = ["text"]; In the constructor, we check to make sure the SpeechSynthesis API is supported, and remember the browser's preferred language for later use. constructor() { super(); this.isPlaying = false; const SpeechSynthesis = window.speechSynthesis || window.webkitSpeechSynthesis; if (!SpeechSynthesis) { this.dispatchEvent( new CustomEvent("speech-output-error", { detail: { error: "SpeechSynthesis not supported" } })); return; } this.synth = SpeechSynthesis; this.lngCode = navigator.language || navigator.userLanguage; } When the custom element is added to the DOM, I define the inner HTML to render a button and attach mouse and keyboard event listeners: connectedCallback() { this.innerHTML = ` <button class="btn btn-outline-secondary" type="button"> <i class="bi bi-volume-up"></i> </button>`; this.speechButton = this.querySelector("button"); this.speechButton.addEventListener("click", () => this.toggleSpeechOutput() ); document.addEventListener('keydown', this.handleKeydown.bind(this)); } The majority of the code is in the toggleSpeechOutput function. If the speech is not yet playing, it creates a new SpeechSynthesisUtterance instance, passes it the "text" attribute, and sets the language and audio properties. It attempts to use a voice that's optimal for the desired language, but falls back to "en-US" if none is found. It attaches event listeners for the start and end events, which will change the button's style to look either active or unactive. Finally, it tells the SpeechSynthesis API to speak the utterance. toggleSpeechOutput() { if (!this.isConnected) { return; } const text = this.getAttribute("text"); if (this.synth != null) { if (this.isPlaying || text === "") { this.stopSpeech(); return; } // Create a new utterance and play it. const utterance = new SpeechSynthesisUtterance(text); utterance.lang = this.lngCode; utterance.volume = 1; utterance.rate = 1; utterance.pitch = 1; let voice = this.synth .getVoices() .filter((voice) => voice.lang === this.lngCode)[0]; if (!voice) { voice = this.synth .getVoices() .filter((voice) => voice.lang === "en-US")[0]; } utterance.voice = voice; if (!utterance) { return; } utterance.onstart = () => { this.isPlaying = true; this.renderButtonOn(); }; utterance.onend = () => { this.isPlaying = false; this.renderButtonOff(); }; this.synth.speak(utterance); } } When the user no longer wants to hear the speech output, indicated either via another press of the button or by pressing the Escape key, we call cancel() from the SpeechSynthesis API. stopSpeech() { if (this.synth) { this.synth.cancel(); this.isPlaying = false; this.renderButtonOff(); } } Once the custom HTML element is fully defined, we register it with the desired tag name, speech-output-button : customElements.define("speech-output-button", SpeechOutputButton); To use this custom speech-output-button element in a chat application, we construct it dynamically each time that we've received a full response from an LLM, and call setAttribute to pass in the text to be spoken: const speechOutput = document.createElement("speech-output-button"); speechOutput.setAttribute("text", answer); messageDiv.appendChild(speechOutput); You can see the full custom HTML element code in speech-output.js and the usage in index.html. This button also uses the same pulsing animation for the active state, defined in styles.css. Acknowledgments I want to give a huge shout-out to John Aziz for his amazing work adding speech input and output to the azure-search-openai-demo, as that was the basis for the code I shared in this blog post.1.3KViews2likes0CommentsBenchmarking Local AI Models
Introduction Selecting the right AI model for your application requires more than reading benchmark leaderboards. Published benchmarks measure academic capabilities, question answering, reasoning, coding, but your application has specific requirements: latency budgets, hardware constraints, quality thresholds. How do you know if Phi-4 provides acceptable quality for your document summarization use case? Will Qwen2.5-0.5B meet your 100ms response time requirement? Does your edge device have sufficient memory for Phi-3.5 Mini? The answer lies in empirical testing: running actual models on your hardware with your workload patterns. This article demonstrates building a comprehensive model benchmarking platform using FLPerformance, Node.js, React, and Microsoft Foundry Local. You'll learn how to implement scientific performance measurement, design meaningful benchmark suites, visualize multi-dimensional comparisons, and make data-driven model selection decisions. Whether you're evaluating models for production deployment, optimizing inference costs, or validating hardware specifications, this platform provides the tools for rigorous performance analysis. Why Model Benchmarking Requires Purpose-Built Tools You cannot assess model performance by running a few manual tests and noting the results. Scientific benchmarking demands controlled conditions, statistically significant sample sizes, multi-dimensional metrics, and reproducible methodology. Understand why purpose-built tooling is essential. Performance is multi-dimensional. A model might excel at throughput (tokens per second) but suffer at latency (time to first token). Another might generate high-quality outputs slowly. Your application might prioritize consistency over average performance, a model with variable response times (high p95/p99 latency) creates poor user experiences even if averages look good. Measuring all dimensions simultaneously enables informed tradeoffs. Hardware matters enormously. Benchmark results from NVIDIA A100 GPUs don't predict performance on consumer laptops. NPU acceleration changes the picture again. Memory constraints affect which models can even load. Test on your actual deployment hardware or comparable specifications to get actionable results. Concurrency reveals bottlenecks. A model handling one request excellently might struggle with ten concurrent requests. Real applications experience variable load, measuring only single-threaded performance misses critical scalability constraints. Controlled concurrency testing reveals these limits. Statistical rigor prevents false conclusions. Running a prompt once and noting the response time tells you nothing about performance distribution. Was this result typical? An outlier? You need dozens or hundreds of trials to establish p50/p95/p99 percentiles, understand variance, and detect stability issues. Comparison requires controlled experiments. Different prompts, different times of day, different system loads, all introduce confounding variables. Scientific comparison runs identical workloads across models sequentially, controlling for external factors. Architecture: Three-Layer Performance Testing Platform FLPerformance implements a clean separation between orchestration, measurement, and presentation: The frontend React application provides model management, benchmark configuration, test execution, and results visualization. Users add models from the Foundry Local catalog, configure benchmark parameters (iterations, concurrency, timeout values), launch test runs, and view real-time progress. The results dashboard displays comparison tables, latency distribution charts, throughput graphs, and "best model for..." recommendations. The backend Node.js/Express server orchestrates tests and captures metrics. It manages the single Foundry Local service instance, loads/unloads models as needed, executes benchmark suites with controlled concurrency, measures comprehensive metrics (TTFT, TPOT, total latency, throughput, error rates), and persists results to JSON storage. WebSocket connections provide real-time progress updates during long benchmark runs. Foundry Local SDK integration uses the official foundry-local-sdk npm package. The SDK manages service lifecycle, starting, stopping, health checkin, and handles model operations, downloading, loading into memory, unloading. It provides OpenAI-compatible inference APIs for consistent request formatting across models. The architecture supports simultaneous testing of multiple models by loading them one at a time, running identical benchmarks, and aggregating results for comparison: User Initiates Benchmark Run â Backend receives {models: [...], suite: "default", iterations: 10} â For each model: 1. Load model into Foundry Local 2. Execute benchmark suite - For each prompt in suite: * Run N iterations * Measure TTFT, TPOT, total time * Track errors and timeouts * Calculate tokens/second 3. Aggregate statistics (mean, p50, p95, p99) 4. Unload model â Store results with metadata â Return comparison data to frontend â Visualize performance metrics Implementing Scientific Measurement Infrastructure Accurate performance measurement requires instrumentation that captures multiple dimensions without introducing measurement overhead: // src/server/benchmark.js import { performance } from 'perf_hooks'; export class BenchmarkExecutor { constructor(foundryClient, options = {}) { this.client = foundryClient; this.options = { iterations: options.iterations || 10, concurrency: options.concurrency || 1, timeout_ms: options.timeout_ms || 30000, warmup_iterations: options.warmup_iterations || 2 }; } async runBenchmarkSuite(modelId, prompts) { const results = []; // Warmup phase (exclude from results) console.log(`Running ${this.options.warmup_iterations} warmup iterations...`); for (let i = 0; i < this.options.warmup_iterations; i++) { await this.executePrompt(modelId, prompts[0].text); } // Actual benchmark runs for (const prompt of prompts) { console.log(`Benchmarking prompt: ${prompt.id}`); const measurements = []; for (let i = 0; i < this.options.iterations; i++) { const measurement = await this.executeMeasuredPrompt( modelId, prompt.text ); measurements.push(measurement); // Small delay between iterations to stabilize await sleep(100); } results.push({ prompt_id: prompt.id, prompt_text: prompt.text, measurements, statistics: this.calculateStatistics(measurements) }); } return { model_id: modelId, timestamp: new Date().toISOString(), config: this.options, results }; } async executeMeasuredPrompt(modelId, promptText) { const measurement = { success: false, error: null, ttft_ms: null, // Time to first token tpot_ms: null, // Time per output token total_ms: null, tokens_generated: 0, tokens_per_second: 0 }; try { const startTime = performance.now(); let firstTokenTime = null; let tokenCount = 0; // Streaming completion to measure TTFT const stream = await this.client.chat.completions.create({ model: modelId, messages: [{ role: 'user', content: promptText }], max_tokens: 200, temperature: 0.7, stream: true }); for await (const chunk of stream) { if (chunk.choices[0]?.delta?.content) { if (firstTokenTime === null) { firstTokenTime = performance.now(); measurement.ttft_ms = firstTokenTime - startTime; } tokenCount++; } } const endTime = performance.now(); measurement.total_ms = endTime - startTime; measurement.tokens_generated = tokenCount; if (tokenCount > 1 && firstTokenTime) { // TPOT = time after first token / (tokens - 1) const timeAfterFirstToken = endTime - firstTokenTime; measurement.tpot_ms = timeAfterFirstToken / (tokenCount - 1); measurement.tokens_per_second = 1000 / measurement.tpot_ms; } measurement.success = true; } catch (error) { measurement.error = error.message; measurement.success = false; } return measurement; } calculateStatistics(measurements) { const successful = measurements.filter(m => m.success); const total = measurements.length; if (successful.length === 0) { return { success_rate: 0, error_rate: 1.0, sample_size: total }; } const ttfts = successful.map(m => m.ttft_ms).sort((a, b) => a - b); const tpots = successful.map(m => m.tpot_ms).filter(v => v !== null).sort((a, b) => a - b); const totals = successful.map(m => m.total_ms).sort((a, b) => a - b); const throughputs = successful.map(m => m.tokens_per_second).filter(v => v > 0); return { success_rate: successful.length / total, error_rate: (total - successful.length) / total, sample_size: total, ttft: { mean: mean(ttfts), median: percentile(ttfts, 50), p95: percentile(ttfts, 95), p99: percentile(ttfts, 99), min: Math.min(...ttfts), max: Math.max(...ttfts) }, tpot: tpots.length > 0 ? { mean: mean(tpots), median: percentile(tpots, 50), p95: percentile(tpots, 95) } : null, total_latency: { mean: mean(totals), median: percentile(totals, 50), p95: percentile(totals, 95), p99: percentile(totals, 99) }, throughput: { mean_tps: mean(throughputs), median_tps: percentile(throughputs, 50) } }; } } function mean(arr) { return arr.reduce((sum, val) => sum + val, 0) / arr.length; } function percentile(sortedArr, p) { const index = Math.ceil((sortedArr.length * p) / 100) - 1; return sortedArr[Math.max(0, index)]; } function sleep(ms) { return new Promise(resolve => setTimeout(resolve, ms)); } This measurement infrastructure captures: Time to First Token (TTFT): Critical for perceived responsivenessâusers notice delays before output begins Time Per Output Token (TPOT): Determines generation speed after first tokenâaffects throughput Total latency: End-to-end timeâmatters for batch processing and high-volume scenarios Tokens per second: Overall throughput metricâuseful for capacity planning Statistical distributions: Mean alone masks variabilityâp95/p99 reveal tail latencies that impact user experience Success/error rates: Stability metricsâsome models timeout or crash under load Designing Meaningful Benchmark Suites Benchmark quality depends on prompt selection. Generic prompts don't reflect real application behavior. Design suites that mirror actual use cases: // benchmarks/suites/default.json { "name": "default", "description": "General-purpose benchmark covering diverse scenarios", "prompts": [ { "id": "short-factual", "text": "What is the capital of France?", "category": "factual", "expected_tokens": 5 }, { "id": "medium-explanation", "text": "Explain how photosynthesis works in 3-4 sentences.", "category": "explanation", "expected_tokens": 80 }, { "id": "long-reasoning", "text": "Analyze the economic factors that led to the 2008 financial crisis. Discuss at least 5 major causes with supporting details.", "category": "reasoning", "expected_tokens": 250 }, { "id": "code-generation", "text": "Write a Python function that finds the longest palindrome in a string. Include docstring and example usage.", "category": "coding", "expected_tokens": 150 }, { "id": "creative-writing", "text": "Write a short story (3 paragraphs) about a robot learning to paint.", "category": "creative", "expected_tokens": 200 } ] } This suite covers multiple dimensions: Length variation: Short (5 tokens), medium (80), long (250)âtests models across output ranges Task diversity: Factual recall, explanation, reasoning, code, creativeâreveals capability breadth Token predictability: Expected token counts enable throughput calculations For production applications, create custom suites matching your actual workload: { "name": "customer-support", "description": "Simulates actual customer support queries", "prompts": [ { "id": "product-question", "text": "How do I reset my password for the customer portal?" }, { "id": "troubleshooting", "text": "I'm getting error code 503 when trying to upload files. What should I do?" }, { "id": "policy-inquiry", "text": "What is your refund policy for annual subscriptions?" } ] } Visualizing Multi-Dimensional Performance Comparisons Raw numbers don't reveal insightsâvisualization makes patterns obvious. The frontend implements several comparison views: Comparison Table shows side-by-side metrics: // frontend/src/components/ResultsTable.jsx export function ResultsTable({ results }) { return ( {results.map(result => ( ))} Model TTFT (ms) TPOT (ms) Throughput (tok/s) P95 Latency Error Rate {result.model_id} {result.stats.ttft.median.toFixed(0)} (p95: {result.stats.ttft.p95.toFixed(0)}) {result.stats.tpot?.median.toFixed(1) || 'N/A'} {result.stats.throughput.median_tps.toFixed(1)} {result.stats.total_latency.p95.toFixed(0)} ms 0.05 ? 'error' : 'success'}> {(result.stats.error_rate * 100).toFixed(1)}% ); } Latency Distribution Chart reveals performance consistency: // Using Chart.js for visualization export function LatencyChart({ results }) { const data = { labels: results.map(r => r.model_id), datasets: [ { label: 'Median (p50)', data: results.map(r => r.stats.total_latency.median), backgroundColor: 'rgba(75, 192, 192, 0.5)' }, { label: 'p95', data: results.map(r => r.stats.total_latency.p95), backgroundColor: 'rgba(255, 206, 86, 0.5)' }, { label: 'p99', data: results.map(r => r.stats.total_latency.p99), backgroundColor: 'rgba(255, 99, 132, 0.5)' } ] }; return ( ); } Recommendations Engine synthesizes multi-dimensional comparison: export function generateRecommendations(results) { const recommendations = []; // Find fastest TTFT (best perceived responsiveness) const fastestTTFT = results.reduce((best, r) => r.stats.ttft.median < best.stats.ttft.median ? r : best ); recommendations.push({ category: 'Fastest Response', model: fastestTTFT.model_id, reason: `Lowest median TTFT: ${fastestTTFT.stats.ttft.median.toFixed(0)}ms` }); // Find highest throughput const highestThroughput = results.reduce((best, r) => r.stats.throughput.median_tps > best.stats.throughput.median_tps ? r : best ); recommendations.push({ category: 'Best Throughput', model: highestThroughput.model_id, reason: `Highest tok/s: ${highestThroughput.stats.throughput.median_tps.toFixed(1)}` }); // Find most consistent (lowest p95-p50 spread) const mostConsistent = results.reduce((best, r) => { const spread = r.stats.total_latency.p95 - r.stats.total_latency.median; const bestSpread = best.stats.total_latency.p95 - best.stats.total_latency.median; return spread < bestSpread ? r : best; }); recommendations.push({ category: 'Most Consistent', model: mostConsistent.model_id, reason: 'Lowest latency variance (p95-p50 spread)' }); return recommendations; } Key Takeaways and Benchmarking Best Practices Effective model benchmarking requires scientific methodology, comprehensive metrics, and application-specific testing. FLPerformance demonstrates that rigorous performance measurement is accessible to any development team. Critical principles for model evaluation: Test on target hardware: Results from cloud GPUs don't predict laptop performance Measure multiple dimensions: TTFT, TPOT, throughput, consistency all matter Use statistical rigor: Single runs misleadâcapture distributions with adequate sample sizes Design realistic workloads: Generic benchmarks don't predict your application's behavior Include warmup iterations: Model loading and JIT compilation affect early measurements Control concurrency: Real applications handle multiple requestsâtest at realistic loads Document methodology: Reproducible results require documented procedures and configurations The complete benchmarking platform with model management, measurement infrastructure, visualization dashboards, and comprehensive documentation is available at github.com/leestott/FLPerformance. Clone the repository and run the startup script to begin evaluating models on your hardware. Resources and Further Reading FLPerformance Repository - Complete benchmarking platform Quick Start Guide - Setup and first benchmark run Microsoft Foundry Local Documentation - SDK reference and model catalog Architecture Guide - System design and SDK integration Benchmarking Best Practices - Methodology and troubleshootingHow to Integrate Playwright MCP for AI-Driven Test Automation
Test automation has come a long way, from scripted flows to self-healing and now AI-driven testing. With the introduction of Model Context Protocol (MCP), Playwright can now interact with AI models and external tools to make smarter testing decisions. This guide walks you through integrating MCP with Playwright in VSCode, starting from the basics, enabling you to build smarter, adaptive tests today. What Is Playwright MCP? Playwright: An open-source framework for web testing and automation. It supports multiple browsers (Chromium, Firefox, and WebKit) and offers robust features like auto-wait, capturing screenshots, along with some great tooling like Codegen, Trace Viewer. MCP (Model Context Protocol): A protocol that enables external tools to communicate with AI models or services in a structured, secure way. By combining Playwright with MCP, you unlock: AI-assisted test generation. Dynamic test data. Smarter debugging and adaptive workflows. Why Integrate MCP with Playwright? AI-powered test generation: Reduce manual scripting. Dynamic context awareness: Tests adapt to real-time data. Improved debugging: AI can suggest fixes for failing tests. Smarter locator selection: AI helps pick stable, reliable selectors to reduce flaky tests. Natural language instructions: Write or trigger tests using plain English prompts. Getting Started in VS Code Prerequisites Node.js Download: nodejs.org Minimum version: v18.0.0 or higher (recommended: latest LTS) Check version: node --version Playwright Install Playwright: npm install @playwright/test Step 1: Create Project Folder mkdir playwrightMCP-demo cd playwrightMCP-demo Step 2: Initialize Project npm init playwright@latest Step 3: Install MCP Server for VS Code Navigate to GitHub - microsoft/playwright-mcp: Playwright MCP server and click install server for VS Code Search for 'MCP: Open user configuration' (type â>mcpâ in the search box) You will see a file mcp.json is created in your user -> app data folder, which is having the server details. { "servers": { "playwright": { "command": "npx", "args": [ "@playwright/mcp@latest" ], "type": "stdio" } }, "inputs": [] } Alternatively, install an MCP server directly GitHub MCP server registry using the Extensions view in VS Code. From GitHub MCP server registry Verify installation: Open Copilot Chat â select Agent Mode â click Configure Tools â confirm microsoft/playwright-mcp appears in the list. Step 4: Create a Simple Test Using MCP Once your project and MCP setup are ready in VS Code, you can create a simple test that demonstrates MCPâs capabilities. MCP can help in multiple scenarios, below is the example for Test Generation using AI: Scenario: AI-Assisted Test Generation- Use natural language prompts to generate Playwright tests automatically. Test Scenario - Validate that a user can switch the Playwright documentation language dropdown to Python, search for âFrames,â and navigate to the Frames documentation page. Confirm that the page heading correctly displays âFrames.â Sample Prompt to Use in VS Code (Copilot Agent Mode):Create a Playwright automated test in JavaScript that verifies navigation to the 'Frames' documentation page following below steps and be more specific about locators to avoid strict mode violation error Navigate to : Playwright documentation select âPythonâ from the dropdown options, labelled âNode.jsâ Type the keyword âFramesâ into the search box. Click the search result for the Frames documentation page Verify that the page header reads âFramesâ. Log success or provide a failure message with details. Copilot will generate the test automatically in your tests folder Step 5: Run Test npx playwright test Conclusion Integrating Playwright with MCP in VS Code helps you build smarter, adaptive tests without adding complexity. Start small, follow best practices, and scale as you grow. Note - Installation steps may vary depending on your environment. Refer to MCP Registry ¡ GitHub for the latest instructions.đ¨Introducing the JS AI Build-a-thon đ¨
Weâre entering a future where AI-first and agentic developer experiences will shape how we build â and you donât want to be left behind. This isnât your average hackathon. Itâs a hands-on, quest-driven learning experience designed for developers, packed with: Interactive quests that guide you step by step â from your first prototype to production-ready apps Community-powered support via our dedicated Discord and local, community-led study jams Showcase moments to share your journey, get inspired, and celebrate what you build Whether you're just starting your AI journey or looking to sharpen your skills with frameworks like LangChain.js, tools like the Azure AI Foundry and AI Toolkit Extensions, or diving deeper into agentic app design â this is your moment to start building.