developer
7913 TopicsOn‑Device AI with Windows AI Foundry
From “waiting” to “instant”- without sending data away AI is everywhere, but speed, privacy, and reliability are critical. Users expect instant answers without compromise. On-device AI makes that possible: fast, private and available, even when the network isn’t - empowering apps to deliver seamless experiences. Imagine an intelligent assistant that works in seconds, without sending a text to the cloud. This approach brings speed and data control to the places that need it most; while still letting you tap into cloud power when it makes sense. Windows AI Foundry: A Local Home for Models Windows AI Foundry is a developer toolkit that makes it simple to run AI models directly on Windows devices. It uses ONNX Runtime under the hood and can leverage CPU, GPU (via DirectML), or NPU acceleration, without requiring you to manage those details. The principle is straightforward: Keep the model and the data on the same device. Inference becomes faster, and data stays local by default unless you explicitly choose to use the cloud. Foundry Local Foundry Local is the engine that powers this experience. Think of it as local AI runtime - fast, private, and easy to integrate into an app. Why Adopt On‑Device AI? Faster, more responsive apps: Local inference often reduces perceived latency and improves user experience. Privacy‑first by design: Keep sensitive data on the device; avoid cloud round trips unless the user opts in. Offline capability: An app can provide AI features even without a network connection. Cost control: Reduce cloud compute and data costs for common, high‑volume tasks. This approach is especially useful in regulated industries, field‑work tools, and any app where users expect quick, on‑device responses. Hybrid Pattern for Real Apps On-device AI doesn’t replace the cloud, it complements it. Here’s how: Standalone On‑Device: Quick, private actions like document summarization, local search, and offline assistants. Cloud‑Enhanced (Optional): Large-context models, up-to-date knowledge, or heavy multimodal workloads. Design an app to keep data local by default and surface cloud options transparently with user consent and clear disclosures. Windows AI Foundry supports hybrid workflows: Use Foundry Local for real-time inference. Sync with Azure AI services for model updates, telemetry, and advanced analytics. Implement fallback strategies for resource-intensive scenarios. Application Workflow Code Example 1. Only On-Device: Tries Foundry Local first, falls back to ONNX if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}") return "Error: No local AI available" 2. Hybrid approach: On-device first, cloud as last resort def get_answer(question, context): """ Priority order: 1. Foundry Local (best: advanced + private) 2. ONNX Runtime (good: fast + private) 3. Cloud API (fallback: requires internet, less private) # in case of Hybrid approach, based on real-time scenario """ if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}, trying cloud...") # Last resort: Cloud API (requires internet) if network_available(): try: import requests response = requests.post( '{BASE_URL_AI_CHAT_COMPLETION}', headers={'Authorization': f'Bearer {API_KEY}'}, json={ 'model': '{MODEL_NAME}', 'messages': [{ 'role': 'user', 'content': f'Context: {context}\n\nQuestion: {question}' }] }, timeout=10 ) answer = response.json()['choices'][0]['message']['content'] return answer, source="Cloud API (Online)" except Exception as e: return "Error: No AI runtime available", source="Failed" else: return "Error: No internet and no local AI available", source="Offline" Demo Project Output: Foundry Local answering context-based questions offline : The Foundry Local engine ran the Phi-4-mini model offline and retrieved context-based data. : The Foundry Local engine ran the Phi-4-mini model offline and mentioned that there is no answer. Practical Use Cases Privacy-First Reading Assistant: Summarize documents locally without sending text to the cloud. Healthcare Apps: Analyze medical data on-device for compliance. Financial Tools: Risk scoring without exposing sensitive financial data. IoT & Edge Devices: Real-time anomaly detection without network dependency. Conclusion On-device AI isn’t just a trend - it’s a shift toward smarter, faster, and more secure applications. With Windows AI Foundry and Foundry Local, developers can deliver experiences that respect user specific data, reduce latency, and work even when connectivity fails. By combining local inference with optional cloud enhancements, you get the best of both worlds: instant performance and scalable intelligence. Whether you’re creating document summarizers, offline assistants, or compliance-ready solutions, this approach ensures your apps stay responsive, reliable, and user-centric. References Get started with Foundry Local - Foundry Local | Microsoft Learn What is Windows AI Foundry? | Microsoft Learn https://devblogs.microsoft.com/foundry/unlock-instant-on-device-ai-with-foundry-local/The Diagonal Suite: Gentle thunking goes a long way!
I've become a big advocate for gentle thunking - using thunks to delay eager evaluation wherever possible in generalized Lambda development. The timings are quicker and the logic is cleaner. On the other hand, thunking the results of MAP, BYROW, or BYCOL - especially when it leads to rows of thunks - tends to introduce recombination overhead and complexity. I think thunking is often dismissed as “too complex,” and that’s understandable if someone’s first exposure involves unwrapping a row of thunks. When used gently thunking becomes indispensable. Typically, I introduce the thunks after the initial benchmarking to see the difference in the calculation times and the after is always quicker. To illustrate, I’ll share The Diagonal Suite - a collection of functions where thunking is used at every opportunity. Simple, clean, deferred logic. What are your thoughts on gentle thunking? Where have you found it helpful/harmful in your own Lambda development? //The Diagonal Suite - Version 1.0 - 10/27/2025 //Author: Patrick H. //Description: Directional traversal and diagonal logic for 2D arrays. // Functions: // • Traverseλ - Directional traversal engine // • ByDiagλ - Diagonal-based aggregation // • DiagMapλ - Wrapper for diagonal matrix extraction // • DiagIndexλ - Targeted diagonal extraction // • Staircaseλ - Construct diagonal staircases from a vector or 2D array //──────────────────────────────────────────────────────────── //------------------------------------------------------------------------------------------- //Traverseλ - Directional Axis Remapper //------------------------------------------------------------------------------------------- //The selected axis is remapped to the top-left traversal order. //Accepted directions: // "NE" or 1 → Northeast (↗) // "SE" or 2 → Southeast (↘) // "SW" or 3 → Southwest (↙) //Parameters: //array → 2D input array (scalars not accepted) //new_axis → Axis direction ("NE", "SE", "SW" or 1–3) Traverseλ = LAMBDA( array, new_axis, //Input validation IF(OR(ROWS(array)=1,COLUMNS(array)=1), "#2D-ARRAY!", IF(AND(ISNUMBER(new_axis),OR(new_axis<=0,new_axis>3)),"#AXIS!", LET( //Dimensions i, ROWS(array), j, COLUMNS(array), //Axis traversal indices (deferred) x_NE, LAMBDA(SEQUENCE(j,,1,0)*SEQUENCE(,i)), y_NE, LAMBDA(SEQUENCE(j,,j,-1)*SEQUENCE(,i,1,0)), x_SE, LAMBDA(SEQUENCE(i,,i,-1)*SEQUENCE(,j,1,0)), y_SE, LAMBDA(SEQUENCE(i,,j,0)+SEQUENCE(,j,0,-1)), x_SW, LAMBDA(SEQUENCE(j,,i,0)+SEQUENCE(,i,0,-1)), y_SW, LAMBDA(SEQUENCE(j,,1)*SEQUENCE(,i,1,0)), //Axis mode selection mode, IF(ISNUMBER(new_axis),new_axis, SWITCH(new_axis,"NE",1,"SE",2,"SW",3,1)), //Index selection x, CHOOSE(mode,x_NE,x_SE,x_SW), y, CHOOSE(mode,y_NE,y_SE,y_SW), //Unwrap indices and get results result, INDEX(array,x(),y()), result ) ))); //------------------------------------------------------------------------------------------- //ByDiagλ - Diagonal-based aggregation //------------------------------------------------------------------------------------------- //Apply an ETA function or Lambda to diagonals //Parameters: //array → 2D input array (scalars not accepted) //[function] → ETA function or Lambda applied to diagonals //[row_wise_stack?] → Optional: Display results as a vertical stack ByDiagλ = LAMBDA( array, [function], [row_wise_stack?], //Check array input ValidateDiagλ(array,,function,row_wise_stack?, LET( //Optional parameters No_Function, ISOMITTED(function), No_row_wise_stack,ISOMITTED(row_wise_stack?), //Dimensions i, ROWS(array), j, COLUMNS(array), //Diagonal count k, MIN(i,j), //Indices - deferred r, LAMBDA(SEQUENCE(k)*SEQUENCE(,j,1,0)), y, LAMBDA(SEQUENCE(k)+SEQUENCE(,j,0,1)), c, LAMBDA(IF(y()>j,NA(),y())), //Unwrap indices, shape, and aggregate result, IFNA(INDEX(array,r(),c()),""), shaped, IF(No_row_wise_stack,result,TRANSPOSE(result)), final, IF(No_Function,shaped, IF(No_row_wise_stack,BYCOL(shaped,function), BYROW(shaped,function))), final ))); //------------------------------------------------------------------------------------------- //DiagMapλ - Wrapper (Calls ByDiagλ) to extract diagonals as 2D matrix //------------------------------------------------------------------------------------------- //Calls ByDiagλ to extract the diagonals from a 2D array. //Parameters: *Please see ByDiagλ for descriptions.** DiagMapλ = LAMBDA( array, [row_wise_stack?], ByDiagλ(array,,row_wise_stack?) ); //------------------------------------------------------------------------------------------- //DiagIndexλ - Targeted diagonal extraction //------------------------------------------------------------------------------------------- //Extract a diagonal or anti-diagonal vector from a 2D array. //Parameters: //array → 2D input array (scalars not accepted) //col_index → Column number to start from. Negative = anti-diagonal DiagIndexλ = LAMBDA( array, col_index, //Input checks ValidateDiagλ(array,col_index,,, LET( //Dimensions i, ROWS(array), j, COLUMNS(array), //Diagonal direction: +1 = SE, –1 = SW s, SIGN(col_index), //Determine diagonal length based on bounds k, IF(s>0, MIN(i, j + 1 - col_index), MIN(i, ABS(col_index))), start, IF(s<0,ABS(col_index),col_index), //Indices - deferred x, LAMBDA(SEQUENCE(k)), y, LAMBDA(SEQUENCE(k,,start,s)), //Unwrap indices and extract vector deliver, INDEX(array,x(),y()), deliver ))); //------------------------------------------------------------------------------------------- //Staircaseλ — Construct diagonal staircases from a vector or 2D array //------------------------------------------------------------------------------------------- //Parameters: //array → Input array (flattened to vector row-wise) //block_size → Number of rows/columns per staircase block //[block_offset] → Optional padding between staircases //[IsHorizontal?] → Optional toggle for column-wise orientation //[IsAntiDiag?] → Optional toggle to display staircase anti-diagonal. Staircaseλ = LAMBDA( array, block_size, [block_offset], [IsHorizontal?], [IsAntiDiag?], //Check inputs ValidateStaircaseλ(array,block_size,block_offset, LET( //Check optional parameters no_Block_Offset, ISOMITTED(block_offset), zero_Offset, block_offset=0, col_offset, IF(No_Block_Offset,0,block_offset), IsVertical?, ISOMITTED(IsHorizontal?), Not_Anti_Diag, ISOMITTED(IsAntiDiag?), //Convert to vector and get dimensions flat, TOCOL(array), k, COUNTA(flat), seq, LAMBDA(SEQUENCE(k)), V, TOROW(EXPAND(WRAPROWS(seq(),block_size),, block_size+block_offset,0)), width, COLUMNS(V), //Anchors and indices - deferred i, LAMBDA(SEQUENCE(block_size)*SEQUENCE(,width,1,0)), col_arr, LAMBDA(IF(Not_Anti_Diag,SEQUENCE(,width), SEQUENCE(,width,width,-1))), j, LAMBDA(MOD(col_arr(),block_size+block_offset)), j_, LAMBDA(IF((no_Block_Offset)+(zero_Offset), IF(j()=0,block_size,j()),j())), idx, LAMBDA(IF(i()=j_(),V,NA())), //Obtain results, shape, and calculate result, DROP(IFNA(INDEX(flat,idx()),""),,-col_offset), final, IF(IsVertical?,TRANSPOSE(result),result), final ))); //---------------------Error Handling & Validation--------------------------- //Validates inputs for Staircaseλ. Please see Staircaseλ for parameter //descriptions. ValidateStaircaseλ = LAMBDA( array, block_size, [block_offset], [on_valid], LET( //Checks NotArray,TYPE(array)<>64, Invalid_block_size, OR(ISTEXT(block_size),block_size<=0,block_size>COUNTA(array)), Invalid_block_offset, OR(ISTEXT(block_offset),block_offset<0), //Logic gate IF(NotArray, "#NOT-ARRAY!", IF(Invalid_block_size, "#BLOCK-SIZE!", IF(Invalid_block_offset,"#BLOCK-OFFSET", on_valid)))) ); //---------------------Error Handling & Validation--------------------------- //Validate inputs for ByDiagλ, DiagMapλ, and DiagIndexλ. //*Please see those functions for parameter descriptions.* ValidateDiagλ= LAMBDA( array, [col_index], [function], [row_wise_stack?], [on_valid], LET( //---Checks--- //Array input IsArray?, TYPE(array)=64, Not_Array, NOT(IsArray?), //Col_index No_Col_Index, ISOMITTED(col_index), Col_Index_Included, NOT(No_Col_Index), Not_Valid_Col_Index?, NOT(AND(col_index<>0, ABS(col_index)<=COLUMNS(array))), //Function No_Function, ISOMITTED(function), Function_Included, NOT(No_Function), Invalid_Function?, AND(ISERROR(BYROW({1,1},function))), //Shaping input RowWiseStack?, NOT(ISOMITTED(row_wise_stack?)), //Deterine which function is being validated DiagIndex, Col_Index_Included, ByDiag, AND(No_Col_Index, Function_Included), DiagMap, AND(No_Col_Index, No_Function), //Logic gates //DiagIndexλ a, IF(Not_Array, "#NOT-ARRAY!", IF(Not_Valid_Col_Index?,"#COLUMN-INDEX!", on_valid)), //ByDiagλ b, IF(Not_Array, "#NOT-ARRAY!", IF(Invalid_Function?, "#FUNCTION!", on_valid)), //DiagMapλ c, IF(Not_Array, "#NOT-ARRAY!", on_valid), //Logic gate selection decide, IF(DiagIndex,a, IF(DiagMap,c, IF(ByDiag,b, "#UNROUTED!"))), decide )); //End of The Diagonal Suite - Version 1.0 //Author: Patrick H.159Views1like7CommentsUnderstanding Small Language Modes
Small Language Models (SLMs) bring AI from the cloud to your device. Unlike Large Language Models that require massive compute and energy, SLMs run locally, offering speed, privacy, and efficiency. They’re ideal for edge applications like mobile, robotics, and IoT.New To The Community
Hello ISV Success Community, As a new member to the community, I decided to start with an introduction. I’m Aubin Bakana, Founder and CTIO (Chief Technology & Innovation Officer) of Baobab Logix LTD, now based in Leeds, UK. I lead technology and innovation, driving solutions that are secure, ethical, and intelligent. At Baobab Logix, we’re building future-ready platforms that empower businesses to thrive. One of our flagship innovations is KeepComm Intel, a next-generation contact center solution that combines AI-driven automation with human-in-the-loop architecture for speed, accuracy, and trust. As a Microsoft Azure Solutions Architect Expert, Harvard CS50 certified, and full-stack developer, I bring deep technical expertise and a vision for sustainable innovation. I’m here to connect, collaborate, and share insights with fellow innovators who believe in shaping technology for good. Looking forward to engaging with this amazing community and exploring opportunities to build impactful solutions together! thank you.9Views1like0CommentsPNP Search Results sort by title
Hi. I have a PNP Search results webpart, that brings me all the Document Libraries in SharePoint site (that are not the standard ones Like site assets, pages etc) I want the results to be sorted, but i tried with sort column and it does not work, I want it to show in alphabetical order If i use "edit sort order" and select by title, it gives me an error I added a RefinableString where i have the title but it is still not sorting it How can i get this list sorted without having to add another column for the Title (with out the link) THank you16Views0likes0CommentsCSOM: My “one query to rule them all” plan backfired — got “Request uses too many resources” 😅
Hey folks, So I’ve been playing around with CSOM (Client-Side Object Model), feeling fancy about making one super-efficient query that loads everything I need at once. Something like this: clientContext.Load( clientContext.Web, w => w.SiteUsers.Include(...), w => w.Lists.Include(...), w => w.SiteGroups.Include(...) ); Basically, my thought was: “Why make multiple calls when I can just get everything in one go?” But CSOM had other plans. Instead of being impressed, it hit me with the dreaded: "Request uses too many resources." Even when I tried to be nice and limit the properties, it still said “Nope.” 🙃 So now I’m wondering: Is it actually more efficient (and safer) to create a new ClientContext for each object I want to query (like Web, SiteUsers, Lists, etc.)? Or am I just thinking about this the wrong way and missing some batching trick? Would love to hear how others handle this — or if there’s a secret sauce to making CSOM not freak out when you ask for too much.8Views0likes0CommentsCSOM batch request: “Request uses too many resources” when loading multiple sites with GetSiteByUrl
I’m currently exploring CSOM (Client-Side Object Model) and analyzing how property values for client objects such as Web and Site are requested and returned in XML format. I used Fiddler to inspect the request and response bodies. I’m now trying to implement a batch-based query where I load multiple sites with a limited set of properties using the GetSiteByUrl method. However, I’m running into this issue: When I load 10 sites, I get the error: “Request uses too many resources.” When I reduce the number to 8 sites, it works fine. I also compared the bytes sent and bytes received for different batch sizes (screenshot attached). So my question is: Is there a specific limitation on the total request size (bytes sent/received) or number of operations in a single CSOM batch request? If yes, is there any official documentation or guidance on how to determine these limits? Thanks in advance!16Views0likes0CommentsPlease tell me how to disable the Pin Copilot message
Morning! I wrote a message yesterday but nobody replied, so here's another one so it doesn't get lost Can somebody tell me how to disable the annoying "Pin Copilot Chat" popup? every morning I have to say "Maybe Later" when I really mean to say NEVER IN A THOUSAND YEARS54Views0likes1CommentStep-by-Step: Setting Up GitHub Student and GitHub Copilot as an Authenticated Student Developer
To become an authenticated GitHub Student Developer, follow these steps: create a GitHub account, verify student status through a school email or contact GitHub support, sign up for the student developer pack, connect to Copilot and activate the GitHub Student Developer Pack benefits. The GitHub Student Developer Pack offers 100s of free software offers and other benefits such as Azure credit, Codespaces, a student gallery, campus experts program, and a learning lab. Copilot provides autocomplete-style suggestions from AI as you code. Visual Studio Marketplace also offers GitHub Copilot Labs, a companion extension with experimental features, and GitHub Copilot for autocomplete-style suggestions. Setting up your GitHub Student and GitHub Copilot as an authenticated Github Student Developer399KViews14likes16Comments