tips and tricks
844 TopicsThe Difference Between Declarative Agents and Autonomous Agents in Copilot Studio
Artificial intelligence is rapidly transforming the way businesses automate tasks, improve customer experiences, and increase productivity. One of the most exciting platforms leading this transformation is Microsoft’s Copilot Studio, a low-code environment that enables organizations to create intelligent AI-powered assistants and workflows. https://dellenny.com/the-difference-between-declarative-agents-and-autonomous-agents-in-copilot-studio/20Views0likes0CommentsTopics and Nodes in Microsoft Copilot Studio: The Foundation of Intelligent Conversations
Artificial intelligence is changing the way businesses communicate with customers and employees. From automating support requests to handling internal workflows, conversational AI tools are becoming an essential part of modern digital transformation. One platform leading this evolution is Microsoft Copilot Studio. https://dellenny.com/topics-and-nodes-in-microsoft-copilot-studio-the-foundation-of-intelligent-conversations/14Views0likes0CommentsEnterprise AI Agent Architecture Patterns Using Copilot Studio
Artificial Intelligence is no longer limited to chatbots answering customer queries or virtual assistants scheduling meetings. Enterprises are now building intelligent AI agents capable of automating workflows, orchestrating business processes, integrating enterprise systems, and assisting employees in real-time. As organizations scale their AI adoption, architecture becomes the defining factor between isolated experiments and enterprise-wide transformation. https://dellenny.com/enterprise-ai-agent-architecture-patterns-using-copilot-studio/27Views0likes0CommentsHow are you connecting OKR tracking to Copilot workflows?
My manager asked me to figure out how we can use Copilot to help with our OKR process. Right now we track objectives in a SharePoint list and its a mess. People forget to update progress, key results are disconnected from daily work, and nobody looks at it until the end of quarter scramble. Has anyone found a way to bring OKR tracking closer to where people actually work in Teams/Copilot? Or is there a Teams app that handles this better than a spreadsheet?52Views2likes1CommentBuilding an On-Device Voice Assistant with Microsoft Foundry Local
Why on-device voice still matters Most "voice AI" tutorials assume your audio leaves the machine. You ship a WAV to Whisper-API, your transcript to GPT-4, and a synthesized response back over the wire. That works — but it also means three round trips, three per-token bills, and three places your user's voice gets logged. The new wave of small, hardware-optimised models changes the trade-off. NVIDIA's Nemotron Speech Streaming En 0.6B is a 600M-parameter streaming ASR model published into the Microsoft Foundry Local catalog. Paired with a small chat model like qwen2.5-0.5b or phi-4-mini , you can run the entire capture → transcribe → reason → respond loop in-process on a developer laptop, with no API keys and no network egress. This post walks through how the fl-nemotron sample does it, the SDK pitfalls we hit on the way, and the design decisions that made the pipeline reliable. What we're building A browser-hosted assistant served by FastAPI at http://127.0.0.1:8000 . The page captures microphone audio, posts it to /api/transcribe , then streams the chat reply back over Server-Sent Events from /api/chat . All inference runs locally through two Foundry Local models loaded into the same process. The shape of the pipeline: Microphone (browser MediaRecorder) │ WebM/Opus blob ▼ Client-side WAV encoder (16 kHz, mono, PCM-16) │ multipart/form-data ▼ FastAPI /api/transcribe │ ▼ Nemotron Speech Streaming En 0.6B (Foundry Local audio client) │ transcript text ▼ Chat LLM e.g. qwen2.5-0.5b (Foundry Local chat client) │ streamed tokens ▼ FastAPI /api/chat → SSE → browser bubble The version that bit us: foundry-local-sdk >= 1.1.0 Before any code, the single most important fact about this project: The Nemotron Speech Streaming model only appears in the Foundry Local 1.1.x catalog. Older SDKs (0.5.x / 0.6.x) cannot resolve the alias nemotron-speech-streaming-en-0.6b and fail with model not found . The module name also changed in 1.1.0 — it is now foundry_local_sdk (with the underscore- sdk suffix), not foundry_local . The pip wheel for foundry-local-core is bundled, so there is no separate MSI / winget install to worry about. Pin it explicitly: pip install --upgrade "foundry-local-sdk>=1.1.0,<2" And verify before anything else: python -c "import importlib.metadata as m; print('sdk', m.version('foundry-local-sdk'))" # expect: sdk 1.1.0 Loading both models from one manager The 1.1.x SDK exposes a single FoundryLocalManager that owns the runtime. Each loaded model gives you back a per-model OpenAI-compatible client — get_chat_client() for text models and get_audio_client() for ASR. There is no need to bring your own openai Python package; the SDK ships its own thin client. The wrapper used in the repo ( src/foundry_client.py ) does this: from foundry_local_sdk import Configuration, FoundryLocalManager FoundryLocalManager.initialize(Configuration(app_name="fl-nemotron")) manager = FoundryLocalManager.instance chat_model = manager.load_model("qwen2.5-0.5b") stt_model = manager.load_model("nemotron-speech-streaming-en-0.6b") chat_client = chat_model.get_chat_client() audio_client = stt_model.get_audio_client() Both models are downloaded on first use into the Foundry Local cache and stay resident for the lifetime of the process. On a laptop with 16 GB RAM, the combined working set sits comfortably under 4 GB. The transcription surprise The first naive approach was the obvious one: with open(wav_path, "rb") as f: result = audio_client.transcribe(file=f, model="nemotron-speech-streaming-en-0.6b") That call fails on Nemotron. The bundled ONNX Runtime GenAI in foundry-local-core does not register the nemotron_speech multi-modal model type that the standard AudioClient.transcribe() path tries to instantiate. The error surfaces as a cryptic model-type registration failure deep inside the native runtime. The fix is to use the streaming session API instead — a different native entry point ( core_interop.start_audio_stream ) that the streaming model does support. The repo isolates this in src/_nemotron_live.py : def transcribe_wav_live(audio_client, wav_path, *, language="en"): with wave.open(str(wav_path), "rb") as w: sample_rate = w.getframerate() channels = w.getnchannels() sample_width = w.getsampwidth() pcm = w.readframes(w.getnframes()) session = audio_client.create_live_transcription_session() session.settings.sample_rate = sample_rate session.settings.channels = channels session.settings.bits_per_sample = sample_width * 8 session.settings.language = language session.start() # Feed PCM in ~100 ms chunks from a worker thread, then stop. bytes_per_sec = sample_rate * channels * sample_width chunk_bytes = max(bytes_per_sec // 10, 1024) def _pusher(): try: for offset in range(0, len(pcm), chunk_bytes): session.append(pcm[offset:offset + chunk_bytes]) finally: session.stop() threading.Thread(target=_pusher, daemon=True).start() parts = [] for resp in session.get_stream(): for cp in getattr(resp, "content", []) or []: text = getattr(cp, "text", "") or getattr(cp, "transcript", "") or "" if text: parts.append(text) return " ".join(p.strip() for p in parts if p.strip()).strip() Two things to notice: Push from a thread, read from the main coroutine. session.append() is a blocking write into the native stream and session.get_stream() is a blocking generator. Run one in a worker thread so the other can drain in parallel — otherwise you deadlock the session. Chunk to ~100 ms. Smaller chunks (e.g. 10 ms) spend more time crossing the FFI boundary than transcribing; larger chunks (e.g. 1 s) hold back partial results and hurt perceived latency. Always session.stop() . Without it the generator never terminates and the request hangs. The other transcription surprise: browsers don't send WAV Inside the browser, MediaRecorder defaults to audio/webm; codecs=opus . That's great for size but bad for our STT model, which expects a 16-bit mono PCM WAV at a known sample rate. Decoding WebM/Opus server-side would require ffmpeg as a runtime dependency — which is exactly the kind of friction this project exists to remove. The cleaner solution is to encode WAV on the client. AudioContext.decodeAudioData already understands WebM/Opus, so the page can decode the recording, resample to 16 kHz, mix to mono, and emit a PCM-16 WAV blob in 30 lines of JavaScript: // Inside src/static/index.html async function webmToWav(blob) { const ctx = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 }); const buf = await ctx.decodeAudioData(await blob.arrayBuffer()); // Mix to mono const ch = buf.numberOfChannels; const mono = new Float32Array(buf.length); for (let c = 0; c < ch; c++) { const data = buf.getChannelData(c); for (let i = 0; i < data.length; i++) mono[i] += data[i] / ch; } return encodeWav(mono, 16000); } function encodeWav(samples, sampleRate) { const buffer = new ArrayBuffer(44 + samples.length * 2); const view = new DataView(buffer); // RIFF header writeStr(view, 0, "RIFF"); view.setUint32(4, 36 + samples.length * 2, true); writeStr(view, 8, "WAVE"); // fmt chunk writeStr(view, 12, "fmt "); view.setUint32(16, 16, true); // PCM chunk size view.setUint16(20, 1, true); // PCM format view.setUint16(22, 1, true); // mono view.setUint32(24, sampleRate, true); view.setUint32(28, sampleRate * 2, true); // byte rate view.setUint16(32, 2, true); // block align view.setUint16(34, 16, true); // bits per sample // data chunk writeStr(view, 36, "data"); view.setUint32(40, samples.length * 2, true); // PCM-16 samples let o = 44; for (let i = 0; i < samples.length; i++, o += 2) { const s = Math.max(-1, Math.min(1, samples[i])); view.setInt16(o, s < 0 ? s * 0x8000 : s * 0x7FFF, true); } return new Blob([view], { type: "audio/wav" }); } Now the server's /api/transcribe endpoint just writes the bytes to a temp file and hands them to transcribe_wav_live() — no audio decoding libraries on the Python side. Wiring it into FastAPI The server ( src/app.py ) is deliberately small. The notable detail is that the same process holds both Foundry Local model handles for its entire lifetime, so there is no warm-up cost per request: @app.post("/api/transcribe") async def transcribe(audio: UploadFile = File(...)): data = await audio.read() with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f: f.write(data); path = f.name text = _ai_client.transcribe(path) return {"text": text} @app.post("/api/chat") async def chat(req: ChatRequest): if req.stream: return StreamingResponse( _sse(_ai_client.stream_completion(req.messages)), media_type="text/event-stream", ) return {"text": _ai_client.chat_completion(req.messages)} Streaming uses Server-Sent Events because they are trivially supported in both fetch() and the FastAPI runtime, and they don't require a WebSocket upgrade through any proxy a developer might have in front of localhost . What it looks like The repo includes screenshots of the running UI: a welcome screen with both models loaded, a streamed haiku reply, an inline code block with copy-to-clipboard, and the recording state for the microphone. Performance, honestly This is a small-model, CPU-friendly stack. On an Arm64 Surface running the x64 SDK under emulation: First model load (cold cache): tens of seconds — downloads ~600 MB for Nemotron and ~400 MB for qwen2.5-0.5b . Subsequent loads (warm cache): a few seconds per model. End-to-end transcription of a 5-second utterance: well under a second after warm-up. First chat token from qwen2.5-0.5b : typically 200–500 ms; full short reply within 1–2 s. On x64 silicon with a recent CPU the numbers improve substantially, and the SDK will pick the best execution provider it finds (CPU / DirectML / CUDA) for each model. Trade-offs to know about Model quality. qwen2.5-0.5b is a 500M-parameter model. It is fast and small enough to ship on a laptop, but it is not GPT-4. Swap in phi-4-mini or mistral-nemo-12b-instruct if you have the RAM and want better reasoning — the wrapper accepts any chat alias in the Foundry Local catalog. STT is English-only here. The current Nemotron streaming model in the catalog is ...-en-0.6b . Multilingual variants are likely to follow. Browser microphone needs a real browser. Headless / automated browsers (Playwright, Puppeteer) deny getUserMedia by default. Open the page in Edge / Chrome / Firefox to grant the permission and capture audio for real. No agent framework yet. This sample is deliberately a single-turn loop over a chat client — there is no tool calling, planning, or multi-agent orchestration. Adding the Microsoft Agent Framework on top would be a natural next step for richer behaviour. Responsible AI considerations Running locally removes the cloud-egress class of privacy concerns, but it does not remove responsibility: Disclose recording. The browser prompts for mic permission; your UI should make it obvious when capture is active. The sample shows a red ⏹ button and a "Recording…" banner for that reason. Don't log raw audio. The sample writes audio to a per-request NamedTemporaryFile and deletes it after transcription. Treat the WAV as sensitive data even when it never leaves the device. Small models hallucinate. A 0.5B chat model is great for snappy local replies, but unsuitable for high-stakes answers. Pair it with retrieval, ground it on your own data, or escalate to a larger model when accuracy matters. Try it Clone github.com/leestott/fl-nemotron. ./setup.ps1 (or ./setup.sh ) to create a virtualenv and install the pinned SDK. python scripts/prefetch.py nemotron-speech-streaming-en-0.6b qwen2.5-0.5b to download both models. .venv\Scripts\uvicorn.exe app:app --app-dir src --port 8000 Open http://127.0.0.1:8000 in a real browser and click the 🎤 button. Where to go next Foundry Local documentation — official docs for the runtime, catalog, and SDK. microsoft/Foundry-Local — upstream samples and issue tracker. NVIDIA Nemotron model family — background on the speech and language models being published into the catalog. leestott/fl-nemotron — the full source for this post. Key takeaways Pin foundry-local-sdk >= 1.1.0 . Earlier SDKs cannot see the Nemotron Speech Streaming model. Use the LiveAudioTranscriptionSession API for Nemotron, not AudioClient.transcribe() . Encode WAV in the browser. It eliminates a heavy server-side ffmpeg dependency for a few lines of JS. Push audio chunks on a worker thread and drain the response generator on the main one to avoid deadlocks. A small Foundry Local chat model plus Nemotron STT gives you a credible local voice loop in a single Python process — no cloud, no keys, no data egress.10 Real-World Copilot Studio Use Cases That Save Teams Hundreds of Hours
Artificial intelligence is no longer a future concept reserved for enterprise tech giants. Businesses of all sizes are now using AI copilots to automate repetitive work, improve customer experiences, and help employees become more productive. One platform that is quickly gaining attention in this space is Microsoft Copilot Studio. Organizations across industries are exploring practical copilot studio use cases to reduce manual work and streamline operations. From customer support automation to internal HR assistance, Copilot Studio allows companies to build intelligent AI agents without requiring advanced coding skills. https://dellenny.com/10-real-world-copilot-studio-use-cases-that-save-teams-hundreds-of-hours/144Views1like0CommentsCopilot Cowork should not be prompted like Copilot Chat...
I keep seeing the same pattern with clients: they talk to Copilot Cowork the same way they talk to Copilot Chat. They write short, vague, immediate requests. The result is predictable: they get an answer, not real work executed across Microsoft 365... With Cowork, the approach has to change. You should not ask for quick help. You should describe the result you want produced. Cowork is designed to handle multi-step work, create deliverables, act across Microsoft 365, and ask for approval before sensitive actions. The right habits are simple: start with the deliverable you expect: a document, an email, a meeting, a Teams message give the relevant context: project, timeframe, people, and sources to use; define the output clearly: format, recipient, destination; add constraints: tone, length, language, deadline. In practical terms, do not say: “prepare my meeting.” Say: “analyze the last 15 days of emails and Teams messages, create a one-page Word summary with the risks and decisions to make, then send it to the participants.” That is the real difference: with Copilot Chat, you get an answer. With Copilot Cowork, you need to ask for an outcome.39Views1like0CommentsHow to Avoid Tasks Copilot "You've reached our weekly Tasks limit"
I’ve been using both Chat‑Copilot (CC) and Tasks‑Copilot (TC) extensively, and I wanted to share a brief summary provided by TC, that may help others understand how each tool works, why TC sometimes stops responding, and how to avoid running into limits. ⭐ 1. Chat‑Copilot and Tasks‑Copilot serve different purposes Chat‑Copilot Real‑time conversational AI Great for brainstorming, drafting, coding, calculations, and iterative design Stateless — each message is processed independently Very stable and rarely gets stuck Tasks‑Copilot Designed for multi‑step workflows Can create and maintain documents Runs long‑lived background tasks Maintains persistent state More powerful for structured work More fragile because it depends on a task‑execution pipeline These two systems are independent. Chat can work perfectly even when TC is frozen. ⭐ 2. Why Tasks‑Copilot hits limits or becomes unresponsive TC can stop responding when: A task runs too long A multi‑step workflow fails mid‑execution The task state becomes corrupted The weekly quota system triggers The backend fails to reset on Friday Too many “pipeline‑style” requests are issued in a short time When this happens, TC may: stop responding entirely ignore all prompts remain stuck across all devices and browsers This is a backend state issue, not a browser or device problem. ⭐ 3. How to avoid triggering TC limits Here are practical ways to keep TC healthy: Use Chat‑Copilot for: brainstorming engineering design calculations drafting text generating diagrams or prompts step‑by‑step reasoning Chat handles these extremely well and never “uses up” TC capacity. Use Tasks‑Copilot only for: creating structured documents maintaining long‑form reports assembling multi‑section deliverables tasks that explicitly require persistent state Avoid these patterns in TC: “Build the entire document end‑to‑end” “Run this whole workflow” “Generate all sections at once” Rapid‑fire edits or repeated task triggers Very large or complex requests Instead, break work into small, single‑action steps. ⭐ 4. When TC gets stuck, what can users do? For consumer Microsoft 365 Personal accounts: There is no user‑accessible reset button Frontline support cannot reset TC’s task state Creating a business account does not fix the issue The only options today are: submit feedback post on the Tech Community wait for the backend to refresh This is a known limitation of the current TC preview. ⭐ 5. What would help users going forward A few improvements would make TC much more reliable: A user‑visible “Reset Task State” button Error messages instead of silent failures More predictable weekly resets Support tools that allow agents to clear stuck task containers52Views0likes0Comments