Blog Post

Microsoft Developer Community Blog
2 MIN READ

Managing Token Consumption with GitHub Copilot for Azure

Julia_Muiruri's avatar
Julia_Muiruri
Icon for Microsoft rankMicrosoft
Apr 01, 2025

Introduction

AI Engineers often face challenges that require creative solutions. One such challenge is managing the consumption of tokens when using large language models. For example, you may observe heavy token consumption from a single client app or user, and determine that with that kind of usage pattern, the shared quota for other client applications relying on the same OpenAI backend will be depleted quickly. To prevent this, we need a solution that doesn't involve spending hours reading documentation or watching tutorials.

Enter GitHub Copilot for Azure.

GitHub Copilot for Azure

Instead of diving into extensive documentation, we can leverage GitHub Copilot for Azure directly within VS Code. By invoking Copilot using azure, we can describe our issue in natural language. For our example, we might say: "Some users of my app are consuming too many tokens, which will affect tokens left for my other services. I need to limit the number of tokens a user can consume." Refer to video above for more context.

Invoking GitHub Copilot for Azure using azure

GitHub Copilot in Action

GitHub Copilot pools relevant ideas from https://learn.microsoft.com/ and suggests Azure services that can help. We can engage in a chat conversation, with follow-up questions like, "What happens if a user exceeds their token limit?" etcetera.

Follow up chat conversations with GitHub Copilot for Azure

This response from GitHub Copilot accurately describes the specific feature we need, along with the expected outcome/ behavior of user requests being blocked from accessing the backend, and users will receive a "too many requests" warning—exactly what we need.

At this point, it felt like I was having a 1:1 chat with docs 🙃

Implementation

To implement this, we ask GitHub Copilot for an example on enforcing the Azure token limit policy. It references the docs on Learn and provides a policy statement. Since we're not fully conversant with the product, we continue using Copilot to help with the implementation. Although GitHub Copilot chat cannot directly update our code, we can switch to GitHub Copilot Edits, provide some custom instructions in natural language, and watch as GitHub Copilot makes the necessary changes, which we review and accept/ decline.

GitHub Copilot Edits implementing policy suggestions to code

Testing and Deployment

After implementing the policy, we redeploy our application using the Azure Developer CLI (azd) and restart our application and API to test. We now see that if a user sends another prompt after hitting the applied token limit, their request is terminated with a warning that the allocated limit is exceeded, along with instructions on what to do next.

Conclusion

Managing token consumption effectively is just one of the many ways GitHub Copilot for Azure can assist developers. Download and install the extension today to try it out yourself. If you have any scenarios you'd like to see us cover, drop them in the comments, and we'll feature them.

See you in the next blog!

Updated Apr 01, 2025
Version 2.0
No CommentsBe the first to comment