openai
88 TopicsGPT-4o Support and New Token Management Feature in Azure API Management
We’re happy to announce new features coming to Azure API Management enhancing your experience with GenAI APIs. Our latest release brings expanded support for GPT-4 models, including text and image-based input, across all GenAI Gateway capabilities. Additionally, we’re expanding our token limit policy with a token quota capability to give you even more control over your token consumption. Token quota This extension of the token limit policy is designed to help you manage token consumption more effectively when working with large language models (LLMs). Key benefits of token quota: Flexible quotas: In addition to rate limiting, set token quotas on an hourly, daily, weekly, or monthly basis to manage token consumption across clients, departments or projects. Cost management: Protect your organization from unexpected token usage costs by aligning quotas with your budget and resource allocation. Enhanced visibility: In combination with emit-token-metric policy, track and analyze token usage patterns to make informed adjustments based on real usage trends. With this new capability, you can empower your developers to innovate while maintaining control over consumption and costs. It’s the perfect balance of flexibility and responsible consumption for your AI projects. Learn more about token quota in our documentation. GPT4o support GPT-4o integrates text and images in a single model, enabling it to handle multiple content types simultaneously. Our latest release enables you take advantage of the full power of GPT-4o with expanded support across all GenAI Gateway capabilities in Azure API Management. Key benefits: Cost efficiency: Control and attribute costs with token monitoring, limits, and quotas. Return cached responses for semantically similar prompts. High reliability: Enable geo-redundancy and automatic failovers with load balancing and circuit breakers. Developer enablement: Replace custom backend code with built-in policies. Publish AI APIs for consumption. Enhanced governance and monitoring: Centralize monitoring and logs for your AI APIs. Phased rollout and availability We’re excited about these new features and want to ensure you have the most up-to-date information about their availability. As with any major update, we’re implementing a phased rollout strategy to ensure safe deployment across our global infrastructure. Because of that some of your services may not have these updates until the deployment is complete. These new features will be available first in the new SKUv2 of Azure API Management followed by SKUv1 rollout towards the end of 2024. Conclusion These new features in Azure API Management represent our step forward in managing and governing your use of GPT4o and other LLMs. By providing greater control, visibility and traffic management capabilities, we’re helping you unlock the full potential of Generative AI while keeping resource usage in check. We’re excited about the possibilities these new features bring and are committed to expanding their availability. As we continue our phased rollout, we appreciate your patience and encourage you to keep an eye out for the updates.394Views1like0CommentsAnnouncing AI building blocks in Logic Apps (Consumption)
We’re thrilled to announce that the Azure OpenAI and AI Search connectors, along with the Parse Document and Chunk Text actions, are now available in the Logic Apps Consumption SKU! These capabilities, already available in the Logic Apps Standard SKU, can now be leveraged in serverless, pay-as-you-go workflows to build powerful AI-driven applications providing cost-efficiency and flexibility. What’s new in Consumption SKU? This release brings almost all the advanced AI capabilities from Logic Apps Standard to Consumption SKU, enabling lightweight, event-driven workflows that automatically scale with your needs. Here’s a summary of the operations now available: Azure OpenAI connector operations Get Completions: Generate text with Azure OpenAI’s GPT models for tasks such as summarization, content creation, and more. Get Embeddings: Generate vector embeddings from text for advanced scenarios like semantic search and knowledge mining. AI Search connector operations Index Document: Add or update a single document in an AI Search index. Index Multiple Documents: Add or update multiple documents in an AI Search index in one operation. *Note: The Vector Search operation for enabling retrieval pattern will be highlighted in an upcoming release in December.* Parse Document and Chunk Text Actions Under the Data operations connector: Parse Document: Extract structured data from uploaded files like PDFs or images. Chunk Text: Split large text blocks into smaller chunks for downstream processing, such as generating embeddings or summaries. Demo workflow: Automating document ingestion with AI To showcase these capabilities, here’s an example workflow that automates document ingestion, processing, and indexing: Trigger: Start the workflow with an HTTP request or an event like a file upload to Azure Blob Storage. Get Blob Content: Retrieve the document to be processed. Parse Document: Extract structured information, such as key data points from a service agreement. Chunk Text: Split the document content into smaller, manageable text chunks. Generate Embeddings: Use the Azure OpenAI connector to create vector embeddings for the text chunks. Select array: To compose the inputs being passed to Index documents operation Index Data: Store the embeddings and metadata for downstream applications, like search or analytics Why choose Consumption SKU? With this release, Logic Apps Consumption SKU allows you to: - Build smarter, scalable workflows: Leverage advanced AI capabilities without upfront infrastructure costs. - Pay only for what you use: Ideal for event-driven workloads where cost-efficiency is key. - Integrate seamlessly: Combine AI capabilities with hundreds of existing Logic Apps connectors. What’s next? In December, we’ll be announcing the Vector Search operation for the AI Search connector, enabling retrieval capability in Logic Apps Consumption SKU to bring feature parity with Standard SKU. This will allow you to perform advanced search scenarios by matching queries with contextually similar content. Stay tuned for updates!263Views3likes0CommentsAnnouncing API Management and API Center Community Live Stream on Thursday, December 12
We're thrilled to announce a community stand-up – a live-stream event for users of Azure API Management and API Center, hosted on YouTube. Join us for an engaging session where we'll delve into the latest industry trends, product updates, and best practices. Event Details Date: Thursday, 12 December 2024 Time: 8 AM PST / 11 AM EST Format: Live stream on YouTube What to Expect Azure API Management and API Center updates and deep dive into Microsoft Ignite announcements: Discover the latest features in our services, including shared workspace gateway, Premium v2 tier, enhancements to GenAI gateway capabilities, and more. Learn how these advancements can benefit your organization and enhance your API management practices. Insights into the API industry: Our product team will share their perspectives on the new developments in the API industry. Interactive Q&A session: Do you have a burning question about our products or are you looking to provide feedback? This is your chance! Join our live Q&A session to get answers directly from our team. Networking opportunities: Connect with fellow API management practitioners in the chat, exchange ideas, and learn from each other's experiences. How to Join Simply tune into our live stream in the Microsoft Azure Developers channel on YouTube at the scheduled date and time. You can select the “Notify me” button to receive a reminder before the event starts. Don't miss out on this exciting opportunity to engage with our product team and fellow API Management and API Center users. Mark your calendars and we'll see you there!Optimizing Vector Similarity Search on Azure Data Explorer – Performance Update
This post is co-authored byAnshul_Sharma (Senior Program Manager, Microsoft). This blog is an update of Optimizing Vector Similarity Searches at Scale. We continue to improve the performance of vector similarity search in Azure Data Explorer (Kusto). We present the new functions and policies to maximize performance and the resulting search times. The following table and chart present the search time for the top 3 most similar vectors to a supplied vector: # of vectors Total time [sec.] 25,000 0.03 50,000 0.035 100,000 0.047 200,000 0.062 400,000 0.094 800,000 0.125 1,600,000 0.14 3,200,000 0.15 6,400,000 0.19 12,800,000 0.35 25,600,000 0.55 51,200,000 1.1 102,400,000 2.3 204,800,000 3.9 409,600,000 7.6 This benchmark was done on a medium size Kusto cluster (containing 29 nodes), searching for the most similar vectors in a table of Azure OpenAI embedding vectors. Each vector was generated using ‘text-embedding-ada-002’ embedding model and contains 1536 coefficients. These are the steps to achieve the best performance of similarity search: Use series_cosine_similarity(), the new optimized native function to calculate cosine similarity Set the encoding of the embeddings column to Vector16, the new 16 bit encoding of the vectors coefficients (instead of the default 64 bit) Store the embedding vectors table on all nodes with at least one shard per processor. This can be achieved by limiting the number of embedding vectors per shard by altering ShardEngineMaxRowCount of the sharding policy and RowCountUpperBoundForMerge of the merging policy. Suppose our table contains 1M vectors and our Kusto cluster has 20 nodes each has 16 processors. The table’s shards should contain at most 1000000/(20*16)=3125 rows. These are the KQL commands to create the empty table and set the required policies and encoding: .create table embedding_vectors(vector_id:long, vector:dynamic) // more columns can be added .alter-merge table embedding_vectors policy sharding '{ "ShardEngineMaxRowCount" : 3125 }' .alter-merge table embedding_vectors policy merge '{ "RowCountUpperBoundForMerge" : 3125 }' .alter column embedding_vectors.vector policy encoding type = 'Vector16' Now we can ingest the vectors into the table. And here is a typical search query: let searched_vector = repeat(1536, 0); // to be replaced with real embedding vector. embedding_vectors | extend similarity = series_cosine_similarity_fl(vector, searched_vector, 1, 1) | top 10 by similarity desc The current semantic search times enable usage of ADX as embedding vectors storage platform for RAG (Retrieval Augmented Generation) scenarios and beyond, We continue to improve vector search performance, stay tuned!4.4KViews4likes2CommentsAzure API Management Turns 10: Celebrating a Decade of Customer-Driven Innovation and Success
This September marks a truly special occasion: Azure API Management turns 10! Since our launch in 2014, we've been on an incredible journey, transforming how businesses connect, scale and secure their digital ecosystems. As the first cloud provider to integrate API management into its platform, Azure has led the way in helping organizations seamlessly navigate the evolving digital landscape.3.2KViews3likes3CommentsDesigning and running a Generative AI Platform based on Azure AI Gateway
Are you in a platform team who has been tasked with building an AI Platform to serve the needs of your internal consumers? What does that mean? It’s a daunting challenge to be set, and even harder if you’re operating in a highly regulated environment. As enterprises scale out usage of Generative AI past a few initial use-cases they will face into a new set of challenges - scaling, onboarding, security and compliance to name a few. In this article we outline a set of common requirements and provide a reference implementation for an AI Platform.7.2KViews2likes0Comments