Expanding GenAI Gateway Capabilities in Azure API Management

MCT

Aug 10, 2024

Great updates! I was experimenting with the `llm-semantic-cache-lookup` policy and would like to share my feedback with the PG team

Currently, we are using `log-to-eventhub` to send the output response to calculate the completion_prompt, total_prompt and input_prompt downstream for charge back. Once we start using `llm-semantic-cache-lookup` we may return the data from cache hence we need to skip those calculations. Because it may lead to confusion that it came from the openai instances for charge back. It would be helpful to have an output property (as a variable) to `llm-semantic-cache-lookup`, which can identify whether it's a cache hit or cache miss. With this output variable, we can update the logic downstream appropriately based on cache hit/miss logic

e.g,

<llm-semantic-cache-lookup

score-threshold="0.05"

embeddings-backend-id = "embeddings-backend"

output-variable-cache-hit = "name of a variable that set to true if cache hit and false if cache miss" >

<vary-by>"expression to partition caching"</vary-by>

</llm-semantic-cache-lookup>

This would be helpful not only for chargeback but also for many different use cases

Thank you for considering this feedback!

Jay

Blog Post

Expanding GenAI Gateway Capabilities in Azure API Management