Building an OpenAI powered Recommendation Engine

Microsoft

Feb 12, 2025

Introduction

Recommendation engines play a vital role in enhancing user experiences by providing personalized suggestions and have been proved as an effective strategy in turning engagement into valuable business.

The technical objective of a Recommendation Engine is to filter and present the most relevant items from a vast datasets considering business constrains. This process includes steps like data collection, preprocessing, model training, and deployment. Advanced techniques such as embeddings and cosine similarity are used to determine most relevant results for recommendations

This blog explores the design and implementation of a recommendation engine. It addresses the challenges faced by traditional systems and how modern approaches can overcome them, aiming to build a robust, scalable recommendation engine suitable for various domains.

Background / Problem Scenario

Traditional recommendation systems often fall short due to their reliance on basic filtering techniques and limited understanding of user behaviour, resulting in poor recommendations and user dissatisfaction.

The main issue is that traditional recommendation engines struggle to analyse large datasets and understand the relationships between items, leading to a mismatch between user preferences and recommendations. Additionally, the need for real-time, personalized suggestions adds complexity.

To address this, we need a recommendation engine that leverages advanced AI techniques like embeddings and cosine similarity to accurately filter relevant results. This engine should be scalable, capable of handling vast amounts of data, and able to provide quick, relevant recommendations.

We have implemented a similar solution on our Microsoft Career Site which has been scaled to provide job recommendations to internal users in over 100 countries across the globe. We have noticed a significant increase in conversion rates of 1.6 times in job applications through recommendations vs job search.

This solution is not just limited to a career site but can be adopted for a variety of recommendation scenarios such as e-commerce, social media, e-learning platforms, media streaming platforms, travel and hospitality, healthcare, retail and much more.

Key Features

Semantic Understanding: By using embeddings, the engine captures the semantic meaning of items, leading to more relevant recommendations.
Agility and Customizability: Customization options are available by modifying the weight.
Scalability: Azure AI Search provides scalable storage and efficient retrieval of embeddings, making the system suitable for large datasets.
Real-time Recommendations: The use of cosine similarity allows for quick computation of similarity scores, enabling real-time recommendations.
Flexibility: The system can be adapted to various domains, such as e-commerce, content streaming, and social media, by training domain-specific embedding models.

Working Principle

Raw Data Conversion: The recommendation engine converts raw data into Named Entity Recognition (NER) using OpenAI. NER is nothing but a Json in the pre-defined schema.
Vector Embeddings: The NER is then converted into vector embeddings using OpenAI.
Vector Database: This is used to store embeddings and querying it efficiently. We preferred to use Azure AI Search as our Vector Database.
User Interaction: When a user interacts with the system, their preferences are also converted into embeddings.
Cosine Similarity: In technical words, cosine similarity measures the angle between the user's embedding and item embeddings. In simple terms, it's a technique used to generate a score that indicates how closely an item matches the given sample.
Recommendation: This process identifies the most similar items, ensuring recommendations are based on the semantic similarity of items, rather than just surface-level features. This process also implements additional filters on the result which user has shared via feedback loop.

Data Flow

NER Generation: With existing structured or unstructured data, NER (Named Entity Recognition) is generated using OpenAI with prompt engineering approach.
Embeddings Generation: NER is further processed with OpenAI to generate embeddings.
Azure AI Search: Generated Embeddings are further stored in Azure AI Search.
Recommendation Generation: Using vector queries and cosine similarity calculations, a set of matching results is generated. Further on that, additional filter is done based on the user feedback collected via feedback loop, which is then served as recommendations.
Feedback Loop: To enhance the recommendation results based on user feedback. The feedback collected here is used further to refine the final calculated results.
Azure Premium Storage: For caching results to improve performance. When considering caching solutions for our recommendation engine, several factors come into play:

Redis Cache Limitations: Redis can struggle with larger response sizes, around 1.5 MB.
Cost Efficiency: Blob-based caching is often more cost-effective compared to Redis.
Document DB Constraints: The maximum response size is usually capped at few MB, which may not be scalable for larger result datasets. Also, scaling up document database can be costly.
Response Time Goals: Our aim is to significantly reduce response times without incurring high costs for ultra-fast API responses.
Performance Metrics: For 25 job recommendations in our pre-production environment, the response time was around 600 ms, which meets our SLA.

Data Flow Diagram of the Recommendation Engine

Considerations for Engineering Standards

Security

Disable Secrets / Local Auth: Disable local authentication and secrets/connection strings for all Azure Services to enhance security and prevent unauthorized access. Use Managed Identity wherever applicable and possible.
Firewall: Consider limiting the IP range accessibility of the databases to reduce the risk of unauthorized access. You can also prefer to use Virtual Network to restrict access.
Rate Limiting: Implement rate limiting to prevent throttling and ensure fair usage of OpenAI resources.
Encryption: Ensure all data at rest and in transit is encrypted to protect sensitive information.
Identity and Access Management (IAM): Implement strict IAM policies to control who can access what resources.
Security Audits: Regularly conduct security audits to identify and mitigate vulnerabilities.
Incident Response Plan: Develop and maintain an incident response plan to quickly address security breaches.

Quality

Comprehensive Testing for Quality NER: Set up an extensive testing environment to guarantee high-quality Named Entity Recognition (NER) outputs. With high quality NER, the overall quality and reliability of the entire system will significantly improve. In our scenario, we have developed an automated tool to feed bulk dataset to generate and test the quality of NER. Manual quality testing is also required up to some extent to ensure result is not capturing any bias based on language, colour, ethnicity etc.
Unit Testing: Make use of Unit Testing framework to ensure consistent and thorough testing of all code changes.
Build Verification Testing (BVT): Perform automated BVT to ensure that the build is stable and meets the basic requirements before proceeding to more rigorous testing.

Performance

Result Caching: Implement caching mechanisms to store frequently accessed data and improve response times.
Multi-Region Load Balancing: Distribute traffic across multiple regions to enhance performance and ensure high availability.
Load Testing: Conduct load testing to evaluate system performance under high traffic conditions and identify potential bottlenecks. We considered JMeter for load testing in our scenario.
Database Optimization: Optimize database queries and indexing to improve performance. Also ensure it is appropriately scaled to cater the required load.
Content Delivery Network (CDN): Use CDNs to reduce latency and improve load times for users globally.
Scalability Testing: Test the system’s ability to scale up or down based on demand.
Resource / SKEU Allocation: Efficiently allocate resources to ensure optimal performance under varying loads.

Prompt Engineering in OpenAI

OpenAI Model Selection: Extensive rounds of testing may be required to identify the optimal model for your use case. New, higher-performing models are emerging almost every quarter. Ensure a thorough validation is done before you plan to switch to a new model.
Context Awareness: Ensure your prompts consider user preferences, history, and current context for personalized recommendations if applicable. In our case, context use case was not there.
Clarity and Brevity: Keep prompts clear and concise to avoid user confusion and encourage quick responses.
Dynamic Adjustments: Adapt your prompts based on user feedback and changing preferences to keep recommendations relevant.
Avoid Bias: Enrich your prompts to avoid any kind of bias in results.
Feedback Loops: Implement prompts that actively seek user feedback to continually refine and improve the recommendation system.

Deployment & Release

Feature Flighting: Gradually roll out new features to a subset of users to test and gather feedback before full deployment.
Blue-Green Deployment: Use blue-green deployment strategies to minimize downtime and reduce the risk during updates.
CICD Pipelines: Implement Continuous Integration and Continuous Deployment pipelines to automate testing and deployment processes, ensuring faster and more reliable releases.
Rollback Strategies: Develop rollback strategies to quickly revert to a previous version in case of issues during deployment.
Infra as Code: Recommended to use Bicep or other approaches for infrastructure setup.

Challenges Anticipated

AI Hallucinations: Ensuring the prevention of AI-generated hallucinations. It can be fixed with appropriate prompts and rigorous testing with malicious prompts.
Quality Assurance: Maintaining rigorous quality testing protocols.
NER Extraction Accuracy: Enhancing the precision of Named Entity Recognition (NER) by enhancing prompts.
Data Privacy and Compliance: Upholding data privacy standards and conducting thorough reviews.

Conclusion: Why Should You Consider This Approach?

Easy Integration with Azure AI Search: One of the biggest advantages of using Azure AI Search is how easy it is to integrate. You don't need to spend a lot of time setting up complex infrastructure. Instead, you can focus on fine-tuning your recommendation algorithms. Azure AI Search comes with built-in support for vector search, making it simpler to implement advanced recommendation systems.
Scalability: Azure AI Search is designed to handle large datasets efficiently. This means your recommendation engine can grow alongside your user base without losing performance. The platform can manage high query volumes and large-scale data indexing, ensuring your system stays responsive and reliable as it scales.
Vector-Based Search Benefits: Traditional filtering techniques often fall short in capturing the true meaning behind user preferences. Vector-based search, on the other hand, understands the semantic relationships between items, leading to more accurate and relevant recommendations. This results in a better user experience, as the suggestions are more aligned with what users are actually looking for.
Cost Efficiency: Choosing the right caching strategies, like Azure Premium Storage blob-based caching over Redis, can help you save costs while maintaining performance. This is especially important for large-scale deployments where budget management is crucial. Blob storage is a cost-effective solution for storing large amounts of data.
Real-World Impact: Implementing a recommendation engine like this can have a significant impact on user engagement and business outcomes. For instance, personalized job recommendations on the Microsoft Global Career Site have led to improved candidate engagement and conversion rates increased by 1.6 times. Delivering relevant content quickly enhances user experience and drives important business metrics like retention and conversion.

References

Contributors:

Ashish Mathur, Jayesh Kudukken Thazhath, Ashudeep Reshi, Bipul Raman, Swadhin Nayak, Sivakamy Lakshminarayanan, Prachi Nautiyal, Priyanka Kumari, Abhishek Mishra, Satya Vamsi Gadikoyila

Updated Feb 11, 2025

Version 1.0

azure openai service