machine learning

99 Topics

Unlocking Smarter Search How to Use Azure AI Search & Azure OpenAI Service Together
In the era of large language models and AI-powered experiences, simply running a keyword search isn’t enough. Users expect conversational, context-aware responses, grounded in real data. That’s where combining Azure’s search infrastructure with generative AI becomes a game-changer. By using Azure AI Search as the retrieval layer and Azure OpenAI Service as the generation layer, you can build applications that understand natural language, fetch relevant documents, and respond with rich, accurate, and contextual answers. In this blog post, we’ll walk through how to achieve that end-to-end, highlight best practices, and give you a blueprint to apply in your own environment. https://dellenny.com/unlocking-smarter-search-how-to-use-azure-ai-search-azure-openai-service-together/
JohnNaguib
Oct 23, 2025 Place Azure
20Views
0likes
0Comments
Data Vault 2.0 Warehouse Automation on Azure
This is the series of 'Blog Articles' on the topic "Data Vault 2.0 on Azure" where we start from 'What?' and then slowly dwell into 'How To?' implement DV 2.0 on Azure Data Platform Technologies.
Naveed-Hussain
Oct 10, 2025 Place Analytics on Azure Blog
9.1KViews
0likes
1Comment
Defining the Raw Data Vault with Artificial Intelligence
This Article is Authored By Michael Olschimke, co-founder and CEO at Scalefree International GmbH. The Technical Review is done by Ian Clarke, Naveed Hussain – GBBs (Cloud Scale Analytics) for EMEA at Microsoft The Data Vault concept is used across the industry to build robust and agile data solutions. Traditionally, the definition (and subsequent modelling) of the Raw Data Vault, which captures the unmodified raw data, is done manually. This work demands significant human intervention and expertise. However, with the advent of artificial intelligence (AI), we are witnessing a paradigm shift in how we approach this foundational task. This article explores the transformative potential of leveraging AI to define the Raw Data Vault, demonstrating how intelligent automation can enhance efficiency, accuracy, and scalability, ultimately unlocking new levels of insight and agility for organizations. Note that this article describes a solution to AI-generated Raw Data Vault models. However, the solution is not limited to Data Vault, but allows the definition of any data-driven, schema-on-read model to integrate independent data sets in an enterprise environment. We discuss this towards the end of this article. Metadata-Driven Data Warehouse Automation In the early days of Data Vault, all engineering was done manually: an engineer would analyse the data sources and their datasets, come up with a Raw Data Vault model in an E/R tool or Microsoft Visio, and then develop both the DDL code (CREATE TABLE) and the ELT / ETL code (INSERT INTO statements). However, Data Vault follows many patterns. Hubs look very similar (the difference lies in the business keys) and are loaded similarly. We discussed these patterns in previous articles of this series, for example, when covering the Data Vault model and implementation. In most projects where Data Vault entities are created and loaded manually, a data engineer eventually develops the idea of creating a metadata-driven Data Vault generator due to these existing patterns. The effort to build a generator is too considerable, and most projects are better off using an off-the-shelf solution such as Vaultspeed. These tools come with a metadata repository and a user interface for setting up the metadata and code templates required to generate the Raw Data Vault (and often subsequent layers). We have discussed Vaultspeed in previous articles of this series. By applying the code templates to the metadata defined by the user, the actual code for the physical model is generated for a data platform, such as Microsoft Fabric. The code templates define the appearance of hubs, links, and satellites, as well as how they are loaded. The metadata defines which hubs, links, and satellites should exist to capture the incoming data set consistently. Manual development often introduces mistakes and errors that result in deviations in code quality. By generating the data platform code, deviations from the defined templates are not possible (without manual intervention), thus raising the overall quality. But the major driver for most project teams is to increase productivity. Instead of manually developing code, they generate the code. Metadata-driven generation of the Raw Data Vault is standard practice in today's projects. Today’s project tasks have therefore changed: while engineers still need to analyse the source data sets and develop a Raw Data Vault model, they no longer create the code (DDL/ELT). Instead, they set up the metadata that represents the Raw Data Vault model in the tool of their choice. Each data warehouse automation tool comes with its specific features, limitations, and metadata formats. The data engineer/modeler must understand how to transfer the Raw Data Vault model into the data warehouse automation tool by correctly setting up the metadata. This is also true for Vaultspeed; the data modeler can set up the metadata either through the user interface or via the SDK. This is the most labour-intensive task concerning the Raw Data Vault layer. It also requires experts who not only know Data Vault modelling but also know (or can analyse) the source systems' data and understand the selected data warehouse automation solution. Additionally, Data Vault is not equal to Data Vault in many cases, as it allows for a very flexible interpretation of how to model a Data Vault, which also leads to quality issues. But what if the organization has no access to such experts? What if budgets are limited, time is of the essence, or there are no available experts in sufficient numbers in the field? As Data Vault experts, we can debate the value of Data Vault as much as we want, but if there are no experts capable of modeling it, the debate will remain inconclusive. And what if this problem is only getting worse? In the past, a few dozen source tables might have been sufficient to be processed by the data platform. Today, several hundred source tables could be considered a medium-sized data platform. Tomorrow, there will be thousands of source tables. The reason? There is not only an exponential growth in the volume of data to be produced and processed, but it also comes with an exponential growth in the complexity of data shape. The source of this exponential growth in data shape comes from more complex source databases, APIs that produce and deliver semi-structured JSON data, and, ultimately, more complex business processes and an increasing amount of generated and available data that needs to be analysed for meaningful business results. Generating the Data Vault using Artificial Intelligence Increasingly, this data is generated using artificial intelligence (AI) and still requires integration, transformation, and analysis. The issue is that the number of data engineers, data modelers, and data scientists is not growing exponentially. Universities around the world only produce a limited number of these roles, and some of us would like to retire one day. Based on our experience, the increase in these roles is linear at best. Even if you argue for exponential growth in these roles, it is evident that there is no debate about a growing gap between the increasing data volume and the people who should analyse it. This gap cannot be closed by humans in the future. Even in a world where all kids want to become and eventually work in a data role. Sorry for all the pilots, police officers, nurses, doctors, etc., there is no way for you to retire without the whole economy imploding. Therefore, the only way to close the gap is through the use of artificial intelligence. It is not about reducing the data roles. It's about making them efficient so that they can deal with the growing data shape (and not just the volume). For a long time, it was common sense in the industry that, if an artificial intelligence could generate or define the Raw Data Vault, it would be an assisting technology. The AI would make recommendations, for example, such as which hubs or links to model and which business keys to use. The human data modeler would make the final decision, with input from the AI. But what if the AI made the final decision? What would it look like? What if one could attach data sources to the AI platform and the AI would analyze the source datasets, come up with a Raw Data Vault model, and load that model into Vaultspeed or another data warehouse automation tool, know the source system’s data, know Data Vault modelling, and understand the selected data warehouse automation? These questions were posed by Michael Olschimke, a Data Vault and AI expert, when initially considering the challenge. He researched the distribution of neural networks on massively parallel processing (MPP) clusters to classify unstructured data at Santa Clara University in Silicon Valley. This prior AI research, combined with the knowledge he accumulated in the Data Vault, enabled him to build a solution that later became known as Flow.BI. Flow.BI as a Generative AI to Define the Raw Data Vault The solution is simple, at least from the outside: attach a few data sources, let the AI do the rest. Flow.BI supports several data sources already, including Microsoft SQL Server and derivatives, such as Synapse and Fabric, as long as a JDBC driver is available, Flow.BI should eventually be able to analyze the data source. And the AI doesn’t care if the data originates from a CRM system, such as Microsoft Dynamics, or an e-commerce platform; it's just data. There are no provisions in the code to deal with specific datasets, at least for now. The goal of Flow.BI is to produce a valid, that is, consistent and integrated, enterprise data model. Typically, this follows a Data Vault design, but it's not limited to that (we’ll discuss this later in the article). This is achieved by following a strict data-driven approach that imitates the human data modeler. Flow.BI needs data to make decisions, just like its human counterpart. Source entities with no data will be ignored. It only requires some metadata, such as the available entities and their columns. Datatypes are nice-to-have; primary keys and foreign keys would improve the target model, just like entity and column descriptions. But they are not required to define a valid Raw Data Vault model. Humans write this text, and as such, we like to influence the result of the modelling exercise. Flow.BI is appreciating this by offering many options for the human data modeler to influence the engine. Some of them will be discussed in this article, but there are many more already available and more to come. Flow.BI’s user interface is kept as lean and straightforward as possible: the solution is designed so that the AI should take the lead and model the whole Raw Data Vault. The UI’s purpose is to interact with human data modelers, allowing them to influence the results. That’s what many screens are related to - and the configuration of the security system. A client can have multiple instances, which result in independent Data Vault models. This is particularly useful when dealing with independent data platforms, such as those used by HR, the compliance department, or specific business use cases, or when creating the raw data foundation for data products within a data mesh. In this case, a Flow.BI instance equals a data product. But don’t underestimate the complexity of Flow.BI: The frontend is used to manage a large number of compute clusters that implement scalable agents to work on defining the Raw Data Vault. The platform is implementing full separation of data and processing, not only by client but also by instance. Mapping Raw Data to Organizational Ontology The very first step in the process is to identify the concepts in the attached datasets. For this purpose, there is a concept classifier that analyses the data and recognizes datasets and their classified concepts that it has seen in the past. A common requirement of clients is that they would like to leverage their organizational requirements in this process. While Flow.BI doesn’t know a client’s ontology; it is possible to override (and in some cases, complete) the concept classifications and refer to concepts from the organizational ontology. By doing so, Flow.BI will integrate the source system’s raw data into the organization's ontology. It will not create a logical Data Vault, which is where the Data Vault model reflects the desired business, but instead model the raw data as the business uses it, and therefore follow the data-driven Data Vault modeling principles that Michael Olschimke has taught to thousands of students over the years at Scalefree. Flow.BI also allows the definition of a multi-tenant Data Vault model, where source systems either provide multi-tenant data or are assigned to a specific tenant. In both cases, the integrated enterprise data model will be extended to allow queries across multiple tenants or within a single tenant, depending on the information consumer’s needs. Ensuring Security and Privacy Flow.BI was designed with security and privacy in mind. From a design perspective, this has two aspects: Security and privacy in the service itself, to protect client solutions and related assets Security and privacy are integral to the defined model, allowing for the effective utilization of Data Vault’s capabilities in addressing security and privacy requirements, such as satellite splits. While Flow.BI is using a shared architecture; all data and metadata storage and processing are separated by client and instance. However, this is often not sufficient for clients as they hesitate to share their highly sensitive data with a third party. For this reason, Flow.BI allows two critical features: Local data storage: instead of storing client data on Flow.BI infrastructure, the client provides an Azure Data Lake Storage to be used for storing the data. Local data processing: A Docker container can be deployed into the client’s infrastructure to access the client's data sources, extract the data, and process it. When using both options, only metadata, such as entity and column names, constraints, and descriptions, are shared with Flow.BI. No data is transferred from the client’s infrastructure to Flow.BI. The metadata is secured on Flow.BI’s premises as if it were actual data: row-level security separates the metadata by instance, and roles and permissions are defined per client who can access the metadata and what they can do with it. But security and privacy are not limited to the service itself. The defined model also utilizes the security and privacy features of Data Vault. For example, it enables the classification of source columns based on security and privacy. The user can set up security and privacy classes and apply them to the influence screen for both. By doing so, the column classifications are used when defining the Raw Data Vault and can later be used to implement a satellite split in the physical model (if necessary). An upcoming release will include an AI model for classifying columns based on privacy, utilizing data and metadata to automate this task. Tackling Multilingual Challenges A common challenge for clients is navigating multilingual data environments. Many data sources use English entity and column names, but there are systems using metadata in a different language. Also, the assumption that the data platform should use English metadata is not always correct. Especially in government clients, the use of the official language is mandatory. Both options, translating the source metadata to English (the default within Flow.BI) and translating the defined target model into any target language, are supported by Flow.BI’s translations tab on the influence screen: The tab utilizes an AI translator to fully automatically translate the incoming table names, column names, and concept names. However, the user can step in and override the translation to improve it to their needs. All strings of the source metadata and the defined model are passed through the translation module. It is also possible to reuse existing translations for a growing list of popular data sources. This feature enables readable names for satellites and their attributes (as well as hubs and links), resulting in a significantly improved user experience for the defined Raw Data Vault. Generating the Physical Model You should have noticed by now that we consistently discuss the defined Raw Data Vault model. Flow.BI is not generating the physical model, that is, the CREATE TABLE and INSERT INTO statements for the Raw Data Vault. Instead, it “just” defines the hubs, links, and satellites required for capturing all incoming data from the attached data sources, including business key selection, satellite splits, and special entity types, such as non-historized links and their satellites, multi-active satellites, hierarchical links, effectivity satellites, and reference tables. Video on Generating Physical Models This logical model (not to be confused with “logical Data Vault modelling”) is then provided to our growing number of ISV partner solutions that will consume our defined model, set up the required metadata in their tool, and generate the physical model. As a result, Flow.BI acts as a team member that analyses your organizational data sources and their data, knows how to model the Raw Data Vault, and how to set up metadata in the tool of your choice. The metadata is provided by Flow.BI can be used to model the landing zone/staging area (either on a data lake or a relational database such as Microsoft Fabric) and the Raw Data Vault in a data-driven Data Vault architecture, which is the recommended practice. With this in mind, Flow.BI is not a competition to Vaultspeed or your other existing data warehouse automation solution, but a valid extension that integrates with your existing tool stack. This makes it much easier to justify the introduction of Flow.BI to the project. Going Beyond Data Vault Flow.BI is not limited to the definition of Data Vault models. While it has been designed with the Data Vault concepts in mind, a customizable expert system is used to define the Data Vault model. Although the expert system is not yet publicly available, it has already been implemented and is in use for every model generation. This expert system enables the implementation of alternative data models, provided they adhere to data-driven, schema-on-read principles. Data Vault is such an example, but many others are possible, as well: Customized Data Vault models Inmon-style enterprise models in third-normal form (3NF, if no business logic is required Kimball-style analytical models with facts and dimensions, again without business logic Semi-structured JSON and XML document collections Key-value stores “One Big Table (OBT)” models “Many Big Related Table (MBRT)” models Okay, we’ve just invented the MBRT model as we're writing the article, but you get the idea: many large, fully denormalized tables with foreign–key relationships between each other. If you've developed your data-driven model, please get in touch with us. About the Authors Michael Olschimke is co-founder and CEO of Flow.BI, a generative AI that defines integrated enterprise data models, such as (but not limited to) Data Vault. Michael has trained thousands of industry data warehousing professionals, taught academic classes, and published regularly on topics around data platforms, data engineering, and Data Vault. He has over two decades of experience in information technology, with a specialization in business intelligence topics, artificial intelligence and data platforms. <<< Back to Blog Series Title Page
Naveed-Hussain
Oct 10, 2025 Place Analytics on Azure Blog
184Views
0likes
0Comments
Enhancing Copilot Bots with Azure OpenAI Services
In an era where conversational AI is rapidly moving from novelty to necessity, enterprises are turning to powerful tools that allow them to build bots and copilots that are not just reactive, but smart, context-aware, and deeply integrated with business data. Microsoft’s Copilot ecosystem combined with Azure’s OpenAI Services offers a compelling pathway to supercharge bots with advanced capabilities. This post explores how to enhance Copilot bots using Azure OpenAI Services: what features are available, what benefits they bring, how to implement them, and challenges to watch out for. https://dellenny.com/enhancing-copilot-bots-with-azure-openai-services/
JohnNaguib
Oct 09, 2025 Place Azure
23Views
0likes
0Comments
How Agentic AI Works and How to Build It in Azure
Agentic AI refers to systems that go beyond simple question-answering or rule-based automation. These systems are autonomous, goal-oriented, and adaptive — meaning they can plan, act, and learn with minimal human oversight. https://dellenny.com/how-agentic-ai-works-and-how-to-build-it-in-azure/
JohnNaguib
Oct 07, 2025 Place Azure
35Views
0likes
0Comments
How Azure AI is Revolutionizing Supply Chain Forecasting and Inventory
In today’s fast-paced global marketplace, supply chain efficiency can make or break a business. Companies face constant challenges such as demand fluctuations, supplier disruptions, and shifting customer expectations. Traditional forecasting methods—often reliant on historical data and rigid models—are no longer enough. This is where Azure AI is stepping in, transforming supply chain forecasting and inventory management with intelligent, adaptive, and real-time solutions. https://dellenny.com/how-azure-ai-is-revolutionizing-supply-chain-forecasting-and-inventory/
JohnNaguib
Sep 29, 2025 Place Azure
22Views
0likes
0Comments
Exploring the Core Components of Microsoft Fabric A Unified Data Platform
As data continues to be the new oil, organizations are increasingly seeking robust platforms that can simplify and unify their data landscape. Enter Microsoft Fabric—a next-generation data platform introduced by Microsoft that brings together all the data and analytics tools needed in the modern enterprise, integrated into a single, SaaS-based solution. In this post, we’ll break down the key components of Microsoft Fabric, explain how they work together, and highlight why this platform is a game-changer for data professionals, developers, and decision-makers alike. https://dellenny.com/exploring-the-core-components-of-microsoft-fabric-a-unified-data-platform/
JohnNaguib
Jul 25, 2025 Place Azure
102Views
0likes
0Comments
Unlocking Innovation with Azure AI Foundry Agent Service
In today’s AI-driven landscape, the ability to build, orchestrate, and operationalize intelligent agents at scale is becoming increasingly critical for organizations seeking to leverage AI as a core capability. Microsoft’s Azure AI Foundry Agent Service, introduced as part of the Azure AI Studio ecosystem, is a game-changing platform designed to empower developers and enterprises to build sophisticated multi-agent AI systems with minimal friction. https://dellenny.com/unlocking-innovation-with-azure-ai-foundry-agent-service/
JohnNaguib
Jul 25, 2025 Place Azure
51Views
0likes
0Comments
Exploring the Synergy Between Microsoft Fabric and Azure Machine Learning Studio
The data landscape continues to evolve at an unprecedented pace. As organizations strive to become more data-driven, the integration of platforms and tools becomes increasingly critical. Microsoft Fabric, the new all-in-one analytics platform, is reshaping how businesses approach data analytics and AI. A key part of this transformation is its growing synergy with Azure Machine Learning Studio. In this blog post, we’ll explore what Microsoft Fabric is, its role in the modern data stack, and how it integrates with Azure Machine Learning to enable powerful machine learning (ML) workflows. https://dellenny.com/exploring-the-synergy-between-microsoft-fabric-and-azure-machine-learning-studio/
JohnNaguib
Jul 25, 2025 Place Azure
70Views
0likes
0Comments
Built a Real-Time Azure AI + AKS + DevOps Project – Looking for Feedback
Hi everyone, I recently completed a real-time project using Microsoft Azure services to build a cloud-native healthcare monitoring system. The key services used include: Azure AI (Cognitive Services, OpenAI) Azure Kubernetes Service (AKS) Azure DevOps and GitHub Actions Azure Monitor, Key Vault, API Management, and others The project focuses on real-time health risk prediction using simulated sensor data. It's built with containerized microservices, infrastructure as code, and end-to-end automation. GitHub link (with source code and documentation): https://github.com/kavin3021/AI-Driven-Predictive-Healthcare-Ecosystem I would really appreciate your feedback or suggestions to improve the solution. Thank you!
Kavindhiran
Jul 24, 2025 Place Azure
128Views
0likes
2Comments