machine learning
94 TopicsExploring the Core Components of Microsoft Fabric A Unified Data Platform
As data continues to be the new oil, organizations are increasingly seeking robust platforms that can simplify and unify their data landscape. Enter Microsoft Fabric—a next-generation data platform introduced by Microsoft that brings together all the data and analytics tools needed in the modern enterprise, integrated into a single, SaaS-based solution. In this post, we’ll break down the key components of Microsoft Fabric, explain how they work together, and highlight why this platform is a game-changer for data professionals, developers, and decision-makers alike. https://dellenny.com/exploring-the-core-components-of-microsoft-fabric-a-unified-data-platform/91Views0likes0CommentsUnlocking Innovation with Azure AI Foundry Agent Service
In today’s AI-driven landscape, the ability to build, orchestrate, and operationalize intelligent agents at scale is becoming increasingly critical for organizations seeking to leverage AI as a core capability. Microsoft’s Azure AI Foundry Agent Service, introduced as part of the Azure AI Studio ecosystem, is a game-changing platform designed to empower developers and enterprises to build sophisticated multi-agent AI systems with minimal friction. https://dellenny.com/unlocking-innovation-with-azure-ai-foundry-agent-service/45Views0likes0CommentsExploring the Synergy Between Microsoft Fabric and Azure Machine Learning Studio
The data landscape continues to evolve at an unprecedented pace. As organizations strive to become more data-driven, the integration of platforms and tools becomes increasingly critical. Microsoft Fabric, the new all-in-one analytics platform, is reshaping how businesses approach data analytics and AI. A key part of this transformation is its growing synergy with Azure Machine Learning Studio. In this blog post, we’ll explore what Microsoft Fabric is, its role in the modern data stack, and how it integrates with Azure Machine Learning to enable powerful machine learning (ML) workflows. https://dellenny.com/exploring-the-synergy-between-microsoft-fabric-and-azure-machine-learning-studio/49Views0likes0CommentsBuilt a Real-Time Azure AI + AKS + DevOps Project – Looking for Feedback
Hi everyone, I recently completed a real-time project using Microsoft Azure services to build a cloud-native healthcare monitoring system. The key services used include: Azure AI (Cognitive Services, OpenAI) Azure Kubernetes Service (AKS) Azure DevOps and GitHub Actions Azure Monitor, Key Vault, API Management, and others The project focuses on real-time health risk prediction using simulated sensor data. It's built with containerized microservices, infrastructure as code, and end-to-end automation. GitHub link (with source code and documentation): https://github.com/kavin3021/AI-Driven-Predictive-Healthcare-Ecosystem I would really appreciate your feedback or suggestions to improve the solution. Thank you!93Views0likes2CommentsScaling Smart with Azure: Architecture That Works
Hi Tech Community! I’m Zainab, currently based in Abu Dhabi and serving as Vice President of Finance & HR at Hoddz Trends LLC a global tech solutions company headquartered in Arkansas, USA. While I lead on strategy, people, and financials, I also roll up my sleeves when it comes to tech innovation. In this discussion, I want to explore the real-world challenges of scaling systems with Microsoft Azure. From choosing the right architecture to optimizing performance and cost, I’ll be sharing insights drawn from experience and I’d love to hear yours too. Whether you're building from scratch, migrating legacy systems, or refining deployments, let’s talk about what actually works.55Views0likes1CommentAnnouncing the availability of Azure Databricks connector in Azure AI Foundry
At Microsoft, Databricks Data Intelligence Platform is available as a fully managed, native, first party Data and AI solution called Azure Databricks. This makes Azure the optimal cloud for running Databricks workloads. Because of our unique partnership, we can bring you seamless integrations leveraging the power of the entire Microsoft ecosystem to do more with your data. Azure AI Foundry is an integrated platform for Developers and IT Administrators to design, customize, and manage AI applications and agents. Today we are excited to announce the public preview of the Azure Databricks connector in Azure AI Foundry. With this launch you can build enterprise-grade AI agents that reason over real-time Azure Databricks data while being governed by Unity Catalog. These agents will also be enriched by the responsible AI capabilities of Azure AI Foundry. Here are a few ways this seamless integration can benefit you and your organization: Native Integration: Connect to Azure Databricks AI/BI Genie from Azure AI Foundry Contextual Answers: Genie agents provide answers grounded in your unique data Supports Various LLMs: Secure, authenticated data access Streamlined Process: Real-time data insights within GenAI apps Seamless Integration: Simplifies AI agent management with data governance Multi-Agent workflows: Leverages Azure AI agents and Genie Spaces for faster insights Enhanced Collaboration: Boosts productivity between business and technical users To further democratize the use of data for those in your organization aren't directly interacting with Azure Databricks, you can also take it one step further with Microsoft Teams and AI/BI Genie. AI/BI Genie enables you to get deep insights from your data using your natural language without needing to access Azure Databricks. Here you see an example of what an agent built in AI Foundry using data from Azure Databricks available in Microsoft Teams looks like We'd love to hear your feedback as you use the Azure Databricks connector in AI Foundry. Try it out today – to help you get started, we’ve put together some samples here.542Views0likes0CommentsPower BI & Azure Databricks: Smarter Refreshes, Less Hassle
We are excited to extend the deep integration between Azure Databricks and Microsoft Power BI with the Public Preview of the Power BI task type in Azure Databricks Workflows. This new capability allows users to update and refresh Power BI semantic models directly from their Azure Databricks workflows, ensuring real-time data updates for reports and dashboards. By leveraging orchestration and triggers within Azure Databricks Workflows, organizations can improve efficiency, reduce refresh costs, and enhance data accuracy for Power BI users. Power BI tasks seamlessly integrate with Unity Catalog in Azure Databricks, enabling automated updates to tables, views, materialized views, and streaming tables across multiple schemas and catalogs. With support for Import, DirectQuery, and Dual Storage modes, Power BI tasks provide flexibility in managing performance and security. This direct integration eliminates manual processes, ensuring Power BI models stay synchronized with underlying data without requiring context switching between platforms. Built into Azure Databricks Lakeflow, Power BI tasks benefit from enterprise-grade orchestration and monitoring, including task dependencies, scheduling, retries, and notifications. This streamlines workflows and improves governance by utilizing Microsoft Entra ID authentication and Unity Catalog suite of security and governance offerings. We invite you to explore the new Power BI tasks today and experience seamless data integration—get started by visiting the [ADB Power BI task documentation].1.9KViews0likes2CommentsAnthropic State-of-the-Art Models Available to Azure Databricks Customers
Our customers now have greater model choices with the arrival of Anthropic Claude 3.7 Sonnet in Azure Databricks. Databricks is announcing a partnership with Anthropic to integrate their state-of-the-art models into Databricks Data Intelligence Platform as a native offering, starting with Claude 3.7 Sonnet http://databricks.com/blog/anthropic-claude-37-sonnet-now-natively-available-databricks. With this announcement, Azure customers can use Claude Models directly in Azure Databricks. Foundation model REST API reference - Azure Databricks | Microsoft Learn With Anthropic models available in Azure Databricks, customers can use the Claude "think" tool with business data optimized promote to guide Claude efficiently perform complex tasks. With Claude models in Azure Databricks, enterprises can deliver domain-specific, high quality AI agents more efficiently. As an integrated component of the Azure Databricks Data Intelligence Platform, Anthropic Claude models benefit from comprehensive end-to-end governance and monitoring throughout the entire data and AI lifecycle with Unity Catalog. With Claude models, we remain committed to providing customers with model flexibility. Through the Azure Databricks Data Intelligence Platform, customers can securely connect to any model provider and select the most suitable model for their needs. They can further enhance these models with enterprise data to develop domain-specific, high-quality AI agents, supported by built-in custom evaluation governance across both data and models.8.8KViews2likes0CommentsDetermining sizing requirements for GPU enabled Azure VM
Greetings, We are trying to determine the correct VM sizing requirement for our AI workload, which is used for NLP processing. This workload does not require any training, but will only be used for inference. We have the following software configuration: a C# application that is heavily multithreaded using a lot of socket I/O. The application has concentrated bursts where 10-20 threads are fired concurrently to perform tasks (mostly socket I/O). This app communicates via dedicated sockets to: a Python application which performs various NLP tasks. This app is also multithreaded to handle multiple incoming requests from the .NET app. This app sends queries to a local LLM (model size will vary based on query type). We estimate we will need to support sub-second performance (at the very least) on a 7B parameter model. Ultimately, we may need to go to larger model sizes if accuracy is insufficient. The amount of text passed to the LLM will range from 300-3000 tokens. In short, we need: a) a CPU with sufficient cores to handle multiple concurrent threads on the .NET side. The app will have 5 or 6 background threads running continuously, and sudden bursts of activity which will require a minimum of 10-20 threads to run shorter-lived tasks. b) a GPU with sufficient VRAM to handle at the very least, a 7B parameter model. Ultimately, we may need to support larger models to perform the same task due to insufficient accuracy. We need the ideal configuration of GPU/VRAM and CPU/RAM to handle these tasks, and also, potentially, larger LLM sizes of up to 14B or 70B parameters. We are looking at the NC-series VMs, with a budget of about $1,000/month (see https://learn.microsoft.com/en-us/answers/questions/2150959/determining-sizing-requirements-for-gpu-enabled-az?comment=question). Any feedback on the optimal configuration in terms of CPU/GPU would be greatly appreciated. Thank you in advance.1.1KViews0likes2CommentsHow to Create an AI Model for Streaming Data
A Practical Guide with Microsoft Fabric, Kafka and MLFlow Intro In today’s digital landscape, the ability to detect and respond to threats in real-time isn’t just a luxury—it’s a necessity. Imagine building a system that can analyze thousands of user interactions per second, identifying potential phishing attempts before they impact your users. While this may sound complex, Microsoft Fabric makes it possible, even with streaming data. Let’s explore how. In this hands-on guide, I’ll walk you through creating an end-to-end AI solution that processes streaming data from Kafka and employs machine learning for real-time threat detection. We’ll leverage Microsoft Fabric’s comprehensive suite of tools to build, train, and deploy an AI model that works seamlessly with streaming data. Why This Matters Before we dive into the technical details, let’s explore the key advantages of this approach: real-time detection, proactive protection, and the ability to adapt to emerging threats. Real-Time Processing: Traditional batch processing isn’t enough in today’s fast-paced threat landscape. We need immediate insights. Scalability: With Microsoft Fabric’s distributed computing capabilities, our solution can handle enterprise-scale data volumes. Integration: By combining streaming data processing with AI, we create a system that’s both intelligent and responsive. What We’ll Build I’ve created a practical demonstration that showcases how to: Ingest streaming data from Kafka using Microsoft Fabric’s Eventhouse Clean and prepare data in real-time using PySpark Train and evaluate an AI model for phishing detection Deploy the model for real-time predictions Store and analyze results for continuous improvement The best part? Everything stays within the Microsoft Fabric ecosystem, making deployment and maintenance straightforward. Azure Event Hub Start by creating an Event Hub namespace and a new Event Hub. Azure Event Hubs have Kafka endpoints ready to start receiving Streaming Data. Create a new Shared Access Signature and utilize the Python i have created. You may adopt the Constructor to your own idea. import uuid import random import time from confluent_kafka import Producer # Kafka configuration for Azure Event Hub config = { 'bootstrap.servers': 'streamiot-dev1.servicebus.windows.net:9093', 'sasl.mechanisms': 'PLAIN', 'security.protocol': 'SASL_SSL', 'sasl.username': '$ConnectionString', 'sasl.password': 'Endpoint=sb://<replacewithyourendpoint>.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=xxxxxxx', } # Create a Kafka producer producer = Producer(config) # Shadow traffic generation def generate_shadow_payload(): """Generates a shadow traffic payload.""" subscriber_id = str(uuid.uuid4()) # Weighted choice for subscriberData if random.choices([True, False], weights=[5, 1])[0]: subscriber_data = f"{random.choice(['John', 'Mark', 'Alex', 'Gordon', 'Silia' 'Jane', 'Alice', 'Bob'])} {random.choice(['Doe', 'White', 'Blue', 'Green', 'Beck', 'Rogers', 'Fergs', 'Coolio', 'Hanks', 'Oliver', 'Smith', 'Brown'])}" else: subscriber_data = f"https://{random.choice(['example.com', 'examplez.com', 'testz.com', 'samplez.com', 'testsite.com', 'mysite.org'])}" return { "subscriberId": subscriber_id, "subscriberData": subscriber_data, } # Delivery report callback def delivery_report(err, msg): """Callback for delivery reports.""" if err is not None: print(f"Message delivery failed: {err}") else: print(f"Message delivered to {msg.topic()} [partition {msg.partition()}] at offset {msg.offset()}") # Topic configuration topic = 'streamio-events1' # Simulate shadow traffic generation and sending to Kafka try: print("Starting shadow traffic simulation. Press Ctrl+C to stop.") while True: # Generate payload payload = generate_shadow_payload() # Send payload to Kafka producer.produce( topic=topic, key=str(payload["subscriberId"]), value=str(payload), callback=delivery_report ) # Throttle messages (1500ms) producer.flush() # Ensure messages are sent before throttling time.sleep(1.5) except KeyboardInterrupt: print("\nSimulation stopped.") finally: producer.flush() You can run this from your Workstation, an Azure Function or whatever fits your case. Architecture Deep Dive: The Three-Layer Approach When building AI-powered streaming solutions, thinking in layers helps manage complexity. Let’s break down our architecture into three distinct layers: Bronze Layer: Raw Streaming Data Ingestion At the foundation of our solution lies the raw data ingestion layer. Here’s where our streaming story begins: A web service generates JSON payloads containing subscriber data These events flow through Kafka endpoints Data arrives as structured JSON with key fields like subscriberId, subscriberData, and timestamps Microsoft Fabric’s Eventstream captures this raw streaming data, providing a reliable foundation for our ML pipeline and stores the payloads in Eventhouse Silver Layer: The Intelligence Hub This is where the magic happens. Our silver layer transforms raw data into actionable insights: The EventHouse KQL database stores and manages our streaming data Our ML model, trained using PySpark’s RandomForest classifier, processes the data SynapseML’s Predict API enables seamless model deployment A dedicated pipeline applies our ML model to detect potential phishing attempts Results are stored in Lakehouse Delta Tables for immediate access Gold Layer: Business Value Delivery The final layer focuses on making our insights accessible and actionable: Lakehouse tables store cleaned, processed data Semantic models transform our predictions into business-friendly formats Power BI dashboards provide real-time visibility into phishing detection Real-time dashboards enable immediate response to potential threats The Power of Real-Time ML for Streaming Data What makes this architecture particularly powerful is its ability to: Process data in real-time as it streams in Apply sophisticated ML models without batch processing delays Provide immediate visibility into potential threats Scale automatically as data volumes grow Implementing the Machine Learning Pipeline Let’s dive into how we built and deployed our phishing detection model using Microsoft Fabric’s ML capabilities. What makes this implementation particularly interesting is how it combines traditional ML with streaming data processing. Building the ML Foundation First, let’s look at how we structured the training phase of our machine learning pipeline using PySpark: Training Notebook Connect to Eventhouse Load the data from pyspark.sql import SparkSession # Initialize Spark session (already set up in Fabric Notebooks) spark = SparkSession.builder.getOrCreate() # Define connection details kustoQuery = """ SampleData | project subscriberId, subscriberData, ingestion_time() """ # Replace with your desired KQL query kustoUri = "https://<eventhousedbUri>.z9.kusto.fabric.microsoft.com" # Replace with your Kusto cluster URI database = "Eventhouse" # Replace with your Kusto database name # Fetch the access token for authentication accessToken = mssparkutils.credentials.getToken(kustoUri) # Read data from Kusto using Spark df = spark.read \ .format("com.microsoft.kusto.spark.synapse.datasource") \ .option("accessToken", accessToken) \ .option("kustoCluster", kustoUri) \ .option("kustoDatabase", database) \ .option("kustoQuery", kustoQuery) \ .load() # Show the loaded data print("Loaded data:") df.show() Separate and flag Phishing payload Load it with Spark from pyspark.sql.functions import col, expr, when, udf from urllib.parse import urlparse # Define a UDF (User Defined Function) to extract the domain def extract_domain(url): if url.startswith('http'): return urlparse(url).netloc return None # Register the UDF with Spark extract_domain_udf = udf(extract_domain) # Feature engineering with Spark df = df.withColumn("is_url", col("subscriberData").startswith("http")) \ .withColumn("domain", extract_domain_udf(col("subscriberData"))) \ .withColumn("is_phishing", col("is_url")) # Show the transformed data df.show() Use Spark ML Lib to Train the model Evaluate the Model from pyspark.sql.functions import col from pyspark.ml.feature import Tokenizer, HashingTF, IDF from pyspark.ml.classification import RandomForestClassifier from pyspark.ml import Pipeline from pyspark.ml.evaluation import MulticlassClassificationEvaluator # Ensure the label column is of type double df = df.withColumn("is_phishing", col("is_phishing").cast("double")) # Tokenizer to break text into words tokenizer = Tokenizer(inputCol="subscriberData", outputCol="words") # Convert words to raw features using hashing hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures", numFeatures=100) # Compute the term frequency-inverse document frequency (TF-IDF) idf = IDF(inputCol="rawFeatures", outputCol="features") # Random Forest Classifier rf = RandomForestClassifier(labelCol="is_phishing", featuresCol="features", numTrees=10) # Build the ML pipeline pipeline = Pipeline(stages=[tokenizer, hashingTF, idf, rf]) # Split the dataset into training and testing sets train_data, test_data = df.randomSplit([0.7, 0.3], seed=42) # Train the model model = pipeline.fit(train_data) # Make predictions on the test data predictions = model.transform(test_data) # Evaluate the model's accuracy evaluator = MulticlassClassificationEvaluator( labelCol="is_phishing", predictionCol="prediction", metricName="accuracy" ) accuracy = evaluator.evaluate(predictions) # Output the accuracy print(f"Model Accuracy: {accuracy}") Add Signature to AI Model from mlflow.models.signature import infer_signature from pyspark.sql import Row # Select a sample for inferring signature sample_data = train_data.limit(10).toPandas() # Create a Pandas DataFrame for schema inference input_sample = sample_data[["subscriberData"]] # Input column(s) output_sample = model.transform(train_data.limit(10)).select("prediction").toPandas() # Infer the signature signature = infer_signature(input_sample, output_sample) Run – Publish Model and Log Metric: Accuracy import mlflow from mlflow import spark # Start an MLflow run with mlflow.start_run() as run: # Log the Spark MLlib model with the signature mlflow.spark.log_model( spark_model=model, artifact_path="phishing_detector", registered_model_name="PhishingDetector", signature=signature # Add the inferred signature ) # Log metrics like accuracy mlflow.log_metric("accuracy", accuracy) print(f"Model logged successfully under run ID: {run.info.run_id}") Results and Impact Our implementation achieved: 81.8% accuracy in phishing detection Sub-second prediction times for streaming data Scalable processing of thousands of events per second Yes, that's a good start ! Now let's continue our post by explaining the deployment and operation phase of our ML solution: From Model to Production: Automating the ML Pipeline After training our model, the next crucial step is operationalizing it for real-time use. We’ve implemented one Pipeline with two activities that process our streaming data every 5 minutes: All Streaming Data Notebook # Main prediction snippet from synapse.ml.predict import MLFlowTransformer # Apply ML model for phishing detection model = MLFlowTransformer( inputCols=["subscriberData"], outputCol="predictions", modelName="PhishingDetector", modelVersion=3 ) # Transform and save all predictions df_with_predictions = model.transform(df) df_with_predictions.write.format('delta').mode("append").save("Tables/phishing_predictions") Clean Streaming Data Notebook # Filter for non-phishing data only non_phishing_df = df_with_predictions.filter(col("predictions") == 0) # Save clean data for business analysis non_phishing_df.write.format("delta").mode("append").save("Tables/clean_data") Creating Business Value What makes this architecture particularly powerful is the seamless transition from ML predictions to business insights: Delta Lake Integration: All predictions are stored in Delta format, ensuring ACID compliance Enables time travel and data versioning Perfect for creating semantic models Real-Time Processing: 5-minute refresh cycle ensures near real-time threat detection Automatic segregation of clean vs. suspicious data Immediate visibility into potential threats Business Intelligence Ready: Delta tables are directly compatible with semantic modeling Power BI can connect to these tables for live reporting Enables both historical analysis and real-time monitoring The Power of Semantic Models With our data now organized in Delta tables, we’re ready for: Creating dimensional models for better analysis Building real-time dashboards Generating automated reports Setting up alerts for security teams Real-Time Visualization Capabilities While Microsoft Fabric offers extensive visualization capabilities through Power BI, it’s worth highlighting one particularly powerful feature: direct KQL querying for real-time monitoring. Here’s a glimpse of how simple yet powerful this can be: SampleData | where EventProcessedUtcTime > ago(1m) // Fetch rows processed in the last 1 minute | project subscriberId, subscriberData, EventProcessedUtcTime This simple KQL query, when integrated into a dashboard, provides near real-time visibility into your streaming data with sub-minute latency. The visualization possibilities are extensive, but that’s a topic for another day. Conclusion: Bringing It All Together What we’ve built here is more than just a machine learning model – it’s a complete, production-ready system that: Ingests and processes streaming data in real-time Applies sophisticated ML algorithms for threat detection Automatically segregates clean from suspicious data Provides immediate visibility into potential threats The real power of Microsoft Fabric lies in how it seamlessly integrates these different components. From data ingestion through Eventhouse ad Lakehouse, to ML model training and deployment, to real-time monitoring – everything works together in a unified platform. What’s Next? While we’ve focused on phishing detection, this architecture can be adapted for various use cases: Fraud detection in financial transactions Quality control in manufacturing Customer behavior analysis Anomaly detection in IoT devices The possibilities are endless with our imagination and creativity! Stay tuned for the Git Repo where all the code will be shared ! References Get Started with Microsoft Fabric Delta Lake in Fabric Overview of Eventhouse CloudBlogger: A guide to innovative Apps with MS Fabric315Views0likes0Comments