Data Science

117 Topics

Sentinel Notebook: Guided Hunting - Domain Generation Algorithm (DGA) Detection
Overview This notebook, titled “Guided Hunting - Domain Generation Algorithm (DGA) Detection”, provides a framework for investigating anomalous network activity by identifying domains generated by algorithms, which are often used by malware to evade detection. It integrates data from Log Analytics (DeviceNetworkEvents) and employs Python-based tools and libraries such as “msticpy”, “pandas”, and “scikit-learn” for analysis and visualization. DGA detection is crucial for cybersecurity as it helps identify and mitigate threats like botnets and malware that use dynamically generated domains for command-and-control communication, making it a key component in proactive threat hunting and network defense. Link: https://github.com/GonePhishing402/SentinelNotebooks/blob/main/DGA_Detection_ManagedIdentity.ipynb What is Domain Generation Algorithm and How to Detect it? A Domain Generation Algorithm (DGA) is a technique used by malware to create numerous domain names for communicating with Command and Control (C2) servers, ensuring continued operation even if some domains are blocked. DGAs evade static signature detection by dynamically generating unpredictable domain names, making it hard for traditional security methods to identify and blacklist them. Machine learning models can effectively detect DGAs by analyzing patterns and features in domain names, leveraging techniques like deep learning to adapt to new variants and identify anomalies that static methods may miss. How to Run the Notebook Log in with Managed Identity This notebook requires you to authenticate with a managed identity. The managed identity can be created from the Azure portal and must have the following RBAC: - Sentinel Contributor - Log Analytics Contributor - AzureML Data Scientist - AzureML Compute Operator Replace the [CLIENT_ID] with the client ID for your managed identity. This can be obtained from the Azure portal under Managed Identities -> Select the identity -> Overview. Note: This notebook will still work if you choose to authenticate with just an azure user using the CLI method as well. Import Libraries This code block is used to import the necessary libraries and label the “credential” variable to use the ManagedIdentityCredential() library. Setup msticpyconfig.yaml This section just pulls the msticpyconfig.yaml to use later on in the notebook. Ensure this is setup before running this notebook and in your current working directory. Setup QueryProvider The query provider is setup for Azure Sentinel. This does not need to be changed unless you want to use a different query provider from msticpy. Connect to Sentinel This code block is used to connect to Sentinel with the managed identity to the workspace specified in your msticpyconfig.yaml. You should see a “connected” after running this code block. DGA Model Creation This code block is designed to use CountVectorizer() and MultinomialNB() to create a model called dga_model.joblib and save it to the path specified in the “model_filename” variable. It is important to change this path specific to your environment. You must give the algorithm data to learn from in order to be effective. Download the domain.csv located here and upload to your current working directory on Azure Machine Learning Workspace: DGA_Detection/data/domain.csv at master · hmaccelerate/DGA_Detection You must also change line 10 in this code block to have the “labeled_domains_df” point to the domain.csv in your environment. Once you run the code block, you should see the model saved and the model accuracy. This number will vary depending on the data you are giving it. Apply dga_model.joblib to Sentinel Data This code block uses the model that we generated in the previous block and runs it against our data we specific in the “query” variable. This is using domain names from the “DeviceNetworkEvents” table in MDE events. The “parse_json” was used in our KQL to extract the appropriate sub-field needed for this search. When this model is run against the data, it will try to determine if any domains in our environment are associated with domain generation algorithms (DGA). If the “IsDGA” column contains a value of “True”, the model has determined that the characteristics of that domain matches a DGA. Here is what the output will look like: Output All Results to CSV This code block will output all the results above to a CSV called “dgaresults.csv”. Change the “output_path” variable to match your environment. Filter DGA Results to CSV This code block will output just the DGA results above to a CSV called “dgaresults2.csv”. Change the “output_path” variable to match your environment. How to Investigate these Results Further You can take the domain results that match DGA and find the correlating IP to see if it matches any threat intelligence. Correlate findings with other security logs, such as firewall or endpoint data, to uncover patterns of malicious behavior. This approach helps pinpoint potential threats and enables proactive mitigation. We can also create logic apps to automate follow-on analysis of these notebooks. This will be covered in a later blog.
jaredgraff
May 12, 2025 Place Core Infrastructure and Security Blog
1.2KViews
2likes
0Comments
The (Amateur) Data Science Body of Knowledge
Whether you're interested in becoming a Data Scientist, a Data Engineer, or just to work with the techniques they use, In this article, I'll help you find resources for whichever path you choose for yourself. At the very least, you'll gain valuable insight to the Data Science field, and how you can use the technologies and knowledge to create a very compelling solution.
BuckWoodyMSFT
Mar 17, 2025 Place Data Architecture Blog
4.3KViews
3likes
2Comments
A Data Science Process, Documentation, and Project Template You Can Use in Your Solutions
In most of the Data Science and AI articles, blogs and papers I read, the focus is on a particular algorithm or math angle to solving a puzzle. And that's awesome - we need LOTS of those. However, even if you figure those out, you have to use them somewhere. You have to run that on some sort of cloud or local system, you have to describe what you're doing, you have to distribute an app, import some data, check a security angle here and there, communicate with a team....you know, DevOps. In this article, I'll show you a complete process, procedures, and free resources to manage your Data Science project from beginning to end.
BuckWoodyMSFT
Nov 15, 2024 Place Data Architecture Blog
12KViews
0likes
1Comment
Cross Subscription Database Restore for SQL Managed Instance Database with TDE enabled using ADF
Our customers require daily refreshes of their production database to the non-production environment. The database, approximately 600GB in size, has Transparent Data Encryption (TDE) enabled in production. Disabling TDE before performing a copy-only backup is not an option, as it would take hours to disable and re-enable. To meet customer needs, we use a customer-managed key stored in Key Vault. Azure Data Factory is then utilized to schedule and execute the end-to-end database restore process.
MUA
Nov 13, 2024 Place Data Architecture Blog
1.4KViews
1like
0Comments
Create and Deploy Azure SQL Managed Instance Database Project integrated with Azure DevOps CICD
Integrating database development into continuous integration and continuous deployment (CI/CD) workflows is the best practice for Azure SQL managed instance database projects. Automating the process through a deployment pipeline is always recommended. This automation ensures that ongoing deployments seamlessly align with your continuous local development efforts, eliminating the need for additional manual intervention. This article guides you through the step-by-step process of creating a new azure SQL managed instance database project, adding objects to it, and setting up a CICD deployment pipeline using GitHub actions. Prerequisites Visual Studio 2022 Community, Professional, or Enterprise Azure DevOps environment Contributor permission within Azure DevOps Con Sysadmin server roles within Azure SQL managed instance Step 01 Open Visual Studio, click Create a new project Search for SQL Server, select SQL Server Database Project Provide the project name, folder path to store .dacpac file, create Step 2 Import the database schema from an existing database. Right-click on the project and select 'Import'. You will see three options: Data-Tier Application (.dacpac), Database, and Script (.sql). In this case, I am using the Database option and importing form Azure SQL managed instance To proceed, you will encounter a screen that allows you to provide a connection string. You can choose to select a database from local, network, or Azure sources, depending on your needs. Alternatively, you can directly enter the server name, authentication type, and credentials to connect to the database server. Once connected, select the desired database to import and include in your project. Step 3 Configure the import settings. There are several options available, each designed to optimize the process and ensure seamless integration. Import application-scoped objects: will import tables, views, stored procedures likewise objects. Imports reference logins: login related imports. Import Permissions: will import related permissions. Import database settings: will import database settings. Folder Structure: option to choose folder structure in your project for database objects. Maximum files per folder: limit number files per folder. Click Start which will show the progress window as shown. Click “Finish” to complete the step. Step 4 To ensure a smooth deployment process, start by incorporating any necessary post-deployment scripts into your project. These scripts are crucial for executing tasks that must be completed after the database has been deployed, such as performing data migrations or applying additional configurations. To compile your database project in Visual Studio, simply right-click on the project and select 'Build'. This action will compile the project and generate a sqlproj file, ensuring that your database project is ready for deployment. When building the project, you might face warnings and errors that need careful debugging and resolution to ensure the successful creation of the sqlproj file. Common issues include missing references, syntax errors, or configuration mismatches. After addressing all warnings and errors, rebuild the project to create the sqlproj file. This file contains the database schema and is essential for deployment. Ensure that any post-deployment scripts are seamlessly integrated into the project. These scripts will run after the database deployment, performing any additional necessary tasks. To ensure all changes are tracked and can be deployed through your CI/CD pipeline, commit the entire codebase, including the sqlproj file and any post-deployment scripts, to your branch in Azure DevOps. This step guarantees that every modification is documented and ready for deployment. Step 5 Create Azure DevOps pipeline to deploy database project Step 6 To ensure the YAML file effectively builds the SQL project and publishes the DACPAC file to the artifact folder of the pipeline, include the following stages. stages: - stage: Build jobs: - job: BuildJob displayName: 'Build Stage' steps: - task: VSBuild@1 displayName: 'Build SQL Server Database Project' inputs: solution: $(solution) platform: $(buildPlatform) configuration: $(buildConfiguration) - task: CopyFiles@2 inputs: SourceFolder: '$(Build.SourcesDirectory)' Contents: '**\*.dacpac' TargetFolder: '$(Build.ArtifactStagingDirectory)' flattenFolders: true - task: PublishPipelineArtifact@1 inputs: targetPath: '$(Build.ArtifactStagingDirectory)' artifact: 'dacpac' publishLocation: 'pipeline' - stage: Deploy jobs: - job: Deploy displayName: 'Deploy Stage' pool: name: 'Pool' steps: - task: DownloadPipelineArtifact@2 inputs: buildType: current artifact: 'dacpac' path: '$(Build.ArtifactStagingDirectory)' - task: PowerShell@2 displayName: 'upgrade sqlpackage' inputs: targetType: 'inline' script: | # use evergreen or specific dacfx msi link below wget -O DacFramework.msi "https://aka.ms/dacfx-msi" msiexec.exe /i "DacFramework.msi" /qn - task: SqlAzureDacpacDeployment@1 inputs: azureSubscription: '$(ServiceConnection)' AuthenticationType: 'servicePrincipal' ServerName: '$(ServerName)' DatabaseName: '$(DatabaseName)' deployType: 'DacpacTask' DeploymentAction: 'Publish' DacpacFile: '$(Build.ArtifactStagingDirectory)/*.dacpac' IpDetectionMethod: 'AutoDetect' Step 7 To execute any Pre and Post SQL script during deployment, you need to update the SQL package, obtain an access token, and then run the scripts. # install all necessary dependencies onto the build agent - task: PowerShell@2 name: install_dependencies inputs: targetType: inline script: | # Download and Install Azure CLI write-host "Installing AZ CLI..." Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi Start-Process msiexec.exe -Wait -ArgumentList "/I AzureCLI.msi /quiet" Remove-Item .\AzureCLI.msi write-host "Done." # prepend the az cli path for future tasks in the pipeline write-host "Adding AZ CLI to PATH..." write-host "##vso[task.prependpath]C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin" $currentPath = (Get-Item -path "HKCU:\Environment" ).GetValue('Path', '', 'DoNotExpandEnvironmentNames') if (-not $currentPath.Contains("C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin")) { setx PATH ($currentPath + ";C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin") } if (-not $env:path.Contains("C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin")) { $env:path += ";C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin" } write-host "Done." # install necessary PowerShell modules write-host "Installing necessary PowerShell modules..." Get-PackageProvider -Name nuget -force if ( -not (Get-Module -ListAvailable -Name Az.Resources) ) { install-module Az.Resources -force } if ( -not (Get-Module -ListAvailable -Name Az.Accounts) ) { install-module Az.Accounts -force } if ( -not (Get-Module -ListAvailable -Name SqlServer) ) { install-module SqlServer -force } write-host "Done." - task: AzureCLI@2 name: run_sql_scripts inputs: azureSubscription: '$(ServiceConnection)' scriptType: ps scriptLocation: inlineScript inlineScript: | # get access token for SQL $token = az account get-access-token --resource https://database.windows.net --query accessToken --output tsv # configure OELCore database Invoke-Sqlcmd -AccessToken $token -ServerInstance '$(ServerName)' -Database '$(DatabaseName)' -inputfile '.\pipelines\config-db.dev.sql'
MUA
Nov 13, 2024 Place Data Architecture Blog
974Views
2likes
1Comment
Automated Continuous integration and delivery – CICD in Azure Data Factory
In Azure Data Factory, continuous integration and delivery (CI/CD) involves transferring Data Factory pipelines across different environments such as development, test, UAT and production. This process leverages Azure Resource Manager templates to store the configurations of various ADF entities, including pipelines, datasets, and data flows. This article provides a detailed, step-by-step guide on how to automate deployments using the integration between Data Factory and Azure Pipelines. Prerequisite Azure database factory, Setup of multiple ADF environments for different stages of development and deployment. Azure DevOps, the platform for managing code repositories, pipelines, and releases. Git Integration, ADF connected to a Git repository (Azure Repos or GitHub). The ADF contributor and Azure DevOps build administrator permission is required Step 1 Establish a dedicated Azure DevOps Git repository specifically for Azure Data Factory within the designated Azure DevOps project. Step 2 Integrate Azure Data Factory (ADF) with the Azure DevOps Git repositories that were created in the first step. Step 3 Create developer feature branch with the Azure DevOps Git repositories that were created in the first step. Select the created developer feature branch from ADF to start the development. Step 4 Begin the development process. For this example, I create a test pipeline “pl_adf_cicd_deployment_testing” and save all. Step 5 Submit pull request from developer feature branch to main Step 6 Once the pull requests are merged from the developer's feature branch into the main branch, proceed to publish the changes from the main branch to the ADF Publish branch. The ARM templates (JSON files) will get up-to date, they will be available in the adf-publish branch within the Azure DevOps ADF repository. Step 7 ARM templates can be customized to accommodate various configurations for Development, Testing, and Production environments. This customization is typically achieved through the ARMTemplateParametersForFactory.json file, where you specify environment-specific values such as link service, environment variables, managed link and etc. For example, in a Testing environment, the storage account might be named teststorageaccount, whereas in a Production environment, it could be prodstorageaccount. To create environment specific parameters file Azure DevOps ADF Git repo > main branch > linkedTemplates folder > Copy “ARMTemplateParametersForFactory.json” Create parameters_files folder under root path Copy paste ARMTemplateParametersForFactory.json inside parameters_files folder and rename to specify environment for example, prod-adf-parameters.json Update each environment specific parameter values Step 8 To create an Azure DevOps CICD pipeline, use the following code and ensure you update the variables to match your environment before running it. This will allow you to deploy from one ADF environment to another, such as from Test to Production. name: Release-$(rev:r) trigger: branches: include: - adf_publish variables: azureSubscription: <Your subscription> SourceDataFactoryName: <Test ADF> DeployDataFactoryName: <PROD ADF> DeploymentResourceGroupName: <PROD ADF RG> stages: - stage: Release displayName: Release Stage jobs: - job: Release displayName: Release Job pool: vmImage: 'windows-2019' steps: - checkout: self # Stop ADF Triggers - task: AzurePowerShell@5 displayName: Stop Triggers inputs: azureSubscription: '$(azureSubscription)' ScriptType: 'InlineScript' Inline: | $triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName "$(DeployDataFactoryName)" -ResourceGroupName "$(DeploymentResourceGroupName)" if ($triggersADF.Count -gt 0) { $triggersADF | ForEach-Object { Stop-AzDataFactoryV2Trigger -ResourceGroupName "$(DeploymentResourceGroupName)" -DataFactoryName "$(DeployDataFactoryName)" -Name $_.name -Force } } azurePowerShellVersion: 'LatestVersion' # Deploy ADF using ARM Template and UAT JSON parameters - task: AzurePowerShell@5 displayName: Deploy ADF inputs: azureSubscription: '$(azureSubscription)' ScriptType: 'InlineScript' Inline: | New-AzResourceGroupDeployment ` -ResourceGroupName "$(DeploymentResourceGroupName)" -TemplateFile "$(System.DefaultWorkingDirectory)/$(SourceDataFactoryName)/ARMTemplateForFactory.json" -TemplateParameterFile "$(System.DefaultWorkingDirectory)/parameters_files/prod-adf-parameters.json" -Mode "Incremental" azurePowerShellVersion: 'LatestVersion' # Restart ADF Triggers - task: AzurePowerShell@5 displayName: Restart Triggers inputs: azureSubscription: '$(azureSubscription)' ScriptType: 'InlineScript' Inline: | $triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName "$(DeployDataFactoryName)" -ResourceGroupName "$(DeploymentResourceGroupName)" if ($triggersADF.Count -gt 0) { $triggersADF | ForEach-Object { Start-AzDataFactoryV2Trigger -ResourceGroupName "$(DeploymentResourceGroupName)" -DataFactoryName "$(DeployDataFactoryName)" -Name $_.name -Force } } azurePowerShellVersion: 'LatestVersion' Triggering the Pipeline The Azure DevOps CI/CD pipeline is designed to automatically trigger whenever changes are merged into the main branch. Additionally, it can be initiated manually or set to run on a schedule for periodic deployments, providing flexibility and ensuring that updates are deployed efficiently and consistently. Monitoring and Rollback To monitor the pipeline execution, utilize the Azure DevOps pipeline dashboards. In case a rollback is necessary, you can revert to previous versions of the ARM templates or pipelines using Azure DevOps and redeploy the changes.
MUA
Nov 13, 2024 Place Data Architecture Blog
1.7KViews
3likes
1Comment
Azure Synapse: A Step-by-Step Beginner’s Guide
An easy-to-follow guide for beginners to learn Azure Synapse, covering everything from setting up your workspace to integrating data and running analytics.
KhalidAbdelaty
Oct 25, 2024 Place Educator Developer Blog
57KViews
3likes
0Comments
Responsible Synthetic Data Creation for Fine-Tuning with RAFT Distillation
This blog will explore the process of crafting responsible synthetic data, evaluating it, and using it for fine-tuning models. We’ll also dive into Azure AI’s RAFT distillation recipe, a novel approach to generating synthetic datasets using Meta’s Llama 3.1 model and UC Berkeley’s Gorilla project.
Sharda_Kaur
Oct 15, 2024 Place Educator Developer Blog
2.1KViews
2likes
0Comments
Evaluating Language Models with Azure AI Studio: A Step-by-Step Guide
Evaluating language models is a crucial step in achieving this goal. By assessing the performance of language models, we can identify areas of improvement, optimize their performance, and ensure that they are reliable and accurate. However, evaluating language models can be a challenging task, requiring significant expertise and resources.
Sharda_Kaur
Sep 16, 2024 Place Educator Developer Blog
6.4KViews
1like
0Comments
Creating a Kubernetes Application for Azure SQL Database
Modern application development has multiple challenges. From selecting a "stack" of front-end through data storage and processing from several competing standards, through ensuring the highest levels of security and performance, developers are required to ensure the application scales and performs well and is supportable on multiple platforms. For that last requirement, bundling up the application into Container technologies such as Docker and deploying multiple Containers onto the Kubernetes platform is now de rigueur in application development. In this example, we'll explore using Python, Docker Containers, and Kubernetes - all running on the Microsoft Azure platform. Using Kubernetes means that you also have the flexibility of using local environments or even other clouds for a seamless and consistent deployment of your application, and allows for multi-cloud deployments for even higher resiliency. We'll also use Microsoft Azure SQL Database for a service-based, scalable, highly resilient and secure environment for the data storage and processing. In fact, in many cases, other applications are often using Microsoft Azure SQL Database already, and this sample application can be used to further leverage and enrich that data. This example is fairly comprehensive in scope, but uses the simplest application, database and deployment to illustrate the process. You can adapt this sample to be far more robust, even including leveraging the latest technologies for the returned data. It's a useful learning tool to create a pattern for other applications.
BuckWoodyMSFT
Aug 27, 2024 Place Data Architecture Blog
5.7KViews
0likes
0Comments