This article provides a hands-on guide for IT administrators to configure an Azure Stream Analytics Job running in a Dedicated Stream Analytics Cluster with Managed Private Endpoints for secure Blob Storage connectivity. It uses Managed Identity for authentication, ensures private networking, and includes a sample job for testing with step-by-step instructions and best practices. This approach helps organizations meet zero-trust and compliance requirements by eliminating public network exposure.
Introduction
Modern data pipelines demand security and compliance. Azure Stream Analytics supports Dedicated Clusters and Managed Private Endpoints, enabling jobs to run in isolated environments and connect privately to Azure resources without exposing them to the public internet.
Key benefits:
- Private connectivity via Azure Private Link.
- Managed Identity for secure, keyless authentication.
- Dedicated clusters for predictable performance and isolation.
Architecture Overview
Prerequisites
- Azure subscription.
- Blob Storage with Public Network Access disabled and two containers configured for Input and Output.
- User Assigned Managed Identity created.
Implementation Steps
1. Assign Managed Identity
- Navigate to Storage Account → Access Control (IAM).
- Add role assignment:
- Role: Storage Blob Data Contributor
- Scope: Storage Account
- Assign to: Your User Assigned Managed Identity.
Note: In this example, the role assignment is applied at the Storage Account level, which grants access to all containers within the account. If needed, you can scope the assignment to an individual container for more granular control.
2. Configure Stream Analytics Job
-
In the Azure portal, go to Stream Analytics Jobs and create a new job if one does not already exist:
- Name: e.g., stream-job
- Hosting environment: Cloud
- Streaming units: 1
- Managed Identity: Enable Managed Identity and assign the User Assigned Managed Identity to the job.
3. Configure Stream Analytics Cluster
The Stream Analytics Cluster resource provides dedicated compute for running multiple jobs securely and at scale.
- In Azure Portal, go to Stream Analytics Clusters.
- Select your cluster or create a new one:
- Name: e.g., stream-cluster
- Streaming unit details: 12
- Location: Same as your Blob Storage and Stream Analytics Job
4. Add Stream Analytics Job to Cluster
Once the Stream Analytics Cluster resource is provisioned, from Stream Analytics Cluster → Settings → Stream Analytics Jobs → add the Stream Analytics Job.
5. Add Managed Private Endpoint
From Stream Analytics Cluster → Settings → Managed Private Endpoints → add a new Managed Private Endpoint.
-
- Select Blob Storage as the target resource.
- Approve the Private Endpoint connection on the target resource.
- Ensure Managed Private Endpoint setup is completed.
6. Configure Blob Input
- Input alias: InputStream
- Container: input-container
- Event serialization: JSON
- Encoding: UTF-8
7. Configure Blob Output
- Output alias: BlobOutput
- Container: output-container
- Path pattern: output/{date}/{time}
- Serialization: JSON
8. Prepare Sample Input
- Create sample-input.json:
[
{
"DeviceId": "sensor-001",
"Temperature": 28.5,
"Humidity": 65,
"EventEnqueuedUtcTime": "2025-10-30T10:00:00Z"
},
{
"DeviceId": "sensor-002",
"Temperature": 30.2,
"Humidity": 60,
"EventEnqueuedUtcTime": "2025-10-30T10:01:00Z"
},
{
"DeviceId": "sensor-001",
"Temperature": 29.0,
"Humidity": 64,
"EventEnqueuedUtcTime": "2025-10-30T10:02:00Z"
},
{
"DeviceId": "sensor-003",
"Temperature": 27.8,
"Humidity": 70,
"EventEnqueuedUtcTime": "2025-10-30T10:03:00Z"
}
]
9. Define Query
- Use Test Query in the portal for quick validation.
SELECT
DeviceId,
AVG(Temperature) AS AvgTemperature,
COUNT(*) AS ReadingCount,
System.Timestamp AS WindowEndTime
INTO
BlobOutput
FROM
InputStream TIMESTAMP BY EventEnqueuedUtcTime
GROUP BY
DeviceId,
TumblingWindow(minute, 5)
10. Start and Validate
- Start the job.
- Upload the Sample data to Input Blob: input-container/sample-input.json
- Monitor the job:
- Input Events
- Output Events
- Check output files in output-container.
Troubleshooting
- Input Events = 0 → Verify path pattern and folder structure.
- Role assignment → Ensure required role assignment is configured at Storage Account level/individual Container level.
- Private Connectivity issues → Ensure Managed Private Endpoint configuration is complete and verify that the 'Test connection' for both Input and Output succeeds.
Automation with Terraform
Here’s a snippet to automate key steps:
# Create a Resource Group
resource "azurerm_resource_group" "example" {
name = "asa-rg"
location = "Central US"
}
# Create a Stream Analytics Cluster
resource "azurerm_stream_analytics_cluster" "example" {
name = "asa-cluster1"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
streaming_capacity = 36
}
# Create a User Assigned Managed Identity
resource "azurerm_user_assigned_identity" "example" {
location = azurerm_resource_group.example.location
name = "asa-job-identity"
resource_group_name = azurerm_resource_group.example.name
}
# Assign Role to Managed Identity to access Storage Account
resource "azurerm_role_assignment" "example" {
scope = azurerm_storage_account.example.id
role_definition_name = "Storage Blob Data Contributor"
principal_id = azurerm_user_assigned_identity.example.principal_id
}
# Create a Stream Analytics Job using User Assigned Managed Identity
resource "azurerm_stream_analytics_job" "example" {
name = "asa-job1"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
compatibility_level = "1.2"
data_locale = "en-GB"
events_late_arrival_max_delay_in_seconds = 60
events_out_of_order_max_delay_in_seconds = 50
events_out_of_order_policy = "Adjust"
output_error_policy = "Drop"
streaming_units = 3
sku_name = "Standard"
stream_analytics_cluster_id = azurerm_stream_analytics_cluster.example.id
type = "Cloud"
identity {
type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.example.id]
}
content_storage_policy = "SystemAccount"
transformation_query = <<QUERY
SELECT
DeviceId,
AVG(Temperature) AS AvgTemperature,
COUNT(*) AS ReadingCount,
System.Timestamp AS WindowEndTime
INTO
BlobOutput
FROM
InputStream TIMESTAMP BY EventEnqueuedUtcTime
GROUP BY
DeviceId,
TumblingWindow(minute, 5)
QUERY
depends_on = [azurerm_role_assignment.example]
}
# Create a Managed Private Endpoint for Stream Analytics Job to access Storage Account
resource "azurerm_stream_analytics_managed_private_endpoint" "example" {
resource_group_name = azurerm_resource_group.example.name
stream_analytics_cluster_name = azurerm_stream_analytics_cluster.example.name
name = "asa-mpe-storage"
target_resource_id = azurerm_storage_account.example.id
subresource_name = "blob"
}
# Configure Input Blob and Output Blob for Stream Analytics Job using MSI authentication mode
resource "azurerm_stream_analytics_stream_input_blob" "example" {
name = "InputStream"
resource_group_name = azurerm_resource_group.example.name
stream_analytics_job_name = azurerm_stream_analytics_job.example.name
storage_account_name = azurerm_storage_account.example.name
storage_account_key = azurerm_storage_account.example.primary_access_key
storage_container_name = azurerm_storage_container.input.name
path_pattern = ""
date_format = "yyyy-MM-dd"
time_format = "HH"
authentication_mode = "Msi"
serialization {
type = "Json"
encoding = "UTF8"
}
}
resource "azurerm_stream_analytics_output_blob" "example" {
name = "BlobOutput"
resource_group_name = azurerm_resource_group.example.name
stream_analytics_job_name = azurerm_stream_analytics_job.example.name
date_format = "yyyy-MM-dd"
time_format = "HH"
storage_account_name = azurerm_storage_account.example.name
storage_account_key = azurerm_storage_account.example.primary_access_key
storage_container_name = azurerm_storage_container.output.name
path_pattern = "output/{date}/{time}"
authentication_mode = "Msi"
serialization {
type = "Json"
encoding = "UTF8"
format = "LineSeparated"
}
}
# Create a Stream Analytics Job Schedule
resource "azurerm_stream_analytics_job_schedule" "example" {
stream_analytics_job_id = azurerm_stream_analytics_job.example.id
start_mode = "JobStartTime"
depends_on = [azurerm_stream_analytics_stream_input_blob.example, azurerm_stream_analytics_output_blob.example]
}
Disclaimer
The information provided in this article is for educational and informational purposes only. While the steps and configurations described are based on best practices for Azure Stream Analytics, they should be validated in your own environment before implementation. Microsoft services and features may change over time; always refer to the official https://learn.microsoft.com/azure/ for the latest guidance. The author assumes no responsibility for any issues arising from the use of this content in production environments.
References
- https://learn.microsoft.com/en-gb/azure/stream-analytics/
- https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/stream_analytics_cluster