open-source
9 TopicsUse cases of Advanced Network Observability for your Azure Kubernetes Service clusters
The blog explores the use cases of Advanced Network Observability for Azure Kubernetes Service (AKS) clusters. It introduces the Advanced Network Observability feature, which brings Hubble's control plane to both Cilium and Non-Cilium Linux data planes. This feature provides deep insights into containerized workloads, enabling precise detection and root-cause analysis of network-related issues in Kubernetes clusters. The document also includes customer scenarios that demonstrate the benefits of Advanced Network Observability, such as DNS metrics, network policy drops at the pod level, and traffic imbalance for pods within a workload4KViews4likes2CommentsIntroducing Container Network Logs with Advanced Container Networking Services for AKS
Overview of container network logs Container network logs offer a comprehensive way to monitor network traffic in AKS clusters. Two modes of support, stored-logs and on-demand logs, provides debugging flexibility with cost optimization. The on-demand mode provides a snapshot of logs with queries and visualization with Hubble CLI UI for specific scenarios and does not use log storage to persist the logs. The stored-logs mode when enabled continuously collects and persists logs based on user-defined filters. Logs can be stored either in Azure Log Analytics (managed) or locally (unmanaged). Managed storage: Logs are forwarded to Azure Log Analytics for secure, scalable, and compliant storage. This enables advanced analytics, anomaly detection, and historical trend analysis. Both basic and analytics table plans are supported for storage. Unmanaged storage: Logs are stored locally on the host nodes under /var/log/acns/hubble. These logs are rotated automatically at 50 MB to manage storage efficiently. These logs can be exported to external logging systems or collectors for further analysis. Use cases Connectivity monitoring: Identify and visualize how Kubernetes workloads communicate within the cluster and with external endpoints, helping to resolve application connectivity issues efficiently. Troubleshooting network errors: Gain deep granular visibility into dropped packets, misconfigurations, or errors with details on where and why errors are occurring (TCP/UDP, DNS, HTTP) for faster root cause analysis. Security policy enforcement: Detect and analyze suspicious traffic patterns to strengthen cluster security and ensure regulatory compliance. How it works Container network logs use eBPF technology with Cilium to capture network flows from AKS nodes. Log collection is disabled by default. Users can enable log collection by defining custom resources (CRs) to specify the types of traffic to monitor, such as namespaces, pods, services, or protocols. The Cilium agent collects and processes this traffic, storing logs in JSON format. These logs can either be retained locally or integrated with Azure Monitoring for long-term storage and advanced analytics and visualization with Azure managed Grafana. Fig1: Container network logs overview If using managed storage, users will enable Azure monitor log collection using Azure CLI or ARM templates. Here’s a quick example of enabling container network logs on Azure monitor using the CLI: az aks enable-addons -a monitoring --enable-high-log-scale-mode -g $RESOURCE_GROUP -n $CLUSTER_NAME az aks update --enable-acns \ --enable-retina-flow-logs \ -g $RESOURCE_GROUP \ -n $CLUSTER_NAME Key benefits Faster issue resolution: Detailed logs enable quick identification of connectivity and performance issues. Operational efficiency: Advanced filtering reduces data management overhead. Enhanced application reliability: Proactive monitoring ensures smoother operations. Cost optimization: Customized logging scopes minimize storage and data ingestion costs. Streamlined compliance: Comprehensive logs support audits and security requirements. Observing logs in Azure managed Grafana dashboards Users can visualize container network logs in Azure managed Grafana dashboards, which simplify monitoring and analysis: Flow logs dashboard: View internal communication between Kubernetes workloads. This dashboard highlights metrics such as total requests, dropped packets, and error rates. Error logs dashboard: Easily zoom in only on the logs which show errors for faster log parsing. Service dependency graph: Visualize relationships between services, detect bottlenecks, and optimize network flows. These dashboards provide filtering options to isolate specific logs, such as DNS errors or traffic patterns, enabling efficient root cause analysis. Summary statistics and top-level metrics further enhance understanding of cluster health and activity. Fig 2: Azure managed Grafana dashboard for container network logs Conclusion Container network logs for AKS offer a powerful and cost optimized way to monitor and analyze network activity, enhance troubleshooting, security, and ensure compliance. To get started, enable Advanced Container Networking Services in your AKS cluster and configure custom resources for logging. Visualize your logs in Grafana dashboards and Azure Log Analytics to unlock actionable insights. Learn more here.1.1KViews3likes0CommentsSecuring Microservices with Cilium and Istio
The adoption of Kubernetes and containerized applications is booming, leading to new challenges in visibility and security. As the landscape of cloud-native applications is rapidly evolving so are the number of sophisticated attacks targeting containerized workloads. Traditional tools often fall short in tracking the usage and traffic flows within these applications. The immutable nature of container images and the short lifespan of containers further necessitate addressing vulnerabilities early in the delivery pipeline. Comprehensive Security Controls in Kubernetes Microsoft Azure offers a range of security controls to ensure comprehensive protection across various layers of the Kubernetes environment. These controls include but are not limited to: Cluster Security: Features such as private clusters, managed cluster identity, and API server authorized ranges enhance security at the cluster level. Node and Pod Security: Hardened bootstrapping, confidential nodes, and pod sandboxing are implemented to secure the nodes and pods within a cluster. Network Security: Advanced Container Networking Services and Cilium Network policies offer granular control over network traffic. Authentication and Authorization: Azure Policy in-cluster enforcement, Entra authentication, and Istio mTLS and authorization policies provide robust identity and access management. Image Scanning: Microsoft Defender for Cloud provides both image and runtime scanning to identify vulnerabilities and threats. Let’s highlight how you can secure micro services while scaling your applications running on Azure Kubernetes Service (AKS) using service mesh for robust traffic management, and network policies for security. Micro segmentation with Network Policies Micro segmentation is crucial for enhancing security within Kubernetes clusters, allowing for the isolation of workloads and controlled traffic between microservices. Azure CNI by Cilium leverages eBPF to provide high-performance networking, security, and observability features. It dynamically inserts eBPF bytecode into the Linux kernel, offering efficient and flexible control over network traffic. Cilium Network Policies enable network isolation within and across Kubernetes clusters. Cilium also provides an identity-based security model, offering Layer 7 (L7) traffic control, and integrates deep observability for L4 to L7 metrics in Kubernetes clusters. A significant advantage of using Azure CNI based on Cilium is its seamless integration with existing AKS environments, requiring minimal modifications to your infrastructure. Note that Cilium Clusterwide Network Policy (CCNP) is not supported at the time of writing this blog post. FQDN Filtering with Advanced Container Networking Services (ACNS) Traditional IP-based policies can be cumbersome to maintain. ACNS allows for DNS-based policies, providing a more granular and user-friendly approach to managing network traffic. This is supported only with Azure CNI powered by Cilium and includes a security agent DNS proxy for FQDN resolution even during upgrades. It’s worth noting that with Cilium’s L7 enforcement, you can control traffic based on HTTP methods, paths, and headers, making it ideal for APIs, microservices, and services that use protocols like HTTP, gRPC, or Kafka. At the time of writing this blog, this capability is not supported via ACNS. More on this in a future blog! AKS Istio Add-On: Mutual TLS (mTLS) and Authorization Policy Istio enhances the security of microservices through its built-in features, including mutual TLS (mTLS) and authorization policies. The Istiod control plane, acting as a certificate authority, issues X.509 certificates to the Envoy sidecar proxies via the Secret Discovery Service (SDS). Integration with Azure Key Vault allows for secure management of root and intermediate certificates. The PeerAuthentication Custom Resource in Istio controls the traffic accepted by workloads. By default, it is set to PERMISSIVE to facilitate migration but can be set to STRICT to enforce mTLS across the mesh. Istio also supports granular authorization policies, allowing for control over IP blocks, namespaces, service accounts, request paths, methods, and headers. The Istio add-on also supports integration with Azure Key Vault (AKV) and the AKV Secrets Store CSI Driver Add-On for plug-in CA certificates, where the root CA lives on a secure machine offline, and the intermediate certs for the Istiod control plane are synced to the cluster by the CSI Driver Add-On. Additionally, certificates for the Istio ingress gateway for TLS termination or SNI passthrough can also be stored in AKV. Defense-In-Depth with Cilium, ACNS and Istio Combining the capabilities of Cilium's eBPF technologies through ACNS and AKS managed Istio addon, AKS provides a defense-in-depth strategy for securing Kubernetes clusters. Azure CNI's Cilium Network Policies and ACNS FQDN filtering enforce Pod-to-Pod and Pod-to-egress policies at Layer 3 and 4, while Istio enforces STRICT mTLS and Layer 7 authorization policies. This multi-layered approach ensures comprehensive security coverage across all layers of the stack. Now, let’s highlight the key steps in achieving this: Step 1: Create an AKS Cluster with Azure CNI (by Cilium), ACNS and Istio Addon enabled. az aks create \ --resource-group $RESOURCE_GROUP \ --name $CLUSTER_NAME \ --location $LOCATION \ --kubernetes-version 1.30.0 \ --node-count 3 \ --node-vm-size standard_d16_v3 \ --enable-managed-identity \ --network-plugin azure \ --network-dataplane cilium \ --network-plugin-mode overlay \ --pod-cidr 192.168.0.0/16 \ --enable-asm \ --enable-acns \ --generate-ssh-keys Step 2: Create Cilium FQDN policy that allows egress traffic to google.com while blocking traffic to httpbin.org. Sample Policy (fqdn-filtering-policy.yaml): apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: sleep-network-policy namespace: foo spec: endpointSelector: matchLabels: app: sleep egress: - toFQDNs: - matchPattern: "*.google.com" - toEndpoints: - matchLabels: "k8s:io.kubernetes.pod.namespace": foo "k8s:app": helloworld - toEndpoints: - matchLabels: "k8s:io.kubernetes.pod.namespace": kube-system "k8s:k8s-app": kube-dns toPorts: - ports: - port: "53" protocol: ANY Apply policy: kubectl apply -f fqdn-filtering-policy.yaml Step 3: Create an Istio deny-by-default AuthorizationPolicy. This denies all requests across the mesh unless specifically authorized with an “ALLOW” policy. Sample Policy (istio-deny-all-authz.yaml): apiVersion: security.istio.io/v1 kind: AuthorizationPolicy metadata: name: allow-nothing namespace: aks-istio-system spec: {} Apply policy: kubectl apply -f istio-deny-all-authz.yaml Step 4: Deploy an Istio L7 AuthorizationPolicy to explicitly allow traffic to the “sample” pod in namespace foo for http “GET” requests. Sample Policy (istio-L7-allow-policy.yaml): apiVersion: security.istio.io/v1 kind: AuthorizationPolicy metadata: name: allow-get-requests namespace: foo spec: selector: matchLabels: app: sample action: ALLOW rules: - to: - operation: methods: [“GET”] Apply policy: kubectl apply -f istio-L7-allow-policy.yaml Step 5: Deploy an Istio strict mTLS PeerAuthentication Resource to enforce that all workloads in the mesh only accept Istio mTLS traffic. Sample PeerAuthentication (istio-peerauth.yaml): apiVersion: security.istio.io/v1 kind: PeerAuthentication metadata: name: strict-mtls namespace: aks-istio-system spec: mtls: mode: STRICT Apply policy: kubectl apply -f istio-peerauth.yaml These examples demonstrate how you can manage traffic to specific FQDNs and enforce L7 authorization rules in your AKS cluster. Conclusion Traditional IP and perimeter security models are insufficient for the dynamic nature of cloud-native environments. More sophisticated security mechanisms, such as identity-based policies and DNS names, are required. Azure CNI, powered by Cilium and ACNS, provides robust FQDN filtering and Layer 3/4 Network Policy enforcement. The Istio add-on offers mTLS for identity-based encryption and Layer7 authorization policies. A defense-in-depth model, incorporating both Azure CNI and service mesh mechanisms, is recommended for maximizing security posture. So, give these a try and let us know (Azure Kubernetes Service Roadmap (Public)) how we can evolve our roadmap to help you build the best with Azure. Credit(s): Niranjan Shankar, Sr. Software Engineer, Microsoft1.2KViews2likes0CommentsSecure, High-Performance Networking for Data-Intensive Kubernetes Workloads
In today’s data-driven world, AI and high-performance computing (HPC) workloads demand a robust, scalable, and secure networking infrastructure. As organizations rely on Kubernetes to manage these complex workloads, the need for advanced network performance becomes paramount. In this blog series, we explore how Azure CNI powered by Cilium, built on eBPF technology, is transforming Kubernetes networking. From high throughput and low latency to enhanced security and real-time observability, discover how these cutting-edge advancements are paving the way for secure, high-performance AI workloads. Ready to optimize your Kubernetes clusters?1.7KViews2likes0CommentsJuly 2025 Recap: Azure Database for PostgreSQL
Hello Azure Community, July delivered a wave of exciting updates to Azure Database for PostgreSQL! From Fabric mirroring support for private networking to cascading read replicas, these new features are all about scaling smarter, performing faster, and building better. This blog covers what’s new, why it matters, and how to get started. Catch Up on POSETTE 2025 In case you missed POSETTE: An Event for Postgres 2025 or couldn't watch all of the sessions live, here's a playlist with the 11 talks all about Azure Database for PostgreSQL. And, if you'd like to dive even deeper, the Ultimate Guide will help you navigate the full catalog of 42 recorded talks published on YouTube. Feature Highlights Upsert and Script activity in ADF and Azure Synapse – Generally Available Power BI Entra authentication support – Generally Available New Regions: Malaysia West & Chile Central Latest Postgres minor versions: 17.5, 16.9, 15.13, 14.18 and 13.21 Cascading Read Replica – Public Preview Private Endpoint and VNet support for Fabric Mirroring - Public Preview Agentic Web with NLWeb and PostgreSQL PostgreSQL for VS Code extension enhancements Improved Maintenance Workflow for Stopped Instances Upsert and Script activity in ADF and Azure Synapse – Generally Available We’re excited to announce the general availability of Upsert method and Script activity in Azure Data Factory and Azure Synapse Analytics for Azure Database for PostgreSQL. These new capabilities bring greater flexibility and performance to your data pipelines: Upsert Method: Easily merge incoming data into existing PostgreSQL tables without writing complex logic reducing overhead and improving efficiency. Script Activity: Run custom SQL scripts as part of your workflows, enabling advanced transformations, procedural logic, and fine-grained control over data operations. Together, these features streamline ETL and ELT processes, making it easier to build scalable, declarative, and robust data integration solutions using PostgreSQL as either a source or sink. Visit our documentation guide for Upsert Method and script activity to know more. Power BI Entra authentication support – Generally Available You can now use Microsoft Entra ID authentication to connect to Azure Database for PostgreSQL from Power BI Desktop. This update simplifies access management, enhances security, and helps you support your organization’s broader Entra-based authentication strategy. To learn more, please refer to our documentation. New Regions: Malaysia West & Chile Central Azure Database for PostgreSQL has now launched in Malaysia West and Chile Central. This expanded regional presence brings lower latency, enhanced performance, and data residency support, making it easier to build fast, reliable, and compliant applications, right where your users are. This continues to be our mission to bring Azure Database for PostgreSQL closer to where you build and run your apps. For the full list of regions visit: Azure Database for PostgreSQL Regions. Latest Postgres minor versions: 17.5, 16.9, 15.13, 14.18 and 13.21 PostgreSQL latest minor versions 17.5, 16.9, 15.13, 14.18 and 13.21 are now supported by Azure Database for PostgreSQL flexible server. These minor version upgrades are automatically performed as part of the monthly planned maintenance in Azure Database for PostgreSQL. This upgrade automation ensures that your databases are always running on the most secure and optimized versions without requiring manual intervention. This release fixes two security vulnerabilities and over 40 bug fixes and improvements. To learn more, please refer PostgreSQL community announcement for more details about the release. Cascading Read Replica – Public Preview Azure Database for PostgreSQL supports cascading read replica in public preview capacity. This feature allows you to scale read-intensive workloads more effectively by creating replicas not only from the primary database but also from existing read replicas, enabling two-level replication chains. With cascading read replicas, you can: Improve performance for read-heavy applications. Distribute read traffic more efficiently. Support complex deployment topologies. Data replication is asynchronous, and each replica can serve as a source for additional replicas. This setup enhances scalability and flexibility for your PostgreSQL deployments. For more details read the cascading read replicas documentation. Private Endpoint and VNET Support for Fabric Mirroring - Public Preview Microsoft Fabric now supports mirroring for Azure Database for PostgreSQL flexible server instances deployed with Virtual Network (VNET) integration or Private Endpoints. This enhancement broadens the scope of Fabric’s real-time data replication capabilities, enabling secure and seamless analytics on transactional data, even within network-isolated environments. Previously, mirroring was only available for flexible server instances with public endpoint access. With this update, organizations can now replicate data from Azure Database for PostgreSQL hosted in secure, private networks, without compromising on data security, compliance, or performance. This is particularly valuable for enterprise customers who rely on VNETs and Private Endpoints for database connectivity from isolated networks. For more details visit fabric mirroring with private networking support blog. Agentic Web with NLWeb and PostgreSQL We’re excited to announce that NLWeb (Natural Language Web), Microsoft’s open project for natural language interfaces on websites now supports PostgreSQL. With this enhancement, developers can leverage PostgreSQL and NLWeb to transform any website into an AI-powered application or Model Context Protocol (MCP) server. This integration allows organizations to utilize a familiar, robust database as the foundation for conversational AI experiences, streamlining deployment and maximizing data security and scalability. For more details, read Agentic web with NLWeb and PostgreSQL blog. PostgreSQL for VS Code extension enhancements PostgreSQL for VS Code extension is rolling out new updates to improve your experience with this extension. We are introducing key connections, authentication, and usability improvements. Here’s what we improved: SSH connections - You can now set up SSH tunneling directly in the Advanced Connection options, making it easier to securely connect to private networks without leaving VS Code. Clearer authentication setup - A new “No Password” option eliminates guesswork when setting up connections that don’t require credentials. Entra ID fixes - Improved default username handling, token refresh, and clearer error feedback for failed connections. Array and character rendering - Unicode and PostgreSQL arrays now display more reliably and consistently. Azure Portal flow - Reuses existing connection profiles to avoid duplicates when launching from the portal. Don’t forget to update to the latest version in the Marketplace to take advantage of these enhancements and visit our GitHub to learn more about this month’s release. Improved Maintenance Workflow for Stopped Instances We’ve improved how scheduled maintenance is handled for stopped or disabled PostgreSQL servers. Maintenance is now applied only when the server is restarted - either manually or through the 7-day auto-restart rather than forcing a restart during the scheduled maintenance window. This change reduces unnecessary disruptions and gives you more control over when updates are applied. You may notice a slightly longer restart time (5–8 minutes) if maintenance is pending. For more information, refer Applying Maintenance on Stopped/Disabled Instances. Azure Postgres Learning Bytes 🎓 Set Up HA Health Status Monitoring Alerts This section will talk about setting up HA health status monitoring alerts using Azure Portal. These alerts can be used to effectively monitor the HA health states for your server. To monitor the health of your High Availability (HA) setup: Navigate to Azure portal and select your Azure Database for PostgreSQL flexible server instance. Create an Alert Rule Go to Monitoring > Alerts > Create Alert Rule Scope: Select your PostgreSQL Flexible Server Condition: Choose the signal from the drop down (CPU percentage, storage percentage etc.) Logic: Define when the alert should trigger Action Group: Specify where the alert should be sent (email, webhook, etc.) Add tags Click on “Review + Create” Verify the Alert Check the Alerts tab in Azure Monitor to confirm the alert has been triggered. For deeper insight into resource health: Go to Azure Portal > Search for Service Health > Select Resource Health. Choose Azure Database for PostgreSQL Flexible Server from the dropdown. Review the health status of your server. For more information, check out the HA Health status monitoring documentation guide. Conclusion That’s a wrap for our July 2025 feature updates! Thanks for being part of our journey to make Azure Database for PostgreSQL better with every release. We’re always working to improve, and your feedback helps us do that. 💬 Got ideas, questions, or suggestions? We’d love to hear from you: https://aka.ms/pgfeedback 📢 Want to stay on top of Azure Database for PostgreSQL updates? Follow us here for the latest announcements, feature releases, and best practices: Azure Database for PostgreSQL Blog Stay tuned for more updates in our next blog!High-Scale Kubernetes Networking with Azure CNI Powered by Cilium
Kubernetes users have diverse cluster networking needs, but paramount to them all are efficient pod networking and robust security features. Azure CNI (Container Networking Interface) powered by Cilium is a solution that addresses these needs by blending the capabilities of the Azure CNI’s control plane and Cilium’s eBPF dataplane. Cilium enables performant networking and security by leveraging the power of eBPF (extended Berkeley Packet Filter), a revolutionary Linux kernel technology. eBPF enables the execution of custom code within the Linux kernel, providing both flexibility and efficiency. This translates to: High-performance networking: eBPF enables efficient packet processing, reduced latencies and improved throughput Enhanced security: Azure CNI (AzCNI) powered by Cilium enables network security policies based on DNS to easily manage and secure network traffic through Advanced Network Security features. Better observability: Our eBPF-based Advanced Network Observability suite provides detailed monitoring, tracing and diagnostics tools for cluster users. Introducing CiliumEndpointSlice A performant CNI dataplane is crucial for low-latency, high-throughput pod communication, enhancing distributed application efficiency and user experience. While Cilium’s eBPF-powered dataplane provides high-performance networking today, we sought to further enhance its scalability and performance. To do this, we looked at enabling a new feature in the dataplane’s configuration, CiliumEndpointSlice, thereby achieving: Lower traffic load on the Kubernetes control plane, leading to reduced control plane memory consumption and improved performance Faster pod start-up latencies Faster in-cluster network latencies for better application performance In particular, this feature improves upon how Azure CNI powered by Cilium manages pods. Previously, Cilium managed pods using Custom Resource Definitions (CRDs) called CiliumEndpoints. Each pod has a CiliumEndpoint associated with it. The CRD contains information about a pod’s status and properties. The Cilium Agent, a core component of the dataplane, runs on every node and watches each of these CiliumEndpoints for information about updates to pods. We have observed that this behavior can place significant stress and load on the control plane, leading to performance bottlenecks, especially for larger clusters. To alleviate load on the control plane, we are bringing in CiliumEndpointSlice, a feature which batches CiliumEndpoints and their associated updates. This reduces the number of updates propagated to the control plane. Consequently, we greatly reduce the risk of overloading the control plane at scale, ensuring smoother operation of the cluster. Performance Testing We have conducted performance testing of Azure CNI powered by Cilium with and without CiliumEndpointSlice enabled. The testing was done on a cluster with the following dimensions: 1000 nodes (Standard_D4_v3) 20,000 pods (i.e. 20 pods per node) 1 service with 4000 backends 800 services with 20 backends each The test involved repeating the following actions 10 times: creating deployments and services, restarting deployments, and deleting deployments and services. We detail the various performance metrics measured below. Average APIServer Responsiveness This metric measures the average latency of responses to LIST requests (one of the most expensive types of requests to the control plane) by the kube-apiserver. With CiliumEndpointSlice enabled, we observed a remarkable 50% decrease in latency, dropping from an average of ~1.5 seconds to ~0.25 seconds! For cluster users, this means much faster processing of queries sent to the kube-apiserver, leading to improved performance. Pod Startup Latencies This metric measures the time taken for a pod to be reported as running. Here, an over 60% decrease in pod startup latency was observed with CiliumEndpointSlice enabled, allowing for faster deployment and scaling of applications. In-Cluster Network Latency This is a critical metric, measuring the latency of pings from a prober pod to a server. An over 80% decrease in latency was observed. This reduction in latency translates to better application performance. Azure CNI powered by Cilium offers a powerful eBPF-based solution for Kubernetes networking and security. With the enablement of CiliumEndpointSlice from Kubernetes version 1.32 on AzCNI Powered by Cilium clusters, we see further improvements in application and control plane performance. For more information, visit https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium.820Views1like0CommentsOpen sourcing legacy edge
Hello together, Although I know this place might not be the correct one to ask this kind of questions and I also know, that the are different opinions on my question but I “just” wanted to get a somewhat official statement of Microsoft: Is MS planning or at least willing to open source the legacy version of Microsoft Edge in the future like they did with ChakraCore? By Microsoft Edge (legacy) I mean everything including the UI, the *.pdf and *.epub reader and the EdgeHTML Engine as well as the remaining private part of Chakra. Personally, I’d really like to see those things; possibly “read-only open source” like MS did with the .Net Framework…1.1KViews1like1Comment