azure hardware infrastructure
12 TopicsBehind the Azure AI Foundry: Essential Azure Infrastructure & Cost Insights
What is Azure AI Foundry? Azure AI Studio is now renamed to Azure AI Foundry. Azure AI Foundry is a unified AI development platform where teams can efficiently manage AI projects, deploy and test generative AI models, integrate data for prompt engineering, define workflows, and implement content security filters. This powerful tool enhances AI solutions with a wide range of functionalities. It is a one stop shop for all you need for AI development. Azure AI Hubs are collaborative workspaces for developing and managing AI solutions. To use AI Foundry's features effectively, you need at least one Azure AI Hub. An Azure AI Hub can host multiple projects. Each project includes the tools and resources needed to create a specific AI solution. For example, you can set up a project to help data scientists and developers work together on building a custom Copilot business application. You can use Azure AI Foundry to create an Azure AI Hub, or you can create a hub while creating a new project. This creates an AI Hub resource in your Azure subscription in the resource group you specify, providing a workspace for collaborative AI development. https://ai.azure.com/ Azure Infrastructure Azure AI Foundry environment utilizes Azure's robust AI infrastructure to facilitate the development, deployment, and management of AI models across various scenarios. Below is the list of Azure Infrastructure required to deploy the environment. Make sure the below resource providers are enabled for your subscription to deploy these Azure resources. Azure resource Resource provider Kind Purpose Azure AI Foundry Hub Microsoft.MachineLearningServices/ workspace Hub This resource type, associated with the Azure Machine Learning service workspace, serves as a central hub for managing machine learning experiments, models, and data. It provides capabilities for creating, organizing, and collaborating on AI projects. Azure AI Foundry Project Microsoft.MachineLearningServices/ workspace Project Within an Azure AI Studio Hub, you can create projects. These projects allow you to organize your work, collaborate with others, and track experiments related to specific tasks or use cases. Essentially, it provides a structured environment for your AI development. Azure AI OpenAI Service Microsoft.CognitiveServices/account AI Services An Azure AI Services as the model-as-a-service endpoint provider including GPT-4/4o and ADA Text Embeddings models. Azure AI Search Microsoft.Search/searchServices Search Service Creates indexes on your data and provides search capabilities for your projects. Azure Storage Account Microsoft.Storage/storageAccounts Storage It is associated with the Azure AI Foundry workspace. Stores artifacts for your projects (e.g., flows and evaluations). Azure Key Vault Microsoft.KeyVault/vaults Key Vault It is associated with the Azure AI Foundry workspace. Stores secrets like connection strings for resource connections. Azure Container Registry(optional) Microsoft.ContainerRegistry/ registries Container Registry Stores Docker images created when using custom runtime for prompt flow. Azure Application Insights Microsoft.Insights/components Monitoring An Azure Application Insights instance associated with the Azure AI Foundry workspace. Used for application-level logging in deployed prompts. Log Analytics Workspace (optional) Microsoft.OperationalInsights/ workspaces Monitoring Used for log storage and analysis. Event Grid Microsoft.Storage/storage accounts/ providers/extensiontopics Event Grid System Topic Event Grid automates workflows by triggering actions in response to events across Azure services, ensuring dynamic and efficient operations in an Azure AI solution AI Foundry Environment Azure Portal View AI Foundry Portal View All dependent resources are connected to the hub and can some resources (Azure OpenAI and Azure AI Search) can be shared across projects. Pricing Since Azure AI Foundry is assembled from multiple Azure services ,pricing would depend on architectural decisions and usage. When building your own Azure AI solution, it's essential to consider the associated costs that it accrues in Azure AI Foundry. Below are the areas where the costs incur: 1.Compute Hours and Tokens: Unlike fixed monthly costs, Azure AI hubs, Azure OpenAI and projects are billed based on compute hours and tokens used. Be mindful of resource utilization to avoid unexpected charges. 2.Networking Costs: By default, the hub network configuration is public. But if you want to secure the Azure AI Hub there is costs associated with data transfer. 3.Additional Resources: Beyond AI services, consider other Azure resources like Azure Key Vault, Storage, Application Insights, and Event Grids. These services charge based on transactions and data volume. AI Foundry Cost Pane Now in Azure Pricing Calculator you can directly find the upfront monthly cost of the resources under Example Scenarios tab in Azure AI Foundry scenario. This cost calculation feature is GA now. You can also use cost management and Azure resource tags to help with a detailed resource-level cost breakdown. Please note while adding vector search in AI Search requires an Azure OpenAI embedding model. Azure OpenAI embedding model, text-embedding-ada-002 (Version 2), will be deployed if not already. Adding vector embeddings will incur usage to your account. Vector search is available as part of all Azure AI Search tiers in all regions at no extra charge. If you require to group costs of these different services together, it is recommend creating hubs in one or more dedicated resource groups and subscriptions in your Azure environment. You can navigate to your resource group cost estimation from view cost of resources in Azure AI Foundry. Azure Pricing Calculator To learn more about the pricing of Azure AI Foundry pricing click here -Azure AI Foundry - Pricing | Microsoft Azure Conclusion Azure AI Foundry enables a path forward for enterprises serious about AI transformation, not just experiments, but scalable, governable, cost predictable, and responsible AI Systems by leveraging the robust infrastructure of Azure Cloud. This integration helps maintain and cater to business goals while simultaneously providing a competitive edge in an AI-driven market. Resources and getting started with Azure AI Azure AI Portfolio Explore Azure AI. Azure AI Infrastructure Microsoft AI at Scale. Azure AI Infrastructure. Azure OpenAI Service Azure OpenAI Service documentation. Explore the playground and customization in Azure AI Foundry Portal . Copilot Studio Copilot Learning Hub Step 1: Understand Copilot Step 2: Adopt Copilot Step 3: Extend Copilot Step 4: Build Copilot Stay up to date on Copilot -What's new in Copilot Studio GPT 4.5 Model Request MS form link. Please note this is now limited to US region only as Azure AI Infrastructure is undergoing significant advancements, continually evolving to meet the demands of modern technology and innovation. Copilot & AI Agents1.3KViews2likes0CommentsMt Diablo - Disaggregated Power Fueling the Next Wave of AI Platforms
AI platforms have quickly shifted the industry from rack powers near 20 kilowatts to a hundred kilowatts and beyond in just the span of a few years. To enable the largest accelerator pod size within a physical rack domain, and enable scalability between platforms, we are moving to a disaggregated power rack architecture. Our disaggregated power rack is known as Mt Diablo and comes in both 48 Volt and 400 Volt flavors. This shift enables us to leverage more of the server rack for AI accelerators and at the same time gives us the flexibility to scale the power to meet the needs of today’s platforms and the platforms of the future. This forward thinking strategy enables us to move faster and foster collaboration to power the world’s most complex AI systems.12KViews2likes5CommentsResiliency Best Practices You Need For your Blob Storage Data
Maintaining Resiliency in Azure Blob Storage: A Guide to Best Practices Azure Blob Storage is a cornerstone of modern cloud storage, offering scalable and secure solutions for unstructured data. However, maintaining resiliency in Blob Storage requires careful planning and adherence to best practices. In this blog, I’ll share practical strategies to ensure your data remains available, secure, and recoverable under all circumstances. 1. Enable Soft Delete for Accidental Recovery (Most Important) Mistakes happen, but soft delete can be your safety net and. It allows you to recover deleted blobs within a specified retention period: Configure a soft delete retention period in Azure Storage. Regularly monitor your blob storage to ensure that critical data is not permanently removed by mistake. Enabling soft delete in Azure Blob Storage does not come with any additional cost for simply enabling the feature itself. However, it can potentially impact your storage costs because the deleted data is retained for the configured retention period, which means: The retained data contributes to the total storage consumption during the retention period. You will be charged according to the pricing tier of the data (Hot, Cool, or Archive) for the duration of retention 2. Utilize Geo-Redundant Storage (GRS) Geo-redundancy ensures your data is replicated across regions to protect against regional failures: Choose RA-GRS (Read-Access Geo-Redundant Storage) for read access to secondary replicas in the event of a primary region outage. Assess your workload’s RPO (Recovery Point Objective) and RTO (Recovery Time Objective) needs to select the appropriate redundancy. 3. Implement Lifecycle Management Policies Efficient storage management reduces costs and ensures long-term data availability: Set up lifecycle policies to transition data between hot, cool, and archive tiers based on usage. Automatically delete expired blobs to save on costs while keeping your storage organized. 4. Secure Your Data with Encryption and Access Controls Resiliency is incomplete without robust security. Protect your blobs using: Encryption at Rest: Azure automatically encrypts data using server-side encryption (SSE). Consider enabling customer-managed keys for additional control. Access Policies: Implement Shared Access Signatures (SAS) and Stored Access Policies to restrict access and enforce expiration dates. 5. Monitor and Alert for Anomalies Stay proactive by leveraging Azure’s monitoring capabilities: Use Azure Monitor and Log Analytics to track storage performance and usage patterns. Set up alerts for unusual activities, such as sudden spikes in access or deletions, to detect potential issues early. 6. Plan for Disaster Recovery Ensure your data remains accessible even during critical failures: Create snapshots of critical blobs for point-in-time recovery. Enable backup for blog & have the immutability feature enabled Test your recovery process regularly to ensure it meets your operational requirements. 7. Resource lock Adding Azure Locks to your Blob Storage account provides an additional layer of protection by preventing accidental deletion or modification of critical resources 7. Educate and Train Your Team Operational resilience often hinges on user awareness: Conduct regular training sessions on Blob Storage best practices. Document and share a clear data recovery and management protocol with all stakeholders. 8. "Critical Tip: Do Not Create New Containers with Deleted Names During Recovery" If a container or blob storage is deleted for any reason and recovery is being attempted, it’s crucial not to create a new container with the same name immediately. Doing so can significantly hinder the recovery process by overwriting backend pointers, which are essential for restoring the deleted data. Always ensure that no new containers are created using the same name during the recovery attempt to maximize the chances of successful restoration. Wrapping It Up Azure Blob Storage offers an exceptional platform for scalable and secure storage, but its resiliency depends on following best practices. By enabling features like soft delete, implementing redundancy, securing data, and proactively monitoring your storage environment, you can ensure that your data is resilient to failures and recoverable in any scenario. Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn Data redundancy - Azure Storage | Microsoft Learn Overview of Azure Blobs backup - Azure Backup | Microsoft Learn Protect your Azure resources with a lock - Azure Resource Manager | Microsoft Learn1.1KViews1like0CommentsLiquid Cooling in Air Cooled Data Centers on Microsoft Azure
With the advent of artificial intelligence and machine learning (AI/ML), hyperscale datacenters are increasingly accommodating AI accelerators at scale, demanding higher power at higher density than is customary in traditionally air-cooled facilities. As Microsoft continues to expand our growing datacenter fleet to enable the world’s AI transformation, we are faced with a need to develop methods for utilizing air-cooled datacenters to provide liquid cooling capabilities for new AI . Additionally, increasing per-rack-density for AI accelerators necessitates the use of standalone liquid-to-air heat-exchangers to support legacy datacenters that are typically not equipped with the infrastructure to support direct-to-chip (DTC) liquid cooling.5.5KViews1like0CommentsAzure Extended Zones: Optimizing Performance, Compliance, and Accessibility
Azure Extended Zones are small-scale Azure extensions located in specific metros or jurisdictions to support low-latency and data residency workloads. They enable users to run latency-sensitive applications close to end users while maintaining compliance with data residency requirements, all within the Azure ecosystem.3KViews2likes0CommentsUnleashing GitHub Copilot for Infrastructure as Code
Introduction In the world of managing infrastructure, things are always changing. People really want solutions that work, can handle big tasks, and won't let them down. Now, as more companies switch to using cloud-based systems and start using Infrastructure as Code (IaC), the job of folks who handle infrastructure is getting even more important. They're facing new problems in setting up and keeping everything running smoothly. The Challenges faced by Infrastructure Professionals Complexity of IaC: Managing infrastructure through code introduces a layer of complexity. Infrastructure professionals often grapple with the intricate syntax and structure required by tools like Terraform and PowerShell. This complexity can lead to errors, delays, and increased cognitive load. Consistency Across Environments: Achieving consistency across multiple environments—development, testing, and production—poses a significant challenge. Maintaining uniformity in configurations is crucial for ensuring the reliability and stability of the deployed infrastructure. Learning Curve: The learning curve associated with IaC tools and languages can be steep for those new to the domain. As teams grow and diversify, onboarding members with varying levels of expertise becomes a hurdle. Time-Consuming Development Cycles: Crafting infrastructure code manually is a time-consuming process. Infrastructure professionals often find themselves reinventing the wheel, writing boilerplate code, and handling repetitive tasks that could be automated. Unleashing GitHub Copilot for Infrastructure as Code In response to these challenges, Leveraging GitHub Copilot to generate infra code specifically for infrastructure professionals is helping to revolutionize the way infrastructure is written, addressing the pain points experienced by professionals in the field. The Significance of GH Copilot for Infra Code Generation with accuracy: Copilot harnesses the power of machine learning to interpret the intent behind prompts and swiftly generate precise infrastructure code. It understands the context of infrastructure tasks, allowing professionals to express their requirements in natural language and receive corresponding code suggestions. Streamlining the IaC Development Process: By automating the generation of infrastructure code, Copilot significantly streamlines the IaC development process. Infrastructure professionals can now focus on higher-level design decisions and business logic rather than wrestling with syntax intricacies. Consistency Across Environments and Projects: GH Copilot ensures consistency across environments by generating standardized code snippets. Whether deploying resources in a development, testing, or production environment, GH Copilot helps maintain uniformity in configurations. Accelerating Onboarding and Learning: For new team members and those less familiar with IaC, GH Copilot serves as an invaluable learning service. It provides real-time examples and best practices, fostering a collaborative environment where knowledge is shared seamlessly. Efficiency and Time Savings: The efficiency gains brought about by GH Copilot are substantial. Infrastructure professionals can witness a dramatic reduction in development cycles, allowing for faster iteration and deployment of infrastructure changes. Copilot in Action Prerequisites 1.Install visual studio code latest version - https://code.visualstudio.com/download Have a GitHub Copilot license with a personal free trial or your company/enterprise GitHub account, install the Copilot extension, and sign in from Visual Studio Code. https://docs.github.com/en/copilot/quickstart Install the PowerShell extension for VS Code, as we are going to use PowerShell for our IaC sample. Below is the PowerShell code generated using VS Code & GitHub Copilot. It demonstrates how to create a simple Azure VM. We're employing a straightforward prompt with #, with the underlying code automatically generated within the VS Code editor. Another example to create azure vm with vm scale set with minimum and maximum number of instance count. Prompt used with # in below example. The PowerShell script generated above can be executed either from the local system or from the Azure Portal Cloud Shell. Similarly, we can create Terraform and devops code using this Infra Copilot. Conclusion In summary, GH Copilot is a big deal in the world of infrastructure as code. It helps professionals overcome challenges and brings about a more efficient and collaborative way of working. As we finish talking about GH Copilot's abilities, the examples we've looked at have shown how it works, what technologies it uses, and how it can be used in real life. This guide aims to give infrastructure professionals the info they need to improve how they do infrastructure as code.31KViews9likes9Comments