Introduction: The AI Lifecycle in the Cloud and Its Risks
As organizations increasingly adopt AI to drive innovation, the development and deployment of AI models, applications and agents is now taking place in the cloud more than ever before. Leading cloud platforms make it easier than ever to build, train, and deploy AI systems at scale - offering powerful compute, seamless integrations, and collaborative tools. However, this shift also introduces new security challenges at every stage of the development lifecycle.
Whether you're training an AI model or deploying an AI application or agent, the AI development lifecycle in the cloud includes multiple stages, including data collection, model training, Fine-tuning pipelines, and the deployment of AI applications and agents. If attackers compromise even one part of this lifecycle, it can put the entire AI system and the business operations it supports at risk.
What adds to the complexity of this landscape is the rapid evolution of cloud-based AI platforms. New features are released at a fast pace, often outpacing the maturity of existing security controls - leaving gaps that attackers can exploit.
This blog will examine the risks associated with each phase of the AI development lifecycle in the cloud – whether it’s models, applications, or agents. We’ll explore how attackers can abuse them, and how Microsoft Defender for Cloud helps organizations reduce AI posture risks with AI Posture management across their multi cloud environment.
Understanding the Threat Landscape Across the AI Lifecycle
Whether it’s poisoning training data, stealing proprietary models, or hijacking deployed AI systems to manipulate outputs, securing the cloud-based AI development lifecycle requires a comprehensive understanding of the risks associated with every phase. Let’s explore how attackers can target various stages of the AI development lifecycle and the specific consequences of those compromises.
Data and training
It all begins with data, which is often the most valuable and the most vulnerable asset. Whether it's customer records, transaction logs, emails, or images, this data is used to train models that will eventually make decisions on behalf of the organization. In cloud AI environments, such data is typically stored in cloud storage.
If attackers gain access to such storage account with training data, due to misconfigured storage or overly permissive cloud account permissions, the consequences can be severe. For instance, they might inject poisoned or manipulated data into the training set, subtly altering the behavior of the model. In one scenario, they could bias a credit scoring model to approve fraudulent applications. In another, they could insert a hidden backdoor - causing the model to behave normally most of the time but output incorrect or malicious predictions when triggered by a specific input.
Once the data is prepared, it flows into the training pipeline: a critical but often overlooked attack surface. This pipeline automates the full training workflow: ingesting data, executing transformation scripts, spinning up GPU-powered training jobs, and saving the resulting model. If attackers infiltrate this pipeline, they can gain persistent control over the AI system. For example, they could modify preprocessing scripts to inject subtle distortions into the data, or they might replace a model artifact with a manipulated one that appears legitimate but behaves maliciously under specific conditions. Since pipelines often run with elevated permissions and can access cloud storage, compute resources, and secrets, they also become convenient pivot points for lateral movement across cloud infrastructure.
Model Artifacts & Registries
Once trained, models in the cloud are typically stored in model registries or artifact repositories. These are often considered secure because they’re not directly exposed to users. However, they represent a high-value target. Attackers who gain access to stored models can steal intellectual property, especially if the model architecture or parameters represent years of R&D. In addition to theft, an attacker might attempt to delete critical models to disrupt business and operations. Even more concerning, they could upload a malicious model in place of a legitimate one. Such a model could be designed to behave subtly but incorrectly, introduce biases, leak data during inference, or provide manipulated outputs that mislead downstream systems and users. This type of tampering not only undermines trust in AI systems but can also have serious operational and security consequences.
Model Fine-tuning
In addition to full model training, many organizations rely on fine-tuning: a process where a pre-trained foundation model is adapted using domain-specific data. Fine-tuning offers a faster and more cost-effective path to building specialized models, but it also introduces new attack vectors. The fine-tuning inherits all the risks of traditional training, plus a few more.
For instance, attackers can target fine-tuning jobs or the associated fine-tuning files (e.g., in storage buckets) to manipulate the behavior of a pre-trained model without raising suspicion. By injecting poisoned fine-tuning data, they can create task-specific vulnerabilities, such as altering outputs related to a particular customer or product.
The risk is especially high because fine-tuned models are often deployed directly into production environments. This means attackers don’t need to compromise the full model training workflow to achieve impact - they can introduce malicious behavior just by manipulating a smaller, faster process with fewer controls. Given this, securing fine-tuning pipelines and datasets is just as critical as protecting full-scale training jobs.
Models Inference & Endpoints
After deployment, models are exposed to the outside world through inference endpoints, typically REST APIs that receive input data and return predictions, decisions, text, or other outputs.
The main risk at this stage is unbounded consumption. This occurs when attackers or even legitimate users are able to perform excessive, uncontrolled requests, especially with resource-intensive models like Large Language Models (LLMs). Such abuse can lead to denial of service (DoS), inflated operational costs, and overall service degradation. In cloud environments, where resource usage drives cost and performance, this kind of exploitation can have serious financial and operational impacts.
In addition to consumption-based abuse, attackers with access to a poorly secured endpoint may attempt destructive actions such as deleting the endpoint to disrupt availability and business operations, or deploying a different model to the endpoint, potentially replacing trusted outputs with manipulated or malicious ones.
Securing inference endpoints is critical to maintaining the integrity, availability, and cost-effectiveness of AI services in the cloud.
The rise of AI Agents and apps
AI agents, autonomous LLM-driven systems that can search, retrieve, write code, execute workflows, and make decisions, are rapidly becoming a central component in modern AI systems. Unlike traditional models that simply return predictions or text, agents are designed to perform complex, goal-oriented tasks by autonomously chaining multiple actions, tools, and reasoning steps.
They can interact with external systems, call APIs, query databases, invoke tools like code execution environments or vector stores, and even communicate with other agents. This growing autonomy and connectivity unlock powerful capabilities - but it also introduces a new and expanding attack surface.
One of the biggest concerns with AI agents is the amplification of existing risks. Vulnerabilities like prompt injection, which might have limited impact in a basic chatbot, can become far more dangerous when exploited in an agent that has access to tools and can take real actions. A single malicious input could cause an agent to leak sensitive information, perform unintended operations, or invoke tools in harmful ways.
In addition, attackers with access to the agent itself, whether though a compromised cloud account permissions or leaked API keys, can access the agent tools, change the agent’s behavior by manipulating its instructions, or deleting it to disrupt business.
As the adoption of AI agents grows, it's critical for organizations to integrate security thinking into their design and deployment. This includes implementing strict controls on agent permissions, monitoring and logging agent behavior, hardening agent tools and APIs, and applying layered protections against manipulation and misuse.
Models’ and Agents dependencies
Cloud-based AI systems increasingly rely on external data sources and tools to perform complex tasks accurately. For example, retrieval-augmented generation (RAG) models depend on grounding data from document stores or vector databases to generate up-to-date, context-aware responses. Similarly, AI agents may be configured to interact with APIs, databases, cloud functions, or internal systems as part of their reasoning or execution loop. These dependencies act as the AI system's supply chain, where a breach in one part can undermine the integrity of the entire system. If attackers tamper the grounding data, a model’s output can be intentionally skewed or poisoned. Likewise, if the tools an agent depends on - such as cloud automation function - are compromised or misconfigured, the agent could execute malicious actions or leak sensitive information. Securing these dependencies is essential, as attackers may exploit trust in the AI supply chain to manipulate behavior, exfiltrate data, or pivot deeper into the cloud infrastructure.
Across all these components, one theme is clear: the interconnected nature of AI in the cloud means that a single weak link can compromise the entire lifecycle. Data corruption can lead to model failure. Pipeline compromise can lead to infrastructure access. Endpoint manipulation can lead to silent data leaks. This is why AI security posture must be end-to-end - from data to deployment.
Securing AI in the cloud – it all starts with visibility
AI Security Posture Management (AI-SPM), part of Microsoft Defender for Cloud's CNAPP solution, provides security from code to deployed AI models, applications and agents. It offers comprehensive visibility into AI assets, including data assets, models, endpoints, and agents. By identifying vulnerabilities and misconfigurations, AI-SPM enables organizations to reduce risks and detect and respond to AI applications.
Reduce AI application risks with Defender for Cloud
By leveraging its agentless detection capabilities, Defender for Cloud uncovers misconfigurations and attack paths that could be exploited to compromise AI components at every stage of the lifecycle outlined above. These insights empower security teams to focus on critical risks and address them effectively, minimizing the overall risk. For example, as illustrated in Figure 1, an attack path can demonstrate how an attacker might utilize a virtual machine with a high-severity vulnerability to gain access to an organization's AI platform. This visualization helps security admin teams to take preventative actions, safeguarding the AI environment from potential breaches.
Figure 1- The attack path demonstrates how an attacker might utilize a virtual machine with a high-severity vulnerability to gain access to an organization's AI platform
The AI-SPM capabilities in Defender for Cloud also supports multi-cloud resources. In another example, as shown in figure 2, the attack path illustrates how an attacker can exploit a vulnerable GCP compute instance to gain access to a custom model deployment in Vertex AI. This scenario underscores the importance of securing every layer of the AI environment, including cloud infrastructure and compute resources, to prevent unauthorized access to sensitive AI components.
Figure 2- The attack path demonstrates how an attacker might utilize a vulnerable GCP compute instance to gain access to a custom model deployment in Vertex AI
In yet another scenario, as depicted in figure 3, an attacker might exploit a vulnerable GCP compute instance not only to access the model itself, but also to target the data used to train the AI model. This type of data poisoning attack could lead to altered model and application behavior, potentially skewing outputs, introducing bias, or corrupting downstream processes. Such attacks emphasize the critical need to secure data integrity across all stages of the AI lifecycle, from ingestion and training pipelines to active deployment. Safeguarding the data layer is as vital as securing the underlying infrastructure to ensure that AI applications remain trustworthy and resilient against threats.
Figure 3- The attack path demonstrates how an attacker might utilize a vulnerable GCP compute instance to gain access to training data
Summary: Build AI Security from the Ground Up
To address these challenges across the whole cloud AI development lifecycle, Microsoft Defender for Cloud provides a suite of security tools tailored for AI workloads. By enabling AI Security Posture Management (AI-SPM) within the Defender for Cloud Defender CSPM plan, organizations gain comprehensive multicloud posture visibility and risk prioritization across platforms such as Azure AI Foundry, OpenAI services, AWS Bedrock, and GCP Vertex AI. This multicloud approach ensures critical vulnerabilities and potential attack paths are effectively identified and mitigated, creating a unified and secure AI ecosystem.
Additionally, Defender for AI Services introduces a runtime protection plan specifically designed for custom-built AI applications. This plan extends the security coverage to AI models deployed on Azure AI Foundry and OpenAI services, safeguarding the entire lifecycle - from code to runtime.
Together, these integrated solutions empower enterprises to build, deploy, and operate AI technologies securely, even within a diverse and evolving threat landscape.
To learn more about Security for AI with Defender for Cloud, visit our website and documentation.