Blog Post

Healthcare and Life Sciences Blog
14 MIN READ

Reimagining Cancer R&D with Agentic AI Using GigaTIME in Microsoft Discovery

Alberto_Santamaria's avatar
Apr 22, 2026

By Alberto Santamaria-Pang, Principal AI Data Scientist & Alexander Mehmet Ersoy, Dr. Industry Advisory HLS

@Alberto Santamaria-Pang,
Principal AI Data Scientist, Industry Solutions Engineering Healthcare
Adjunct Faculty at Johns Hopkins School of Medicine
@Alexander Mehmet Ersoy,
Dir. Industry Advisory, Healthcare & Life Sciences 

1. Introduction: From Images to insight in modern oncology

What if we could characterize every single cell in a tumor not just by how it looks under the microscope, but by the biological signals that shape how it behaves, how it evades the immune system, and how it responds to therapy? This question sits at the heart of modern oncology and precision medicine. Advances in artificial intelligence and spatial biology are rapidly lowering the barrier to understanding cancer at cellular and molecular resolution, supporting research into more precise, more personalized, and ultimately more effective treatments. Immuno-oncology already offers a glimpse of what becomes possible when therapy is guided by biology rather than averages. For example, the FDA approval of tisagenlecleucel for relapsed or refactory B-cell acute lymphoblastic leukemia was supported by an overall remission rate of 82.5%, underscoring how meaningful outcomes can be when treatment aligns with the right biological signals [1]. The challenge is scale: how do we make this type of biologically informed decision-making feasible across millions of patients, diverse tumor types, and real-world clinical settings?

Two recent Microsoft innovations help address that challenge, at different layers of the R&D stack: The GigaTIME AI Framework (a model and workflow for virtual mIF generation from routine pathology) and Microsoft Discovery platform (the agentic R&D platform that orchestrates data, tools, and AI Agents). In this time, we introduce GigaTIME in general (including a practical tutorial on how model can be used), and then show how GigaTIME could be used within, and in the context of, the Discovery platform as one tool that helps accelerate precision oncology discovery.

2. GigaTIME: Scaling tumor microenvironment insight from routine pathology

A routine hematoxylin and eosin (H&E) slide is a common cost-efficient diagnostic tool used to understand the specifics of patient’s oncological condition. It is like a high-resolution photograph of a complex cellular community. An H&E slide captures structure, morphology, and organization in remarkable detail, but it cannot fully reveal how cells are communicating or which molecular programs are active beneath the surface. This is why multiplex immunofluorescence (mIF) and related spatial proteomics assays have become so valuable in oncology research: they reveal protein patterns linked to immune identity, checkpoint signaling, proliferation, and tumor context. Their broad use, however, is limited by cost and throughput, which makes large-scale tumor immune microenvironment analysis difficult [2]. GigaTIME provides an important bridge. It translates routine H&E pathology slides into virtual mIF images across 21 protein channels, making it possible to infer spatially resolved, biologically meaningful virtual mIF patterns from a much more accessible input. In this blog, we focus on what that means at the tissue level: how to interpret selected virtual mIF signals, how to localize them in cellular context, and why that matters for understanding tumor–immune interactions in oncology [3].

Figure 1. GigaTIME Workflow schematic.

2.1 Reading virtual mIF signals in context

To make the virtual mIF panel easier to interpret, it helps to think of the tissue as two interacting compartments: the tumor compartment, where malignant growth and tumor-associated programs dominate, and the stroma or host compartment, where immune cells, vasculature, and connective tissue either resist, reshape, or sometimes enable tumor progression. The most important biology often happens at the boundary between these two worlds. Rather than reading the panel as a flat list of proteins, we can read it as a guide to tumor geography, immune access, checkpoint context, proliferation, and tissue infrastructure.


Table 1. Selected markers produced in GigaTIME.

Marker

What it represents

Why it matters biologically

CK

Tumor-rich epithelial regions

Defines where the tumor compartment is located.

DAPI

Cell nuclei

Anchors localization at the single-cell level.

CD8

Cytotoxic T cells

Helps assess whether immune cells are infiltrating tumor regions.

CD68

Macrophage-associated signal

Highlights myeloid context at tumor borders or within tumor-rich tissue.

PD-1 / PD-L1

Checkpoint-associated signaling

Provides context on whether immune activity may be locally restrained.

Ki67

Proliferation

Indicates whether tumor-rich regions are actively cycling.

CD34

Vasculature

Helps interpret access routes and stromal context around the tumor.

 

In this blog, we focus on a small set of markers that are especially useful for reading tumor geography, immune access, checkpoint biology, proliferation, and vascular organization. To make that concrete, we implemented a practical notebook that shows how the GigaTIME model can be deployed as an endpoint, used for inference on H&E patches, and combined with single-cell localization to support downstream phenotyping and interpretation. The main point is not any one marker in isolation, but how marker combinations organize in space and help us ask more meaningful questions about tumor–host interaction.

2.2 From H&E to virtual mIF: how GigaTIME works

The starting point is a sample-level H&E patch from the test dataset, paired with a compressed label file that contains binary marker masks and cell-segmentation scaffolds used downstream. The workflow is intentionally practical: load the H&E input, generate or reuse GigaTIME predictions, visualize selected virtual mIF channels, refine those predictions with single-cell localization, and summarize the results as virtual phenotypes and per-marker counts [4].

At the model-output stage, GigaTIME produces a multi-channel spatial prediction stack from the H&E patch. In the notebook, each channel can be visualized as a virtual mIF map indicating where the model predicts marker-associated signal in the tissue. However, these raw virtual mIF maps are not yet cell phenotypes. To make them biologically interpretable, the notebook converts dense predictions into cell-aware assignments. It uses labels_dapi for nuclear regions and labels_dapi_expanded for expanded cell regions, then computes the fraction of positive pixels within each segmented region. Marker positivity is assigned only when the overlap exceeds a threshold, with localization adjusted according to expected marker biology, such as nuclear versus non-nuclear signal [5].

This same localization scaffold also supports validation. Because the reference files provide binarized marker masks together with shared nuclei and expanded-cell labels, predicted signal and reference signal can be compared in the same segmented cellular space rather than only as unstructured image intensities. Once virtual mIF maps are tied to individual nuclei or cell regions, they become both quantitative and spatial, supporting measurements of infiltration, compartment-specific localization, and per-marker cell counts that can be aggregated across samples. You can access the tutorial here: https://aka.ms/gigatime-sample.

 

Figure 2. Example H&E patch and virtual mIF output.

2.3 Virtual Phenotyping

Once the virtual mIF maps have been localized to segmented cells, they can be interpreted as spatial phenotypes rather than diffuse prediction maps. In this tutorial, we use a limited sample dataset to demonstrate how these localized overlays can be reproduced and read biologically in practice. The goal is not to make broad claims from a small set of examples, but to show how virtual phenotyping connects marker prediction, cellular localization, and tumor microenvironment interpretation. In real applications, this type of workflow would typically require additional fine-tuning and validation to account for imaging conditions, tissue context, cohort composition, and study-specific marker panels.

At a high level, the figures in this section can be read through four themes: tumor–immune interaction, immune system structure, immune checkpoint biology, and stromal and vascular context. These themes translate localized virtual mIF signals into biologically meaningful spatial patterns. Rather than reading each marker in isolation, we can read how marker combinations organize near tumor-rich regions, immune niches, and tissue boundaries. These same concepts are already used in modern oncology, where immune infiltration, immune organization, checkpoint signaling, and vascular or stromal remodeling all shape how therapies are developed and interpreted [6–9].

Theme

Biological interpretation

Example marker trends

Therapy relevance

Tumor–immune interaction [6]

Tumor-rich compartment is being accessed by immune cells, shaped by myeloid cells, and actively proliferating.

• Higher CD8 near CK-rich regions suggests immune infiltration;
• CD68 concentrated at the tumor border suggests a myeloid interface or barrier;
• Higher Ki67 within CK-rich regions suggests active tumor proliferation.

Higher intratumoral CD8 is generally favorable for anti-tumor immunity; border-restricted CD68 may reflect a suppressive interface; high Ki67 in CK-rich regions is generally unfavorable because it suggests active tumor growth.

Immune system structure [7]

Immune compartment appears coordinated, sparse, balanced, or spatially segregated.

• CD3 and CD20 co-localized in organized clusters suggests structured lymphoid neighborhoods;
• Balanced CD4 and CD8 distributions suggests a coordinated immune context;
• Fragmented or separated patterns suggest a less organized response.

Organized lymphoid structure and balanced adaptive immune populations are generally favorable; fragmented or sparse immune organization may indicate weaker local immune coordination.

Immune checkpoint biology [8]

Immune cells are present but may be locally restrained by inhibitory signaling.

• CD8 overlapping with PD-L1 suggests immune presence in a potentially suppressive niche;

• CD3 overlapping with PD-1 suggests T cells in a checkpoint-associated state consistent with local restraint.

Context-dependent: this may indicate a restrained immune response that could be relevant to checkpoint blockade, but not automatically a positive or negative finding in isolation.

Stromal and vascular context [9]

Tissue structure supports access, creates barriers, or concentrates inflammatory niches.

• CD34 aligned near CK-rich regions suggests vascular routes close to tumor compartments;
• Tryptase and CD68 clustered in stromal or perivascular regions suggests innate inflammatory niches that may shape local signaling and access.

Context-dependent: vascular proximity can support access, while stromal or perivascular inflammatory niches may either facilitate response or reinforce barriers depending on the broader microenvironment.

Table 2. Quick guide to interpreting virtual phenotyping themes.

2.3.1 Tumor–immune interaction

Figure 3. Tumor–immune interaction.

We begin with a central question in the tumor microenvironment: can immune cells reach the tumor? In Figure 3, the CK-centered overlays provide a compact way to read this biology. CK + CD8 shows tumor-rich regions alongside cytotoxic T-cell signal, allowing us to ask whether immune cells are infiltrating tumor nests, remaining at the border, or being excluded from the tumor core. CK + CD68 adds macrophage context and helps highlight whether myeloid cells are embedded within tumor-rich regions or concentrated at the tumor–stroma interface. CK + Ki67 complements these immune overlays by showing whether the same tumor-rich regions also display strong proliferative activity.

Read together, these panels provide a concise illustrative summary of tumor geography, immune access, myeloid interface biology, and growth state. Are immune cells entering the malignant compartment, or is access limited? Are macrophages mixing with tumor cells or forming a border-associated niche? Are tumor-rich regions relatively quiescent, or are they actively cycling? Even in a tutorial setting, this combination of overlays shows how virtual markers can move beyond visualization and support structured interpretation of the tumor immune microenvironment.

2.3.2 Immune system structure

Figure 4. Immune system structure.

Virtual phenotyping is also useful for understanding how immune populations are organized beyond the tumor border itself. In Figure 4, overlays such as CD3 + CD20 and CD4 + CD8 provide a view into the composition and organization of the lymphoid compartment. Rather than asking only whether immune cells are present, these panels help us ask whether the immune landscape appears coordinated, sparse, balanced, or spatially segregated. This matters because immune presence alone does not fully capture immune effectiveness; spatial arrangement can suggest very different biological states.

2.3.3 Immune checkpoint biology

Figure 5. Immune checkpoint biology.

Checkpoint biology provides another layer of interpretation that is especially relevant in immuno-oncology. In Figure 5, overlays such as CD8 + PD-L1 and CD3 + PD-1 help connect immune presence with local regulatory signals. These panels are useful because they show that immune cells may be present in the tissue and still not be fully effective if their activity is being restrained by checkpoint-associated biology. Spatial overlap between T-cell markers and checkpoint-associated signal does not, by itself, prove immune exhaustion or therapeutic response, but it can provide context that is consistent with restrained or suppressed immune activity.

2.3.4 Stromal and vascular context

Figure 6. Stromal and vascular context.

The tumor microenvironment is also shaped by the surrounding tissue infrastructure. In Figure 6, overlays such as CD34 + CK and Tryptase + CD68 help reveal how vessels, stromal niches, and innate immune populations are positioned relative to tumor-rich regions. These patterns matter because immune access, tumor expansion, and local signaling are all influenced by the organization of the supporting tissue around the tumor. By including vascular and stromal context, the notebook helps show how virtual markers can support a more complete spatial interpretation of tumor–host interaction.

These examples show how virtual phenotyping transforms raw virtual mIF maps into interpretable spatial summaries of the tumor microenvironment. After localization, the outputs are no longer just probability maps; they become cell-aware patterns that can be read in terms of immune infiltration, tumor growth, checkpoint context, stromal organization, and compartment-specific localization.

The goal of examples is reproducibility and interpretation rather than broad biological generalization. The limited dataset is useful because it makes the workflow easy to follow and the figures easy to inspect, but real deployment would require additional tuning, validation, and adaptation for the target imaging workflow and marker set. Even with that caveat, this workflow illustrates the practical value of GigaTIME: virtual mIF predictions become most useful when they are localized, contextualized, and interpreted as part of a spatial system rather than as isolated channels.

3. Microsoft Discovery: Transform the end‑to‑end discovery process from hypothesis generation to simulation, evaluation, iteration, and design

Microsoft Discovery is designed as an enterprise agentic AI platform. It is built around a graph-based knowledge engine and teams of specialized AI agents that collaborate with scientists throughout the discovery cycle from literature reasoning and hypothesis formation to simulation and iterative learning. With Microsoft Discovery, teams can:

  • Accelerate end‑to‑end research with autonomous, multi‑agent systems that conduct literature analysis, scientific reasoning, simulation, and tool execution at scale
  • Unify institutional knowledge through GraphRAG‑powered Bookshelves that transform proprietary documents and scientific data into structured, queryable knowledge graphs
  • Scale advanced computation on Azure supercomputing infrastructure to support large‑scale simulation, modeling, and design‑space exploration
  • Collaborate with confidence in enterprise‑grade workspaces featuring built‑in RBAC, managed identities, and full data sovereignty

 

Figure 7. Microsoft Discovery.

Importantly, Discovery does not treat AI outputs as final answers. Instead, it embeds them into an explicit scientific reasoning loop, where:

  • Knowledge is represented as contextual, versioned graphs rather than static text
  • Conflicting evidence and assumptions are surfaced, not hidden
  • AI agents specialize, adapt, and learn across iterations
  • Researchers remain in control, with traceable sources and explainable steps

All outputs are intended to support, not replace, expert scientific and clinical judgment.

 

Figure 8. Microsoft Discovery Scientific Reasoning Loop.

Built on Microsoft Azure, Microsoft Discovery orchestrates teams of specialized AI agents using a graph-based knowledge engineering framework and able to leverage AI models available through Microsoft Foundry. The platform integrates advanced AI, high-performance computing (HPC) and quantum capabilities, and can connect insights back to the physical world to enable continuous experimentation and refinement. Meanwhile, Microsoft Discovery remains fully extensible to an organization’s own models, agents, tools, and datasets while meeting stringent enterprise requirements for trust, governance, security and compliance.

 

Figure 9. Enterprise Agentic R&D Platform Microsoft Discovery.

4. Using GigaTIME within Microsoft Discovery for precision oncology R&D

Microsoft Discovery is the overall agentic R&D platform. GigaTIME is one of the many AI tools that can be used on the Discovery platform to generate spatially resolved tumor microenvironment features from routine pathology, and then connect those features to downstream reasoning, validation, and iteration. GigaTIME provides population-scale, spatially resolved tumor microenvironment features derived from routine pathology.

When GigaTIME runs as a standalone notebook or point solution, the pipeline is often held together by ad hoc storage, cross-team handoffs, and manual input/output tracking (for example, whole slide images and patches in one location, predictions in another, single-cell localization outputs elsewhere, and downstream analyses in separate scripts).

In Microsoft Discovery, the pipeline is reshaped with governed ingestion, model execution, post-processing/feature extraction, and iterative reasoning. So that each stage produces typed, versioned inputs for the next instead of “files you have to hunt down”. Operationalizing GigaTIME in Discovery shifts the day-to-day experience from “run a model, then assemble context elsewhere” to “ask, explore, and iterate in one governed workspace”. In addition to that, Microsoft Discovery provides comprehensive suite of tools that transform data from sources like science catalog and AI models into actionable insights and validated findings. These tools include intelligent multi-agent orchestration, a cognitive discovery engine, a bookshelf, high-performance compute and validation of hypotheses, scientific reasoning, and an iteration framework.

Within a Discovery Platform, researcher can build customized analytics workflows for image ingestion, model inference, visualization, and these can become standardized building blocks rather than one-off analyses. Because the platform is extensible, teams can integrate additional models from Microsoft Foundry, third-party tools, or in-house pipelines alongside GigaTIME, creating a governed, end-to-end tumor immune phenotyping and discovery workflow.

Figure 10. Microsoft Discovery platform using GigaTIME as R&D tool (alongside other models, data sources, and R&D capabilities)

In the future, we expect Discovery to empower the research community to explore several other R&D applications by incorporating new models like GigaTIME alongside additional tools, datasets, experimental systems, and domain knowledge, including:

  • Exploring tumor responses to immunotherapy treatment by linking spatial immune context
  • Supporting drug-discovery research by connecting spatial phenotypes to molecular pathways and targets
  • Helping researchers generate hypotheses about candidate biomarkers and therapeutic targets by contextualizing population-scale signals against prior evidence in a knowledge graph.
  • Informing research on treatment stratification using cell-aware spatial signatures beyond bulk averages

GigaTIME and Microsoft Discovery are intended for research and development purposes. They are not medical devices and are not intended to diagnose, prevent, monitor, predict, prognose, treat, or alleviate any disease or condition. Any clinical application would require separate validation and applicable regulatory clearance.

5. From tutorial to platform scale impact

The virtual phenotyping the tumor immune microenvironment with GigaTIME shows that virtual mIF outputs are most useful value when they are localized, contextualized, and interpreted as part of a spatial system rather than  isolated channels. When integrated into Microsoft Discovery, these outputs form the foundation for scalable, auditable, and collaborative oncology R&D.

With this integration, Microsoft Discovery reflects a broader shift in how AI is applied to science. The objective is no longer simply to run individual models or analyses faster, but to help evolve how R&D is conducted by embedding reasoning, learning, and orchestration directly into the scientific process. In this way, outputs from tools like GigaTIME can be translated into testable hypotheses and validated decisions.

Ultimately, this about providing tools that can help researchers examine complex systems, structure their reasoning, and iterate on their analyses.

Microsoft Discovery is now available in preview. Ready to take the next steps and try out platform with GigaTIME and any other Microsoft 1P or 3P Models available through Microsoft Foundry:

Microsoft Discovery expended preview announcement  https://aka.ms/MicrosoftDiscoveryBlog

Learn and practice how Microsoft Discovery can help scientists and engineers transform research and development at https://aka.ms/microsoftdiscovery

Follow our tutorial notebook to understand how to deploy GigaTIME using Microsoft Foundry model catalog, reproduce the results described here, and understand how to use it for your own workloads: https://aka.ms/gigatime-sample

Access GigaTIME model card, learn model details and access deployment.

 

This post contains forward-looking statements regarding potential future capabilities, research directions, and applications of GigaTIME and Microsoft Discovery. These statements reflect current plans and expectations, are subject to change without notice, and do not constitute a commitment to deliver any functionality, feature, code, or service. Actual results may differ.

Special thanks to Microsoft cross functional team for their great support:

@Jeya Maria Jose Valanarasu, Sr. Scientist, Microsoft Research Health Futures
@Naoto Usuyama, Principal Researcher at Microsoft Research Health Futures
@Hao Qiu, Data Scientist, HLS Frontiers
@Ivan Tarapov, Senior Director, Multimodal Healthcare AI at Microsoft
@Saumil Shrivastava, Principal Product Manager, Microsoft Foundry
@Bella Chan, Principal Product Manager, Microsoft Discovery
@Ash Jogalekar, Senior Program Manager, Microsoft Discovery
@Nihit Pokhrel, Senior Product Manager, Microsoft Discovery
@Lily Kim, General Manager, Microsoft Discovery
@Samuel De Freitas Martins, Senior Director, Strategy and Partnerships
@Mu Wei, Principal Applied Science Manager, Health and Life Sciences
@Hoifung Poon, General Manager, Microsoft Research Health Futures

References

[1] U.S. Food and Drug Administration. FDA approves tisagenlecleucel for B-cell ALL and tocilizumab for cytokine release syndrome. 2017.

[2] Valanarasu JMJ, et al. Multimodal AI generates virtual population for tumor microenvironment modeling. Cell. 2026.

[3] Valanarasu JMJ, et al. Multimodal AI generates virtual population for tumor microenvironment modeling. Cell. 2026.

[4] Sood Anup et. al., Comparison of Multiplexed Immunofluorescence Imaging to Chromogenic Immunohistochemistry of Skin Biomarkers in Response to Monkeypox, Viruses 12 (8), 787

[5] Santamaria-Pang, A., et.al., Automated Phenotyping via Cell Auto Training (CAT) on the Cell DIVE Platform, 2019 IEEE BIBM,  

[6] Brummel K, Eerkens AL, de Bruyn M, et al. Tumour-infiltrating lymphocytes: from prognosis to treatment selection. British Journal of Cancer. 2023.

[7] Zhao L, Jin S, Wu H. Tertiary lymphoid structures in diseases: immune mechanisms and therapeutic advances. Signal Transduction and Targeted Therapy. 2024.

[8] Sun Q, Hong Z, Zhang C, et al. Immune checkpoint therapy for solid tumours: clinical dilemmas and future trends. Signal Transduction and Targeted Therapy. 2023.

[9] Choi Y, Jung K. Normalization of the tumor microenvironment by harnessing vascular and immune modulation to achieve enhanced cancer therapy. Experimental & Molecular Medicine. 2023.

Updated Apr 22, 2026
Version 2.0
No CommentsBe the first to comment