Blog Post

Healthcare and Life Sciences Blog
7 MIN READ

Image Search Series Part V: Building Histopathology Image Search with Prov-GigaPath

Alberto_Santamaria's avatar
Mar 12, 2026

@Alberto Santamaria-Pang, Principal AI Data Scientist, Healthcare ISE and Adjunct Faculty, Johns Hopkins School of Medicine

@Asma Ben Abacha, Senior Applied Scientist, HLS AI

@Manoj Kumar, Director HLS, Data & AI HLS Frontiers AI

@Jameson Merkow, Principal Applied Data Scientist

@Mu Wei, 

@Ivan Tarapov, Senior Director, Multimodal Healthcare AI at Microsoft

1. Introduction

In earlier posts, we showed how to build a practical 2D medical image search system: take an image, turn it into an embedding with a foundation model, and use similarity search to find the closest prior cases [1]. We also demonstrated why radiology + pathology together matters for cancer workflows, where imaging findings and tissue evidence complement each other and can be combined in a single pipeline [2,3]. But in real clinical practice, prediction alone isn’t enough. Doctors routinely need to pull up similar prior cases across modalities, compare patterns, and check whether what appears on MRI lines up with what is confirmed under the microscope.

This post focuses on making that workflow practical for pathology. Using a pathology foundation model (Prov-GigaPath) as a retrieval backbone, we convert pathology images into searchable embeddings, build an index, and return the most similar slides in seconds—using the same retrieval pattern introduced in the image search series [1]. Because this approach fits naturally alongside radiology representations used in multimodal pipelines, it helps close the radiology–pathology gap and supports diagnostic concordance with evidence clinicians can directly review [2,3]. An overview of the end-to-end workflow is shown in Figure 1.

Figure 1. Histopathology image search workflow with linked radiology (MRI) context.

2. Histopathology Data

Even with strong foundation models, clinicians still face a practical problem: they need to compare current cases against prior cases across radiology and pathology, not just receive a prediction. In real workflows, diagnostic concordance often hinges on questions like: “Do these MRI findings match what we see in the tissue?” and “Have we seen a similar slide pattern before, and what did it correspond to on imaging?” Our tutorial 2d_pathology_image_search.ipynb addresses this gap by treating pathology as a search problem: extract embeddings from pathology images, index them, and retrieve the most similar prior cases so clinicians can review evidence rather than rely only on model outputs.

A second problem is interoperability. Clinical systems evolve quickly, and the retrieval layer must remain usable even as models change. The architecture in this workflow is intentionally simple and model-agnostic: any foundation model that produces embeddings can plug into the same pipeline (embed → index → retrieve). In this tutorial we use pathology embeddings from Prov-GigaPath and take advantage of an existing radiology–pathology mapping (MRI linked to pathology cases) to make retrieval more impactful: once a relevant pathology case is retrieved, the corresponding radiology context can also be surfaced to support concordance. In this notebook the mapping already exists, but in practice the same idea can be extended to multi-modal indexing, where both pathology and radiology embeddings are indexed (separately or in an aligned space) so that search can pull relevant information across modalities as part of a single workflow.

For this tutorial, we use pre-computed pathology embeddings derived from TCGA-GBMLGG, a curated cohort of 170 subjects with H&E-stained histopathology slides and tumor Grade labels (0/1/2). We split the cohort into ~80% training (to build the FAISS index) and ~20% test (to evaluate retrieval performance), with each image represented as a 1536-dimensional embedding generated by GigaPath (Table 1).

Table 1. TCGA-GBMLGG dataset summary and embedding configuration

Property

Value

Total subjects

170

Split

~80% train / ~20% test

Tumor grades

Grade 0, Grade 1, Grade 2

Image type

H&E-stained histopathology slides

Embedding dimension

1536 (GigaPath)

3. Building the Image Search Engine

To build the pathology search engine, we follow the same practical steps described in the 2D image search blog: (1) turn each image into an embedding using a foundation model, (2) build a vector index (we use FAISS) over those embeddings, and (3) retrieve the nearest neighbors for any new query image. Concretely, we take a pathology image (typically a tile/patch from a whole-slide image), run it through the pathology foundation model to produce a spatial feature map, and then apply adaptive pooling to convert that variable-sized feature map into a fixed-length embedding. Adaptive pooling matters because it guarantees a consistent embedding shape even when patch sizes or resolutions vary. Without that, indexing and distance comparisons become brittle and hard to scale.

Once we can reliably generate embeddings, the rest of the search engine is straightforward: we compute embeddings for the pathology corpus, build a FAISS index (e.g., flat L2 for a baseline), and then run query → embedding → nearest neighbors to retrieve similar pathology cases. Example retrieval results across tumor grades (0–2) are shown in Figure 2. To make “similarity” more clinically meaningful, we optionally apply a lightweight adapter implemented as a small MLP, on top of the foundation embeddings. In the notebook, the adapter takes 1536-D GigaPath embeddings as input (in_channels=1536) and produces a compact 254-D representation (adapter_emb_size=254), trained with a simple 3-class objective (num_class=3, Grades 0/1/2). This is intentionally lightweight compared with retraining the foundation model: we only learn a small mapping from embeddings to a better-aligned space, then rebuild the FAISS index using the adapted vectors (gigapath_adapter_features) to improve retrieval relevance. The effect of this optimization is visualized in Figure 3, which contrasts the baseline embedding space with the adapter-optimized space.

 

Figure 2. Nearest-neighbor retrieval examples for Grade 0, Grade 1, and Grade 2 queries.

 

Figure 3. Embedding space before and after lightweight adapter optimization.

4. Results

We evaluated pathology image retrieval using cancer Grade (0/1/2) as the clinical label. For each query pathology tile/patch in the test set, we searched a FAISS index built from the training set embeddings and computed Precision@K, defined as the fraction of the top-K retrieved items that share the same Grade as the query. In the notebook, we evaluate K = [1, 3, 5], comparing baseline embeddings.

 

Table 2. Overall retrieval precision before and after refinement

Overall Precision@K using (i) baseline GigaPath embeddings and (ii) refined adapter-informed embeddings.

Embedding space

Precision@1

Precision@3

Precision@5

Baseline (GigaPath)

0.5795

0.5593

0.5739

Refined (adapter-informed)

0.7727

0.7967

0.7689

Overall retrieval quality improves substantially after refinement (Table 2), with consistent gains across all K values, indicating that nearest neighbors become more aligned with Grade-consistent similarity.

 

Table 3. Precision by cancer Grade before and after refinement

Precision@K stratified by pathology cancer Grade (0/1/2) for baseline vs refined embeddings.

Grade

Baseline P@1

Baseline P@3

Baseline P@5

Refined P@1

Refined P@3

Refined P@5

0

0.5000

0.5000

0.5500

0.6250

0.6667

0.6250

1

0.3636

0.3030

0.3091

0.8182

0.8485

0.7818

2

0.8750

0.8750

0.8625

0.8750

0.8750

0.9000

Performance differs by Grade (Table 3). Baseline retrieval is strongest for Grade 2, moderate for Grade 0, and weakest for Grade 1, suggesting Grade 1 is the most challenging cohort under raw embeddings. After refinement, Grade 1 improves markedly across all K values, while Grade 2 remains high and improves slightly at deeper retrieval (P@5).

 

Table 4. Absolute improvement in precision after refinement

Absolute change (Δ = refined − baseline) in Precision@K, overall and by Grade.

Cohort

Δ Precision@1

Δ Precision@3

Δ Precision@5

Overall

+0.1932

+0.2374

+0.1951

Grade 0

+0.1250

+0.1667

+0.0750

Grade 1

+0.4545

+0.5455

+0.4727

Grade 2

+0.0000

+0.0000

+0.0375

 

As summarized in Table 4, the improvements are driven primarily by Grade 1 (ΔP@1 = +0.4545; ΔP@3 = +0.5455; ΔP@5 = +0.4727). Note on Grade 2: ΔP@1 and ΔP@3 are 0.0000 because baseline retrieval for Grade 2 is already high (P@1 = 0.8750, P@3 = 0.8750; Table 3), so the adapter does not change top-1/top-3 correctness for that cohort. The improvement appears at deeper retrieval (ΔP@5 = +0.0375), suggesting the adapter mainly refines ranking beyond the top few results rather than increasing an already strong “hit rate.”

Collectively, these results indicate that the refined embedding space makes similarity more grade-consistent, which is exactly what diagnostic concordance workflows need: when clinicians retrieve “similar” pathology cases, they want those neighbors to reflect clinically relevant groupings (here, tumor grade), and to remain interpretable when linked to the corresponding radiology context. The fact that Grade 1 benefits most is also plausible from a pathology standpoint: intermediate grades often show more overlap in morphology with both lower and higher grades (i.e., less separable visual patterns), while higher grades may exhibit more distinctive features that are easier to retrieve correctly even without refinement. In that sense, the lightweight adapter acts as a targeted calibration step: shaping the embedding space so that ambiguous, overlapping cases (like Grade 1) are pulled closer to the right neighbors.



 

Figure 4. Histopathology (H&E) retrieval with linked radiology (MRI) context.

 

 

5. Conclusion

We built a practical histopathology image search engine using a simple, reusable pattern: generate Prov-GigaPath embeddings from pathology tiles (with adaptive pooling to produce fixed-length vectors), index them with FAISS, and retrieve nearest neighbors for any query. This matters because retrieval makes foundation models actionable for clinicians, returning similar prior examples that can be reviewed and compared, rather than only producing a prediction score. The implementation is lightweight and interoperable: once embeddings are available, the remaining steps are standard vector indexing and search, and the same design naturally extends to multimodal workflows by linking retrieved H&E cases to their corresponding MRI context (or by indexing radiology and pathology embeddings side-by-side for cross-modal lookup).

The Microsoft healthcare AI models, including MedImageInsight, available in the Microsoft Foundry model catalog, are intended for research and model development exploration. The models are not designed or intended to be deployed in clinical settings as-is nor for use in the diagnosis or treatment of any health or medical condition, and the individual models’ performances for such purposes have not been established. You bear sole responsibility and liability for any use of the healthcare AI models, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals. 

 

References

[1] Image Search Series Part 1: Chest X-ray lookup with MedImageInsight | Microsoft Community Hub 
[2] Cancer Survival with Radiology-Pathology Analysis and Healthcare AI Models in Azure AI Foundry (Microsoft Healthcare & Life Sciences Blog) [3] Adapter training notebook (MedImageInsight). Microsoft healthcareai-examples GitHub repository (azureml/medimageinsight/adapter-training.ipynb)

Updated Mar 12, 2026
Version 1.0
No CommentsBe the first to comment