Blog Post

Azure Architecture Blog
7 MIN READ

Designing a Medallion Framework — A Decision Guide

Subhajit1994's avatar
Subhajit1994
Icon for Microsoft rankMicrosoft
Apr 24, 2026

Why “medallion” is more about decisions than diagrams

Everyone draws the same picture: Bronze → Silver → Gold. Three boxes, three arrows. Done.

What that picture hides is the dozen design decisions you have to make inside each box — and the ones you make at the boundaries between them. Get those right and onboarding the 200th table feels like onboarding the 2nd. Get them wrong and you’ll be rewriting the framework in 18 months.

This post is a generic walkthrough of how to think about a medallion framework on Databricks (or any other platform): what each layer should own, where the responsibilities blur, and a few opinionated patterns I’ve found worth defending

The classic template -

Bronze → Silver → Gold. Three layers, broadly:

Press enter or click to view image in full size

 

 

This template is intentionally vague — and that’s the point. The same three labels can describe a framework for a 10-table marketing pipeline and a 2,000-table enterprise lakehouse. The differences are in how you tweak the template to match your project.

This post walks through the questions that drive those tweaks. There isn’t a single right answer for any of them — only the answer that fits your project’s requirements.

How to read this guide

For each architectural choice, I’ll frame it as:

  1. The question — the requirement you need to clarify
  2. The options — the realistic ways to answer it
  3. When each option fits — what kind of project picks which option

Use this to make your tradeoffs explicit. Document the answers in your design doc. They’ll inform a hundred downstream decisions.

Question 1 — Do you need a Staging layer?

A Staging (stg_*) layer is a transient zone that holds just the current run’s data before it lands in Bronze.

Options:

  • No staging. Source → Bronze directly.
  • Staging as a transient table per object, overwritten every run.
  • Staging as a checkpointed zone (e.g., Auto Loader checkpoints + raw files in a landing path).

When to pick which:

 

 

The decision usually comes down to failure isolation and incremental capture clarity. If both are non-issues, you can skip it.

Question 2 — How “raw” should Bronze be?

This is the single biggest tweak point in the medallion architecture. The textbook says “Bronze = raw bytes.” Real projects often deviate.

Options:

  • A. Strictly raw. Source schema preserved exactly. All columns as STRING. No casting, no trimming.
  • B. Lightly cleaned. Strong typing, whitespace trimmed, null normalization (“”, “N/A” → NULL), audit columns added. Schema stable.
  • C. Cleansed + minor enrichment. Above plus reference data lookups, basic standardization (e.g., country codes), key normalization.

When to pick which:

 

 

A useful rule of thumb: the more sources and consumers you have, the cleaner Bronze should be. The cost of not cleaning compounds with every notebook downstream.

If you choose B or C, you’ve shifted some traditional Silver responsibilities into Bronze. That’s fine — just be explicit about it so Silver’s contract changes accordingly.

Question 3 — What does Silver actually own?

Silver is the most overloaded layer in any medallion framework. Decide upfront which of these responsibilities Silver owns vs. defers to other layers:

 

 

How to decide what Silver owns:

  • If Silver is the only layer business users query, give it more — including light history and aggregations. (Common in smaller projects.)
  • If you have a strong Gold layer with multiple marts, keep Silver narrow: business entities only, current state.
  • If you have multiple consuming teams with different needs, push everything consumer-specific to Gold and keep Silver as the shared canonical model.

The clearest signal that Silver is overloaded: you have one Silver table per source table. Silver should be organized by business entity, not by source. If they line up 1:1, you’ve effectively built “Bronze with cleaning” and skipped Silver’s real value.

Question 4 — Is Gold one zone or several?

The default picture shows Gold as one box. In real projects it often splits.

Options:

  • Single Gold zone. Marts and history live together.
  • Gold-Reporting + Gold-History. Reporting marts (denormalized, aggregated, fast) separated from historized snapshots (SCD2, point-in-time, append-mostly).
  • Gold per consumer. Separate zones per business unit, dashboard family, or external API.

 

 

The cost of splitting Gold is some duplication and more pipelines. The benefit is independent SLAs — your dashboard refresh isn’t held hostage by your audit history rebuild.

Question 5 — Load patterns: FullLoad vs DeltaLoad vs CDC

Per source table, decide the load pattern. This decision drives staging design, watermark management, and merge logic.

 

 

It’s normal to mix patterns inside the same framework. The metadata-driven approach below makes this trivial — load pattern is just a column in your config table.

Question 6 — How metadata-driven should the framework be?

Options:

  • Code-per-table. One notebook per ingestion. Simple, easy to reason about, scales poorly.
  • Hybrid. Generic ingestion notebooks for common patterns, custom notebooks for exceptions.
  • Fully metadata-driven. Generic notebooks for every layer, behavior driven entirely by metadata tables.

When to pick which:

 

 

A fully metadata-driven framework has higher upfront cost but flattens the per-table cost dramatically. The break-even point is usually around 30–50 tables.

Question 7 — Orchestration shape

How do you fan out work across tables?

Options:

  • Sequential. One table at a time. Simple, slow.
  • Parallel pool. ThreadPoolExecutor or Databricks Workflows fan-out. Tables run concurrently, no inter-table dependencies.
  • DAG. Dependency-aware execution. Required when tables depend on each other.

Per-layer guidance:

 

 

The decision driver is whether tables in that layer depend on each other. If they don’t, don’t pay the DAG complexity tax.

Question 8 — Failure handling and retries

Options to decide on:

  • Retry scope. Per statement, per child notebook, per master run, none.
  • Retry counts. Per layer? Per table? Per environment?
  • Backoff. Fixed, linear, exponential.
  • Failure semantics. Fail-fast (stop on first failure) or best-effort (continue and report at the end).

When to pick which:

 

 

A good default for most projects: process-level retry (master retries the failed child), exponential backoff, per-layer max retry count, fail-fast within a child.

Question 9 — Observability: how much do you log?

Decide what every run captures:

  • Execution status, start/end timestamps, duration
  • Row counts per activity (source read, staging write, target write)

  • MERGE metrics (inserted, updated, deleted)
  • Watermark used and watermark captured
  • Retry attempts
  • Error message (truncated)

Options for storage:

  • Logs in source-side metadata DB (e.g., Azure SQL). Easy to query with SQL, integrates with monitoring tools.
  • Logs in a Delta table in the lakehouse. Native to Databricks, queryable with Spark.
  • Logs in both. Source-side for ops dashboards, Delta for analytics on the pipeline itself.

When to pick which:

 

 

Whatever you pick, make count validation a first-class output. The moment counts mismatch, you want to know — not three reports later.

Question 10 — Schema evolution policy

The cheapest decision to defer and the most painful one to retrofit.

Decide which changes are allowed automatically:

 

 

Where to enforce:

  • At Bronze ingestion — fail loudly if source schema changes in a disallowed way
  • At Silver — handle by transformation; new Bronze columns don’t auto-flow to Silver
  • At Gold — strict contracts; consumers depend on the shape

The contract changes per layer reflects the audience. Bronze is forgiving (data engineers see issues); Gold is strict (consumers can’t tolerate surprises).

Question 11 — Idempotency and replay

Can you re-run yesterday’s load and get the same result?

Options:

  • Idempotent by run_id. Re-running the same run_id is a no-op or produces identical output.
  • Idempotent by data. Re-running with the same source data produces identical output (regardless of run_id).
  • Non-idempotent. Replays may produce different results (e.g., timestamps based on current_timestamp()).

Recommendation: aim for data-idempotent in every layer. Concretely:

  • Staging: overwrite-per-run → idempotent by construction.
  • Bronze: keyed MERGE → idempotent.
  • Silver: pure transformation of Bronze inputs → idempotent.
  • Gold: pure transformation of Silver inputs → idempotent.

If you can’t replay a layer cleanly, that’s a design bug worth fixing early.

Question 12 — Environment topology

How many environments? How do they differ?

Common patterns:

  • Dev / Test/ Stage / Prod, separate workspaces and data.
  • Per-developer dev, shared Test/Stage, isolated Prod.

What changes between environments (drive these from config):

  • Source connection strings
  • Target storage paths / catalog names
  • Retry counts (often higher in prod)
  • Parallelism (often lower in dev to save cost)
  • Logging verbosity
  • Data masking rules

Keep code identical across environments. Differences live in environment-scoped config (dev.yml, test.yml, stage.yml, prod.yml) loaded at runtime.

Putting it together — three example shapes

The same framework, three different projects, three different shapes:

Shape A — Small marketing analytics project

  • 15 tables, single source, weekly batch
  • No staging — source is reliable, volumes small
  • Bronze: lightly cleaned — analysts query it directly
  • Silver: full ownership including light history and aggregations (no separate Gold needed)
  • Gold: optional, only for the executive dashboard
  • Code-per-table, sequential orchestration, fail-fast, minimal logging

Shape B — Mid-size enterprise data platform

  • 80 tables, 5 source systems, daily batch with some hourly
  • Staging as transient table for Delta Loads
  • Bronze: lightly cleaned + audit columns
  • Silver: business entities (Customer, Policy, Claim), DAG orchestration
  • Gold: split into Reporting + History zones
  • Hybrid metadata-driven (generic ingestion, custom transforms), per-layer retry, structured count logs

Shape C — Large multi-tenant Lakehouse

  • 500+ tables, 20+ source systems, mixed batch/streaming
  • Staging zone with file-level checkpoints (Auto Loader)
  • Bronze: strictly raw + a parallel Bronze-Curated layer for cleansed views
  • Silver: shared canonical model, narrow scope
  • Gold: per-consumer zones with independent SLAs
  • Fully metadata-driven, DAG everywhere, multi-store logging, strict schema contracts

Notice none of these are “wrong.” They’re calibrated to the project.

A short checklist for your own framework

Before writing code, write down your answers to:

  1. Do we need a Staging layer? Why?
  2. How clean is Bronze? What’s allowed and what’s not?
  3. What does Silver own? Where does it stop?
  4. Is Gold one zone or multiple? How are they divided?
  5. Which load patterns do we support? Per table or universal?
  6. How metadata-driven? Where do exceptions live?
  7. What’s the orchestration shape per layer?
  8. What’s our retry and failure policy per layer?
  9. What does every run log? Where?
  10. What’s our schema evolution policy per layer?
  11. Are all layer's data-idempotent?
  12. What changes per environment, and what stays the same?

If you have an answer for each, you have a framework design. If you skip any, you have a framework that will surprise you in production.

Closing thought

The medallion architecture isn’t a prescription — it’s a vocabulary. Bronze, Silver, Gold give you words to describe responsibilities. The actual responsibilities are yours to assign, based on what your project actually needs.

Tweak deliberately. Document your tweaks. And revisit them when the project’s requirements change — because they will.

Updated Apr 23, 2026
Version 1.0
No CommentsBe the first to comment