Blog Post

SQL Server Blog
7 MIN READ

mssql-python 1.4: Bulk Copy Arrives - Load Millions of Rows at Native Speed

DavidLevy's avatar
DavidLevy
Icon for Microsoft rankMicrosoft
Mar 18, 2026

We're excited to announce the release of mssql-python 1.4.0, the latest version of Microsoft's official Python driver for SQL Server, Azure SQL Database, and SQL databases in Fabric. This release delivers one of our most-requested features, Bulk Copy (BCP), alongside spatial type support, important bug fixes, and developer experience improvements.

pip install --upgrade mssql-python

 

The headline: Bulk Copy is here

If you're moving large volumes of data into SQL Server, whether you're building ETL pipelines, loading data warehouse staging tables, ingesting IoT telemetry, or seeding databases for testing, the new bulkcopy() API is purpose-built for you. It provides the same high-throughput data loading capability that tools like bcp.exe and SqlBulkCopy in .NET have offered for years, now available natively from Python.

Why bulk copy matters

Traditional row-by-row inserts, even batched with executemany(), carry per-statement overhead: parsing, plan compilation, and individual round-trips for each row or batch. Bulk copy uses SQL Server's native bulk insert protocol (TDS bulk load), which:

  • Streams rows directly into the target table with minimal protocol overhead
  • Bypasses query parsing - there's no SQL statement to compile
  • Batches intelligently - you control the batch size, or let the server optimize it
  • Supports server-side options like table locks, constraint checking, trigger firing, and identity preservation

For large datasets, the performance difference can be dramatic.

How it works

The API lives on the cursor object, so it fits naturally into the DB API 2.0 workflow you already know:

import mssql_python

conn = mssql_python.connect(
    "SERVER=myserver.database.windows.net,1433;"
    "DATABASE=mydb;"
    "UID=myuser;PWD=mypassword;"
    "Encrypt=yes;"
)
cursor = conn.cursor()

# Your data - any iterable of tuples or lists
rows = [
    (1, "Alice", "alice@example.com"),
    (2, "Bob", "bob@example.com"),
    (3, "Carol", "carol@example.com"),
    # ... millions more
]

result = cursor.bulkcopy("dbo.Users", rows)

print(f"Loaded {result['rows_copied']} rows "
      f"in {result['batch_count']} batches "
      f"({result['elapsed_time']:.2f}s)")

That's it. Three lines of code to bulk-load your data.

Full control when you need it

The bulkcopy() method exposes the full range of SQL Server bulk copy options:

result = cursor.bulkcopy(
    table_name="dbo.Users",
    data=rows,
    batch_size=10000,           # 10k rows per batch (0 = server optimal)
    timeout=120,                # Configurable timeout for large loads allows you to avoid premature cancellations while still failing due to blocking and other issues
    column_mappings=["UserID", "FirstName", "Email"],  # Explicit column targeting
    keep_identity=True,         # Preserve identity values from source
    check_constraints=True,     # Enforce constraints during load
    table_lock=True,            # Table-level lock for maximum throughput
    keep_nulls=True,            # Preserve NULLs instead of column defaults
    fire_triggers=True,         # Execute INSERT triggers
    use_internal_transaction=True,  # Transaction per batch for recoverability
)

Column mappings support two formats. The simple format maps columns by position:

# Position in list = source column index
column_mappings=["UserID", "FirstName", "Email"]

The advanced format uses explicit index-to-column tuples, which lets you skip or reorder source columns:

# (source_index, target_column_name) - skip index 2, reorder freely
column_mappings=[(0, "UserID"), (1, "FirstName"), (3, "Email")]

 

Powered by Rust under the hood

The bulk copy engine is implemented in mssql-py-core, a companion Rust library that handles the TDS bulk load protocol. When you call bulkcopy(), the driver:

  1. Parses your existing connection string and translates it for the Rust layer
  2. Opens a dedicated connection through mssql-py-core (separate from your DDBC query connection)
  3. Acquires an Entra ID token if needed
  4. Streams your data iterator directly to the Rust bulk copy engine
  5. Returns a result dictionary with rows_copied, batch_count, and elapsed_time

The Python logging integration is performance-aware: the logger is only passed to the Rust layer when debug logging is active, so there's zero overhead in production.

Security is built-in: credentials are scrubbed from memory in the finally block, and error messages are sanitized to prevent credential leakage in stack traces.

Spatial type support: geography, geometry, and hierarchyid

Version 1.4 adds support for SQL Server's spatial and hierarchical types: geography, geometry, and hierarchyid. These CLR user-defined types are now handled natively by the driver.

Reading spatial data

Spatial columns are returned as bytes (the raw CLR binary representation). To get human-readable output, use SQL Server's built-in conversion methods:

# Insert a geography point (WGS 84)
cursor.execute(
    "INSERT INTO Locations (point) VALUES (geography::STGeomFromText(?, 4326))",
    "POINT(-122.349 47.651)"
)

# Read as WKT text
cursor.execute("SELECT point.STAsText() FROM Locations")
row = cursor.fetchone()
# row[0] = "POINT (-122.349 47.651)"

# Use spatial methods server-side
cursor.execute("""
    SELECT a.point.STDistance(b.point) AS distance_meters
    FROM Locations a CROSS JOIN Locations b
    WHERE a.id = 1 AND b.id = 2
""")

 

Writing spatial data

The driver auto-detects WKT (Well-Known Text) geometry strings. If a parameter value starts with POINT, LINESTRING, or POLYGON, it's automatically mapped to the correct SQL type:

# All standard WKT types are supported
cursor.execute(
    "INSERT INTO Routes (path) VALUES (geography::STGeomFromText(?, 4326))",
    "LINESTRING(-122.349 47.651, -122.340 47.660, -122.330 47.670)"
)

cursor.execute(
    "INSERT INTO Zones (boundary) VALUES (geometry::STGeomFromText(?, 0))",
    "POLYGON((0 0, 100 0, 100 100, 0 100, 0 0))"
)

 

HierarchyId for tree structures

hierarchyid is SQL Server's built-in type for representing tree/graph hierarchies: org charts, file systems, bill-of-materials structures:

# Insert a node
cursor.execute(
    "INSERT INTO OrgChart (node, name) VALUES (hierarchyid::Parse(?), ?)",
    "/1/2/3/", "Engineering Lead"
)

# Query the hierarchy
cursor.execute("SELECT node.ToString(), node.GetLevel(), name FROM OrgChart")
# ("/1/2/3/", 3, "Engineering Lead")

# Find ancestors
cursor.execute("SELECT node.GetAncestor(1).ToString() FROM OrgChart WHERE name = 'Engineering Lead'")
# "/1/2/"

 

Output converters

For advanced use cases, you can register custom converters to automatically transform the raw binary representation:

def parse_geography(value):
    """Convert CLR binary to a shapely geometry (example)."""
    if value is None:
        return None
    # Your deserialization logic here
    return shapely.wkb.loads(value)

conn.add_output_converter(bytes, parse_geography)

# Now all bytes columns are automatically converted
cursor.execute("SELECT point FROM Locations")
row = cursor.fetchone()
# row[0] is now a shapely geometry object

 

Bug fixes

VARCHAR encoding fix

VARCHAR columns would fail to fetch when the data length exactly equaled the column size and the data contained non-ASCII characters in the CP1252 code page. This was a subtle edge case that could surface with European-language text (accented characters, currency symbols, etc.) in fixed-length string columns.

Segmentation fault fix

Resolved a segfault that occurred when interleaving fetchmany() and fetchone() calls on the same cursor. This affected patterns like:

batch = cursor.fetchmany(100) # ... process batch ... next_row = cursor.fetchone() # Previously could segfault

This is now safe to use in all combinations.

Date/time type code alignment

Aligned date/time type code mappings with the ODBC 18 driver source, correctly mapping SQL_SS_TIME2 (-154) and SQL_SS_DATETIMEOFFSET (-155). This improves compatibility with tools and frameworks that inspect cursor.description type codes.

Developer experience improvements

PEP 561 type checking support

The driver now ships with a py.typed marker file, enabling full static type checking in tools like mypy, Pyright, and IDE type inspectors. Combined with the existing .pyi stub file, you get accurate autocompletion and type validation for the entire mssql-python API.

Devcontainer for contributors

A new devcontainer configuration makes it easy to spin up a fully configured development environment for contributing to the driver. Just open the repo in VS Code or GitHub Codespaces and you're ready to go.

Azure SQL Database in CI

The PR validation pipeline now tests against Azure SQL Database in addition to on-premises SQL Server, ensuring that every change is validated against the Azure SQL service before merge.

The road to 1.4

For context, here's how the driver has evolved over its GA releases:

ReleaseDateHighlights
1.0.0November 2025GA release - DDBC architecture, Entra ID auth, connection pooling, DB API 2.0 compliance
1.1.0December 2025Parameter dictionaries, Connection.closed property, Copilot prompts
1.2.0January 2026Param-as-dict, non-ASCII path handling, fetchmany fixes
1.3.0January 2026Initial BCP implementation (internal), SQLFreeHandle segfault fix
1.4.0February 2026BCP public API, spatial types, Rust core upgrade, encoding & stability fixes

BCP was introduced as an internal implementation in 1.3 and has been hardened, expanded, and promoted to a fully public API in 1.4, with Entra ID support, explicit parameters, column mappings, logging integration, and comprehensive validation.

What's next

Looking ahead, the roadmap includes:

  • Asynchronous query execution with asyncio support
  • Vector datatype support for SQL Server's native vector type
  • Table-Valued Parameters (TVPs) for passing tabular data to stored procedures

Get started

pip install --upgrade mssql-python

 

We'd love your feedback. Try the new bulk copy API, let us know how it performs on your workloads, and file issues for anything you run into. This driver is built for the Python data community, and your input directly shapes what comes next.

Updated Mar 17, 2026
Version 1.0
No CommentsBe the first to comment